Pod Affinity

Pod 亲和性是指将新 pod 调度到已经拥有一个或多个满足特定条件的 pod 的节点上的能力。 Karpenter 通过为节点定义 pod 亲和性规则来支持 pod 亲和性。 在本节我们将了解如何使用 podAffinity 确保将前端 pod 部署在后端 pod 所在的同一区域中。 这里我们展示了 podAffinity 的示例,但它对于 podAntiAffinity 的工作原理也相同(允许将新的 pod 调度到不具有任何满足特定条件的现有 pod 的节点上)。

首先删除前面章节创建的资源:

kubectl delete deployment inflate
kubectl delete nodepools.karpenter.sh default
kubectl delete ec2nodeclasses.karpenter.k8s.aws default

部署NodePool:

mkdir ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> podaffinity.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  limits:
    cpu: "20"
  template:
    metadata:
      labels:
        eks-immersion-team: my-team
    spec:
      nodeClassRef:
        name: default
      # Requirements that constrain the parameters of provisioned nodes.
      # These requirements are combined with pod.spec.affinity.nodeAffinity rules.
      # Operators { In, NotIn } are supported to enable including or excluding values
      requirements:
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
          operator: In
          values: ["on-demand"]
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  securityGroupSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  subnetSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    intent: apps
    managed-by: karpenter
EoF

kubectl apply -f podaffinity.yaml

部署应用

export AZ1="$AWS_REGION"b
export AZ2="$AWS_REGION"c

我们将部署两个应用(backendinflate), backend应用设置了nodeAffinity,要部署在AZ2, inflate应用要部署在AZ1:

cd ~/environment/karpenter
cat <<EoF> nodeaffinity-pod-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      affinity: 
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "topology.kubernetes.io/zone"
                  operator: "In"
                  values: ["$AZ2"]
      terminationGracePeriodSeconds: 0
      containers:
        - name: backend
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      affinity: 
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "topology.kubernetes.io/zone"
                  operator: "In"
                  values: ["$AZ1"]
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f nodeaffinity-pod-deploy.yaml

这样Karpenter会在两个AZ各拉起一台机器:

image-20231028131639188

且两台机器上各跑着两个pod:

kongpingfan:~/environment/karpenter $  kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
NODE                                            NAME
ip-192-168-130-42.us-west-2.compute.internal    backend-7fb8544cf9-6n5mq
ip-192-168-130-42.us-west-2.compute.internal    backend-7fb8544cf9-z547g
ip-192-168-189-213.us-west-2.compute.internal   inflate-7c56688b5d-bwm76
ip-192-168-189-213.us-west-2.compute.internal   inflate-7c56688b5d-cwmxz

测试podAffinity

现在部署frontend应用,frontend应用需要跟backend应用部署在同一个AZ(为了减少跨AZ访问流量)。所以我们为它添加podAffinity:

cd ~/environment/karpenter
cat <<EoF> podaffinity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: backend
            topologyKey: topology.kubernetes.io/zone 
      terminationGracePeriodSeconds: 0
      containers:
        - name: frontend
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f podaffinity-deploy.yaml

可以看到frontend的pod全部和backend在同一个AZ,并且分布在两个节点上(因为第一个节点的资源已经不够用了,要创建出来一个新的节点):

image-20231028132118885

kongpingfan:~/environment/karpenter $  kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
NODE                                            NAME
ip-192-168-130-42.us-west-2.compute.internal    backend-7fb8544cf9-6n5mq
ip-192-168-130-42.us-west-2.compute.internal    backend-7fb8544cf9-z547g
ip-192-168-36-213.us-west-2.compute.internal    frontend-8547476cdc-krdtn
ip-192-168-130-42.us-west-2.compute.internal    frontend-8547476cdc-tv59n
ip-192-168-189-213.us-west-2.compute.internal   inflate-7c56688b5d-bwm76
ip-192-168-189-213.us-west-2.compute.internal   inflate-7c56688b5d-cwmxz