Pod Disruption Budget(PDB)

如果设置了PDB,Karpenter会使用指数回退的驱逐策略,Pod不会被强制删除,从而阻止删除节点的操作。PDB指定了 Deployment, ReplicationController, ReplicaSet, StatefulSet 中最小运行的Pod数量,防止Pod被大量驱逐,保证线上应用的平滑运行。

在本节,我们将PDB设置为最小4个Pod运行,看Karpenter在遇到冲突时的行为。


先删除上一节创建的资源:

kubectl delete deployment inflate
kubectl delete deployment backend
kubectl delete deployment frontend
kubectl delete nodepools.karpenter.sh default
kubectl delete ec2nodeclasses.karpenter.k8s.aws default

为2个AZ设置环境变量:

export AZ1="$AWS_REGION"b
export AZ2="$AWS_REGION"c

部署NodePool,其中的实例CPU设置为小于3,并部署在AZ2:

mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> pdb-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
    expireAfter: Never
  limits:
    cpu: "20"
  template:
    metadata:
      labels:
        eks-immersion-team: my-team
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m"]
        - key: karpenter.k8s.aws/instance-cpu
          operator: Lt
          values: ["3"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["${AZ2}"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
          operator: In
          values: ["on-demand"]
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  securityGroupSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  subnetSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    intent: apps
    managed-by: karpenter
EoF

kubectl apply -f pdb-nodepool.yaml

部署PDB和应用

先创建一个PDB,它匹配app: inflate标签,并且在pod数量小于4时,不允许继续驱逐pod,从而阻碍节点回收:

cd ~/environment/karpenter
cat <<EoF> pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: inflate-pdb
spec:
  minAvailable: 4
  selector:
    matchLabels:
      app: inflate
EoF

kubectl apply -f pdb.yaml      

创建6个replica,每个容器有1G内存和1个CPU:

cd ~/environment/karpenter
cat <<EoF> pdb-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 6
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
      - name: app
        image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
        resources:
          requests:
            memory: 1Gi
            cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f pdb-deploy.yaml

测试PDB

可以看到Karpenter在AZ2创建出来6个节点:

image-20231028223619269

执行下面的命令,也能确认6个pod在6个节点上:

kongpingfan:~/environment/karpenter $ kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
NODE                                           NAME
ip-192-168-49-6.us-west-2.compute.internal     inflate-754f46b654-4bl42
ip-192-168-141-71.us-west-2.compute.internal   inflate-754f46b654-62mld
ip-192-168-128-83.us-west-2.compute.internal   inflate-754f46b654-8h7bl
ip-192-168-133-59.us-west-2.compute.internal   inflate-754f46b654-gwjj5
ip-192-168-49-199.us-west-2.compute.internal   inflate-754f46b654-h7qd4
ip-192-168-52-131.us-west-2.compute.internal   inflate-754f46b654-k669m

现在我们将从EKS集群中驱逐其中一个节点,Karpenter允许我们驱逐它并会创建一个新的节点(原因是在 6 个pod中,我们仍然有 5 个,这超过了我们设置的 PDB 限制 4):

kubectl drain --ignore-daemonsets $(kubectl get nodes -l "eks-immersion-team" -o name | tail -n1)

image-20231028223924950

现在我们尝试一次驱逐三个节点,由于pod数量此时会小于PDB设置的4,所以会报错:

kubectl drain --ignore-daemonsets $(kubectl get nodes -l "eks-immersion-team" -o name | tail -n3)

image-20240803223828023

驱逐前两个节点的时候没问题,但到第三个的时候就会报错,当几次重试之后,PDB重新被满足(Karpenter拉起来新的节点并部署pod),第三个节点终于被驱逐。

最终重新达到平衡状态,但在这个过程中,Karpenter先启动了两个节点(同时干掉两个),再启动一个节点(干掉最后一个):

image-20231028224907152