Pod Distruption Budget(PDB)

如果设置了PDB,Karpenter会使用指数回退的驱逐策略,Pod不会被强制删除,从而阻止删除节点的操作。PDB指定了 Deployment, ReplicationController, ReplicaSet, StatefulSet 中最小运行的Pod数量,防止Pod被大量驱逐,保证线上应用的平滑运行。

在本节,我们将PDB设置为最小4个Pod运行,看Karpenter在遇到冲突时的行为。


先删除上一节创建的资源:

kubectl delete deployment inflate
kubectl delete provisioners.karpenter.sh default
kubectl delete awsnodetemplates.karpenter.k8s.aws default

为2个AZ设置环境变量:

export AZ1="$AWS_REGION"b
export AZ2="$AWS_REGION"c

部署Provisioner和Node template,provisioner中的实例CPU设置为小于3,并部署在AZ2:

mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> pdb-provisioner.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  # References cloud provider-specific custom resource, see your cloud provider specific documentation
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30

  # Labels are arbitrary key-values that are applied to all nodes
  labels:
    eks-immersion-team: my-team

  # Requirements that constrain the parameters of provisioned nodes.
  # These requirements are combined with pod.spec.affinity.nodeAffinity rules.
  # Operators { In, NotIn } are supported to enable including or excluding values
  requirements:
    - key: "karpenter.k8s.aws/instance-category"
      operator: In
      values: ["c", "m"]
    - key: "karpenter.k8s.aws/instance-cpu" 
      operator: Lt
      values: ["3"] 
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["$AZ2"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
    - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
      operator: In
      values: ["on-demand"]
  limits:
    resources:
      cpu: "20"

---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
  securityGroupSelector:
    aws:eks:cluster-name: ${CLUSTER_NAME}
  tags:
    managed-by: "karpenter"
    intent: "apps"
EoF

kubectl apply -f pdb-provisioner.yaml

部署PDB和应用

先创建一个PDB,它匹配app: inflate标签,并且在pod数量小于4时,不允许继续驱逐pod,从而阻碍节点回收:

cd ~/environment/karpenter
cat <<EoF> pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: inflate-pdb
spec:
  minAvailable: 4
  selector:
    matchLabels:
      app: inflate
EoF

kubectl apply -f pdb.yaml      

创建6个replica,每个容器有1G内存和1个CPU:

cd ~/environment/karpenter
cat <<EoF> pdb-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 6
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
      - name: app
        image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
        resources:
          requests:
            memory: 1Gi
            cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f pdb-deploy.yaml

测试PDB

可以看到Karpenter在AZ2创建出来6个节点:

image-20231028223619269

执行下面的命令,也能确认6个pod在6个节点上:

kongpingfan:~/environment/karpenter $ kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
NODE                                           NAME
ip-192-168-49-6.us-west-2.compute.internal     inflate-754f46b654-4bl42
ip-192-168-141-71.us-west-2.compute.internal   inflate-754f46b654-62mld
ip-192-168-128-83.us-west-2.compute.internal   inflate-754f46b654-8h7bl
ip-192-168-133-59.us-west-2.compute.internal   inflate-754f46b654-gwjj5
ip-192-168-49-199.us-west-2.compute.internal   inflate-754f46b654-h7qd4
ip-192-168-52-131.us-west-2.compute.internal   inflate-754f46b654-k669m

现在我们将从EKS集群中驱逐其中一个节点,Karpenter允许我们驱逐它并会创建一个新的节点:

kubectl drain --ignore-daemonsets $(kubectl get nodes -l "eks-immersion-team" -o name | tail -n1)

image-20231028223924950

查看Karpenter日志:

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter

可以看到Karpenter先拉起一台机器,然后再把被驱逐的机器删除掉:

image-20231028224142516

现在我们尝试一次驱逐三个节点,由于pod数量此时会小于PDB设置的4,所以会报错:

image-20231028224736333

驱逐前两个节点的时候没问题,但到第三个的时候就会报错,当几次重试之后,PDB重新被满足(Karpenter拉起来新的节点并部署pod),第三个节点终于被驱逐。

最终重新达到平衡状态,但在这个过程中,Karpenter先启动了两个节点(同时干掉两个),再启动一个节点(干掉最后一个):

image-20231028224907152