如果设置了PDB,Karpenter会使用指数回退的驱逐策略,Pod不会被强制删除,从而阻止删除节点的操作。PDB指定了 Deployment, ReplicationController, ReplicaSet, StatefulSet 中最小运行的Pod数量,防止Pod被大量驱逐,保证线上应用的平滑运行。
在本节,我们将PDB设置为最小4个Pod运行,看Karpenter在遇到冲突时的行为。
先删除上一节创建的资源:
kubectl delete deployment inflate
kubectl delete deployment backend
kubectl delete deployment frontend
kubectl delete nodepools.karpenter.sh default
kubectl delete ec2nodeclasses.karpenter.k8s.aws default
为2个AZ设置环境变量:
export AZ1="$AWS_REGION"b
export AZ2="$AWS_REGION"c
部署NodePool,其中的实例CPU设置为小于3,并部署在AZ2:
mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> pdb-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
disruption:
consolidateAfter: 30s
consolidationPolicy: WhenEmpty
expireAfter: Never
limits:
cpu: "20"
template:
metadata:
labels:
eks-immersion-team: my-team
spec:
nodeClassRef:
name: default
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m"]
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values: ["3"]
- key: topology.kubernetes.io/zone
operator: In
values: ["${AZ2}"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["on-demand"]
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: "KarpenterNodeRole-${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
alpha.eksctl.io/cluster-name: $CLUSTER_NAME
subnetSelectorTerms:
- tags:
alpha.eksctl.io/cluster-name: $CLUSTER_NAME
tags:
intent: apps
managed-by: karpenter
EoF
kubectl apply -f pdb-nodepool.yaml
先创建一个PDB,它匹配app: inflate
标签,并且在pod数量小于4时,不允许继续驱逐pod,从而阻碍节点回收:
cd ~/environment/karpenter
cat <<EoF> pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: inflate-pdb
spec:
minAvailable: 4
selector:
matchLabels:
app: inflate
EoF
kubectl apply -f pdb.yaml
创建6个replica,每个容器有1G内存和1个CPU:
cd ~/environment/karpenter
cat <<EoF> pdb-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 6
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: app
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
memory: 1Gi
cpu: 1
nodeSelector:
eks-immersion-team: my-team
EoF
kubectl apply -f pdb-deploy.yaml
可以看到Karpenter在AZ2创建出来6个节点:
执行下面的命令,也能确认6个pod在6个节点上:
kongpingfan:~/environment/karpenter $ kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
NODE NAME
ip-192-168-49-6.us-west-2.compute.internal inflate-754f46b654-4bl42
ip-192-168-141-71.us-west-2.compute.internal inflate-754f46b654-62mld
ip-192-168-128-83.us-west-2.compute.internal inflate-754f46b654-8h7bl
ip-192-168-133-59.us-west-2.compute.internal inflate-754f46b654-gwjj5
ip-192-168-49-199.us-west-2.compute.internal inflate-754f46b654-h7qd4
ip-192-168-52-131.us-west-2.compute.internal inflate-754f46b654-k669m
现在我们将从EKS集群中驱逐其中一个节点,Karpenter允许我们驱逐它并会创建一个新的节点(原因是在 6 个pod中,我们仍然有 5 个,这超过了我们设置的 PDB 限制 4):
kubectl drain --ignore-daemonsets $(kubectl get nodes -l "eks-immersion-team" -o name | tail -n1)
现在我们尝试一次驱逐三个节点,由于pod数量此时会小于PDB设置的4,所以会报错:
kubectl drain --ignore-daemonsets $(kubectl get nodes -l "eks-immersion-team" -o name | tail -n3)
驱逐前两个节点的时候没问题,但到第三个的时候就会报错,当几次重试之后,PDB重新被满足(Karpenter拉起来新的节点并部署pod),第三个节点终于被驱逐。
最终重新达到平衡状态,但在这个过程中,Karpenter先启动了两个节点(同时干掉两个),再启动一个节点(干掉最后一个):