如果节点上使用的 AMI 与 AWSNodeTemplate 上设置的 AMI ID 不匹配,Karpenter 会自动使用注释 karpenter.sh/volutical-disruption: "drifted"
将节点注释为漂移。 一旦节点被标记为漂移,Karpenter 将自动驱逐和终止节点,除非触发了PDB或Pod上有标记karpenter.sh/do-not-evict: “true”
。
在本节,我们将学习如何启用Drift并通过更新 AMI ID 来测试它。
首先删除之前创建的资源:
kubectl delete deployment inflate
kubectl delete provisioners.karpenter.sh default
kubectl delete awsnodetemplates.karpenter.k8s.aws default
Karpenter默认没有开启drift检测,可以在configmap中确认:
编辑这个configmap,将它的值改为true:
kubectl edit configmap -n karpenter karpenter-global-settings
更改完成后,还要将karpenter pod重启才能生效:
kubectl rollout restart deploy karpenter -n karpenter
EKS集群当前版本是1.25。我们将这样进行测试:
AWSNodeTemplate
上配置使用1.24的AMI,Provisioner引用它AWSNodeTemplate
,使用1.25的AMI,更新Provisioner来引用它先将1.24版本的镜像保存到环境变量:
export AMI_OLD=$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.24/amazon-linux-2/recommended/image_id --region $AWS_REGION --query "Parameter.Value" --output text)
echo 1.24=$AMI_OLD
创建基于这个镜像版本的Node Template,命名为oldnode
:
cd ~/environment/karpenter
cat << EOF > oldnode_template.yaml
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: oldnode
spec:
amiSelector:
aws::ids: $AMI_OLD
subnetSelector:
alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
securityGroupSelector:
kubernetes.io/cluster/${CLUSTER_NAME}: owned
tags:
managed-by: "karpenter"
intent: "apps"
EOF
kubectl -f oldnode_template.yaml create
基于这个NodeTemplate创建Provisioner:
mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> provisioner.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
providerRef: # 在这里引用了oldnode
name: oldnode
labels:
eks-immersion-team: my-team
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["on-demand"]
limits:
resources:
cpu: "1000"
memory: 1000Gi
consolidation:
enabled: true
EoF
kubectl apply -f provisioner.yaml
部署应用,Karpenter会基于1.24的镜像拉起一个节点:
cd ~/environment/karpenter
cat <<EoF> drift-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 1
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1
nodeSelector:
eks-immersion-team: my-team
EoF
kubectl apply -f drift-deploy.yaml
执行命令,确认拉起来的节点版本符合预期:
kubectl get nodes -l eks-immersion-team=my-team
先取回新版本AMI的id:
export AMI_NEW=$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.25/amazon-linux-2/recommended/image_id --region $AWS_REGION --query "Parameter.Value" --output text)
echo 1.25=$AMI_NEW
基于这个版本,创建新的Node Template
cd ~/environment/karpenter
cat << EOF > newnode_template.yaml
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: newnode
spec:
amiSelector:
aws::ids: $AMI_NEW
subnetSelector:
alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
securityGroupSelector:
kubernetes.io/cluster/${CLUSTER_NAME}: owned
tags:
managed-by: "karpenter"
intent: "apps"
EOF
kubectl -f newnode_template.yaml create
更新Provisioner中对于Node Template的引用,从oldnode改为newnode
:
kubectl edit provisioner default
执行以下命令查看节点状态:
kubectl get nodes -l eks-immersion-team=my-team
会发现原来节点状态先被标记为Ready,SchedulingDisabled
, 然后Karpenter新拉起来1.25版本的节点,最后把旧节点下掉。这表明了Karpenter的Drift检测功能已经生效:
完成后,重新将Drift检测功能关掉:
kubectl edit configmap -n karpenter karpenter-global-settings # 设置featureGates.driftEnabled: "false"
重启Karpenter Pod:
kubectl rollout restart deploy karpenter -n karpenter
删除两个NodeTemplate和provisioner
kubectl delete awsnodetemplate oldnode
kubectl delete awsnodetemplate newnode
kubectl delete provisioner default
删除应用:
kubectl delete deployment inflate