本节将介绍Karpenter创建Spot机器,以及它如何处理Spot中断事件
先删除之前创建的资源:
kubectl delete deployment inflate
kubectl delete provisioners.karpenter.sh default
kubectl delete awsnodetemplates.karpenter.k8s.aws default
部署Provisioner,它的karpenter.sh/capacity-type
为spot:
mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> spot.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
providerRef:
name: default
labels:
eks-immersion-team: my-team
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["spot"]
limits:
resources:
cpu: "1000"
memory: 1000Gi
consolidation:
enabled: true
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
securityGroupSelector:
aws:eks:cluster-name: ${CLUSTER_NAME}
tags:
managed-by: "karpenter"
intent: "apps"
EoF
kubectl apply -f spot.yaml
部署应用,有两个replica:
cd ~/environment/karpenter
cat <<EOF > spot-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 2
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
nodeSelector:
intent: apps
karpenter.sh/capacity-type: spot
containers:
- image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
name: inflate
resources:
requests:
cpu: "1"
memory: 256M
nodeSelector:
eks-immersion-team: my-team
EOF
kubectl apply -f spot-deploy.yaml
Karpenter会创建一个spot机器,来部署pod:
Karpenter原生支持Spot的中断事件。它会监控将要到来的回收通知,当检测到后,会自动驱逐并终止节点
ec2-spot-interrupter 可以用于生成spot回收事件,我们将用它来做测试
安装ec2-spot-interrupter
:
wget https://github.com/aws/amazon-ec2-spot-interrupter/releases/download/v0.0.10/ec2-spot-interrupter_0.0.10_Linux_amd64.tar.gz
tar -xzvf ec2-spot-interrupter_0.0.10_Linux_amd64.tar.gz
先获取上面创建出来的spot实例的id:
export NODE_NAME=$(kubectl get nodes -l "eks-immersion-team" -o name | cut -d/ -f2)
echo $NODE_NAME
export NODE_ID=$(aws ec2 describe-instances --query "Reservations[].Instances[?PrivateDnsName == '${NODE_NAME}'].InstanceId" --output text)
echo $NODE_ID
运行ec2-spot-interrupter
,它会发送一条回收通知。spot从收到回收通知到真正被回收,中间有2分钟:
./ec2-spot-interrupter --instance-ids $NODE_ID
Karpenter收到回收通知后,先把spot节点驱逐掉:
同时再拉起一台新的spot机器,将pod部署在上面:
Karpenter是如何接收到Spot回收事件的呢?在一开始我们创建Karpenter时,会注意到有几个EventBridge事件及一个SQS队列被创建出来:
我们查看下最后一个Rule,它检查的事件为Spot回收通知:
在收到事件后,会发送到SQS队列:
Karpener会监听这个SQS队列,当有新消息过来时就读取并处理。在每一章创建Karpenter时,它的Role上就已经授予了访问SQS的权限:
我们在安装Karpenter时的命令为:
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter --create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
--set settings.aws.clusterName=${CLUSTER_NAME} \
--set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
--set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
--wait
其中只要指定了--interruption-queue-name
,就开启了Karpenter的Spot中断处理
中断事件处理(Interruption handling
)除了spot被回收外,还有其他几种。伴随着EventBridge Rule被一起创建出来:
当Karpenter检查到这些事件时,会自动驱逐并替换节点。
可以从代码中找到处理逻辑:https://github.com/aws/karpenter/blob/main/pkg/controllers/interruption/controller.go
除了EC2 Rebalance Recommendation外,其他的三种事件都会CordonAndDrain