workload consolidation
会自动寻找机会, 将这些workload重新安排到一组更具成本效益的 EC2 实例上,无论它们已经在集群中还是需要启动。consolidation
如果启用了Consolidation
,Karpenter会尝试以下面两种方式降低节点的总体成本:
其他节点
和额外单个更便宜的节点
的组合上运行,则可以替换该节点。对于spot节点,只有删除机制,Karpenter不会使用节点替换。比如我们启动了一台c5.4xlarge的spot机器,某时刻上面cpu使用率只有10%,Karpenter此时不会采取操作来启动一台c5.large机型替换它。
Empty node consolidation(空节点合并)
- 并行删除所有空的节点Single-Node Consolidation(单节点整合)
- 尝试删除单个节点,可能会启动比该节点价格便宜的节点来替换它。Multi-Node Consolidation(多节点整合)
- 尝试并行删除两个或多个节点,可能会启动比被删除节点价格更便宜的节点进来替换。Karpenter 通过选择通过终止来整合总体上对工作负载干扰最少的节点来实现此目的:首先,清理现有的NodePool和deployment:
kubectl delete deployment inflate
kubectl delete nodepool.karpenter.sh default
kubectl delete ec2nodeclass.karpenter.k8s.aws default
让我们部署之前使用的Nodepool,但这次启用了consolidation功能:
mkdir ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> singlenode.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: Never
limits:
cpu: "10"
template:
metadata:
labels:
eks-immersion-team: my-team
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- c
- m
- r
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/os
operator: In
values:
- linux
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: "KarpenterNodeRole-${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
alpha.eksctl.io/cluster-name: $CLUSTER_NAME
subnetSelectorTerms:
- tags:
alpha.eksctl.io/cluster-name: $CLUSTER_NAME
tags:
intent: apps
managed-by: karpenter
eks-immersion-team: my-team
EoF
kubectl apply -f singlenode.yaml
nodepool.karpenter.sh/default created
ec2nodeclass.karpenter.k8s.aws/default created
让我们部署应用:
cd ~/environment/karpenter
cat <<EoF> basic-rightsizing.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 8
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
memory: 1Gi
cpu: 1
nodeSelector:
eks-immersion-team: my-team
EoF
kubectl apply -f basic-rightsizing.yaml
deployment.apps/inflate configured
一开始有8个pending pod:
过一段时间后,Karpenter启动两个节点,调度了8个pod:
现在,让我们将deployment缩减到4个副本。
kubectl scale deployment inflate --replicas 4
一开始从2xlarge节点上下掉4个pod,pod分布为3和1:
过一段时间后,Karpenter发现可以将第二台机器上的pod移到第一台机器上,同时把第二台机器删除掉:
检查Karpenter的日志:
kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter
我们应该在日志中看到 deprovisioning via consolidation delete
和 terminating 1 nodes
事件: