随着 Kubernetes 集群中的workload发生变化,可能有必要启动新的 EC2 实例以确保它们拥有所需的计算资源。
随着时间的推移,由于某些workload缩小或从集群中删除,这些实例可能会变得利用率不足
Karpenter 的workload consolidation
会自动寻找机会, 将这些workload重新安排到一组更具成本效益的 EC2 实例上,无论它们已经在集群中还是需要启动。
可以通过Provisioner中的配置更改来启用 consolidation
如果启用了Consolidation
,Karpenter会尝试以下面两种方式降低节点的总体成本:
其他节点
和额外单个更便宜的节点
的组合上运行,则可以替换该节点。对于spot节点,只有删除机制,Karpenter不会使用节点替换。比如我们启动了一台c5.4xlarge的spot机器,某时刻上面cpu使用率只有10%,Karpenter此时不会采取操作来启动一台c5.large机型替换它。
Empty node consolidation(空节点合并)
- 并行删除所有空的节点
Single-Node Consolidation(单节点整合)
- 尝试删除单个节点,可能会启动比该节点价格便宜的节点来替换它。
Multi-Node Consolidation(多节点整合)
- 尝试并行删除两个或多个节点,可能会启动比被删除节点价格更便宜的节点进来替换。Karpenter 通过选择通过终止来整合总体上对工作负载干扰最少的节点来实现此目的:
运行较少 Pod 的节点
即将过期的节点
具有较低优先级 pod 的节点
首先删除之前创建的资源:
kubectl delete deployment inflate
kubectl delete provisioners.karpenter.sh default
kubectl delete awsnodetemplates.karpenter.k8s.aws default
部署Provisioner,注意在29行,开启了consolidation
:
mkdir ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> singlenode.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
providerRef:
name: default
labels:
eks-immersion-team: my-team
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["on-demand"]
limits:
resources:
cpu: "10"
consolidation:
enabled: true
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
securityGroupSelector:
aws:eks:cluster-name: ${CLUSTER_NAME}
tags:
managed-by: "karpenter"
intent: "apps"
EoF
kubectl apply -f singlenode.yaml
创建应用,每个pod需要1GB内存和1个CPU,replica数量为8:
cd ~/environment/karpenter
cat <<EoF> basic-rightsizing.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 8
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
memory: 1Gi
cpu: 1
nodeSelector:
eks-immersion-team: my-team
EoF
kubectl apply -f basic-rightsizing.yaml
可以看到8个pod被分配到两个新节点上:
将应用的replica设置为4:
kubectl scale deployment inflate --replicas 4
查看Karpenter的日志:
kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter
中间看到了 deprovisioning via consolidation delete
和 terminating 1 nodes
事件:
先删除上面创建的资源:
kubectl delete deployment inflate
kubectl delete provisioners.karpenter.sh default
kubectl delete awsnodetemplates.karpenter.k8s.aws default
创建Provisioner:
mkdir -p ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> multinode.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
providerRef:
name: default
labels:
eks-immersion-team: my-team
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: Lt
values: ["5"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
operator: In
values: ["on-demand"]
limits:
resources:
cpu: 1k
consolidation:
enabled: true
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
alpha.eksctl.io/cluster-name: ${CLUSTER_NAME}
securityGroupSelector:
aws:eks:cluster-name: ${CLUSTER_NAME}
tags:
managed-by: "karpenter"
intent: "apps"
EoF
kubectl apply -f multinode.yaml
部署应用:
cd ~/environment/karpenter
cat <<EoF> multinode-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 20
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
memory: 1Gi
cpu: 1
nodeSelector:
eks-immersion-team: my-team
EoF
kubectl apply -f multinode-deploy.yaml
看到创建出来7个节点,前6个每个跑着3个pod,最后一个跑着2个pod:
将replica数量缩小到10:
kubectl scale deployment inflate --replicas 10
一开始看到随机下掉10个pod,现在每个节点上Pod分布数量为 1 1 3 1 2 2 0
Karpenter首先将没有pod的节点干掉(最后一个):
然后它将node进行了整合,又下掉了两个node。最后pod数量分布为3 3 3 1
:
再将replica设置为4:
kubectl scale deployment inflate --replicas 4
每个node上的pod数量先变成2 0 1 1
, 毫无疑问第二个首先被干掉:
接着进行node整合,最后只需要两个node。pod数量分布为3 1
:
不仅如此,第二个节点(分布了1个pod的)类型为c6a.large, 达到了极致的省钱效果: