Spot和OD按比例分配

在本节中,我们将了解Karpenter如何在按需和spot实例上运行工作负载,并根据所需的比例来保证按需节点的基本可用性,同时利用spot实例来优化成本。

首先清理之前创建的资源:

kubectl delete deployment inflate
kubectl delete nodepool.karpenter.sh default
kubectl delete ec2nodeclass.karpenter.k8s.aws default

部署NodePool

让我们部署两个NodePool,它们将利用Karpenter的能力根据所需的比例在OD和Spot实例之间拆分工作负载。

mkdir ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> ratiosplit.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: on-demand
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  limits:
    cpu: "100"
  template:
    metadata:
      labels:
        eks-immersion-team: my-team
    spec:
      nodeClassRef:
        name: default
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
        - m
        - r
      - key: capacity-spread
        operator: In
        values:
        - "1"
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: kubernetes.io/os
        operator: In
        values:
        - linux

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  limits:
    cpu: "100"
  template:
    metadata:
      labels:
        eks-immersion-team: my-team
    spec:
      nodeClassRef:
        name: default
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
        - m
        - r
      - key: capacity-spread
        operator: In
        values:
        - "2"
        - "3"
        - "4"
        - "5"
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
      - key: kubernetes.io/os
        operator: In
        values:
        - linux

---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  securityGroupSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  subnetSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    intent: apps
    managed-by: karpenter
    eks-immersion-team: my-team
EoF

kubectl apply -f ratiosplit.yaml
nodepool.karpenter.sh/on-demand created
nodepool.karpenter.sh/spot created
ec2nodeclass.karpenter.k8s.aws/default created

让我们部署一个有5个replica的应用。使用capacity-spread标签,我们将节点平均分布在此标签上,最终将得到4:1的Spot: OnDemand节点的比例:

cd ~/environment/karpenter
cat <<EoF> capacity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 5
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: capacity-spread
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: inflate
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f capacity-deploy.yaml

Karpenter在5个节点上调度了5个应用pod。有1个OD节点和4个Spot节点:

image-20240804142107025

让我们将部署扩展到10个副本:

kubectl scale deployment inflate --replicas 10
deployment.apps/inflate scaled

Karpenter会新创建5个新节点,总共有10个节点。这5个节点的比例为4:1的Spot: OD。因此,我们应该看到对于这10个节点,有2个按需节点和8个Spot节点:

image-20240804142343843

清理

删除deployment、NodePool和EC2NodeClass资源:

kubectl delete deployment inflate
kubectl delete nodepool on-demand
kubectl delete nodepool spot
kubectl delete ec2nodeclasses.karpenter.k8s.aws default