Taints and Toleration

在K8s中,taints和toleration用于控制pod调度在哪个节点上。taints是打在节点上的标签,只有声明了toleration的pod才能被调度到上面。本节将在Provisioner上打上taint标签

首先删除之前创建的资源:

kubectl delete deployment inflate
kubectl delete nodepools.karpenter.sh default
kubectl delete ec2nodeclasses.karpenter.k8s.aws default

创建NodePool,注意它打了key: systemnodes taint:

mkdir ~/environment/karpenter
cd ~/environment/karpenter
cat <<EoF> taint.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
    expireAfter: Never
  limits:
    cpu: "10"
  template:
    metadata:
      labels:
        eks-immersion-team: my-team
    spec:
      nodeClassRef:
        name: default
      # Requirements that constrain the parameters of provisioned nodes.
      # These requirements are combined with pod.spec.affinity.nodeAffinity rules.
      # Operators { In, NotIn } are supported to enable including or excluding values
      requirements:
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
          operator: In
          values: ["on-demand"]
      taints:
      - key: systemnodes
        effect: NoSchedule
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  securityGroupSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  subnetSelectorTerms:
  - tags:
      alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    intent: apps
    managed-by: karpenter
EoF

kubectl apply -f taint.yaml

部署应用,这个应用上没有打toleration对应的标签:

cd ~/environment/karpenter
cat <<EoF> taint-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f taint-deploy.yaml

我们观察到一段时间后,pod依然是pending状态:

image-20231028105035576

查看karpenter的日志:

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter

发现Karpenter不能创建出对应的node来,因为pod不满足tolerate条件:

image-20240803221058162

我们将应用重新设置toleration,然后部署:

cd ~/environment/karpenter
cat <<EoF> taint-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
      tolerations:
      - key: "systemnodes"
        operator: "Exists"
        effect: "NoSchedule"
      nodeSelector:
        eks-immersion-team: my-team
EoF

kubectl apply -f taint-deploy.yaml

等待一段时间后,karpenter开出新的节点来部署pod:

image-20231028105739634