5 minute read

Kubernetes Scheduling

Scheduling

Manual Scheduling

Modify the Pod definition file, to includ the nodeName in the spec.

pradeep@learnk8s$ cat manual-sched-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-manual
spec:
  nodeName: k8s
  containers:
  - image: nginx
    name: nginx
pradeep@learnk8s$ kubectl create -f manual-sched-pod.yaml
pod/nginx-manual created
pradeep@learnk8s$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
demo-6c54f77c95-6g7zq   1/1     Running   0          20m
demo-6c54f77c95-sb4c9   1/1     Running   0          20m
demo-6c54f77c95-w2bsw   1/1     Running   0          20m
nginx                   1/1     Running   0          25m
nginx-manual            1/1     Running   0          48s
pradeep@learnk8s$ kubectl get pods -o wide | grep manual
nginx-manual            1/1     Running   0          2m9s   10.244.0.5   k8s       <none>           <none>

Node Selector

You can constrain a Pod so that it can only run on particular set of Node(s). nodeSelector provides a very simple way to constrain pods to nodes with particular labels. First, let’s label one of the nodes with disktype=ssd.

pradeep@learnk8s$ kubectl label nodes k8s-m02 disktype=ssd
node/k8s-m02 labeled
pradeep@learnk8s$ kubectl get nodes --show-labels
NAME      STATUS   ROLES                  AGE     VERSION   LABELS
k8s       Ready    control-plane,master   3d11h   v1.23.1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s,kubernetes.io/os=linux,minikube.k8s.io/commit=3e64b11ed75e56e4898ea85f96b2e4af0301f43d,minikube.k8s.io/name=k8s,minikube.k8s.io/updated_at=2022_02_07T17_03_56_0700,minikube.k8s.io/version=v1.25.1,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8s-m02   Ready    <none>                 4m3s    v1.23.1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-m02,kubernetes.io/os=linux
pradeep@learnk8s$

Now, let’s create a pod with nodeSelector spec set to the label that we assigned, so that the pod gets scheduled on this node.

pradeep@learnk8s$ cat pod-node-selector.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-node-selector
  labels:
    env: test
spec:
  containers:
  - name: nginx-node-selector
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

Finally, verify the node on which this pod is running.

pradeep@learnk8s$ kubectl get pods -o wide | grep selector
nginx-node-selector     1/1     Running   0          17s    10.244.1.2    k8s-m02   <none>           <none>

As expected, this new pod is running on the node(k8s-m02) with the label specified by the nodeSelector pod spec.

If you describe this pod, you will see the configured Node-Selectors.

pradeep@learnk8s$ kubectl describe pod nginx-node-selector | grep Node-Selectors
Node-Selectors:              disktype=ssd

Node Affinity

There are currently two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution.

You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (similar to nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.

The IgnoredDuringExecution part of the names means that, similar to how nodeSelector works, if labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node.

pradeep@learnk8s$ cat pod-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

  containers:
  - name: with-node-affinity
    image: nginx
pradeep@learnk8s$ kubectl create -f pod-node-affinity.yaml
pod/with-node-affinity created
pradeep@learnk8s$ kubectl get pods -o wide | grep affinity
with-node-affinity      1/1     Running   0          9s     10.244.1.3    k8s-m02   <none>           <none>

Taints and Tolerations

Taints allow a node to repel a set of pods.

Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

pradeep@learnk8s$ kubectl describe nodes | grep Taint
Taints:             <none>
Taints:             <none>
pradeep@learnk8s$ kubectl taint nodes k8s key1=value1:NoSchedule

node/k8s tainted
pradeep@learnk8s$ kubectl describe nodes | grep Taint
Taints:             key1=value1:NoSchedule
Taints:             <none>
pradeep@learnk8s$ kubectl taint nodes k8s-m02 key2=value2:NoSchedule

node/k8s-m02 tainted
pradeep@learnk8s$ kubectl describe nodes | grep Taint
Taints:             key1=value1:NoSchedule
Taints:             key2=value2:NoSchedule
pradeep@learnk8s$ cat pod-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-taint-demo
  labels:
    env: test
spec:
  containers:
  - name: nginx-taint-demo
    image: nginx
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
pradeep@learnk8s$ kubectl create -f pod-toleration.yaml
pod/nginx-taint-demo created
pradeep@learnk8s$ kubectl get pods -o wide | grep taint-demo
nginx-taint-demo        1/1     Running   0          9m3s    10.244.0.15   k8s       <none>           <none>
pradeep@learnk8s$ cat pod-toleration-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-taint-demo-2
  labels:
    env: test
spec:
  containers:
  - name: nginx-taint-demo-2
    image: nginx
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "key2"
    operator: "Equal"
    value: "value2"
    effect: "NoSchedule"
pradeep@learnk8s$ kubectl create -f pod-toleration-2.yaml
pod/nginx-taint-demo-2 created
pradeep@learnk8s$ kubectl get pods -o wide | grep taint-demo
nginx-taint-demo        1/1     Running   0          11m     10.244.0.15   k8s       <none>           <none>
nginx-taint-demo-2      1/1     Running   0          9m35s   10.244.2.5    k8s-m02   <none>    <none>
pradeep@learnk8s$ cat pod-no-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-no-tolerate
  labels:
    env: test
spec:
  containers:
  - name: nginx-no-tolerate
    image: nginx
    imagePullPolicy: IfNotPresent
pradeep@learnk8s$ kubectl create -f pod-no-toleration.yaml
pod/nginx-no-tolerate created
pradeep@learnk8s$ kubectl get pods -o wide | grep no-tolerate
nginx-no-tolerate       0/1     Pending   0          10m     <none>        <none>    <none>          <none>

The Pod got created but is in Pending state, it is not Running yet. Let’s find out why?!

pradeep@learnk8s$ kubectl describe pods nginx-no-tolerate
Name:         nginx-no-tolerate
Namespace:    default
Priority:     0
Node:         <none>
Labels:       env=test
Annotations:  <none>
Status:       Pending
IP:
IPs:          <none>
Containers:
  nginx-no-tolerate:
    Image:        nginx
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6mz6d (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  kube-api-access-6mz6d:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  43s (x11 over 11m)  default-scheduler  0/2 nodes are available: 1 node(s) had taint {key1: value1}, that the pod didn't tolerate, 1 node(s) had taint {key2: value2}, that the pod didn't tolerate.
  

Look at the Reason: FailedScheduling, none of the nodes (0/2) are available because both have some taint that our pod couldn’t tolerate!

Now, let’s delete the Taint on one of the nodes and try creating the Pod again.

To untaint a node, just add a - at the end of the taint that you plan to remove.

pradeep@learnk8s$ kubectl taint node k8s  key1=value1:NoSchedule-
node/k8s untainted
pradeep@learnk8s$ kubectl describe nodes | grep Taint
Taints:             <none>
Taints:             key2=value2:NoSchedule

Now the nginx-no-tolerate pod changes to Running state and is scheduled on the k8s node which does not have any Taints at the moment.

pradeep@learnk8s$ kubectl get pods -o wide | grep no-tolerate
nginx-no-tolerate       1/1     Running   0          16h   10.244.0.16   k8s       <none>           <none>
Back to Top ↑