Certified Kubernetes Administrator (CKA) #13 Scheduling 1: nodeSelector, nodeAffinity, podAffinity/antiAffinity

If #12 ConfigMap and Secret in Depth wrapped up workloads and their configuration, this post starts controlling which node those workloads land on. By default, kube-scheduler picks a suitable node on its own. But in operations, requirements never stop coming: “only on nodes with a GPU,” “next to the cache in the same availability zone,” “spread the replicas across different nodes.” Expressing these placement intents in a manifest is scheduling.

This #13 covers four tools: nodeSelector, nodeAffinity, podAffinity/podAntiAffinity. They all express “which nodes a Pod likes.” Since the taints/tolerations covered next in #14 deal with the opposite — “which Pods a node pushes away” — reading the two posts together completes both sides of scheduling.

What the scheduler does #

Let’s get the big picture first. When you create a Pod, its manifest has an empty nodeName field. When kube-scheduler finds a Pod with no node assigned, it picks a node in two stages.

  1. Filtering. It screens out nodes that can’t take this Pod. Nodes short on resources, nodes that don’t match the nodeSelector conditions, and nodes with a taint the Pod can’t tolerate are excluded.
  2. Scoring. It scores the remaining candidate nodes and picks the highest. The weight of preferred rules, resource headroom, whether the image is already cached, and more all factor into the score.

Once the scheduler settles on a node, it writes the result into the Pod’s nodeName — this is called binding. The kubelet on that node sees the Pod assigned to it and starts the containers. Every tool covered in this post is a way to intervene in this filtering and scoring stage.

Here are the tools for expressing placement intent, ordered by strength.

ToolBasisEnforcement
nodeSelectorNode labelsHard (Pending if no match)
nodeAffinity (required)Node labelsHard (Pending if no match)
nodeAffinity (preferred)Node labelsSoft (score only, placed even if no match)
podAffinity / podAntiAffinityPosition of other PodsBoth required and preferred available

nodeSelector: simple label matching #

The simplest tool. Write a label key and value in the Pod’s spec.nodeSelector, and the Pod is placed only on nodes that have all of those labels. If no node satisfies the condition, the Pod stays in Pending.

First, label the node.

# Assign a label to the node
k label node node01 disktype=ssd

# Check labels
k get nodes --show-labels
k get nodes -l disktype=ssd

Then specify that label in the Pod.

apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
  nodeSelector:
    disktype: ssd
  containers:
    - name: web
      image: nginx:1.27

nodeSelector does AND matching only. If you write multiple keys, the node must have all of them, and conditions like “either of two” or “not this value” can’t be expressed. When you need that expressiveness, move on to nodeAffinity.

nodeAffinity: required and preferred #

nodeAffinity is an extended version of nodeSelector. It serves the same purpose of “picking by node label,” but it lets you express operators and enforcement in fine detail. There are two kinds.

  • requiredDuringSchedulingIgnoredDuringExecution. A hard rule. If no node satisfies the condition, the Pod stays in Pending. Same enforcement as nodeSelector, but you can use operators.
  • preferredDuringSchedulingIgnoredDuringExecution. A soft rule. It gives bonus points to nodes that satisfy the condition, but if no such node exists, the Pod is simply placed on another node.

The reason the name is long is that you read it in two parts. The leading DuringScheduling means the rule is applied at scheduling time, and the trailing IgnoredDuringExecution means a Pod already running won’t be evicted even if the node’s labels change later.

Operators #

These are the operators used in nodeAffinity’s matchExpressions.

OperatorMeaning
InValue is in the list
NotInValue is not in the list
ExistsKey exists (value irrelevant)
DoesNotExistKey does not exist
Gt / LtValue is greater / less than (integer)

NotIn and DoesNotExist are the negative conditions nodeSelector lacked. Thanks to them, placements like “only on nodes that don’t have this label” become possible.

nodeAffinity example #

The following example places the Pod only on a node whose disktype is ssd or nvme, and prefers a node among those where zone=ap-northeast-1a.

apiVersion: v1
kind: Pod
metadata:
  name: db
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd
                  - nvme
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 50
          preference:
            matchExpressions:
              - key: zone
                operator: In
                values:
                  - ap-northeast-1a
  containers:
    - name: db
      image: postgres:16

Two spots in the structure are easy to get confused.

  • required is a nodeSelectorTerms list. The list items are joined by OR. Multiple matchExpressions within a single item are joined by AND.
  • preferred is a weighted list. Each item carries a weight (1〜100), and a node that satisfies it gets that many extra points. If a node satisfies multiple preferred rules, the weights add up.

podAffinity and podAntiAffinity: relative to other Pods #

Where nodeAffinity is based on node labels, podAffinity and podAntiAffinity are based on the position of other Pods already running.

  • podAffinity. Places the Pod near a Pod with a certain label. For example, you can pin an app Pod to the same node as a cache Pod to cut network latency.
  • podAntiAffinity. Keeps the Pod away from a Pod with a certain label. For example, you can spread replicas of the same Deployment across different nodes so a single node failure doesn’t take everything down.

topologyKey defines “the same place” #

The heart of podAffinity is topologyKey. It defines the “unit of nearness” — whether “the same node” or “the same availability zone” — by a node label key.

topologyKeyMeaning of “the same place”
kubernetes.io/hostnameSame node
topology.kubernetes.io/zoneSame availability zone
topology.kubernetes.io/regionSame region

The behavior reads as follows. podAffinity means “place me on a node that has the same topologyKey value as the node where a matching Pod is running,” and podAntiAffinity means the opposite — “avoid such nodes.” Set topologyKey to kubernetes.io/hostname and the unit becomes “same node / different node”; set it to zone and the unit becomes “same zone / different zone.”

podAntiAffinity example: one replica per node #

The following example keeps Pods labeled app=web on different nodes. Putting it in a Deployment’s Pod template prevents replicas from piling up on one node.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: web
              topologyKey: kubernetes.io/hostname
      containers:
        - name: web
          image: nginx:1.27

You pick “which Pods to use as the basis” with labelSelector, and “at which unit to spread” with topologyKey. Here the same app=web Pods avoid sharing a node, so if there are fewer than three nodes, the leftover replicas stay in Pending. If you use preferred instead of required, the Pods will stack on the same node even when nodes are scarce.

The preferred form adds only a weight to the same structure.

      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: web
                topologyKey: kubernetes.io/hostname

For preferred, just remember that labelSelector and topologyKey move one level down under podAffinityTerm, with weight attached above them.

Manual placement: bypassing the scheduler with nodeName #

Every tool so far only gives the scheduler a hint; the final decision is the scheduler’s. By contrast, when you write spec.nodeName directly on a Pod, it skips the scheduler entirely and binds straight to that node.

apiVersion: v1
kind: Pod
metadata:
  name: pinned
spec:
  nodeName: node01
  containers:
    - name: app
      image: nginx:1.27

Since this approach bypasses both filtering and scoring, it binds by force even if the node has no resources or carries a taint. If the node can’t actually run the Pod, the Pod may never come up at all, so this is rarely used in practice. That said, it’s useful in exceptions like needing to bring up a control plane component while the scheduler itself is down. A static Pod works exactly this way: the kubelet starts Pods from the manifest directory directly, without the scheduler.

Debugging: why is it stuck in Pending #

If you set affinity rules too tightly, the Pod easily gets trapped in Pending. The cause shows up immediately with describe.

# Check the cause of Pending
k describe pod web

# Recheck whether node labels match the conditions
k get nodes --show-labels

The scheduler’s message is printed in the Events of the describe output. If nodeAffinity doesn’t match, you get phrasing like didn't match Pod's node affinity/selector; if podAntiAffinity leaves nodes short, you get didn't match pod anti-affinity rules. Just reading that one line tells you which rule is tripping you up.

Exam points #

In the CKA exam, scheduling is one axis of the Workloads and Scheduling domain (15%). Here’s a rundown of the tasks that come up often within this post’s scope.

  • Assigning node labels. Get k label node <node> <key>=<value> into your fingers. You have to apply the label the question asks for first, or nodeSelector/nodeAffinity won’t work.
  • Distinguishing nodeSelector vs nodeAffinity. “Nodes that have this label” is enough with nodeSelector; “either of two” or “nodes that don’t have this label” needs nodeAffinity’s In/NotIn/Exists.
  • The enforcement difference between required and preferred. If the question says “must,” use required; if it says “if possible,” use preferred. weight is mandatory for preferred.
  • Spreading replicas with podAntiAffinity. Memorize the pattern of setting topologyKey: kubernetes.io/hostname and a labelSelector of your own Pod’s labels. Remember that if the replica count exceeds the node count and the rule is required, some will stay in Pending.
  • YAML structure traps. required nodeAffinity uses nodeSelectorTerms, preferred uses weight+preference, and preferred podAffinity uses weight+podAffinityTerm. The indentation difference among these three forms is the most common mistake.
  • Pending debugging. One line in the Events of k describe pod tells you instantly which rule is the cause.

In the exam, time spent writing affinity YAML by hand is wasted. The fastest path is to build the skeleton with kubectl create deployment ... $do, slot in just the affinity block, and copy an example from the official Assigning Pods to Nodes docs and change only the values.

Wrap-up #

What this post locked in:

  • The scheduler picks a node in two stages, filtering and scoring, then binds the result into the Pod’s nodeName.
  • nodeSelector. Node label AND matching. The simplest and hard. When expressiveness falls short, move on to nodeAffinity.
  • nodeAffinity. required (hard) and preferred (soft + weight). Operators like In/NotIn/Exists express even negative conditions.
  • podAffinity/podAntiAffinity. Based on the position of other Pods, pin Pods to the same place or spread them to different places at the topologyKey unit.
  • nodeName. Manual placement that bypasses the scheduler. It’s how a static Pod works, but not recommended for ordinary workloads.
  • Debugging. If it’s Pending, check which rule is blocking with the Events of k describe pod.

Next — Scheduling 2 #

Every tool in this post expressed “which nodes a Pod likes.” #14 Scheduling 2: Taints/tolerations, Priority/PriorityClass, preemption deals with the opposite direction. We’ll work hands-on with the manifests for which Pods a node pushes away via taints/tolerations, which Pod claims a spot first when resources run short via PriorityClass, and how a lower-priority Pod gets evicted via preemption, filling in the other half of scheduling.

X