Certified Kubernetes Administrator (CKA) #11 Workloads 2: DaemonSet, StatefulSet, Job, CronJob

In #10 Workloads 1 we ran stateless apps with Deployment and ReplicaSet and got hands-on with rolling updates and rollbacks. But not every workload ends at “several identical Pods on whatever node.” There are agents that must run exactly one per node, databases where each Pod needs its own name and its own disk, batch jobs that should run once and stop, and cases where that job has to run automatically every morning.

This post covers the four workloads that Deployment can’t solve — DaemonSet, StatefulSet, Job, and CronJob. We’ll lay out what problem each one exists to solve, how it differs from Deployment, and the YAML and kubectl patterns that show up often on the exam.

Why Deployment isn’t enough #

The Deployment we saw in #10 is a workload for stateless apps. It runs as many identical Pods as the replica count, it doesn’t care which node they land on, and there’s no reason to tell the Pods apart. The four requirements this model can’t solve are the subject of this post.

RequirementThe workload that solves it
Exactly one Pod must run on every nodeDaemonSet
Each Pod must have a unique name, order, and its own diskStatefulSet
Run once and see it through to completionJob
Run repeatedly on a fixed scheduleCronJob

Comparing these four against Deployment makes the differences clear.

AspectDeploymentDaemonSetStatefulSetJob / CronJob
PurposeStateless appsPer-node agentsStateful appsBatch / one-shot / scheduled work
Pod countSet by replicasAutomatic, per node countSet by replicasSet by completions
Pod nameRandom hashOne per nodeStable ordinal (0, 1, 2…)Random hash
Creation/termination orderNo guaranteeNo guaranteeOrderedRepeats until completions
StorageUsually shared/externalUsually a host pathA dedicated PVC per PodUsually none
restartPolicyAlwaysAlwaysAlwaysOnFailure/Never

Now let’s look at each one.

DaemonSet: one per node #

A DaemonSet is a workload that runs exactly one Pod on every (or some) node in the cluster. When a new node joins, the DaemonSet automatically adds a Pod to it too; when a node leaves, that Pod disappears with it. That’s why there’s no replicas field — a person doesn’t set the count; the node count is the Pod count.

The typical use is infrastructure agents that have to operate per node.

  • Log collectors (fluentd, fluent-bit)
  • Node monitoring (node-exporter)
  • Network plugins (CNI); kube-proxy itself often runs as a DaemonSet too
  • Storage daemons

Basic manifest #

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
        - name: node-exporter
          image: prom/node-exporter:v1.8.0
          ports:
            - containerPort: 9100

It looks almost exactly like a Deployment manifest with kind changed to DaemonSet and only replicas removed. The rule that the selector and the template’s labels must match is the same too.

Running on specific nodes only: nodeSelector #

When it should land on only some nodes rather than all of them, put a nodeSelector in the template’s Pod spec. Pods come up only on nodes whose labels match.

spec:
  template:
    spec:
      nodeSelector:
        disktype: ssd

Running on control plane nodes too: tolerations #

By default, control plane nodes carry a taint, so ordinary Pods aren’t placed on them. For a DaemonSet like a log collector that must run on control plane nodes too, you need a toleration that tolerates that taint. The detailed behavior of taints and tolerations is covered in #14.

spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Update strategy #

A DaemonSet’s default update strategy is RollingUpdate. When you change the template, it replaces the Pod on each node with the new version one at a time, and maxUnavailable controls how many nodes can go down at once.

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

The other strategy is OnDelete. Even when you change the template it doesn’t replace anything automatically; only when you delete the Pod yourself does a new one come up in its place with the new template. Use it when you want to control the timing of replacement by hand.

# Check status (compare node count against desired/ready)
k get daemonset -n monitoring
k rollout status ds/node-exporter -n monitoring

StatefulSet: stable IDs and ordering #

A StatefulSet is a workload for apps that must preserve state. You use it for things like databases, message brokers, and distributed KV stores, where each instance must have a unique identity and its own data. It guarantees three things Deployment can’t.

  1. Stable network ID. Pod names are fixed as ordinals, like name-0, name-1, name-2. Even if a Pod dies and comes back, it returns with the same name and the same DNS name.
  2. Ordering guarantee. Creation proceeds in order from 0 (0 must be Ready before 1 is created), and deletion and scale-down proceed in reverse, starting from the highest number.
  3. Per-Pod dedicated storage. With volumeClaimTemplates, an independent PVC is created for each Pod, and even if a Pod is re-created, it reconnects to the PVC with the same number.

A headless Service is needed first #

A StatefulSet’s stable DNS names only work when a headless Service exists. A headless Service is a Service made with clusterIP: None; instead of getting a single cluster IP, it exposes each Pod’s DNS record directly. As a result, each Pod gets a unique address of the form pod-name.service-name.namespace.svc.cluster.local. Service types and how they behave overall are covered in #18.

StatefulSet + headless Service example #

# headless Service: clusterIP is None
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
    - name: web
      port: 80
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: nginx       # must match the headless Service name above
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.27
          ports:
            - name: web
              containerPort: 80
          volumeMounts:
            - name: data
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:     # auto-create a PVC per Pod
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi

Let’s lay out what this manifest creates.

  • Pods are created in order as web-0, web-1, web-2.
  • Each Pod automatically gets a PVC named data-web-0, data-web-1, data-web-2.
  • DNS names stay stable, like web-0.nginx.<namespace>.svc.cluster.local.
# Check that the Pod names came up as ordinals
k get pods -l app=nginx

# Check the auto-created PVCs
k get pvc

# Check the headless Service's endpoints (each Pod IP)
k get endpoints nginx

Watch out when scaling and deleting #

When you scale down, Pods disappear starting from the highest number, but the PVCs created by volumeClaimTemplates are not deleted automatically. This is intended behavior to prevent data loss. The PVCs remain even when you delete the StatefulSet, so if cleanup is needed you have to delete the PVCs yourself.

# Scale down (web-2 disappears first, PVC remains)
k scale statefulset web --replicas=2

# Delete only the StatefulSet and leave the Pods (rarely used)
k delete statefulset web --cascade=orphan

The default update strategy is RollingUpdate, which replaces Pods in reverse order, starting from the highest number. Setting a partition value enables a staged rollout (canary) that updates only the numbers at or above that value.

Job: work that runs toward completion #

Deployment and DaemonSet are workloads that keep Pods alive. A Job, by contrast, is a workload that finishes after completing successfully the specified number of times. You use it for work that should run once (or a fixed number of times) and stop, like data migrations, batch computations, and backup scripts.

There are four key fields.

FieldMeaning
completionsTotal number of Pods that must succeed. Default 1
parallelismNumber of Pods that can run at once. Default 1
backoffLimitRetry limit on failure. Exceeding it fails the Job
restartPolicyOnly OnFailure or Never allowed (Always not permitted)

Unlike a Deployment, a Job’s Pod can’t use restartPolicy: Always. Infinite restarts don’t make sense for work that aims for completion.

Basic Job example #

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  completions: 4        # must succeed 4 times total to complete
  parallelism: 2        # run 2 in parallel at a time
  backoffLimit: 4       # retry up to 4 times on failure
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: pi
          image: perl:5.34
          command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
# Job status (complete when COMPLETIONS reaches 4/4)
k get jobs

# Check the Pods the Job created, and their logs
k get pods --selector=job-name=pi
k logs job/pi

# Clean up the completed Job
k delete job pi

Setting activeDeadlineSeconds lets you forcibly terminate a Job that runs past that time, and setting ttlSecondsAfterFinished automatically cleans up a completed Job and its Pods after a set time.

CronJob: stamping out Jobs on a schedule #

A CronJob is a workload that periodically creates Jobs according to a cron expression. A CronJob doesn’t create Pods directly; on each schedule it creates one Job object, and that Job creates the Pods that do the work. You use it for recurring work like nightly backups, periodic reports, and cache cleanup.

The key fields are as follows.

FieldMeaning
schedulecron expression (minute hour day month weekday)
concurrencyPolicyPolicy for when the previous run hasn’t finished
successfulJobsHistoryLimitNumber of successful Jobs to keep. Default 3
failedJobsHistoryLimitNumber of failed Jobs to keep. Default 1
startingDeadlineSecondsAllowed delay when the scheduled time is missed

concurrencyPolicy has three values.

  • Allow (default). Starts a new Job concurrently even if the previous run hasn’t finished.
  • Forbid. Skips this run if the previous one hasn’t finished.
  • Replace. Cancels the previous run and replaces it with the new Job.

CronJob example #

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
spec:
  schedule: "0 3 * * *"           # every day at 3 AM
  concurrencyPolicy: Forbid       # skip if the previous backup hasn't finished
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 120
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: bitnami/postgresql:16
              command: ["/bin/sh", "-c", "pg_dump ... > /backup/dump.sql"]

schedule is five fields: minute hour day month weekday. 0 3 * * * means daily at 03:00, */15 * * * * means every 15 minutes, and 0 0 * * 0 means midnight every Sunday.

# CronJob list and last scheduled time
k get cronjob

# Check the Jobs the CronJob stamped out
k get jobs --watch

# Run once manually right now (when testing)
k create job manual-backup --from=cronjob/db-backup

To pause a CronJob for a while, set spec.suspend: true and new Job creation stops. This is handy for preventing a backup from running during maintenance.

Exam points #

Here are the points that decide your score when these workloads show up on the CKA hands-on exam.

  • Save time with imperative creation. For Job and CronJob, it’s faster to make the skeleton with kubectl create, then pull out the manifest with do (--dry-run=client -o yaml) and edit it.

    k create job test --image=busybox $do -- /bin/sh -c "echo hi" > job.yaml
    k create cronjob hello --image=busybox --schedule="*/1 * * * *" $do \
      -- /bin/sh -c "date" > cron.yaml
  • You can’t make a DaemonSet with kubectl create. There’s no dedicated create command for a DaemonSet, so it’s fast to memorize the pattern of pulling out a Deployment manifest with do, changing kind to DaemonSet, and removing replicas.

  • The restartPolicy trap. A Job’s and CronJob’s Pod template only allows OnFailure or Never. Leaving the default (Always) in place gets the manifest rejected, so always specify it.

  • StatefulSet pairs serviceName with a headless Service. Without the headless Service (clusterIP: None) that serviceName points to, stable DNS won’t work. Making it a habit to submit both in one manifest is safe.

  • PVCs remain. Scaling down or deleting a StatefulSet does not auto-delete the PVCs created by volumeClaimTemplates. If the question asks for cleanup, you have to delete the PVCs yourself.

  • selector and template labels must match. As we saw in #10, all four workloads reject creation if selector.matchLabels and template.metadata.labels don’t match.

Practicing building these workloads up and tearing them down yourself on a real cluster is what builds the hand-speed you’ll need in the exam room. If you want broader background, the Kubernetes intermediate series revisits these same resources from an operational angle.

Summary #

What we locked in this post:

  • DaemonSet. One Pod per node. No replicas. Limit target nodes with nodeSelector, extend to control plane nodes with tolerations, RollingUpdate/OnDelete update strategies
  • StatefulSet. Stable ordinal IDs, creation/deletion order guarantee, per-Pod dedicated PVC. The headless Service (clusterIP: None) that serviceName points to and volumeClaimTemplates are the core. PVCs are not auto-deleted
  • Job. Work that aims for completion. completions/parallelism/backoffLimit, restartPolicy allows only OnFailure/Never
  • CronJob. Creates a Job per schedule. schedule (five cron fields), concurrencyPolicy (Allow/Forbid/Replace), history limits, pause with suspend
  • Exam points. Secure the skeleton with imperative creation, write DaemonSet by conversion, the restartPolicy trap, StatefulSet pairs with a headless Service, handling PVC remnants

Next: ConfigMap and Secret in Depth #

That covers the full shape of workloads. But how to inject config values and secret information into a Pod is something we haven’t touched yet.

In #12 ConfigMap and Secret in Depth, we’ll work through the ConfigMap that separates config from code, the types and base64 encoding of the Secret that holds sensitive information, how to inject both into a Pod as environment variables and as volumes, and how changes propagate when values change.

X