4 Chapter

Deployment and ReplicaSet

Cover declarative deployment and rolling updates. Build the relationship among the three tiers Deployment / ReplicaSet / Pod, self-healing with replicas: 3, RollingUpdate's maxSurge / maxUnavailable, rollout undo rollback, and the workloads Deployment doesn't solve (StatefulSet · DaemonSet · Job) — all together.

The one line we confirmed at the end of Chapter 3 kubectl and your first Pod — a Pod is mortal, so if you bring it up directly it merely disappears — becomes the starting point of this chapter. Here we cover the first controller that fills that gap automatically, the Deployment, and the ReplicaSet beneath it. We cover how replicas: 3 keeps Pods running, how auto-healing works when you delete one Pod, and how rolling updates and rollbacks work when you change the image tag.

By the end of this chapter you’ll have your first manifest that leaves Pods to a controller instead of bringing them up by hand. From here on, this is essentially the base form of the manifests people write for operations.

Deployment, ReplicaSet, Pod — the relationship of three tiers #

The stars of this chapter are three resources, but you write only the top layer. It helps to fix the picture in your head like this.

structure of three tiers

   ┌──────────────────────┐
   │     Deployment       │  ← the manifest a person writes
   │  (web)               │
   └──────────┬───────────┘
              │ creates / manages
              ▼
   ┌──────────────────────┐
   │     ReplicaSet       │  ← Deployment creates it automatically
   │  (web-abc123)        │
   └──────────┬───────────┘
              │ creates / maintains
              ▼
   ┌──────────┬──────────┬──────────┐
   │   Pod    │   Pod    │   Pod    │  ← the actual workload
   │ web-...  │ web-...  │ web-...  │
   └──────────┴──────────┴──────────┘

Organizing each tier’s responsibility in one line gives this.

Deployment — the manifest you write. It says, “bring up N copies of this Pod template, and switch to a new version with a rolling update.” It is the resource you touch most often in operations.
ReplicaSet — the intermediate object a Deployment creates automatically. Its responsibility is simple: always maintain N copies of this Pod template. People rarely write a ReplicaSet manifest directly.
Pod — the actual workload. The ReplicaSet creates it, and when it dies the ReplicaSet creates it again. It is the same Pod we brought up by hand in Chapter 3, but now it comes back on its own no matter who kills it.

Why two layers #

At first glance, one Deployment layer seems enough. You may ask why ReplicaSet exists separately. The reason is version rollout.

When deploying a new version, the Deployment creates one more new ReplicaSet. It raises that new ReplicaSet’s replicas gradually to 1, 2, 3 while lowering the old ReplicaSet’s replicas gradually to 3, 2, 1. In between there’s a short stretch where both ReplicaSets’ Pods are up together — this is the body of the rolling update. When done, the old ReplicaSet sits empty at 0 but remains as an object, so when a rollback is needed you revert by growing that one back to N.

In short, a Deployment is the layer that handles the transition between versions and a ReplicaSet is the layer that maintains one version at N. Because the two are split, the old version and the new version can briefly coexist in the same cluster.

Your first Deployment manifest #

This time write the same nginx:1.27 as a Deployment rather than a Pod. Name the file web.yaml and write it as follows.

web.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx:1.27
          ports:
            - containerPort: 80

Compared with the Pod manifest from Chapter 3, there are three newly arrived parts.

apiVersion: apps/v1 — Pod was v1, but Deployment is in the apps/v1 API group. The controller-family resources (Deployment, StatefulSet, DaemonSet, ReplicaSet) all belong to the same group.
spec.replicas: 3 — a declaration that 3 of this Pod template must always be up.
spec.selector.matchLabels + spec.template — the label condition by which the Deployment finds the Pods it manages, and the template of what shape those Pods should be. The shape under template is exactly the same as the Pod’s metadata + spec we saw in Chapter 3.

One rule — the selector and template labels must match #

The mistake that blows up most often when first writing a manifest is here. spec.selector.matchLabels and spec.template.metadata.labels must match each other. If they don’t match, K8s rejects the manifest. It’s not a mere recommended convention but a validation rule the apiserver holds.

That’s why both are set to app: web in the manifest above. If you set the selector to app: web but changed only the template’s label to app: nginx, kubectl apply spits out an error like the following.

when the labels are off

The Deployment "web" is invalid: spec.template.metadata.labels:
  Invalid value: map[string]string{"app":"nginx"}:
  `selector` does not match template `labels`

A simple model in your head is this — the selector is “how I recognize the Pods I manage,” and the template is “the labels of the Pods I create.” The near-tautological rule is that the two must match so I recognize the Pods I created.

Applying it #

Reflect web.yaml into the cluster.

create the Deployment

kubectl apply -f web.yaml

example output

deployment.apps/web created

Let’s see the three kinds of resource at once. kubectl get can take several resource kinds at once, separated by commas.

three tiers at once

kubectl get deploy,rs,pods

example output

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   3/3     3            3           20s

NAME                            DESIRED   CURRENT   READY   AGE
replicaset.apps/web-abc123      3         3         3       20s

NAME                  READY   STATUS    RESTARTS   AGE
pod/web-abc123-aa11   1/1     Running   0          20s
pod/web-abc123-bb22   1/1     Running   0          20s
pod/web-abc123-cc33   1/1     Running   0          20s

How to read it is as follows.

Deployment line — READY 3/3 means all 3 desired are ready, UP-TO-DATE 3 is the number of Pods updated to the current template, and AVAILABLE 3 is the number of Pods that have stayed alive through minReadySeconds and are fit to take traffic.
ReplicaSet line — the name is the web-abc123 form. The trailing abc123 is a random value K8s auto-generates as a template hash. DESIRED 3 / CURRENT 3 / READY 3 are the key columns.
Pod line — the name carries a two-stage random value like web-abc123-aa11. You can see the front part (web-abc123) matches the ReplicaSet name. Who created it is shown right in the name.

Pinning the name pattern in one line — <deployment>-<replicaset-hash>-<pod-suffix>. You meet this shape often to the end of the book.

Killing a Pod — self-healing #

It’s time to confirm this manifest’s first effect. In Chapter 3, deleting a Pod just made it disappear. Now let’s confirm how it’s different.

force-delete one Pod

kubectl delete pod web-abc123-aa11

example output

pod "web-abc123-aa11" deleted

Immediately get the Pod list again.

look again

kubectl get pods

example output

NAME                  READY   STATUS    RESTARTS   AGE
web-abc123-bb22       1/1     Running   0          2m
web-abc123-cc33       1/1     Running   0          2m
web-abc123-dd44       1/1     Running   0          5s

There are still three up. But looking at one line closely shows a difference — bb22 and cc33 are AGE 2m, while the newly seen dd44 is AGE 5s. It’s a Pod just brought up fresh. The changed random value after the name is also a clue that it’s a new Pod.

This is the reconcile loop we saw as a diagram in Chapter 1 What Kubernetes Is at work. The ReplicaSet holds “3 Pods must exist,” and the moment a person deleted one, desired (3) and actual (2) diverged. The ReplicaSet controller inside the controller-manager detects that difference and asks the apiserver to create one more Pod. The scheduler decides a node, the kubelet brings up the container, and it’s back to 3. A person did nothing extra.

The same thing happens at the node level. When a node that some Pods were on dies, K8s moves those Pods to another live node and brings them up again. “The service must stay alive even when a node dies,” which we saw in Chapter 1, is actually the problem this ReplicaSet controller solves.

Adjusting replicas #

When 3 is too few or too many, there are two paths to adjusting the count.

Declarative — change the number in the manifest and apply again. The cleanest path.

web.yaml — replicas to 5 only

spec:
  replicas: 5
  ...

apply again

kubectl apply -f web.yaml

example output

deployment.apps/web configured

Looking at kubectl get pods, it’s soon grown to 5. Shrinking is the same way — reduce the number in the manifest and apply.

Imperative — fast but temporary.

imperative scale

kubectl scale deploy/web --replicas=5

example output

deployment.apps/web scaled

It’s light for momentarily growing and shrinking. But the downside is clear — the manifest’s replicas value and the cluster’s actual state diverge. The manifest still says replicas: 3 while the cluster has 5 up. The next time someone calls kubectl apply -f web.yaml once more without much thought, the 5 shrink back to 3.

So, organizing it in one line — the declarative manifest is always the source of truth. Use kubectl scale only when you need to touch something up quickly while debugging, or when you intend to re-sync the manifest soon, and the normal flow is to fix the manifest and apply. This principle is the foundation of the entire desired state model we saw in Chapter 1, and it’s covered more seriously once more with ArgoCD / Flux in Chapter 20 GitOps.

The option to grow and shrink replicas automatically with load instead of a person deciding is covered in Chapter 13 Autoscaling with HPA · VPA · Cluster Autoscaler.

Rolling update — the default behavior of zero-downtime deploys #

Now, for the first time in the book, we deploy a new version. Change the image tag from nginx:1.27 to nginx:1.28 — just one character.

web.yaml — image tag to 1.28 only

      containers:
        - name: web
          image: nginx:1.28
          ports:
            - containerPort: 80

apply the new version

kubectl apply -f web.yaml

example output

deployment.apps/web configured

The message that drops on the surface is one line, but inside, quite a lot happens.

What happens inside #

The Deployment controller sees the template changed and creates a new ReplicaSet for that new template. Then it grows the new RS’s replicas from 0 to 1, 2, 3, and shrinks the old RS’s replicas from 3 to 2, 1, 0. At each step a new Pod must come into Ready state before it moves to the next step.

If you look with kubectl get rs mid-progress, two ReplicaSet lines show.

mid-rollout

kubectl get rs

example output — mid-way

NAME             DESIRED   CURRENT   READY   AGE
web-abc123       2         2         2       10m   ← old RS (1.27)
web-def456       2         2         1       30s   ← new RS (1.28)

The old RS has shrunk from 3 to 2, and the new RS has risen from 0 to 2. This one frame is the body of the rolling update. The two RS’s Pods are briefly up together — traffic during that time is distributed evenly by the Service we’ll cover in Chapter 5 Service.

Monitoring progress #

If you want to see rollout progress in one line, the following command is the most convenient.

rollout progress

kubectl rollout status deploy/web

example output

Waiting for deployment "web" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "web" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "web" rollout to finish: 1 old replicas are pending termination...
deployment "web" successfully rolled out

Progress steps drop on the screen line by line, and when success prints at the end, the new-version deploy is done. Looking at kubectl get rs again afterward shows the old RS empty at DESIRED 0 but remaining as an object. This structure makes the rollback in the next section possible.

The default strategy, in one line #

The exact reason the flow above happens is that a Deployment’s spec.strategy default is RollingUpdate, and its two parameters are as follows.

maxSurge: 25% — the limit on how many more can be temporarily up beyond the desired count. With 3 as the base, up to 1 extra is allowed.
maxUnavailable: 25% — the limit on how many can be temporarily short of the desired count. With 3 as the base, up to 1 short is allowed.

The other option is Recreate — kill all old Pods and bring up new ones. It isn’t zero-downtime, but it’s sometimes used for stateful workloads where two versions must not be up at once (e.g., a DB migration occupying the same volume). For a regular web server the default RollingUpdate is enough.

What happens on failure #

Let’s deliberately write the image tag wrong — say something like nginx:1.99-not-real.

apply with a nonexistent tag

kubectl apply -f web.yaml

kubectl rollout status deploy/web stays stuck for a long time, and looking with kubectl get pods shows one newly created Pod in ImagePullBackOff state.

example output

NAME                  READY   STATUS             RESTARTS   AGE
web-abc123-aa11       1/1     Running            0          15m
web-abc123-bb22       1/1     Running            0          15m
web-abc123-cc33       1/1     Running            0          15m
web-ghi789-zz99       0/1     ImagePullBackOff   0          40s

The interesting part is that the 3 old Pods are still alive. If the new Pod can’t come into Ready, the Deployment doesn’t move to the next step. That is, it doesn’t shrink the old RS to 0. Even while the rollout is stuck, the old version takes traffic normally. The heart of zero-downtime is here.

The debugging order at this point is exactly as organized at the end of Chapter 3.

looking into a stuck rollout

kubectl describe deploy/web
kubectl describe pod web-ghi789-zz99

describe deploy’s Events have a message like ReplicaSet ... has timed out progressing, and describe pod’s Events have Failed to pull image "nginx:1.99-not-real". The answer is almost always in these two outputs. The finished version of the diagnostic tree is organized in Chapter 27 kubectl debugging patterns.

Rollback #

If you find a new version went up wrong, the path to revert to the old version is prepared in one line.

rollout history

kubectl rollout history deploy/web

example output

deployment.apps/web
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

A revision list shows. Number 1 is the initial nginx:1.27, and number 2 is the nginx:1.28 we just put up. To revert to the immediately previous revision, do this.

revert to the previous revision

kubectl rollout undo deploy/web

example output

deployment.apps/web rolled back

If you want to specify a particular revision, use the --to-revision flag.

to a specific revision

kubectl rollout undo deploy/web --to-revision=1

The reason this is possible is summed up in one line — a revision is another guise of a ReplicaSet. When deploying the new version, the old ReplicaSet didn’t disappear; it stayed emptied to replicas: 0. undo is the act of growing that old RS back to N. So the old version takes traffic again almost instantly.

By default K8s keeps up to 10 revisions. You can grow or shrink that with the manifest’s spec.revisionHistoryLimit. Set it too long and old ReplicaSets pile up in the registry making cleanup messy; set it too short and you can’t jump back to a much older version at once — so match it to the deploy frequency of your operational environment. For a typical web service the default 10 is fine.

What Deployment doesn’t solve #

The Deployment we’ve covered so far doesn’t handle every workload shape. Let’s note the cases of a different kind in one section.

Stateful workloads — workloads where each instance must have its own name and own disk, like a database, fit StatefulSet. Pod names are assigned stably as web-0, web-1, and each is connected 1:1 to a PVC defined in the manifest. The start order is also guaranteed as 0 → 1 → 2. Deployment handles both Pod names and disks as random values, so it’s not suitable for a DB.
Workloads that must run one per node — workloads like log collectors (Fluent Bit, Filebeat), node monitors (Node Exporter), and CNI agents fit DaemonSet. When a new node joins the cluster, one comes up on that node automatically.
One-off jobs — a task that runs once and finishes, like a migration, backup, or batch job, uses a Job (run immediately) or CronJob (scheduled). It’s a workload where a Pod going into Phase Succeeded is natural.

These three are covered in Chapter 8 StatefulSet / DaemonSet / Job / CronJob. This chapter focuses only on the most-touched Deployment. Still, it’s good to fix the classification in your head up front — no state, Deployment; state, StatefulSet; one per node, DaemonSet; one-off, Job.

Cleanup and teardown #

Wipe today’s resources clean. Deleting one Deployment cleans up the ReplicaSet and Pods beneath it together. This is the part where K8s garbage-collects through the parent-child relationship (owner reference).

clean up with the manifest

kubectl delete -f web.yaml

example output

deployment.apps "web" deleted

really empty?

kubectl get deploy,rs,pods

example output

No resources found in default namespace.

There’s also the path of deleting directly by name.

clean up by name

kubectl delete deploy web

Confirm that deleting only the Deployment makes the ReplicaSet and Pods disappear all at once. This owner reference model works the same way later in the book too.

Exercises #

After bringing up web.yaml with replicas: 3 as in the body, force-delete one Pod with kubectl delete pod <name>. Record in time order how the AGE column of kubectl get pods changes, and note how the random value after the new Pod’s name changed. Write a paragraph about where the ReplicaSet controller’s reconcile loop filled the difference.
After rolling out once from nginx:1.27 to nginx:1.28, deliberately apply once more with a nonexistent tag like nginx:1.99-not-real. Organize how kubectl get rs and kubectl get pods get stuck, and how the 3 old Pods remaining in a state that can take traffic connects to the zero-downtime heart of §“What happens on failure.” Record the full flow up to reverting cleanly with kubectl rollout undo deploy/web as one pass.
After growing imperatively with kubectl scale deploy/web --replicas=5, re-run kubectl apply of the manifest (replicas: 3) once more and confirm the 5 shrink back to 3. Note in a paragraph how the conclusion of §“Adjusting replicas,” “the declarative manifest is always the source of truth,” connects to the mental model of Chapter 20 GitOps.

In one line: a Deployment is the manifest that handles “maintain N of this Pod template + rolling transition to a new version,” the ReplicaSet beneath it takes responsibility for maintaining N of one version, and a Pod takes responsibility for the actual execution. Even if you delete a Pod, the ReplicaSet brings up a new one and self-healing works automatically. A new-version deploy is a rolling update that gradually grows the new RS and gradually shrinks the old RS; on failure the old RS keeps taking traffic, and one line of kubectl rollout undo reverts by growing the old RS again.

Next chapter #

Even this far, one thing is still unsolved — how do you send traffic from outside into a Pod inside the cluster? The 3 nginx Pods we just created merely have cluster-internal IPs, and those IPs change every time a Pod dies and comes back up. The ReplicaSet revives Pods automatically, but since the IP of a Pod revived that way differs each time, where the client should connect is ambiguous.

In Chapter 5 Service we cover how a Service puts a stable virtual IP / DNS name in front of Pods, the difference between the three kinds — ClusterIP for cluster-internal communication, NodePort that opens externally via a node port, and LoadBalancer that attaches a cloud load balancer — and the flow of attaching a Service in front of the app: web Pods we brought up in this chapter to make your first external connection. The 3 Pods of this chapter become a “service with an address” for the first time there.