K8s Basics #4: Deployment and ReplicaSet — Declarative Deploys and Rolling Updates

14 min read

As #3 kubectl and your first Pod ended, a Pod created directly just disappears. This post covers the first controller that fills that gap automatically — Deployment — and the ReplicaSet sitting underneath it. In one cycle: declaring replicas: 3 to keep Pods alive, watching auto-recovery when one Pod is deleted, and seeing how rolling updates and rollbacks behave when you change an image tag.

This series is K8s Basics, 7 posts.

By the end of this post you’ll have the first manifest that hands Pod management off to a controller instead of creating Pods by hand. From here on this is essentially the baseline shape of manifests in real-world ops.

Deployment, ReplicaSet, Pod — three layers #

The cast for this post is three resources, but the human only writes the top one. A handy mental picture:

The three layers
   ┌──────────────────────┐
   │     Deployment       │  ← what you write in the manifest
   │  (web)               │
   └──────────┬───────────┘
              │ creates / manages
   ┌──────────────────────┐
   │     ReplicaSet       │  ← created automatically by Deployment
   │  (web-abc123)        │
   └──────────┬───────────┘
              │ creates / maintains
   ┌──────────┬──────────┬──────────┐
   │   Pod    │   Pod    │   Pod    │  ← actual workload
   │ web-...  │ web-...  │ web-...  │
   └──────────┴──────────┴──────────┘

One-line responsibilities:

  • Deployment — the manifest the human writes. Declares “spin up N Pods of this template, and here’s how to switch to a new version (rolling update).” Effectively the resource you touch most often in real-world ops.
  • ReplicaSet — the intermediate object Deployment creates automatically. One job — “keep N Pods of this template alive at all times.” You almost never write a ReplicaSet manifest yourself.
  • Pod — the actual workload. ReplicaSet creates them; if one dies, ReplicaSet creates another. The same Pods we hand-created in #3 — but now if one dies, a replacement comes back on its own.

Why two layers? #

It can look like one Deployment layer should be enough. Why does ReplicaSet exist as its own thing? The reason is new-version deploys.

When you push a new version, Deployment creates a brand-new ReplicaSet. It scales the new RS replicas up — 1, 2, 3 — while scaling the old RS down — 3, 2, 1. There’s a brief window where Pods from both RSes are up at the same time. That’s the heart of a rolling update. When it finishes, the old RS sits at 0 but stays around as an object — so a rollback can scale it back up.

In short, Deployment is the layer that handles transitions between versions, and ReplicaSet is the layer that maintains a single version at N. Splitting them lets old and new versions coexist briefly in the same cluster.

Your first Deployment manifest #

This time we write the same nginx:1.27 not as a Pod but as a Deployment. Save the file as web.yaml:

web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx:1.27
          ports:
            - containerPort: 80

Three new pieces compared to the Pod manifest in #3:

  • apiVersion: apps/v1 — Pod was v1, but Deployment lives in the apps/v1 API group. Controller-style resources (Deployment, StatefulSet, DaemonSet, ReplicaSet) all share that group.
  • spec.replicas: 3 — the declaration that 3 Pods of this template should always be up.
  • spec.selector.matchLabels + spec.template — the label condition Deployment uses to find the Pods it manages, and the template describing the Pod’s shape. The shape under template is exactly the metadata + spec of the Pod we saw in #3.

One rule — selector and template labels must match #

The most common mistake when writing your first manifest is here. spec.selector.matchLabels and spec.template.metadata.labels have to match. If they don’t, K8s rejects the manifest. It’s not just a convention — it’s a validation rule the API server enforces.

That’s why both fields above are app: web. If you set the selector to app: web but change the template’s label to app: nginx, kubectl apply will spit out:

When labels don't match
The Deployment "web" is invalid: spec.template.metadata.labels:
  Invalid value: map[string]string{"app":"nginx"}:
  `selector` does not match template `labels`

Simple mental model — the selector says “how I recognize the Pods I manage,” the template says “the labels on the Pods I create.” They have to match so the controller recognizes the Pods it just created — almost a tautology.

Apply it #

Push web.yaml to the cluster.

Create the Deployment
kubectl apply -f web.yaml
Example output
deployment.apps/web created

Let’s see all three resource kinds at once. kubectl get accepts a comma-separated list.

All three layers in one shot
kubectl get deploy,rs,pods
Example output
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   3/3     3            3           20s

NAME                            DESIRED   CURRENT   READY   AGE
replicaset.apps/web-abc123      3         3         3       20s

NAME                  READY   STATUS    RESTARTS   AGE
pod/web-abc123-aa11   1/1     Running   0          20s
pod/web-abc123-bb22   1/1     Running   0          20s
pod/web-abc123-cc33   1/1     Running   0          20s

How to read it:

  • Deployment rowREADY 3/3 means all 3 desired replicas are ready, UP-TO-DATE 3 is the count of Pods updated to the current template, and AVAILABLE 3 is the count alive long enough (past minReadySeconds) to take traffic.
  • ReplicaSet row — the name is web-abc123. The abc123 suffix is a hash K8s computes from the template. The columns to read are DESIRED 3 / CURRENT 3 / READY 3.
  • Pod row — names like web-abc123-aa11 carry two suffixes. The first part (web-abc123) matches the ReplicaSet name. The chain of who-created-what is right there in the name.

The naming pattern in one line — <deployment>-<replicaset-hash>-<pod-suffix>. You’ll see it constantly through the rest of the series.

Killing a Pod — self-healing #

Time to see what this manifest changed. In #3, deleting a Pod just made it disappear. This time it’s different.

Force-delete one Pod
kubectl delete pod web-abc123-aa11
Example output
pod "web-abc123-aa11" deleted

Pull the Pod list right after.

Look again
kubectl get pods
Example output
NAME                  READY   STATUS    RESTARTS   AGE
web-abc123-bb22       1/1     Running   0          2m
web-abc123-cc33       1/1     Running   0          2m
web-abc123-dd44       1/1     Running   0          5s

Still three Pods. Look closely — bb22 and cc33 show AGE 2m, but the new dd44 shows AGE 5s. A freshly created Pod. The changed suffix is another hint that this is a new one.

This is the reconcile loop from #1 at work. ReplicaSet holds “3 Pods should exist,” and the moment a human deleted one, desired (3) and actual (2) diverged. The ReplicaSet controller in controller-manager noticed the gap and asked the API server to create one more Pod. The scheduler picked a node, the kubelet started the container, and we’re back to 3. The human did nothing.

The same thing happens at the node level. If a node hosting some Pods dies, K8s relocates them to other live nodes. The “service stays up when a node dies” line from #1 is, fundamentally, what this ReplicaSet controller solves.

Adjusting replicas #

When 3 isn’t enough — or it’s too many — there are two ways to adjust.

Declarative — change the number in the manifest and apply again. The cleanest way.

web.yaml — only replicas to 5
spec:
  replicas: 5
  ...
Apply again
kubectl apply -f web.yaml
Example output
deployment.apps/web configured

kubectl get pods will soon show 5. Scaling down is the same — drop the number in the manifest and apply.

Imperative — fast, but temporary.

Scale imperatively
kubectl scale deploy/web --replicas=5
Example output
deployment.apps/web scaled

Useful for a quick burst up or down. The downside is clear — the manifest’s replicas value drifts from the cluster’s actual state. The manifest still says replicas: 3, but the cluster has 5 running. Next time someone runs kubectl apply -f web.yaml without thinking, those 5 collapse back to 3.

The one-line rule — the declarative manifest is always the source of truth. Use kubectl scale only for a quick fix during debugging, or when you’re about to sync the manifest immediately after. The normal flow is: edit the manifest, then apply. That principle is the foundation of the entire desired-state model from #1.

Rolling updates — the default of zero-downtime deploys #

Now the first new-version deploy of this series. Change the image tag from nginx:1.27 to nginx:1.28 — one character.

web.yaml — image tag only to 1.28
      containers:
        - name: web
          image: nginx:1.28
          ports:
            - containerPort: 80
Apply the new version
kubectl apply -f web.yaml
Example output
deployment.apps/web configured

The output is one line, but a fair amount happens behind it.

What’s happening underneath #

The Deployment controller notices the template changed and creates a new ReplicaSet for the new template. It scales the new RS replicas up from 0 to 1, 2, 3, and scales the old RS down from 3 to 2, 1, 0. At each step the new Pod has to reach Ready before the next step happens.

In the middle of a rollout, kubectl get rs shows two ReplicaSets.

Mid-rollout
kubectl get rs
Example output — mid-flight
NAME             DESIRED   CURRENT   READY   AGE
web-abc123       2         2         2       10m   ← old RS (1.27)
web-def456       2         2         1       30s   ← new RS (1.28)

Old RS down to 2, new RS up to 2. That snapshot is the heart of a rolling update. Pods from both RSes are up briefly — and the traffic across them is split evenly by the Service we’ll cover in #5.

Watching progress #

The most convenient one-liner to track a rollout:

Rollout status
kubectl rollout status deploy/web
Example output
Waiting for deployment "web" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "web" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "web" rollout to finish: 1 old replicas are pending termination...
deployment "web" successfully rolled out

Steps print one line at a time, and a success line at the end means the deploy is done. After that, kubectl get rs shows the old RS at DESIRED 0 but still around as an object. That structure makes the next section’s rollback possible.

One line on the default strategy #

The flow above happens because Deployment’s spec.strategy defaults to RollingUpdate, with two parameters:

  • maxSurge: 25% — how many extra Pods over desired are allowed temporarily. With 3 desired, +1 is allowed.
  • maxUnavailable: 25% — how many missing Pods under desired are allowed temporarily. With 3 desired, -1 is allowed.

The other option is Recreate — kill all old Pods first, then start new ones. No zero-downtime, but useful for stateful workloads where two versions can’t coexist (e.g., a DB migration holding the same volume). For an ordinary web server, the default RollingUpdate is enough.

What if the rollout fails? #

Try a wrong image tag on purpose — say nginx:1.99-not-real.

Apply with a non-existent tag
kubectl apply -f web.yaml

kubectl rollout status deploy/web hangs for a long time, and kubectl get pods shows a freshly created Pod stuck at ImagePullBackOff.

Example output
NAME                  READY   STATUS             RESTARTS   AGE
web-abc123-aa11       1/1     Running            0          15m
web-abc123-bb22       1/1     Running            0          15m
web-abc123-cc33       1/1     Running            0          15m
web-ghi789-zz99       0/1     ImagePullBackOff   0          40s

Notably, the three old Pods are alive and well. If the new Pod can’t reach Ready, Deployment refuses to advance to the next step. So it doesn’t scale the old RS down to 0. Even with the rollout stuck, the old version keeps taking traffic normally. That’s the core of zero-downtime.

The debug order is the same as the closing pattern from #3:

Inspect a stuck rollout
kubectl describe deploy/web
kubectl describe pod web-ghi789-zz99

describe deploy’s Events show something like ReplicaSet ... has timed out progressing, and describe pod’s Events have Failed to pull image "nginx:1.99-not-real". The answer is almost always in those two outputs.

Rollback #

If a bad version made it out, rolling back is one command.

Rollout history
kubectl rollout history deploy/web
Example output
deployment.apps/web
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

A revision list. 1 is the original nginx:1.27; 2 is the nginx:1.28 we just rolled out. To roll back to the previous revision:

Roll back to the previous revision
kubectl rollout undo deploy/web
Example output
deployment.apps/web rolled back

To pick a specific revision, use --to-revision:

Roll back to a specific revision
kubectl rollout undo deploy/web --to-revision=1

The reason this works boils down to one line — a revision is just an old ReplicaSet still hanging around. When the new version was deployed, the old ReplicaSet didn’t disappear; it sat at replicas: 0 but stayed as an object. undo is just scaling that old RS back to N. So the old version takes traffic again almost instantly.

By default K8s keeps 10 revisions. spec.revisionHistoryLimit raises or lowers it. Too long, and old ReplicaSets clutter the registry; too short, and you can’t reach back to a much older version in one step. Match it to your deploy frequency — for a typical web service, the default 10 is fine.

What Deployment doesn’t solve #

Deployment doesn’t fit every workload shape. The ones with a different grain:

  • Stateful workloads — for things like databases where each instance needs its own name and its own disk, StatefulSet is the right resource. Pod names are stable (web-0, web-1), and a 1:1 PVC defined in the manifest is attached to each. Start order is also guaranteed (012). Deployment treats Pod names and disks as randomized, so it’s not a fit for DBs.
  • One-per-node workloads — log shippers (Fluent Bit, Filebeat), node monitors (Node Exporter), CNI agents — DaemonSet is right. When a new node joins the cluster, one shows up on it automatically.
  • One-shot jobs — migrations, backups, batch jobs — anything that runs once and finishes — use Job (immediate) or CronJob (scheduled). Workloads where Pods naturally end up in Succeeded.

These three are covered one at a time in K8s Intermediate. This series stays focused on Deployment, the one you touch most. But it’s worth getting the categorization right in your head ahead of time — stateless → Deployment, stateful → StatefulSet, one-per-node → DaemonSet, one-shot → Job.

Clean up #

Tear down today’s resources cleanly. Deleting one Deployment removes its ReplicaSets and Pods underneath — K8s handles that through owner references and garbage collection.

Clean up by manifest
kubectl delete -f web.yaml
Example output
deployment.apps "web" deleted
Really empty?
kubectl get deploy,rs,pods
Example output
No resources found in default namespace.

By name works too:

Clean up by name
kubectl delete deploy web

Deleting the Deployment alone also removes its ReplicaSet and Pods. The owner-reference model works the same way throughout the rest of the series.

Summary #

What this post pinned down:

  • Deployment / ReplicaSet / Pod — three layers. The human only writes Deployment. ReplicaSet is the auto-created middle object; Pod is the workload it produces.
  • The manifest spine is apiVersion: apps/v1 / kind: Deployment / metadata / spec. Inside spec, the new fields are replicas, selector.matchLabels, template — and selector and template labels must match.
  • Force-deleting a Pod gets it replaced soon after by the ReplicaSet controller, which closes the gap between desired (N) and actual — the simplest face of the reconcile loop from #1.
  • For replicas adjustments, edit the manifest and apply; kubectl scale is temporary. The declarative manifest is always the source of truth.
  • Rolling updates work by creating a new ReplicaSet and gradually emptying the old one. Default strategy is RollingUpdate (maxSurge 25%, maxUnavailable 25%); track with kubectl rollout status.
  • Rollback is one command — kubectl rollout undo. It works because the old ReplicaSet was sitting at replicas: 0 the whole time.

Next — Service #

Even now, one thing isn’t solved — how do you get traffic from outside the cluster to those Pods? Our 3 nginx Pods have cluster-internal IPs, but those IPs change every time a Pod dies and is recreated. ReplicaSet keeps the Pods alive, but the moving IPs leave clients without a stable place to connect to.

#5 Service — ClusterIP / NodePort / LoadBalancer covers (1) how Service puts a stable virtual IP / DNS name in front of Pods, (2) the three types — ClusterIP for in-cluster traffic, NodePort to expose via node ports, LoadBalancer to attach a cloud LB, and (3) putting a Service in front of the app: web Pods we just built and making the first external connection. The 3 Pods from this post turn into “a service with an address” for the first time there.

X