Certified Kubernetes Application Developer (CKAD) #2 Pod and Container Lifecycle — Restart Policy and Container States
In #1, we got the kubectl environment for running a two-hour practical exam under our fingers — aliases, dry-run, generators, vim indentation, and context switching. With setup done, we now move into the smallest deployable unit in CKAD: the Pod. A Pod is the concrete thing that every workload (Deployment, Job, DaemonSet) ultimately produces, and the ability to diagnose “why isn’t this Pod coming up?” on the exam translates directly into points no matter the domain.
This post is organized along two axes. One is what states a Pod passes through (the lifecycle and restartPolicy); the other is why the container inside it got stuck in that state (container states and exit codes). We’ll learn both by checking them directly with commands and manifests.
The lifecycle a Pod passes through #
From creation until it disappears, a Pod passes through a defined set of phases. Many of the values you see in the STATUS column of k get pod are this phase; the rest are container-state reasons. Mixing the two throws your diagnosis off, so we start by distinguishing the phase. If K8s practical track #3 covered the basic concept of a Pod, here we narrow its state transitions down to the exam’s perspective.
| phase | Meaning |
|---|---|
| Pending | Registered with the API server but the container hasn’t run yet. Waiting to be scheduled, pulling the image, or mounting volumes |
| Running | The Pod is bound to a node and all containers are created. At least one container is running, or starting/restarting |
| Succeeded | All containers terminated normally (exit code 0) and won’t be restarted. The healthy completion shape of a Job |
| Failed | All containers terminated, and at least one of them did so abnormally (a non-zero code) |
| Unknown | Communication with the node was lost, so the Pod’s state can’t be determined. Usually a node failure |
You can pull the phase out directly with kubectl get pod -o jsonpath.
k get pod nginx -o jsonpath='{.status.phase}'The key here is the first-level branch: a long-lasting Pending means a scheduling, image, or volume problem, while Failed or repeated restarts mean a problem inside the container. Which side it falls on determines where you look.
restartPolicy: who uses what #
restartPolicy decides whether the kubelet restarts a container on the same node when it terminates. It’s a Pod-level setting with three possible values.
| Value | Behavior | Workloads that use it by default |
|---|---|---|
| Always | Always restarts regardless of exit code. For services that must stay up | Deployment, ReplicaSet, DaemonSet, StatefulSet |
| OnFailure | Restarts only on abnormal (non-zero) termination. Leaves it alone on normal completion (0) | Job, CronJob |
| Never | Never restarts regardless of exit code | One-shot, run-once tasks |
There’s a trap here that the exam likes to set. The Pods a Deployment creates always have restartPolicy fixed to Always. Putting restartPolicy: OnFailure in a Deployment manifest produces a validation error. Conversely, a Job must specify a restartPolicy, and Always is not allowed. In other words, the kind of workload constrains the possible values of restartPolicy.
A Pod manifest with restartPolicy set explicitly looks like this.
apiVersion: v1
kind: Pod
metadata:
name: oneshot
spec:
restartPolicy: OnFailure # 정상 완료면 멈추고, 실패하면 재시작
containers:
- name: worker
image: busybox:1.36
command: ["sh", "-c", "echo done; exit 0"]The Pod above terminates normally with exit 0, so under restartPolicy: OnFailure it won’t restart and the phase goes to Succeeded. Change command to exit 1 and it’s an abnormal termination, so the kubelet keeps restarting it and you reproduce the CrashLoopBackOff of the next section.
A restartPolicy restart always means restarting the container on the same node — not moving the Pod to a different node. Spinning up a new Pod on another node when the node itself dies is the job of higher-level controllers like Deployment and ReplicaSet.
Reading container states and reasons #
If the phase is the Pod’s macro-level state, a container state is the micro-level state of each individual container. It shows up under the State field of k describe pod and comes in three forms.
| State | Meaning |
|---|---|
| Waiting | Not running yet. Pulling the image, waiting on dependencies, etc. The reason records why it’s waiting |
| Running | Running normally. Comes with a startedAt timestamp |
| Terminated | Execution has ended. Comes with exitCode, reason, startedAt, and finishedAt |
Eighty percent of diagnosis is reading the reason attached to Waiting and Terminated. The reasons you’ll run into repeatedly on the exam are summarized below.
| reason | State | Meaning and first-level cause |
|---|---|---|
| ContainerCreating | Waiting | Creating the container. The volume-mount / image-prep stage. A long stall here suggests a missing volume or secret |
| ImagePullBackOff | Waiting | Repeated image-pull failures, now backing off. Image name typo, missing tag, or missing private-registry auth |
| ErrImagePull | Waiting | Image pull failed immediately. The stage just before ImagePullBackOff |
| CrashLoopBackOff | Waiting | The container dies right after starting, over and over, so it waits with growing restart intervals. A problem with the process inside the container |
| OOMKilled | Terminated | The kernel force-killed it for exceeding the memory limit (limits.memory). Exit code 137 |
| Completed | Terminated | Terminated normally (exit code 0). The success shape of a Job |
| Error | Terminated | Terminated with a non-zero code. An application error |
The biggest misunderstanding here is that CrashLoopBackOff is not an error but a state. It means the container keeps dying, so the kubelet is waiting with an exponentially growing restart interval — 10s, 20s, 40s, up to 5 minutes. The cause isn’t the BackOff itself but why the container is dying, so you have to drill down to the logs and the exit code.
How to read exit codes #
The exitCode of a Terminated state narrows the cause down fast. These values are worth memorizing for the exam.
| Exit code | Meaning |
|---|---|
| 0 | Normal termination. Completed |
| 1 | A generic application error. You need to look at the logs |
| 137 | 128 + 9 (SIGKILL). Force-killed. Either OOMKilled or a force kill after the grace period |
| 143 | 128 + 15 (SIGTERM). Ended after receiving the normal termination signal. Usually a clean graceful shutdown |
You can check the exit code directly with this one line.
k get pod oneshot -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'If you see 137, first check whether it’s OOMKilled (the reason in describe); if so, the move is to raise limits.memory or reduce the application’s memory use. We cover resource limits in detail in #16.
The diagnostic command set #
Once the state is classified, the commands you actually use to dig in are fixed. Getting this sequence under your fingers determines your exam troubleshooting speed.
# 1) 넓게 본다: 어느 Pod가 어떤 상태이고 재시작 횟수는 몇 번인가
k get pod -o wide
# 2) 한 Pod를 깊게 본다: 컨테이너 State,reason,exitCode,이벤트가 한 화면에
k describe pod oneshot
# 3) 현재 컨테이너 로그
k logs oneshot
# 4) 직전(크래시 전) 컨테이너 로그: CrashLoopBackOff 진단의 핵심
k logs oneshot --previous
# 5) 클러스터 이벤트를 시간순으로 (스케줄 실패,pull 실패가 여기 찍힘)
k get events --sort-by=.metadata.creationTimestampThe RESTARTS column of k get pod is the first-level signal. If the number climbs fast, it’s a CrashLoop; if it stays at 0 while Pending, it’s a scheduling, image, or volume problem. The Events section at the bottom of describe prints messages like Failed to pull image, Back-off restarting failed container, and OOMKilled directly, so confirm the basis for a reason here.
The decisive clue for CrashLoopBackOff is k logs --previous. Even when the container has already died and the current log is empty, the previous instance’s log still holds why it died — a missing config file, a port conflict, a bad argument, and so on.
Exam staple: “why does this Pod keep restarting?” #
A common CKAD question type hands you a broken Pod and asks you to find and fix the cause. Drilling down in this order solves nearly every case.
- Run
k get pod -o wideto check STATUS and RESTARTS. See whether it’s CrashLoopBackOff and whether RESTARTS is climbing. - Run
k describe pod <name>to read the container State, the reason and exitCode of lastState, and the Events at the bottom. - If the reason is ImagePullBackOff, look at the image name, tag, and registry auth (no need to go inside the container).
- If the reason is CrashLoopBackOff, run
k logs <name> --previousto find why it died in the previous log. - If the exit code is 137, check whether it’s OOMKilled and inspect limits.memory.
- After fixing the cause in the manifest,
k applyor delete and recreate the Pod, and confirm STATUS goes to Running.
The key is narrowing it down enough with describe and logs before you exec into the container. Exam time is limited, and most causes reveal themselves in one screen of describe and the previous log.
Reproducing CrashLoop with a deliberately failing container #
Building it once yourself gets the diagnostic screens under your fingers. The Pod below terminates abnormally one second after starting, reproducing CrashLoopBackOff.
apiVersion: v1
kind: Pod
metadata:
name: crasher
spec:
restartPolicy: Always # 비정상 종료마다 계속 재시작 → CrashLoopBackOff
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "echo starting; sleep 1; echo boom >&2; exit 1"]k apply -f crasher.yaml
k get pod crasher -w # STATUS가 Running → CrashLoopBackOff로 가는 것을 관찰
k describe pod crasher # lastState.terminated: reason=Error, exitCode=1
k logs crasher --previous # "starting" 과 "boom" 이 직전 로그에 남아 있음Watching with -w (watch), you can see the backoff widen the restart interval directly. Once the logs confirm the cause is exit 1, change the manifest’s command to exit 0 and reapply, and the phase settles to Succeeded.
Handling individual containers in a multi-container Pod #
When a Pod has multiple containers, you have to point logs and exec at a specific container with -c. The multi-container patterns themselves are covered in #3, but get the shape of the diagnostic commands down in advance.
# 멀티 컨테이너 Pod에서 특정 컨테이너 로그,셸
k logs mypod -c sidecar
k exec -it mypod -c app -- shDrop the -c and it defaults to the first container, with a warning when there are several. When the exam says something like “check the logs of the sidecar container,” specifying -c is part of the correct answer.
Exam points #
- The 5 phases. Pending (before running), Running (running), Succeeded (normal completion), Failed (abnormal termination), Unknown (lost node communication)
- The 3 restartPolicy values and their constraints. Always (Deployment etc., fixed), OnFailure / Never (a Job uses one of these two; Always is not allowed)
- The 3 container states. Waiting, Running, Terminated. Diagnosis is reading the reason of Waiting and Terminated
- The staple reasons. ImagePullBackOff / ErrImagePull (image), CrashLoopBackOff (internal process), OOMKilled (memory), Completed (normal)
- Exit codes. 0 (normal), 1 (app error), 137 (SIGKILL / OOM), 143 (SIGTERM / graceful)
- Diagnostic order.
get pod -o wide→describe pod→logs --previous→get events. exec comes last - CrashLoopBackOff is not an error but a restart-backoff state. The cause is in
logs --previous - For multi-container, specify the container with
-con logs and exec
Wrap-up #
What this post locked in:
- We separated the Pod’s phase (macro state) from the container state (micro state) and set up the first-level branch of diagnosis.
- We confirmed that restartPolicy has its possible values constrained by the kind of workload (Deployment = Always fixed, Job = OnFailure/Never).
- We learned how to narrow causes down fast with reasons like CrashLoopBackOff, ImagePullBackOff, and OOMKilled, and with exit codes 0, 1, 137, and 143.
- We confirmed the diagnostic command order of
describe→logs --previous→eventswith a Pod we reproduced ourselves.
Now that you have an eye for reading state, the next step is the design where you deliberately put several containers inside one Pod.
Next — Multi-container patterns #
If this post covered the state and diagnosis of a single container, the next is the pattern of placing several containers together inside one Pod.
#3 Multi-container Patterns: Init container, sidecar, ambassador, adapter covers init containers that run sequentially before the main container, sidecars that assist alongside the main container, and the ambassador and adapter patterns — laying out when and in what shape to use each, with manifests.