Certified Kubernetes Application Developer (CKAD) #2 Pod and Container Lifecycle — Restart Policy and Container States

In #1, we got the kubectl environment for running a two-hour practical exam under our fingers — aliases, dry-run, generators, vim indentation, and context switching. With setup done, we now move into the smallest deployable unit in CKAD: the Pod. A Pod is the concrete thing that every workload (Deployment, Job, DaemonSet) ultimately produces, and the ability to diagnose “why isn’t this Pod coming up?” on the exam translates directly into points no matter the domain.

This post is organized along two axes. One is what states a Pod passes through (the lifecycle and restartPolicy); the other is why the container inside it got stuck in that state (container states and exit codes). We’ll learn both by checking them directly with commands and manifests.

The lifecycle a Pod passes through #

From creation until it disappears, a Pod passes through a defined set of phases. Many of the values you see in the STATUS column of k get pod are this phase; the rest are container-state reasons. Mixing the two throws your diagnosis off, so we start by distinguishing the phase. If K8s practical track #3 covered the basic concept of a Pod, here we narrow its state transitions down to the exam’s perspective.

phaseMeaning
PendingRegistered with the API server but the container hasn’t run yet. Waiting to be scheduled, pulling the image, or mounting volumes
RunningThe Pod is bound to a node and all containers are created. At least one container is running, or starting/restarting
SucceededAll containers terminated normally (exit code 0) and won’t be restarted. The healthy completion shape of a Job
FailedAll containers terminated, and at least one of them did so abnormally (a non-zero code)
UnknownCommunication with the node was lost, so the Pod’s state can’t be determined. Usually a node failure

You can pull the phase out directly with kubectl get pod -o jsonpath.

k get pod nginx -o jsonpath='{.status.phase}'

The key here is the first-level branch: a long-lasting Pending means a scheduling, image, or volume problem, while Failed or repeated restarts mean a problem inside the container. Which side it falls on determines where you look.

restartPolicy: who uses what #

restartPolicy decides whether the kubelet restarts a container on the same node when it terminates. It’s a Pod-level setting with three possible values.

ValueBehaviorWorkloads that use it by default
AlwaysAlways restarts regardless of exit code. For services that must stay upDeployment, ReplicaSet, DaemonSet, StatefulSet
OnFailureRestarts only on abnormal (non-zero) termination. Leaves it alone on normal completion (0)Job, CronJob
NeverNever restarts regardless of exit codeOne-shot, run-once tasks

There’s a trap here that the exam likes to set. The Pods a Deployment creates always have restartPolicy fixed to Always. Putting restartPolicy: OnFailure in a Deployment manifest produces a validation error. Conversely, a Job must specify a restartPolicy, and Always is not allowed. In other words, the kind of workload constrains the possible values of restartPolicy.

A Pod manifest with restartPolicy set explicitly looks like this.

apiVersion: v1
kind: Pod
metadata:
  name: oneshot
spec:
  restartPolicy: OnFailure   # 정상 완료면 멈추고, 실패하면 재시작
  containers:
    - name: worker
      image: busybox:1.36
      command: ["sh", "-c", "echo done; exit 0"]

The Pod above terminates normally with exit 0, so under restartPolicy: OnFailure it won’t restart and the phase goes to Succeeded. Change command to exit 1 and it’s an abnormal termination, so the kubelet keeps restarting it and you reproduce the CrashLoopBackOff of the next section.

A restartPolicy restart always means restarting the container on the same node — not moving the Pod to a different node. Spinning up a new Pod on another node when the node itself dies is the job of higher-level controllers like Deployment and ReplicaSet.

Reading container states and reasons #

If the phase is the Pod’s macro-level state, a container state is the micro-level state of each individual container. It shows up under the State field of k describe pod and comes in three forms.

StateMeaning
WaitingNot running yet. Pulling the image, waiting on dependencies, etc. The reason records why it’s waiting
RunningRunning normally. Comes with a startedAt timestamp
TerminatedExecution has ended. Comes with exitCode, reason, startedAt, and finishedAt

Eighty percent of diagnosis is reading the reason attached to Waiting and Terminated. The reasons you’ll run into repeatedly on the exam are summarized below.

reasonStateMeaning and first-level cause
ContainerCreatingWaitingCreating the container. The volume-mount / image-prep stage. A long stall here suggests a missing volume or secret
ImagePullBackOffWaitingRepeated image-pull failures, now backing off. Image name typo, missing tag, or missing private-registry auth
ErrImagePullWaitingImage pull failed immediately. The stage just before ImagePullBackOff
CrashLoopBackOffWaitingThe container dies right after starting, over and over, so it waits with growing restart intervals. A problem with the process inside the container
OOMKilledTerminatedThe kernel force-killed it for exceeding the memory limit (limits.memory). Exit code 137
CompletedTerminatedTerminated normally (exit code 0). The success shape of a Job
ErrorTerminatedTerminated with a non-zero code. An application error

The biggest misunderstanding here is that CrashLoopBackOff is not an error but a state. It means the container keeps dying, so the kubelet is waiting with an exponentially growing restart interval — 10s, 20s, 40s, up to 5 minutes. The cause isn’t the BackOff itself but why the container is dying, so you have to drill down to the logs and the exit code.

How to read exit codes #

The exitCode of a Terminated state narrows the cause down fast. These values are worth memorizing for the exam.

Exit codeMeaning
0Normal termination. Completed
1A generic application error. You need to look at the logs
137128 + 9 (SIGKILL). Force-killed. Either OOMKilled or a force kill after the grace period
143128 + 15 (SIGTERM). Ended after receiving the normal termination signal. Usually a clean graceful shutdown

You can check the exit code directly with this one line.

k get pod oneshot -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

If you see 137, first check whether it’s OOMKilled (the reason in describe); if so, the move is to raise limits.memory or reduce the application’s memory use. We cover resource limits in detail in #16.

The diagnostic command set #

Once the state is classified, the commands you actually use to dig in are fixed. Getting this sequence under your fingers determines your exam troubleshooting speed.

# 1) 넓게 본다: 어느 Pod가 어떤 상태이고 재시작 횟수는 몇 번인가
k get pod -o wide

# 2) 한 Pod를 깊게 본다: 컨테이너 State,reason,exitCode,이벤트가 한 화면에
k describe pod oneshot

# 3) 현재 컨테이너 로그
k logs oneshot

# 4) 직전(크래시 전) 컨테이너 로그: CrashLoopBackOff 진단의 핵심
k logs oneshot --previous

# 5) 클러스터 이벤트를 시간순으로 (스케줄 실패,pull 실패가 여기 찍힘)
k get events --sort-by=.metadata.creationTimestamp

The RESTARTS column of k get pod is the first-level signal. If the number climbs fast, it’s a CrashLoop; if it stays at 0 while Pending, it’s a scheduling, image, or volume problem. The Events section at the bottom of describe prints messages like Failed to pull image, Back-off restarting failed container, and OOMKilled directly, so confirm the basis for a reason here.

The decisive clue for CrashLoopBackOff is k logs --previous. Even when the container has already died and the current log is empty, the previous instance’s log still holds why it died — a missing config file, a port conflict, a bad argument, and so on.

Exam staple: “why does this Pod keep restarting?” #

A common CKAD question type hands you a broken Pod and asks you to find and fix the cause. Drilling down in this order solves nearly every case.

  1. Run k get pod -o wide to check STATUS and RESTARTS. See whether it’s CrashLoopBackOff and whether RESTARTS is climbing.
  2. Run k describe pod <name> to read the container State, the reason and exitCode of lastState, and the Events at the bottom.
  3. If the reason is ImagePullBackOff, look at the image name, tag, and registry auth (no need to go inside the container).
  4. If the reason is CrashLoopBackOff, run k logs <name> --previous to find why it died in the previous log.
  5. If the exit code is 137, check whether it’s OOMKilled and inspect limits.memory.
  6. After fixing the cause in the manifest, k apply or delete and recreate the Pod, and confirm STATUS goes to Running.

The key is narrowing it down enough with describe and logs before you exec into the container. Exam time is limited, and most causes reveal themselves in one screen of describe and the previous log.

Reproducing CrashLoop with a deliberately failing container #

Building it once yourself gets the diagnostic screens under your fingers. The Pod below terminates abnormally one second after starting, reproducing CrashLoopBackOff.

apiVersion: v1
kind: Pod
metadata:
  name: crasher
spec:
  restartPolicy: Always   # 비정상 종료마다 계속 재시작 → CrashLoopBackOff
  containers:
    - name: app
      image: busybox:1.36
      command: ["sh", "-c", "echo starting; sleep 1; echo boom >&2; exit 1"]
k apply -f crasher.yaml
k get pod crasher -w          # STATUS가 Running → CrashLoopBackOff로 가는 것을 관찰
k describe pod crasher        # lastState.terminated: reason=Error, exitCode=1
k logs crasher --previous     # "starting" 과 "boom" 이 직전 로그에 남아 있음

Watching with -w (watch), you can see the backoff widen the restart interval directly. Once the logs confirm the cause is exit 1, change the manifest’s command to exit 0 and reapply, and the phase settles to Succeeded.

Handling individual containers in a multi-container Pod #

When a Pod has multiple containers, you have to point logs and exec at a specific container with -c. The multi-container patterns themselves are covered in #3, but get the shape of the diagnostic commands down in advance.

# 멀티 컨테이너 Pod에서 특정 컨테이너 로그,셸
k logs mypod -c sidecar
k exec -it mypod -c app -- sh

Drop the -c and it defaults to the first container, with a warning when there are several. When the exam says something like “check the logs of the sidecar container,” specifying -c is part of the correct answer.

Exam points #

  • The 5 phases. Pending (before running), Running (running), Succeeded (normal completion), Failed (abnormal termination), Unknown (lost node communication)
  • The 3 restartPolicy values and their constraints. Always (Deployment etc., fixed), OnFailure / Never (a Job uses one of these two; Always is not allowed)
  • The 3 container states. Waiting, Running, Terminated. Diagnosis is reading the reason of Waiting and Terminated
  • The staple reasons. ImagePullBackOff / ErrImagePull (image), CrashLoopBackOff (internal process), OOMKilled (memory), Completed (normal)
  • Exit codes. 0 (normal), 1 (app error), 137 (SIGKILL / OOM), 143 (SIGTERM / graceful)
  • Diagnostic order. get pod -o widedescribe podlogs --previousget events. exec comes last
  • CrashLoopBackOff is not an error but a restart-backoff state. The cause is in logs --previous
  • For multi-container, specify the container with -c on logs and exec

Wrap-up #

What this post locked in:

  • We separated the Pod’s phase (macro state) from the container state (micro state) and set up the first-level branch of diagnosis.
  • We confirmed that restartPolicy has its possible values constrained by the kind of workload (Deployment = Always fixed, Job = OnFailure/Never).
  • We learned how to narrow causes down fast with reasons like CrashLoopBackOff, ImagePullBackOff, and OOMKilled, and with exit codes 0, 1, 137, and 143.
  • We confirmed the diagnostic command order of describelogs --previousevents with a Pod we reproduced ourselves.

Now that you have an eye for reading state, the next step is the design where you deliberately put several containers inside one Pod.

Next — Multi-container patterns #

If this post covered the state and diagnosis of a single container, the next is the pattern of placing several containers together inside one Pod.

#3 Multi-container Patterns: Init container, sidecar, ambassador, adapter covers init containers that run sequentially before the main container, sidecars that assist alongside the main container, and the ambassador and adapter patterns — laying out when and in what shape to use each, with manifests.

X