Certified Kubernetes Application Developer (CKAD) #7 Workloads 3: Job, CronJob (Backoff, Concurrency)
After learning zero-downtime deployments with Deployment in #5 and node-level and state-preserving workloads with DaemonSet and StatefulSet in #6, this post looks at a completely different kind of workload. If Deployment is for services that must stay up continuously, Job is for work that runs once and finishes. Data migrations, backups, and batch computations — things that are done once they complete — belong here.
When a service Pod dies it has to be brought back, but a batch Job Pod must not be restarted once it finishes successfully. Handling this notion of “done” is the heart of Job. And running that Job periodically on a cron schedule is what CronJob does. Since backoffLimit and concurrencyPolicy come up especially often on the exam, this post drills both YAML and kubectl until they feel natural.
Job: work that runs once and finishes #
A Job is a workload that creates one or more Pods and guarantees that they run to a specified number of successful completions. Unlike a Deployment that always keeps N Pods alive, a Job stops creating Pods once the defined work is done.
Let’s generate the simplest possible Job.
k create job pi --image=perl:5.34 \
$do -- perl -Mbignum=bpi -wle 'print bpi(2000)' > job.yaml$do is the --dry-run=client -o yaml defined in #1. The skeleton the command above produces looks like this.
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: NeverHere a Job differs from a Deployment in that it is apiVersion: batch/v1, and that the Pod template’s restartPolicy is Never.
restartPolicy: Always is forbidden in a Job #
A Job’s Pod template can use only OnFailure or Never for restartPolicy. Unlike a Deployment, Always is not allowed — because once the work finishes, it must not be run again.
| Value | Behavior |
|---|---|
Never | Does not restart the failed Pod; the Job creates a new Pod to retry |
OnFailure | Restarts the container inside the same Pod to retry |
Never has the advantage of leaving each failed Pod behind so debugging logs remain, while OnFailure has the advantage of not increasing the Pod count. On the exam, just follow whatever the question requires.
completions and parallelism #
A Job’s behavior is determined by two fields.
| Field | Meaning | Default |
|---|---|---|
completions | How many completions count as success | 1 |
parallelism | How many Pods run at the same time | 1 |
For example, with completions: 6 and parallelism: 2, the Job runs two Pods at a time and keeps going until it has filled six successes.
backoffLimit: retry limit #
backoffLimit sets how many times a Job retries on failure. Once this count is exceeded, the Job is marked Failed and creates no more Pods. The default is 6.
spec:
backoffLimit: 4An increasing delay (exponential backoff) is applied between retries. This acts as a safeguard against an endlessly failing job consuming cluster resources. When the exam asks you to “retry up to N times only,” this is the field to set.
activeDeadlineSeconds: time limit #
activeDeadlineSeconds is a time limit that forcibly terminates the Job once this many seconds have passed since it started. If backoffLimit is a count-based limit, this one is time-based. Whichever condition is reached first applies.
spec:
activeDeadlineSeconds: 100If it is terminated for exceeding the time, the Job is treated as failed with the reason DeadlineExceeded.
ttlSecondsAfterFinished: automatic cleanup #
A completed or failed Job remains in the cluster as-is by default. Setting ttlSecondsAfterFinished causes the Job and its Pods to be deleted automatically once the specified number of seconds has passed after termination.
spec:
ttlSecondsAfterFinished: 60Use this to prevent old Jobs from accumulating after batch work finishes; setting it to 0 deletes them immediately upon completion.
Parallel execution patterns #
Job produces three representative patterns through combinations of completions and parallelism.
| Pattern | Setting | Use |
|---|---|---|
| Single task | completions unset, parallelism unset | A one-off task that runs once and finishes |
| Fixed completion count | completions: N (parallelism optional) | Process N independent items |
| Work queue | parallelism: M, completions unset | Workers pull items from a queue, process them, and stop when the queue is empty |
In the work-queue pattern, each Pod looks at an external queue and pulls work on its own, so you leave completions empty and set only the worker count with parallelism.
Fixed completion count + parallel Job example #
A Job that processes six completions three at a time and allows retries only up to twice.
apiVersion: batch/v1
kind: Job
metadata:
name: batch-import
spec:
completions: 6
parallelism: 3
backoffLimit: 2
activeDeadlineSeconds: 120
ttlSecondsAfterFinished: 120
template:
spec:
restartPolicy: Never
containers:
- name: importer
image: busybox:1.36
command: ["sh", "-c", "echo importing && sleep 5"]This Job runs three Pods at a time to fill six successes, stops if it fails more than twice, is forcibly terminated if it exceeds 120 seconds, and is cleaned up automatically 120 seconds after it finishes.
CronJob: running a Job periodically #
A CronJob is a workload that creates Jobs on a set schedule. If a Job is one-off, a CronJob repeats it. Running a backup every morning at dawn, or generating a report every 5 minutes, are typical use cases.
Let’s generate the skeleton.
k create cronjob report \
--image=busybox:1.36 \
--schedule="*/5 * * * *" \
$do -- /bin/sh -c 'date; echo report' > cronjob.yamlapiVersion: batch/v1
kind: CronJob
metadata:
name: report
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: report
image: busybox:1.36
command: ["/bin/sh", "-c", "date; echo report"]A CronJob holds the Job’s spec verbatim under spec.jobTemplate, and a Pod template nests again inside that — a three-level nested structure. This is the resource where indentation errors are most likely on the exam, so generating the skeleton with a generator is the safer route.
schedule: cron notation #
schedule uses the standard five-field cron notation.
┌── minute (0〜59)
│ ┌── hour (0〜23)
│ │ ┌── day of month (1〜31)
│ │ │ ┌── month (1〜12)
│ │ │ │ ┌── day of week (0〜6, 0=Sunday)
│ │ │ │ │
* * * * *| Notation | Meaning |
|---|---|
*/5 * * * * | Every 5 minutes |
0 * * * * | On the hour, every hour |
0 2 * * * | Every day at 2 AM |
0 0 * * 0 | Every Sunday at midnight |
concurrencyPolicy: concurrency policy #
concurrencyPolicy decides what to do when the next schedule arrives while the previous run has not yet finished. It is an exam regular, so memorize the difference between the three values exactly.
| Value | Behavior |
|---|---|
Allow (default) | Allows concurrent runs. Creates a new Job even if the previous Job is still running |
Forbid | Skips the new schedule if the previous Job has not finished |
Replace | Cancels the previous Job and replaces it with the new one |
Forbid suits a backup job that must not overlap, and Replace suits a job where only the latest run matters.
startingDeadlineSeconds: starting deadline #
This field allows a Job to start late — within this many seconds after the scheduled time — when the controller was down or a node problem prevented it from starting on time. Once the deadline is exceeded, the run is skipped.
spec:
startingDeadlineSeconds: 30suspend: pausing #
Setting suspend: true stops the CronJob from creating new Jobs. It is used to briefly turn off a periodic job during maintenance. It does not affect Jobs that are already running.
# Pause
k patch cronjob report -p '{"spec":{"suspend":true}}'
# Resume
k patch cronjob report -p '{"spec":{"suspend":false}}'History limits #
A CronJob keeps a set number of finished Jobs.
| Field | Meaning | Default |
|---|---|---|
successfulJobsHistoryLimit | How many successful Jobs to keep | 3 |
failedJobsHistoryLimit | How many failed Jobs to keep | 1 |
This setting prevents old Jobs from accumulating indefinitely; raise the value to keep more history for debugging.
CronJob example applying concurrencyPolicy #
A backup CronJob that prevents overlapping runs, allows a late start of only up to 30 seconds, and keeps its history trimmed.
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
startingDeadlineSeconds: 30
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 2
jobTemplate:
spec:
backoffLimit: 3
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: busybox:1.36
command: ["/bin/sh", "-c", "echo backing up && sleep 10"]It runs every day at 2 AM but skips this run if the previous backup has not finished, and retries up to three times on failure via the backoffLimit inside the jobTemplate.
Hands-on: working with Job and CronJob #
Creating a Job and checking logs #
# Create the Job
k apply -f job.yaml
# Check status (done when the COMPLETIONS column reaches 6/6)
k get job batch-import
# View the Pods the Job created
k get pods -l job-name=batch-import
# Check logs (view the log of one Pod by the Job name)
k logs job/batch-import
# Detailed progress and events
k describe job batch-importThe COMPLETIONS column of k get job shows progress in success/target format. You can also check elapsed time with DURATION and AGE.
Working with a CronJob and triggering manually #
Because a CronJob only creates a Job when the scheduled time arrives, you trigger it manually to verify behavior immediately on the exam.
# Create the CronJob
k apply -f cronjob.yaml
# Confirm registration (SCHEDULE, LAST SCHEDULE, ACTIVE columns)
k get cronjob db-backup
# Run once now without waiting for the schedule (create a Job from the CronJob)
k create job manual-run --from=cronjob/db-backup
# Logs of the manually run Job
k logs job/manual-run
# List of Jobs the CronJob created
k get jobsk create job <name> --from=cronjob/<CronJob name> clones the CronJob’s jobTemplate verbatim and creates a Job immediately. This is the fastest way to verify that the work runs correctly before grading.
Exam points #
- Job is
batch/v1. Get the apiVersion wrong and the resource won’t be created. Generating it with a generator gets it right automatically. - restartPolicy is
NeverorOnFailureonly. UsingAlwaysin a Job or CronJob Pod template is rejected. - backoffLimit is a regular. When a “retry up to N times” requirement appears, this is the field. Remembering the default of 6 is fast.
- The three concurrencyPolicy values. Questions that ask you to distinguish
Allow(default),Forbid(skip), andReplace(replace) precisely come up often. - CronJob is three-level nesting. With a structure that goes all the way down to
jobTemplate.spec.template.spec, indentation mistakes are frequent. Generating the skeleton with a generator and then editing only the fields is the safer route. - Manual trigger. To verify a CronJob immediately, use
k create job --from=cronjob/<name>. - The combination of completions and parallelism splits the single, fixed, and queue patterns. Both default to 1.
Wrap-up #
What this post locked in:
- Job is a run-once batch workload.
completionssets the target completion count,parallelismsets the concurrent run count. - Retries and limits. Control failures with
backoffLimit(count) andactiveDeadlineSeconds(time), and clean up finished Jobs automatically withttlSecondsAfterFinished. - restartPolicy is
NeverorOnFailure.Alwayscannot be used in a Job. - CronJob repeats a Job on a cron schedule. Tune its behavior with
schedule,concurrencyPolicy,startingDeadlineSeconds,suspend, and history limits. - Hands-on. Verify a Job with
k logs job/<name>, and trigger a CronJob manually withk create job --from=cronjob/<name>.
If you want a broader grasp of batch workload concepts, the Kubernetes Intermediate series is a good companion read.
Next: Deployment strategies #
With the run-to-completion workload types (Job, CronJob) wrapped up, we have finished the workload group spanning #5〜#7. Next we return to the question of how to swap those workloads out.
#8 Deployment Strategies: Blue-green, canary covers blue-green deployment, which moves an entire version at once, and canary deployment, which sends only part of the traffic to the new version. We will implement both strategies directly on the exam by combining Deployment, Service, and label selectors.