Certified Kubernetes Administrator (CKA) #27 Full-Length Practice Exam — 17 Tasks with Solutions

Infrastructure Kubernetes Container Orchestration Certification

Saturday, June 6, 2026

18 min read

From #1 the exam environment through #26 exam tips, we have circled every domain once. The final post of this series is not one you read but one you solve. Just like the real CKA, it gathers 17 tasks that integrate every domain in one place. These are not multiple choice — they are hands-on scenarios where you build and fix a cluster directly in an empty terminal and on the nodes, and each task carries a point value.

The recommended time limit is 2 hours, the same as the real exam. The pass line is 66%, scored by summing the point values of all 17 tasks. If you get stuck on a task, mark it, move on, and bank points from the high-value tasks you have a feel for first — that is the way over the pass line.

Because CKA has multiple clusters, making context switching the very first thing you do prevents wrong answers. Each task is only graded if you solve it in the specified context, and since many tasks take you inside the nodes, you also need a feel for SSH, systemctl, and etcdctl. For each task, solve it fully on your own first, then unfold the solution. If you read the solution first, your hands never learn it.

How to take it #

Solving on a multi-node cluster built with kubeadm is closest to the real thing. If that is hard locally, stand up one control plane and one worker on two or three cloud VMs. etcd backup/restore and node troubleshooting don’t build the right feel on a single-node minikube.
For each task, switch to the specified context first. As this series has repeated, a misconfigured context scores 0 even if your answer is correct.

k config use-context <the context the question specifies>

Some tasks have you SSH into the nodes, so check ahead of time that you can connect using the hostnames the exam presents (such as node01). If you need root on a node, switch with sudo -i.
Solve all 17 to the end, then unfold the solutions and grade them in one pass. Peeking at solutions mid-exam dulls your sense of the real thing. Applying the alias k=kubectl and export do="--dry-run=client -o yaml" setup from #1 first will save you time.

Domain distribution #

The 17 tasks are arranged to match the domain weights of the real CKA. Troubleshooting is the largest at 30%, so it has the most tasks too.

#	Domain	Tasks	Task numbers
1	Cluster Architecture, Installation, Configuration	5	1, 2, 3, 4, 5
2	Workloads and Scheduling	3	6, 7, 8
3	Services and Networking	3	9, 10, 11
4	Storage	2	12, 13
5	Troubleshooting	4	14, 15, 16, 17

The points reflect the domain weights and task difficulty, totaling 100. The scoring criteria are laid out at the end of the post.

Task 1 (8 points): Cluster Architecture, Installation, Configuration #

In context cluster1, save a snapshot of the running etcd to /opt/etcd-backup.db. Work by SSHing into the control plane node, where etcd runs as a static Pod and the certificates live under /etc/kubernetes/pki/etcd.

Solution

Connect to the control plane node, switch to root, and save the snapshot.

ssh cluster1-controlplane
sudo -i

ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Once it finishes, check the snapshot status.

ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/etcd-backup.db

Explanation: snapshot save connects to etcd with a client certificate, so all three of --cacert, --cert, and --key are required, and you can find the paths in the --cert-file, --key-file, and --trusted-ca-file values of the etcd static Pod manifest (/etc/kubernetes/manifests/etcd.yaml). The most common trap is dropping ETCDCTL_API=3, which makes the command run against the v2 API and fail.

Task 2 (8 points): Cluster Architecture, Installation, Configuration #

Restore the snapshot /opt/etcd-backup.db from Task 1 into a new data directory /var/lib/etcd-restore, and edit the manifest so the etcd static Pod uses this directory. After the restore, confirm the cluster returns to normal.

Solution

Restore the snapshot into the new directory.

ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
  --data-dir=/var/lib/etcd-restore

In the etcd static Pod manifest, change the host path to the new directory.

vim /etc/kubernetes/manifests/etcd.yaml

  volumes:
    - name: etcd-data
      hostPath:
        path: /var/lib/etcd-restore
        type: DirectoryOrCreate

Wait for kubelet to detect the manifest change and bring the etcd Pod back up, then verify.

crictl ps | grep etcd
k get nodes

Explanation: snapshot restore only creates a new data directory — it does not restart etcd. The actual switch happens at the step where you change hostPath.path in the static Pod manifest to the new directory, making kubelet recreate the etcd Pod against the new data. Watch out for the permissions on the parent path of --data-dir and for the brief moment the existing etcd container goes down.

Task 3 (8 points): Cluster Architecture, Installation, Configuration #

Upgrade the control plane and worker node of context cluster1 from v1.31.0 to v1.31.1. Upgrade the control plane first, then upgrade the worker node01. The worker must be emptied of workloads during its upgrade.

Solution

On the control plane node, bump kubeadm, then check and apply the upgrade plan.

ssh cluster1-controlplane
sudo -i

apt-get update && apt-get install -y kubeadm=1.31.1-1.1
kubeadm upgrade plan
kubeadm upgrade apply v1.31.1

Bump the control plane’s kubelet and kubectl, then restart it with drain,uncordon.

kubectl drain cluster1-controlplane --ignore-daemonsets
apt-get install -y kubelet=1.31.1-1.1 kubectl=1.31.1-1.1
systemctl daemon-reload && systemctl restart kubelet
kubectl uncordon cluster1-controlplane

Drain the worker from the control plane, then upgrade it on the node itself.

kubectl drain node01 --ignore-daemonsets

ssh node01
sudo -i
apt-get update && apt-get install -y kubeadm=1.31.1-1.1
kubeadm upgrade node
apt-get install -y kubelet=1.31.1-1.1
systemctl daemon-reload && systemctl restart kubelet
exit

kubectl uncordon node01

Explanation: The key difference is using kubeadm upgrade apply for the control plane and kubeadm upgrade node for the worker. The upgrade order is install kubeadm → upgrade → install kubelet/kubectl → restart kubelet, and draining without --ignore-daemonsets gets blocked by DaemonSet Pods. Version jumps of only one minor at a time are allowed.

Task 4 (8 points): Cluster Architecture, Installation, Configuration #

Create a ServiceAccount deployer that operates only in namespace dev, and configure RBAC so this ServiceAccount can create, get, update, and delete Deployments within dev. Verify the permissions are correct with kubectl auth can-i.

Solution

Create the ServiceAccount, Role, and RoleBinding.

k -n dev create serviceaccount deployer
k -n dev create role deploy-manager \
  --verb=create,get,list,update,delete \
  --resource=deployments.apps
k -n dev create rolebinding deployer-binding \
  --role=deploy-manager \
  --serviceaccount=dev:deployer

Verify the permissions.

k -n dev auth can-i create deployments --as=system:serviceaccount:dev:deployer
k -n dev auth can-i delete deployments --as=system:serviceaccount:dev:deployer

Explanation: A Role is a namespace-scoped set of permissions, and a RoleBinding connects that Role to a subject (here, the ServiceAccount). If you don’t spell out the API group as in --resource=deployments.apps, it gets captured under the core group and can end up an empty rule. When verifying, the subject must be written in the form system:serviceaccount:<ns>:<name>, and both commands should return yes.

Task 5 (6 points): Cluster Architecture, Installation, Configuration #

Check the expiry date of the certificate /etc/kubernetes/pki/apiserver.crt, and renew all cluster certificates with kubeadm. After renewal, check again that the expiry date has moved into the future.

Solution

On the control plane node, check the certificate expiry status.

ssh cluster1-controlplane
sudo -i

kubeadm certs check-expiration

To see a specific certificate’s expiry date directly, check it with openssl.

openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -enddate

Renew all certificates, then restart the control plane static Pods.

kubeadm certs renew all
systemctl restart kubelet

Explanation: kubeadm certs check-expiration shows the expiry dates of each certificate and kubeconfig in a table. certs renew all only reissues the certificate files, so you must restart the components — apiserver and the other static Pods — to make them read the new certificates and actually take effect. kubeadm auto-renews certificates during a control plane upgrade, so a regular upgrade can double as a renewal.

Task 6 (6 points): Workloads and Scheduling #

In namespace apps, create a DaemonSet log-agent that runs exactly one fluentd log collector on every node. Use the image fluent/fluentd:v1.16, and it does not need to run on the control plane node because of its taint.

Solution

Since this is a kind whose skeleton you can’t build with dry-run, write the manifest by hand.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-agent
  namespace: apps
spec:
  selector:
    matchLabels:
      app: log-agent
  template:
    metadata:
      labels:
        app: log-agent
    spec:
      containers:
        - name: fluentd
          image: fluent/fluentd:v1.16

k apply -f log-agent.yaml
k -n apps get ds log-agent

Explanation: A DaemonSet places one Pod on each worker node and has no replicas field. The control plane node usually has a node-role.kubernetes.io/control-plane:NoSchedule taint, so without a separate toleration nothing schedules there — which means the requirement that it “does not need to run there” is satisfied by simply not adding a toleration. Unlike a Deployment, the selector and template labels must match.

Task 7 (7 points): Workloads and Scheduling #

Add the taint gpu=true:NoSchedule to the worker node node01, and in namespace apps create a Pod ml-job (image nginx) that tolerates this taint and schedules only onto nodes with the label disktype=ssd.

Solution

Add the taint to the node and check that the label is present.

k taint node node01 gpu=true:NoSchedule
k label node node01 disktype=ssd

Create a Pod with a toleration and nodeAffinity.

apiVersion: v1
kind: Pod
metadata:
  name: ml-job
  namespace: apps
spec:
  tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoSchedule
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd
  containers:
    - name: ml-job
      image: nginx

k apply -f ml-job.yaml
k -n apps get pod ml-job -o wide

Explanation: A taint is the mechanism by which a node repels Pods, and a toleration is a Pod’s declaration that it will tolerate that taint. With only a toleration the Pod can still land on other nodes, so to pin it to a specific node you must add nodeAffinity or nodeSelector as well. The toleration’s value must be quoted as a string to avoid a "true" boolean-interpretation error.

Task 8 (7 points): Workloads and Scheduling #

In namespace data, create a StatefulSet cache. Use the image redis:7 with replicas 3, and give the Pods stable DNS names through a headless Service cache. Each Pod has a 1Gi PVC via volumeClaimTemplates.

Solution

Define the headless Service and the StatefulSet together.

apiVersion: v1
kind: Service
metadata:
  name: cache
  namespace: data
spec:
  clusterIP: None
  selector:
    app: cache
  ports:
    - port: 6379
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cache
  namespace: data
spec:
  serviceName: cache
  replicas: 3
  selector:
    matchLabels:
      app: cache
  template:
    metadata:
      labels:
        app: cache
    spec:
      containers:
        - name: redis
          image: redis:7
          ports:
            - containerPort: 6379
          volumeMounts:
            - name: data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

k apply -f cache.yaml
k -n data get pods -l app=cache

Explanation: A StatefulSet must be tied to a headless Service (clusterIP: None) via serviceName to grant stable DNS in the form cache-0.cache.data.svc.cluster.local. volumeClaimTemplates automatically creates a separate PVC per Pod, and even when a Pod is recreated it reattaches to the same PVC. Pods being named sequentially from 0 is another difference from a Deployment.

Task 9 (6 points): Services and Networking #

In namespace net, create a Deployment web with the nginx image and replicas 2, and expose it through a NodePort Service web-np accessible from outside the cluster on the node’s port 30080. The Service port is 80 and the target port is 80.

Solution

k -n net create deploy web --image=nginx --replicas=2
k -n net expose deploy web --name=web-np --type=NodePort --port=80 --target-port=80

Pin the nodePort value to 30080.

k -n net patch svc web-np --type='json' \
  -p='[{"op":"replace","path":"/spec/ports/0/nodePort","value":30080}]'

Explanation: When you create a NodePort with expose, nodePort is auto-assigned from the 30000〜32767 range, so to pin it to a specific value you either patch it or write nodePort: 30080 directly into the dry-run YAML and apply. Verify with curl <node IP>:30080 to confirm a response.

Task 10 (6 points): Services and Networking #

In namespace net, create an Ingress web-ing. Route traffic arriving at host web.example.com path / to port 80 of the Service web-np from Task 9, using Prefix for pathType and nginx for ingressClassName.

Solution

k -n net create ingress web-ing \
  --class=nginx \
  --rule="web.example.com/*=web-np:80" $do > ing.yaml

The generated manifest looks like this.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ing
  namespace: net
spec:
  ingressClassName: nginx
  rules:
    - host: web.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-np
                port:
                  number: 80

k apply -f ing.yaml

Explanation: The --rule of create ingress takes the form host/path=service:port, and the trailing /* of the path converts to pathType: Prefix. Drop --class=nginx and ingressClassName is left empty, so routing won’t work on a cluster with no default IngressClass. Actual operation requires the ingress-nginx controller, but grading looks at the correctness of the resource definition.

Task 11 (6 points): Services and Networking #

In namespace net, create a NetworkPolicy db-allow. Allow ingress traffic to Pods with the label app=db only when a Pod with the label role=api in the same namespace accesses TCP port 5432, and block all other ingress.

Solution

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-allow
  namespace: net
spec:
  podSelector:
    matchLabels:
      app: db
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: api
      ports:
        - protocol: TCP
          port: 5432

k apply -f db-allow.yaml

Explanation: podSelector selects the target Pods the policy applies to (app=db), and ingress.from selects the allowed sources (role=api). Once even one NetworkPolicy applies to a Pod, all traffic not explicitly listed is blocked, so other ingress is shut off without a separate deny-all. Putting from and ports in the same rule makes it an AND of both conditions, and it is enforced only on a policy-capable CNI such as Calico.

Task 12 (8 points): Storage #

In namespace storage, create a 2Gi PersistentVolume pv-data (accessMode ReadWriteOnce, reclaimPolicy Retain) backed by the host path /mnt/data, then create a 1Gi PVC pvc-data that binds to it exactly, and finally create a Pod pv-user (nginx) that mounts this PVC at /usr/share/nginx/html.

Solution

Define the PV, PVC, and Pod in order.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-data
spec:
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /mnt/data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-data
  namespace: storage
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: pv-user
  namespace: storage
spec:
  containers:
    - name: nginx
      image: nginx
      volumeMounts:
        - name: data
          mountPath: /usr/share/nginx/html
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: pvc-data

k apply -f pv.yaml
k -n storage get pvc pvc-data

Explanation: For a PVC to bind to a PV, the accessModes must match and the PV’s capacity must be greater than or equal to the PVC’s request (2Gi ≥ 1Gi). A PV is a cluster-wide resource so it has no namespace, while the PVC and Pod must live in the same namespace. With reclaimPolicy: Retain, the PV’s data survives even after you delete the PVC, and the PV goes into the Released state.

Task 13 (6 points): Storage #

In namespace storage, create a StorageClass fast for dynamic provisioning. The provisioner is rancher.io/local-path, volumeBindingMode is WaitForFirstConsumer, and allowVolumeExpansion is true. Then create a 3Gi PVC pvc-fast using this StorageClass.

Solution

Define the StorageClass and PVC.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-fast
  namespace: storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast
  resources:
    requests:
      storage: 3Gi

k apply -f sc.yaml
k -n storage get pvc pvc-fast

Explanation: WaitForFirstConsumer defers volume creation until a Pod that uses the PVC is scheduled, so a PVC on its own staying in Pending is normal. allowVolumeExpansion: true is required to later grow the PVC’s resources.requests.storage for online expansion. If you don’t specify storageClassName on a PVC, the default StorageClass is used.

Task 14 (7 points): Troubleshooting #

The worker node node01 in context cluster1 is in NotReady state. Diagnose the cause and bring the node back to Ready.

Solution

First check the node status and conditions.

k get nodes
k describe node node01

Connect to the node and look at the kubelet status and logs.

ssh node01
sudo -i
systemctl status kubelet
journalctl -u kubelet -e

If kubelet is stopped, restart it; if it’s a config error, fix the file the logs point to and then restart.

systemctl enable --now kubelet
systemctl restart kubelet

Explanation: The most common causes of NotReady are that the kubelet service died (systemctl restart kubelet), a kubeconfig/CA path error, or disk/memory pressure. The Conditions in describe node and journalctl -u kubelet pin down the cause, and if kubelet is inactive at boot, enable --now turns on automatic start at boot too. The key is recognizing that this is not a control plane problem but something you must look at on the node itself.

Task 15 (7 points): Troubleshooting #

On the control plane of context cluster2, kubectl fails to respond with connection refused. The apiserver static Pod manifest has been edited incorrectly. Find the cause, fix it, and bring the apiserver back to normal.

Solution

Connect to the control plane node and look at the apiserver container status.

ssh cluster2-controlplane
sudo -i

crictl ps -a | grep kube-apiserver
crictl logs <apiserver-container-id>

Review the static Pod manifest and fix the wrong field (for example, a typo’d port, a wrong certificate path, or broken indentation).

vim /etc/kubernetes/manifests/kube-apiserver.yaml

Wait for kubelet to detect the manifest change and bring the apiserver Pod back up, then verify.

crictl ps | grep kube-apiserver
k get nodes

Explanation: kubectl only works once the apiserver is up, so in this state the key is to read the container logs directly with crictl instead of kubectl. A static Pod is automatically recreated by kubelet when you edit its manifest file, so moving the file out and back briefly produces a forced restart. A typo in the --etcd-servers path or a certificate path is a common cause.

Task 16 (7 points): Troubleshooting #

Requests sent to the Service frontend in namespace apps are not responding. The Pods are all Running, but the Service’s endpoints are empty. Diagnose the cause and fix it so traffic reaches the Pods.

Solution

Compare the Service, the endpoints, and the Pod labels.

k -n apps get svc frontend -o wide
k -n apps get endpoints frontend
k -n apps describe svc frontend
k -n apps get pods --show-labels

If the Service’s selector is out of step with the Pods’ labels, fix the selector to match the Pod labels.

k -n apps patch svc frontend \
  -p '{"spec":{"selector":{"app":"frontend"}}}'
k -n apps get endpoints frontend

Explanation: Empty endpoints are a sign that the Service’s selector matches no Pod label. Even when Pods are Running, if the selector is out of step they aren’t registered in the endpoints and traffic can’t reach them. After fixing the selector, verify that get endpoints fills in with Pod IPs, and check for port name/number mismatches at the same time.

Task 17 (7 points): Troubleshooting #

The Pod report in namespace apps is in CrashLoopBackOff state. Diagnose the cause, save the previous logs of the most recently terminated container to the file /tmp/report.log, and if the cause is out of memory (OOMKilled), raise the memory limit to 256Mi to bring it back to normal.

Solution

Diagnose the state and the termination cause.

k -n apps describe pod report
k -n apps logs report --previous > /tmp/report.log

If Last State is OOMKilled, raise the memory limit. A Pod’s resources can’t be edited in place, so pull the manifest, fix it, and recreate.

k -n apps get pod report -o yaml > report.yaml

    resources:
      limits:
        memory: "256Mi"
      requests:
        memory: "128Mi"

k -n apps delete pod report
k apply -f report.yaml

Explanation: CrashLoopBackOff is a state where the container repeatedly terminates, so the current log may be empty — use --previous (-p) to pull the logs of the container that terminated just before. If the Last State Reason in describe is OOMKilled, an insufficient memory limit is the cause. A running Pod’s resources are immutable, so you must edit the manifest and delete,recreate it, and if the Pod is managed by a Deployment, fix the Deployment’s template instead.

Scoring criteria #

Grade by summing each task’s points. The total is 100, and 66 or higher is the passing zone.

Domain	Tasks , points	Subtotal
Cluster Architecture, Installation, Configuration	1(8) , 2(8) , 3(8) , 4(8) , 5(6)	38
Workloads and Scheduling	6(6) , 7(7) , 8(7)	20
Services and Networking	9(6) , 10(6) , 11(6)	18
Storage	12(8) , 13(6)	14
Troubleshooting	14(7) , 15(7) , 16(7) , 17(7)	28
Total	(total)	100

Grading is result-based, just like the real exam. It looks not at how you typed the commands but at whether the resources you created and the cluster state you recovered match the requirements. Even within a single task, partial credit is split by item — labels, ports, paths, fields — so even when you’re stuck on one item, filling in the parts you can to the end is better for your score.

Reviewing weak domains #

After grading, go back to the corresponding post in the table below for any low-scoring domain and review it.

Domain	Related tasks	Posts to review
Cluster Architecture, Installation, Configuration	1, 2, 3, 4, 5	#6 , #7 , #8 , #9
Workloads and Scheduling	6, 7, 8	#11 , #13 , #14
Services and Networking	9, 10, 11	#18 , #19 , #20
Storage	12, 13	#16 , #17
Troubleshooting	14, 15, 16, 17	#22 , #23 , #24 , #25

If you ran short on time on a particular task, the issue may be hand speed rather than domain knowledge. In that case, re-read #1 setup and #26 time management, and solve the same 17 tasks once more against the clock. Once etcd backup/restore and node troubleshooting are in your hands, the time per task drops noticeably.

Closing the series #

Starting from the exam environment setup in #1, we passed through every CKA domain across 27 posts — control plane, nodes, kubeadm install, HA, upgrades, etcd backup/restore, certificates, RBAC, workloads, scheduling, resources, storage, Service, Ingress, NetworkPolicy, and the four flavors of troubleshooting. If you cleared 66 points on this mock, you have built hands that can clear the pass line in the real exam room too. Congratulations.

If CKA was the hands-on exam for the cluster operator, the next step is CKS (Certified Kubernetes Security Specialist), which keeps that cluster secure — and the feel for etcd, certificates, RBAC, and troubleshooting you built here is a direct stepping stone into it, so carry the momentum of passing straight into the next hands-on exam.