Certified Kubernetes Security Specialist (CKS) #7: seccomp Profiles
In #6 AppArmor Profiles, we bundled which files a container can access and which capabilities it can use into a profile. The counterpart tool in the same System Hardening domain is seccomp. Where AppArmor looks at files and capabilities, seccomp filters the system calls the container throws at the kernel themselves. In this post we cover the concept of seccomp and its three profile types, how to apply it to a Pod, how to load a custom profile onto the node and reference it, and how to verify that blocking actually works.
What is seccomp #
seccomp (secure computing mode) is a Linux kernel feature that restricts the system calls (syscalls) a process can make. A system call is the only path through which a user-space process asks the kernel to do work. Opening a file, creating a network socket, spawning a new process, loading a kernel module — every one of these happens through a system call. Linux has over 300 system calls, and most containers use only a tiny fraction of them.
The problem is that once an attacker takes over a container, they can freely use the rest of the system calls. System calls like mount, keyctl, unshare, and bpf can become footholds for privilege escalation and container escape. seccomp blocks the system calls a container does not use ahead of time, narrowing the attack surface at the system-call level.
A seccomp profile is a JSON document that sets a default action (defaultAction) and lists the system calls that are exceptions to it. The most common pattern is “block by default, but allow only known-safe system calls.”
The difference from AppArmor #
seccomp and AppArmor are both System Hardening tools, but they block at different layers.
| Item | seccomp | AppArmor |
|---|---|---|
| Target | System calls (syscalls) | File paths, capabilities, network |
| Question | “Should I allow this system call?” | “Should I read or write this file?” |
| Definition location | JSON profile | Text profile (/etc/apparmor.d/) |
| Apply key | securityContext.seccompProfile | annotation or securityContext.appArmorProfile |
| Apply scope | Pod or container | container |
The two are complementary, not competing. Block dangerous system calls with seccomp and bundle file and capability access with AppArmor, and your defenses stack layer upon layer. The exam covers the two tools separately, but in practice applying them together is the standard.
The three profile types #
Kubernetes’s seccompProfile.type takes three values.
| type | Meaning | Notes |
|---|---|---|
RuntimeDefault | Apply the default profile the container runtime provides | A sensible default vetted by containerd / CRI-O. Recommended |
Localhost | Reference a custom profile file loaded onto the node | Specify the file path with localhostProfile |
Unconfined | No seccomp applied. All system calls allowed | Effectively defenseless. Avoid it |
Make RuntimeDefault your default #
The first thing to memorize is the principle that you use RuntimeDefault as your default. Runtimes like containerd or CRI-O ship with a vetted default profile that blocks dangerous system calls a container workload almost never uses. This profile blocks dangerous system calls like mount, reboot, and keyctl without breaking ordinary applications.
One thing to watch for is the fact that Kubernetes’s past default was Unconfined. A Pod that did not specify seccomp came up with no system-call restriction at all. To fix this, turning on the kubelet’s --seccomp-default flag (or the SeccompDefault feature gate) automatically applies RuntimeDefault to every Pod that does not specify a profile. When the exam gives you a task like “enforce a default seccomp on every Pod on the node,” this is the flag to recall.
The securityContext.seccompProfile setting #
A seccomp profile is specified in two places: at the Pod level and at the container level. Placed at the Pod level it applies to all containers; placed at the container level it applies only to that container and overrides the Pod-level setting.
Applying RuntimeDefault to the whole Pod #
apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginx:1.27A seccompProfile placed under spec.securityContext applies RuntimeDefault to every container in this Pod. Most exam tasks end in this form.
Applying at the container level #
apiVersion: v1
kind: Pod
metadata:
name: mixed-app
spec:
containers:
- name: app
image: nginx:1.27
securityContext:
seccompProfile:
type: RuntimeDefault
- name: sidecar
image: busybox:1.36
command: ["sleep", "3600"]
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.jsonThis is an example of using a different profile per container within the same Pod. app references the runtime default, while sidecar references a custom profile loaded onto the node. The container-level setting takes precedence over the Pod-level setting.
Writing a custom profile #
When the runtime default is not enough, you write a JSON profile yourself. A custom profile must be loaded into a fixed directory on the node to be referenced with the Localhost type.
The profile directory #
The kubelet looks for custom seccomp profiles at the following path on the node.
/var/lib/kubelet/seccomp/The path you write in localhostProfile is a relative path based on this directory. By convention, profiles are collected under profiles/. For example, if you place a file at the following location,
/var/lib/kubelet/seccomp/profiles/audit.jsonthe manifest references it as localhostProfile: profiles/audit.json. An absolute path or a path outside the directory is not allowed.
The profile JSON structure #
The core of a custom profile is two fields: defaultAction and syscalls.
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": [
"accept4",
"bind",
"listen",
"read",
"write",
"close",
"exit_group"
],
"action": "SCMP_ACT_ALLOW"
}
]
}Since defaultAction is SCMP_ACT_ERRNO, every system call not listed is blocked, and calling it returns an error (EPERM). Only the system calls placed in the names of the syscalls block are allowed with SCMP_ACT_ALLOW. This “block by default + allow explicitly” approach is the safest whitelist pattern.
The main action values are as follows.
| Action | Behavior |
|---|---|
SCMP_ACT_ERRNO | Block the call. Return an error code |
SCMP_ACT_ALLOW | Allow the call |
SCMP_ACT_LOG | Allow but log it (for auditing) |
SCMP_ACT_KILL | Kill the process on the call |
An audit-purpose profile is written by setting defaultAction to SCMP_ACT_LOG to first observe which system calls are used, then building the allow list from those results.
A Pod that references the custom profile #
apiVersion: v1
kind: Pod
metadata:
name: custom-seccomp
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
containers:
- name: app
image: nginx:1.27Set type to Localhost and write a directory-relative path in localhostProfile. If that file is not on the node, the Pod errors out at the creation stage. When a custom-profile task comes up in the exam, you should first check whether the file is loaded at the correct path.
Verification #
The procedure for confirming that the seccomp you applied actually blocks system calls is the heart of verification.
Confirming the profile was applied #
kubectl get pod secure-app -o jsonpath='{.spec.securityContext.seccompProfile}'Confirm that the profile type went into the Pod spec. For the container level, look at .spec.containers[0].securityContext.seccompProfile.
Testing a blocked system call #
In a profile whose defaultAction is block, deliberately call a system call not on the allow list and see whether it is blocked. For example, if the profile does not allow mkdir, directory creation should fail.
kubectl exec custom-seccomp -- mkdir /tmp/testmkdir: can't create directory '/tmp/test': Operation not permittedOperation not permitted is the signal that the system call was blocked by SCMP_ACT_ERRNO. Conversely, in a profile that allows ordinary operations like RuntimeDefault, a plain command should work normally. Checking both sides — “does what should be blocked get blocked, and does what should run still run” — finishes the task.
Checking whether it fell back to Unconfined #
A frequent mistake is thinking you specified a profile when the container actually runs as Unconfined. If seccompProfile is empty in the Pod spec and the kubelet’s --seccomp-default is also off, the container comes up with no system-call restriction. If the jsonpath query above returns empty, the profile was not applied, so review the manifest again.
Exam points #
RuntimeDefaultis the recommended default. Most “apply seccomp to the Pod” tasks end with the single linesecurityContext.seccompProfile.type: RuntimeDefault.- A profile is specified in two places, the Pod level (
spec.securityContext) and the container level (spec.containers[].securityContext), and the container level takes precedence. - Load a custom profile into the node’s
/var/lib/kubelet/seccomp/directory, and write a relative path based on this directory inlocalhostProfile. - The whitelist pattern for a JSON profile is the combination of
defaultAction: SCMP_ACT_ERRNO(block by default) +SCMP_ACT_ALLOWfor the allowed system calls. Unconfinedmeans no seccomp applied, so it is a value to avoid. Remember that not specifying a profile can fall back to the past default.- Node-wide enforcement is handled with the kubelet’s
--seccomp-defaultflag (or theSeccompDefaultfeature gate). - For verification, confirm whether it is applied with
kubectl get pod -o jsonpath, then finish by calling a target system call to block and checking thatOperation not permittedcomes out.
Wrap-up #
What this post locked in:
- seccomp is a Linux kernel feature that filters the system calls a container throws at the kernel, narrowing the attack surface.
- The profile types are three:
RuntimeDefault(runtime default, recommended),Localhost(references a custom file on the node), andUnconfined(not applied). - You apply it with
securityContext.seccompProfile, at both the Pod level and the container level. - A custom profile is loaded into
/var/lib/kubelet/seccomp/and referenced with theLocalhosttype, defining the allow list withdefaultActionandsyscalls. - If seccomp looks at system calls, AppArmor looks at files and capabilities. Stack the two together and the defense grows thicker.
Next — kernel hardening #
We bundled system calls with seccomp and files and capabilities with AppArmor. The last piece of the System Hardening domain is reducing the kernel privileges handed to the container itself.
In #8 kernel hardening, capabilities, /proc protection, we’ll build firsthand the kernel-level hardening settings — securityContext.capabilities to drop Linux capabilities to the minimum, allowPrivilegeEscalation and privileged to block privilege escalation, and /proc masking and readOnlyRootFilesystem.