K8s Advanced #3: Admission Controller — OPA Gatekeeper / Kyverno
The third post in the K8s Advanced series. Intermediate #7 covered how RBAC controls K8s API permissions, NetworkPolicy controls Pod-to-Pod traffic, and ResourceQuota controls resource totals. This post’s topic is the policy grain laid one more layer on top — policies that enforce the shape of the manifest itself. Rules like “containers without limits cannot be created,” “images must come from our ECR registry only,” “every workload must have an owner label” can’t be expressed by RBAC alone. Where these rules go is the Admission stage of the K8s API server, and the two tools that insert policy engines into that stage are OPA Gatekeeper and Kyverno.
This series is K8s Advanced, 6 posts.
- #1 CNI in depth — Calico / Cilium / eBPF
- #2 RBAC / ServiceAccount in depth — Aggregated ClusterRole / Impersonation / IRSA / Workload Identity
- #3 Admission Controller — OPA Gatekeeper / Kyverno ← this post
- #4 CRD and the Operator pattern — controller-runtime
- #5 Observability — Prometheus / Grafana / Loki / OpenTelemetry
- #6 GitOps — ArgoCD / Flux
Admission stage — right before manifests enter etcd #
From the moment you type kubectl apply -f my-pod.yaml to when that manifest is stored in etcd, the path is not a straight line. The K8s API server passes the request through five stages in order.
1. Authentication — who called
2. Authorization — RBAC check. Can the caller use this verb on this resource
3. Mutating Admission — mutate the manifest (defaulting, sidecar injection, etc.)
4. Validating Admission — does the manifest satisfy policy
5. etcd storageStages 3 and 4 are the Admission Controller topic of this post. Even requests that pass authentication and RBAC can be rejected at this stage, and the manifest itself can be mutated before storage.
Mutating vs Validating #
The difference between the two is clear.
- Mutating Admission — mutates the manifest. Examples: auto-injecting sidecar containers into all Pods, auto-filling missing labels, applying defaults. Multiple mutating controllers can apply to the same object in sequence.
- Validating Admission — only inspects the manifest. Pass or reject. No mutation occurs. Importantly, it sees the final manifest after all mutations are done.
The order is always mutating → validating. Since the manifest with all mutations done is passed to the inspection stage, validating rules only need to evaluate “does the final form satisfy policy.”
Built-in Admission Controllers #
Several admission controllers are already compiled into the K8s API server. The ones frequently encountered in operational clusters:
| Controller | Kind | Role |
|---|---|---|
NamespaceLifecycle | Validating | Block object creation in namespaces being deleted |
LimitRanger | Mutating + Validating | Apply LimitRange defaults + reject violations |
ResourceQuota | Validating | Reject when ResourceQuota total is exceeded |
ServiceAccount | Mutating | Auto-attach default ServiceAccount to Pods |
PodSecurity | Validating | Enforce Pod Security Standards (stable from 1.25+) |
DefaultStorageClass | Mutating | Auto-fill default SC into PVCs |
The stage where ResourceQuota and LimitRange covered in Intermediate #7 actually operate is this admission stage. When a manifest exceeds the ResourceQuota total, the ResourceQuota admission controller rejects it at stage 4. Built-in controllers are activated/deactivated via the --enable-admission-plugins API server flag.
Webhook — inserting into the admission stage from outside #
Built-in controllers are embedded in K8s code, so users can’t change their definitions. When an operations team wants to insert their own policy into the admission stage, they use Webhook. There are two kinds.
MutatingWebhookConfiguration— sends the manifest to an external HTTP service, and that service returns the mutated manifest.ValidatingWebhookConfiguration— sends the manifest to an external HTTP service, and that service returns allow/deny.
The K8s API server looks at the result of this webhook call to either pass the request, mutate it, or reject it. Both OPA Gatekeeper and Kyverno are policy engines layered on top of this webhook mechanism. They don’t add new admission types to K8s but are tools that abstract the standard webhook to be used well.
OPA Gatekeeper — policy expressed in Rego #
OPA (Open Policy Agent) is a general-purpose policy engine used outside K8s as well. Policies are written in its own language Rego, and the OPA engine evaluates those policies. Gatekeeper is a tool that wraps OPA as a K8s admission webhook.
Gatekeeper’s core objects are:
ConstraintTemplate— the blueprint of a policy written in Rego. “Defines this kind of policy”Constraint— an instance of a ConstraintTemplate. “Apply this policy to which resources with which parameters”
This separation is Gatekeeper’s model. The “shape” of the policy is written once in ConstraintTemplate, and parameterized to be instantiated multiple times.
ConstraintTemplate example — enforce required labels #
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg}] {
required := input.parameters.labels
provided := input.review.object.metadata.labels
missing := required[_]
not provided[missing]
msg := sprintf("Missing required label: %v", [missing])
}The code inside the rego block is the actual policy. input.review.object is the manifest at the admission stage, and input.parameters are parameters passed from the Constraint. If violation[...] is not empty, the manifest is rejected. Applying the ConstraintTemplate creates a new CRD called K8sRequiredLabels in K8s.
Constraint example — instance of the template above #
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: namespace-must-have-owner
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
parameters:
labels: ["owner", "team"]Applying this Constraint, from that moment any newly created Namespace without owner and team labels is rejected at the admission stage.
$ kubectl create ns test
Error from server (Forbidden): admission webhook "validation.gatekeeper.sh" denied the request:
[namespace-must-have-owner] Missing required label: owner
[namespace-must-have-owner] Missing required label: teamGatekeeper’s auxiliary features #
Beyond policy evaluation, Gatekeeper has a few operations-friendly features.
- dry-run / audit mode — applying with Constraint’s
enforcementAction: dryrundoesn’t reject but only records violations. Used to measure scope of impact before enforcing a policy in production. Configobject to limit evaluation scope — system namespaces likekube-systemcan be excluded from evaluation.- External data referrer — Constraints can reference OPA’s
dataobject to evaluate policies against other K8s objects or external data.
Kyverno — policy expressed in YAML #
Kyverno is a tool in the same category as OPA Gatekeeper, but with a different approach. Writing policy in YAML, without learning a new language, is Kyverno’s biggest distinction. K8s users are already familiar with YAML, so the barrier to adopting policies is low.
Kyverno’s three actions #
A Kyverno policy does one (or more) of three things.
- validate — check that the manifest satisfies the rule (Validating Admission)
- mutate — mutate the manifest (Mutating Admission)
- generate — automatically create another object (Kyverno-only feature)
generate isn’t an action of the admission stage itself, but it expresses patterns like “when a Namespace is created, automatically generate a default NetworkPolicy inside it” as a single policy.
Validate example — reject containers without limits #
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: require-cpu-memory-limits
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Pod must have CPU and memory limits."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"?* inside pattern means “any value is fine but it can’t be empty.” Applying this policy requires every container in every new Pod to have both limits.cpu and limits.memory written. A pattern that enforces the resource model covered in Intermediate #4 at the admission level.
Mutate example — auto-add labels to all Pods #
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-labels
spec:
rules:
- name: add-managed-by
match:
any:
- resources:
kinds: ["Deployment", "StatefulSet"]
mutate:
patchStrategicMerge:
metadata:
labels:
managed-by: platform-teamApplying this policy auto-adds the managed-by label at the admission stage even when the manifest doesn’t have it. A path to enforce label standards without changing a line of code.
Gatekeeper vs Kyverno — which to use #
Comparing the two tools in one table.
| Dimension | OPA Gatekeeper | Kyverno |
|---|---|---|
| Policy language | Rego (must learn anew) | YAML |
| Expressiveness | Very high (Turing-complete Rego) | Moderate (declarative pattern matching) |
| Learning curve | Steep | Low |
| Policy actions | validate, mutate (1.0+) | validate, mutate, generate, cleanup |
| Non-K8s policy | OPA itself usable beyond K8s | K8s-only |
| Policy library | Rich (gatekeeper-library) | Rich (kyverno/policies) |
The selection decision usually comes down to this.
- If you can absorb Rego’s learning burden and want to use the same policy engine beyond K8s — Gatekeeper is natural. Advantageous when carrying policy consistency across multiple systems in a large organization.
- If you want the K8s operations team to write and maintain policies directly, and the entry barrier to writing policies itself is the largest cost — Kyverno is faster. The learning cost difference in the first month of adoption is large.
Both tools have sufficient operational scale track record. Unless overwhelming expressiveness is needed, considering Kyverno first and moving to Gatekeeper when Rego’s expressiveness is truly needed is a natural flow too.
Operational principles to lock in #
A few principles to definitely lock in operationally when adopting admission webhooks.
1. failurePolicy’s two choices — Fail vs Ignore #
The field that decides how the API server behaves when the webhook can’t be called (timeout, network outage, policy engine Pod down).
failurePolicy: Fail— reject requests when the webhook cannot respond. Policy is never bypassed, but the policy engine’s availability is tied to the cluster’s overall availability. If the policy engine goes down, new workloads cannot be created.failurePolicy: Ignore— pass requests through when the webhook cannot respond. Availability is preserved but policy can be bypassed.
The practical approach is to mark critical policies as Fail and supplementary ones as Ignore. Running the policy engine itself with multiple replicas (2 or more) and protecting it with a PodDisruptionBudget is the baseline.
2. Exclude system namespaces with namespaceSelector #
Namespaces where K8s’s own workloads run, like kube-system and kube-public, are usually excluded from policy evaluation. A safety device to prevent the incident of cluster boot itself being blocked by policy.
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values: ["kube-system", "kube-public", "kube-node-lease"]3. Gradual adoption with dry-run #
Applying a new policy in enforce mode directly to an operational cluster can reject existing workload updates one after another. The standard flow:
1. Apply in dry-run mode (Gatekeeper's dryrun, Kyverno's Audit)
2. Collect violation logs for a period → measure scope of impact
3. Clean up violating workloads first
4. Switch to enforce modeSkipping this cycle leads to incidents where existing manifests are rejected en masse, halting GitOps sync. Even when the policy intent is sound, the rollout procedure must be gradual.
4. Webhook latency monitoring #
Admission webhooks are on the critical path of every manifest change. When the policy engine slows down, kubectl apply slows with it. Both Gatekeeper and Kyverno expose their own metrics, so tying P99 latency and rejection rate into the observability stack covered in #5 is the standard.
Closing #
The K8s API server’s admission stage and the policy engines layered on top have been organized. Among the five stages right before manifests are stored in etcd, mutating and validating admission are the entry point of policy, and the model where webhooks can insert external policy engines beyond K8s built-in controllers was followed. The two standards of those external engines are OPA Gatekeeper and Kyverno, and we compared the grain of Gatekeeper’s expressiveness vs Kyverno’s low entry barrier. Finally, we covered four operational principles — failurePolicy / system namespace exclusion / dry-run adoption / webhook latency monitoring. The next post covers extending the K8s API itself — defining new object kinds via CRD and operating those objects through Operator patterns built on controller-runtime.