K8s Advanced #4: CRD and the Operator Pattern — controller-runtime
The fourth post in the K8s Advanced series. The objects covered so far have all been standard resources K8s ships built in — Pod, Deployment, Service, ConfigMap, Secret, NetworkPolicy, etc. But the real appeal of the K8s API is being able to add domain objects on top. Domain concepts like a PostgreSQL cluster, a Redis primary-replica topology, or a Kafka broker group can become first-class objects inside K8s, queryable via kubectl get postgrescluster, declared via manifests, and operated automatically by controllers. The two axes of this extension are CustomResourceDefinition (defining new object kinds) and Operator (the controller that operates those objects).
This series is K8s Advanced, 6 posts.
- #1 CNI in depth — Calico / Cilium / eBPF
- #2 RBAC / ServiceAccount in depth — Aggregated ClusterRole / Impersonation / IRSA / Workload Identity
- #3 Admission Controller — OPA Gatekeeper / Kyverno
- #4 CRD and the Operator pattern — controller-runtime ← this post
- #5 Observability — Prometheus / Grafana / Loki / OpenTelemetry
- #6 GitOps — ArgoCD / Flux
Two paths to K8s API extension #
K8s offers two paths for extending its API.
| Path | Characteristic |
|---|---|
| CRD (CustomResourceDefinition) | Register a new object kind via manifest. The K8s API server stores that object in etcd and treats it like a standard object. |
| Aggregation Layer | Run a separate API server and have the K8s API server delegate calls to it. More flexible but with high operational cost. |
The path overwhelmingly used in operational clusters is CRD. Aggregation Layer is rarely used outside of cases tightly coupled with K8s core, like metrics-server. The topic of this post is CRD.
CRD — defining a new object kind #
CRD is itself a kind of K8s object. Applying a single CRD registers a new object kind in the cluster from that moment on, and you can create and query that object via kubectl.
The simplest CRD example #
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: widgets.myteam.example.com
spec:
group: myteam.example.com
scope: Namespaced
names:
plural: widgets
singular: widget
kind: Widget
shortNames: ["wg"]
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
size:
type: string
enum: ["small", "medium", "large"]
replicas:
type: integer
minimum: 1
maximum: 10
required: ["size"]
status:
type: object
properties:
phase:
type: string
subresources:
status: {}After applying this CRD, the following manifest becomes valid in that cluster.
apiVersion: myteam.example.com/v1
kind: Widget
metadata:
name: my-first-widget
namespace: default
spec:
size: medium
replicas: 3kubectl get widgets
kubectl get wg my-first-widget -o yamlKey elements of CRD definition #
A few important parts in the manifest above from an operational perspective.
scope: NamespacedvsCluster— whether this object lives inside a namespace or cluster-wide. Hard to change once decided, so decide based on the domain’s grain. ConfigMap and Pod are Namespaced, Node and StorageClass are Cluster.schema.openAPIV3Schema— defines the object’s shape as an OpenAPI schema. The K8s API server validates manifests against this schema at the admission stage. Constraints likeenum,minimum,requiredare expressed here.subresources.status: {}— separates thestatusfield as a separate subresource. The K8s standard pattern that lets controllers update onlystatuswithout touchingspec. This single line is very important.versions— CRD can support multiple versions simultaneously, and one version withstorage: truedecides the actual etcd storage format. Storage migration is needed when changing versions.
CRD alone isn’t enough — the role of Operator #
Just registering a CRD doesn’t give the object meaning. A Widget object sitting in etcd is just a manifest — no action takes place on its own. A controller that watches that object and actually creates and manages things is needed.
The K8s community calls this controller an Operator. An Operator is essentially a bundle of two parts:
- CRD — definition of the new domain object’s shape
- Custom Controller — code that looks at that object and runs a reconcile loop
The reason this model is powerful is that it extends K8s’s core paradigm — declarative desired state + controller’s reconcile loop — to the user’s domain. You simply declare in a manifest “I want a PostgreSQL cluster of 1 primary + 3 replicas,” and the Operator automatically creates and manages the StatefulSet, Service, PVC, ConfigMap, Secret, and backup CronJob.
Reconcile loop — the essence of a controller #
The core pattern of a K8s controller is an infinite loop.
loop forever:
observed_state = query actual state from K8s API
desired_state = read desired state from manifest
if observed_state != desired_state:
close the gap via K8s API calls
sleep until next triggerThis simple loop is the common model of built-in controllers (Deployment, StatefulSet, Job, etc.) and user-defined Operators. Writing an Operator means deciding “how to implement this reconcile function for my domain object.”
controller-runtime — the standard skeleton for Operators #
It’s possible to write a reconcile loop by calling the K8s API client directly, but the boilerplate is enormous. controller-runtime is a Go library managed by the Kubernetes project itself, a tool that standardizes the Operator skeleton. Kubebuilder and Operator SDK are higher-level tools wrapping controller-runtime.
The shape of a Reconciler #
A controller-runtime-based Operator nearly always has the following shape.
package controller
import (
"context"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
myteamv1 "myteam.example.com/api/v1"
)
type WidgetReconciler struct {
client.Client
}
func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
// 1. desired state — read the Widget object
var widget myteamv1.Widget
if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
return reconcile.Result{}, client.IgnoreNotFound(err)
}
// 2. observed state — read the Deployment Widget should make
var deploy appsv1.Deployment
err := r.Get(ctx, client.ObjectKey{
Namespace: widget.Namespace,
Name: widget.Name,
}, &deploy)
if err != nil && client.IgnoreNotFound(err) == nil {
// create if missing
newDeploy := buildDeployment(&widget)
if err := r.Create(ctx, newDeploy); err != nil {
return reconcile.Result{}, err
}
} else if err == nil {
// if exists, compare with desired state and update
if needsUpdate(&deploy, &widget) {
updateDeployment(&deploy, &widget)
if err := r.Update(ctx, &deploy); err != nil {
return reconcile.Result{}, err
}
}
}
// 3. update status
widget.Status.Phase = "Ready"
if err := r.Status().Update(ctx, &widget); err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{}, nil
}This function is called once per Widget object (or whenever there’s a change). controller-runtime handles all the underlying infrastructure like watch / queue / retry / leader election, so the Operator developer only focuses on the reconcile logic above.
Three standard operational patterns #
There are three patterns that come up in almost every Operator. They are standard K8s API mechanisms, and knowing them well takes you far.
1. ownerReference — auto-cleanup of child objects #
When a Widget creates a Deployment, that Deployment should also be deleted when the Widget is deleted. The mechanism that lets K8s’s garbage collector handle this automatically, without explicit deletion code, is ownerReference.
deploy := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: widget.Name,
Namespace: widget.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(&widget, myteamv1.GroupVersion.WithKind("Widget")),
},
},
Spec: ...
}A Deployment with this ownerReference attached is automatically deleted by K8s’s garbage collector the moment its parent Widget is deleted. There’s no need to put explicit deletion logic in the Operator code.
2. finalizer — hook for external resource cleanup #
When an object is deleted, child K8s objects are auto-cleaned via ownerReference, but resources outside K8s are not. For an Operator that creates external resources like cloud LBs, S3 buckets, RDS instances, those external resources must be cleaned together when the object is deleted. The mechanism for hooking this is finalizer.
When you try to delete an object with finalizers registered, the K8s API server doesn’t immediately delete the object but only fills the metadata.deletionTimestamp field. That object enters “deletion in progress” state and remains in etcd until the finalizer list is emptied.
const widgetFinalizer = "widget.myteam.example.com/finalizer"
func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
var widget myteamv1.Widget
if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
return reconcile.Result{}, client.IgnoreNotFound(err)
}
// is deletion in progress
if !widget.DeletionTimestamp.IsZero() {
if controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
// clean up external resources
if err := cleanupExternalResources(&widget); err != nil {
return reconcile.Result{}, err
}
// when cleanup is done, remove finalizer → K8s actually deletes the object
controllerutil.RemoveFinalizer(&widget, widgetFinalizer)
if err := r.Update(ctx, &widget); err != nil {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}
// ensure finalizer if not in deletion
if !controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
controllerutil.AddFinalizer(&widget, widgetFinalizer)
if err := r.Update(ctx, &widget); err != nil {
return reconcile.Result{}, err
}
}
// proceed with normal reconcile
...
}Objects with finalizers aren’t actually deleted until external resource cleanup is complete, so even if someone force-deletes an object, cloud resources are not left orphaned.
3. status subresource — separation of spec and status #
The significance of the one line subresources.status: {} in the CRD becomes clear here. CRDs with this subresource keep two behaviors strictly separated:
- Users (or GitOps tools) modify only
spec.statusmodifications are ignored. - Controllers modify only
statusviar.Status().Update(). They don’t touchspec.
The reason this separation matters is that it prevents conflicts with GitOps. When ArgoCD syncs git manifests to the cluster, status fields are values controllers fill, so they don’t exist in git. With status subresource separated, ArgoCD doesn’t look at status and only compares spec, so even when controllers update status, ArgoCD doesn’t see it as “drift.”
Operator build tools — Kubebuilder vs Operator SDK #
There are two tools providing higher abstraction beyond using controller-runtime directly.
| Tool | Characteristic |
|---|---|
| Kubebuilder | Kubernetes project’s official tool. CRD scaffolding, Makefile, kustomize integration. Wraps controller-runtime most directly. |
| Operator SDK | Red Hat’s tool. Based on Kubebuilder + adds Helm / Ansible-based Operator options. |
The flow of generating the skeleton and filling in only the reconcile function is nearly the same for both tools. Kubebuilder is closer to the standard for full-fledged Operators written directly in Go, and Operator SDK is smoother for migration scenarios from Helm chart to Operator.
kubebuilder init --domain example.com --repo github.com/myteam/widget-operator
kubebuilder create api --group myteam --version v1 --kind WidgetThese two commands generate everything — CRD definition, controller skeleton, manifests, Dockerfile, Makefile. After that, filling the Reconcile function in controller.go is the body of the actual work.
When should you write an Operator #
CRD and Operator are powerful tools but not needed for every domain object. The value of an Operator is large when these conditions converge:
- Operational automation of stateful workloads — procedures people regularly ran like DB cluster setup, backup, failover, upgrades
- Abstracting a bundle of K8s objects as one domain object — bundling Deployment + Service + Ingress + PDB + HPA + ServiceMonitor as one object
- Maintaining consistency between external resources and K8s objects — auto-syncing things like cloud LBs and DNS records with K8s objects
- Codifying domain knowledge — operational know-how like “when PostgreSQL primary dies, promote the replica with the smallest sync lag”
Conversely, for a simple bundle that a single Helm chart covers, no Operator is needed. An Operator carries significant code maintenance cost, so adopt one only where the automation value is genuinely large.
Closing #
The CRD as the K8s API extension mechanism and the Operator pattern layered on top have been organized. We followed the flow where a CRD registers a new object kind, and a controller-runtime-based Operator attaches a reconcile loop to that object, extending K8s’s declarative model into the user’s domain. We also covered the three standard patterns — auto-cleaning child objects via ownerReference, hooking external resource cleanup via finalizer, and preventing GitOps conflicts via status subresource. The next post covers how to observe a cluster where all these components run — the observability stack composed of Prometheus / Grafana / Loki / OpenTelemetry.