K8s Advanced #4: CRD and the Operator Pattern — controller-runtime

The fourth post in the K8s Advanced series. The objects covered so far have all been standard resources K8s ships built in — Pod, Deployment, Service, ConfigMap, Secret, NetworkPolicy, etc. But the real appeal of the K8s API is being able to add domain objects on top. Domain concepts like a PostgreSQL cluster, a Redis primary-replica topology, or a Kafka broker group can become first-class objects inside K8s, queryable via kubectl get postgrescluster, declared via manifests, and operated automatically by controllers. The two axes of this extension are CustomResourceDefinition (defining new object kinds) and Operator (the controller that operates those objects).

This series is K8s Advanced, 6 posts.

Two paths to K8s API extension #

K8s offers two paths for extending its API.

PathCharacteristic
CRD (CustomResourceDefinition)Register a new object kind via manifest. The K8s API server stores that object in etcd and treats it like a standard object.
Aggregation LayerRun a separate API server and have the K8s API server delegate calls to it. More flexible but with high operational cost.

The path overwhelmingly used in operational clusters is CRD. Aggregation Layer is rarely used outside of cases tightly coupled with K8s core, like metrics-server. The topic of this post is CRD.

CRD — defining a new object kind #

CRD is itself a kind of K8s object. Applying a single CRD registers a new object kind in the cluster from that moment on, and you can create and query that object via kubectl.

The simplest CRD example #

widget-crd.yaml — defining new object kind 'Widget'
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.myteam.example.com
spec:
  group: myteam.example.com
  scope: Namespaced
  names:
    plural: widgets
    singular: widget
    kind: Widget
    shortNames: ["wg"]
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  enum: ["small", "medium", "large"]
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
              required: ["size"]
            status:
              type: object
              properties:
                phase:
                  type: string
      subresources:
        status: {}

After applying this CRD, the following manifest becomes valid in that cluster.

my-widget.yaml — one new Widget object
apiVersion: myteam.example.com/v1
kind: Widget
metadata:
  name: my-first-widget
  namespace: default
spec:
  size: medium
  replicas: 3
Queryable with standard kubectl
kubectl get widgets
kubectl get wg my-first-widget -o yaml

Key elements of CRD definition #

A few important parts in the manifest above from an operational perspective.

  • scope: Namespaced vs Cluster — whether this object lives inside a namespace or cluster-wide. Hard to change once decided, so decide based on the domain’s grain. ConfigMap and Pod are Namespaced, Node and StorageClass are Cluster.
  • schema.openAPIV3Schema — defines the object’s shape as an OpenAPI schema. The K8s API server validates manifests against this schema at the admission stage. Constraints like enum, minimum, required are expressed here.
  • subresources.status: {} — separates the status field as a separate subresource. The K8s standard pattern that lets controllers update only status without touching spec. This single line is very important.
  • versions — CRD can support multiple versions simultaneously, and one version with storage: true decides the actual etcd storage format. Storage migration is needed when changing versions.

CRD alone isn’t enough — the role of Operator #

Just registering a CRD doesn’t give the object meaning. A Widget object sitting in etcd is just a manifest — no action takes place on its own. A controller that watches that object and actually creates and manages things is needed.

The K8s community calls this controller an Operator. An Operator is essentially a bundle of two parts:

  • CRD — definition of the new domain object’s shape
  • Custom Controller — code that looks at that object and runs a reconcile loop

The reason this model is powerful is that it extends K8s’s core paradigm — declarative desired state + controller’s reconcile loop — to the user’s domain. You simply declare in a manifest “I want a PostgreSQL cluster of 1 primary + 3 replicas,” and the Operator automatically creates and manages the StatefulSet, Service, PVC, ConfigMap, Secret, and backup CronJob.

Reconcile loop — the essence of a controller #

The core pattern of a K8s controller is an infinite loop.

Pseudo-code of reconcile loop
loop forever:
  observed_state = query actual state from K8s API
  desired_state = read desired state from manifest
  if observed_state != desired_state:
    close the gap via K8s API calls
  sleep until next trigger

This simple loop is the common model of built-in controllers (Deployment, StatefulSet, Job, etc.) and user-defined Operators. Writing an Operator means deciding “how to implement this reconcile function for my domain object.”

controller-runtime — the standard skeleton for Operators #

It’s possible to write a reconcile loop by calling the K8s API client directly, but the boilerplate is enormous. controller-runtime is a Go library managed by the Kubernetes project itself, a tool that standardizes the Operator skeleton. Kubebuilder and Operator SDK are higher-level tools wrapping controller-runtime.

The shape of a Reconciler #

A controller-runtime-based Operator nearly always has the following shape.

WidgetReconciler — simplified skeleton
package controller

import (
    "context"

    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

    myteamv1 "myteam.example.com/api/v1"
)

type WidgetReconciler struct {
    client.Client
}

func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    // 1. desired state — read the Widget object
    var widget myteamv1.Widget
    if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // 2. observed state — read the Deployment Widget should make
    var deploy appsv1.Deployment
    err := r.Get(ctx, client.ObjectKey{
        Namespace: widget.Namespace,
        Name:      widget.Name,
    }, &deploy)

    if err != nil && client.IgnoreNotFound(err) == nil {
        // create if missing
        newDeploy := buildDeployment(&widget)
        if err := r.Create(ctx, newDeploy); err != nil {
            return reconcile.Result{}, err
        }
    } else if err == nil {
        // if exists, compare with desired state and update
        if needsUpdate(&deploy, &widget) {
            updateDeployment(&deploy, &widget)
            if err := r.Update(ctx, &deploy); err != nil {
                return reconcile.Result{}, err
            }
        }
    }

    // 3. update status
    widget.Status.Phase = "Ready"
    if err := r.Status().Update(ctx, &widget); err != nil {
        return reconcile.Result{}, err
    }

    return reconcile.Result{}, nil
}

This function is called once per Widget object (or whenever there’s a change). controller-runtime handles all the underlying infrastructure like watch / queue / retry / leader election, so the Operator developer only focuses on the reconcile logic above.

Three standard operational patterns #

There are three patterns that come up in almost every Operator. They are standard K8s API mechanisms, and knowing them well takes you far.

1. ownerReference — auto-cleanup of child objects #

When a Widget creates a Deployment, that Deployment should also be deleted when the Widget is deleted. The mechanism that lets K8s’s garbage collector handle this automatically, without explicit deletion code, is ownerReference.

Attaching ownerReference when creating child objects
deploy := &appsv1.Deployment{
    ObjectMeta: metav1.ObjectMeta{
        Name:      widget.Name,
        Namespace: widget.Namespace,
        OwnerReferences: []metav1.OwnerReference{
            *metav1.NewControllerRef(&widget, myteamv1.GroupVersion.WithKind("Widget")),
        },
    },
    Spec: ...
}

A Deployment with this ownerReference attached is automatically deleted by K8s’s garbage collector the moment its parent Widget is deleted. There’s no need to put explicit deletion logic in the Operator code.

2. finalizer — hook for external resource cleanup #

When an object is deleted, child K8s objects are auto-cleaned via ownerReference, but resources outside K8s are not. For an Operator that creates external resources like cloud LBs, S3 buckets, RDS instances, those external resources must be cleaned together when the object is deleted. The mechanism for hooking this is finalizer.

When you try to delete an object with finalizers registered, the K8s API server doesn’t immediately delete the object but only fills the metadata.deletionTimestamp field. That object enters “deletion in progress” state and remains in etcd until the finalizer list is emptied.

Using finalizer for external resource cleanup
const widgetFinalizer = "widget.myteam.example.com/finalizer"

func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    var widget myteamv1.Widget
    if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // is deletion in progress
    if !widget.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
            // clean up external resources
            if err := cleanupExternalResources(&widget); err != nil {
                return reconcile.Result{}, err
            }
            // when cleanup is done, remove finalizer → K8s actually deletes the object
            controllerutil.RemoveFinalizer(&widget, widgetFinalizer)
            if err := r.Update(ctx, &widget); err != nil {
                return reconcile.Result{}, err
            }
        }
        return reconcile.Result{}, nil
    }

    // ensure finalizer if not in deletion
    if !controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
        controllerutil.AddFinalizer(&widget, widgetFinalizer)
        if err := r.Update(ctx, &widget); err != nil {
            return reconcile.Result{}, err
        }
    }

    // proceed with normal reconcile
    ...
}

Objects with finalizers aren’t actually deleted until external resource cleanup is complete, so even if someone force-deletes an object, cloud resources are not left orphaned.

3. status subresource — separation of spec and status #

The significance of the one line subresources.status: {} in the CRD becomes clear here. CRDs with this subresource keep two behaviors strictly separated:

  • Users (or GitOps tools) modify only spec. status modifications are ignored.
  • Controllers modify only status via r.Status().Update(). They don’t touch spec.

The reason this separation matters is that it prevents conflicts with GitOps. When ArgoCD syncs git manifests to the cluster, status fields are values controllers fill, so they don’t exist in git. With status subresource separated, ArgoCD doesn’t look at status and only compares spec, so even when controllers update status, ArgoCD doesn’t see it as “drift.”

Operator build tools — Kubebuilder vs Operator SDK #

There are two tools providing higher abstraction beyond using controller-runtime directly.

ToolCharacteristic
KubebuilderKubernetes project’s official tool. CRD scaffolding, Makefile, kustomize integration. Wraps controller-runtime most directly.
Operator SDKRed Hat’s tool. Based on Kubebuilder + adds Helm / Ansible-based Operator options.

The flow of generating the skeleton and filling in only the reconcile function is nearly the same for both tools. Kubebuilder is closer to the standard for full-fledged Operators written directly in Go, and Operator SDK is smoother for migration scenarios from Helm chart to Operator.

Starting a new Operator project with Kubebuilder
kubebuilder init --domain example.com --repo github.com/myteam/widget-operator
kubebuilder create api --group myteam --version v1 --kind Widget

These two commands generate everything — CRD definition, controller skeleton, manifests, Dockerfile, Makefile. After that, filling the Reconcile function in controller.go is the body of the actual work.

When should you write an Operator #

CRD and Operator are powerful tools but not needed for every domain object. The value of an Operator is large when these conditions converge:

  • Operational automation of stateful workloads — procedures people regularly ran like DB cluster setup, backup, failover, upgrades
  • Abstracting a bundle of K8s objects as one domain object — bundling Deployment + Service + Ingress + PDB + HPA + ServiceMonitor as one object
  • Maintaining consistency between external resources and K8s objects — auto-syncing things like cloud LBs and DNS records with K8s objects
  • Codifying domain knowledge — operational know-how like “when PostgreSQL primary dies, promote the replica with the smallest sync lag”

Conversely, for a simple bundle that a single Helm chart covers, no Operator is needed. An Operator carries significant code maintenance cost, so adopt one only where the automation value is genuinely large.

Closing #

The CRD as the K8s API extension mechanism and the Operator pattern layered on top have been organized. We followed the flow where a CRD registers a new object kind, and a controller-runtime-based Operator attaches a reconcile loop to that object, extending K8s’s declarative model into the user’s domain. We also covered the three standard patterns — auto-cleaning child objects via ownerReference, hooking external resource cleanup via finalizer, and preventing GitOps conflicts via status subresource. The next post covers how to observe a cluster where all these components run — the observability stack composed of Prometheus / Grafana / Loki / OpenTelemetry.

X