18 Chapter

The CRD and Operator Pattern

We cover the two axes of extending the K8s API into objects of your own domain. You define a new object kind with a CustomResourceDefinition, and a controller-runtime-based Operator hangs the reconcile loop from Chapter 1 over that object, extending K8s's declarative model all the way to your domain. We organize the three standard patterns of ownerReference · finalizer · status subresource and the build tools Kubebuilder · Operator SDK.

In Chapter 17 Admission Controller we touched on the point that objects like Gatekeeper’s ConstraintTemplate and Kyverno’s ClusterPolicy aren’t standard resources of K8s proper but new object kinds the two tools defined as CRDs. This chapter’s subject is that CRD itself. Every object we’ve covered so far has been a standard resource K8s has built in — Pod, Deployment, Service, ConfigMap, Secret, NetworkPolicy, and so on. But the real attraction of the K8s API is that you can add objects of your own domain on top of it. Domain concepts like a PostgreSQL cluster, a Redis master-replica topology, or a Kafka broker group become first-class objects inside K8s, queried with kubectl get postgrescluster, declared with a manifest, and operated automatically by a controller. The two axes of this extension are the CustomResourceDefinition (defining a new object kind) and the Operator (the controller that operates that object).

By the end of this chapter you’ll have the model where the reconcile loop of Chapter 1 What Kubernetes Is extends beyond K8s proper’s standard resources all the way to your own domain objects. The starting point is that the Chapter 4 Deployment controller and the Chapter 18 Operator are different instances of the same pattern.

The two paths to extending the K8s API #

K8s provides two paths to extend its own API.

Path	Characteristics
CRD (CustomResourceDefinition)	Registers a new object kind with a manifest. The K8s API server stores that object in etcd and treats it like a standard object
Aggregation Layer	Brings up a separate API server and the K8s API server delegates calls to it. More flexible but with a high operational cost

The path used overwhelmingly more often in a production cluster is the CRD. The Aggregation Layer is rarely used outside cases that tie very deeply into the K8s core, like metrics-server. This chapter’s subject is the CRD.

CRD — defining a new object kind #

The CRD is itself one of K8s’s object kinds. Apply one CRD and, from that moment, a new object kind is registered in the cluster, and you can create and query that object with kubectl.

The simplest CRD example #

widget-crd.yaml — defining a new object kind 'Widget'

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.myteam.example.com
spec:
  group: myteam.example.com
  scope: Namespaced
  names:
    plural: widgets
    singular: widget
    kind: Widget
    shortNames: ["wg"]
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  enum: ["small", "medium", "large"]
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
              required: ["size"]
            status:
              type: object
              properties:
                phase:
                  type: string
      subresources:
        status: {}

Once you apply this CRD, the following manifest becomes valid in that cluster.

my-widget.yaml — one new Widget object

apiVersion: myteam.example.com/v1
kind: Widget
metadata:
  name: my-first-widget
  namespace: default
spec:
  size: medium
  replicas: 3

Queryable with standard kubectl

kubectl get widgets
kubectl get wg my-first-widget -o yaml

The core elements of a CRD definition #

Let’s pin down a few parts of the manifest above that matter on the operational side.

scope: Namespaced vs Cluster — decides whether this object lives inside a namespace or cluster-wide. Once decided it’s hard to change, so decide by looking at the domain’s boundaries. ConfigMap · Pod are Namespaced; Node · StorageClass are Cluster.
schema.openAPIV3Schema — defines the object’s shape as an OpenAPI schema. The K8s API server validates the manifest against this schema at the admission stage. Constraints like enum, minimum, and required are expressed here.
subresources.status: {} — separates the status field into a distinct subresource. It’s a K8s standard pattern that lets a controller update only status without touching spec. This line is very important — we touch on it again in §“status subresource” later.
versions — a CRD can support several versions at once, and one version with storage: true determines the actual etcd storage format. A version change requires a storage migration.

A CRD alone isn’t enough — the role of the Operator #

Merely registering a CRD gives the object no meaning. Even if you create a Widget object, by itself it’s just a manifest stored in etcd, and no behavior happens. You need a controller that looks at that object and actually creates and operates something.

The K8s community calls this controller an Operator. An Operator is essentially a bundle of the following two parts.

CRD — the shape definition of a new domain object
Custom Controller — the code that looks at that object and runs a reconcile loop

The reason this model is powerful is that it extends K8s’s core paradigm — declarative desired state + the controller’s reconcile loop — all the way to your own domain. We just declare with a manifest “we want a cluster of 1 PostgreSQL master + 3 replicas,” and the Operator automatically creates and manages everything from a Chapter 8 StatefulSet, Service, PVC, ConfigMap, Secret, all the way to a backup CronJob.

Reconcile loop — the essence of a controller #

The core pattern of a K8s controller is an infinite loop. It’s the exact form of the reconcile loop we saw as a picture in Chapter 1.

Pseudocode of the reconcile loop

loop forever:
  observed_state = query the actual state from the K8s API
  desired_state = read the desired state from the manifest
  if observed_state != desired_state:
    close the gap with a K8s API call
  sleep until the next trigger

This simple loop is the common model of the built-in controllers (Chapter 4 Deployment / ReplicaSet, StatefulSet, Job, etc.) and a custom Operator. Writing an Operator is deciding “how to implement this reconcile function for my domain object.” If in Chapter 4 we saw the Deployment controller’s reconcile filling the gap between ReplicaSet and Pod, this chapter’s Operator applies the same pattern to a custom object like Widget.

controller-runtime — the standard skeleton of an Operator #

Writing the reconcile loop by calling the K8s API client directly from scratch is possible but carries enormous boilerplate. controller-runtime is a Go library managed by the Kubernetes project itself, a tool that standardizes the skeleton of an Operator. Kubebuilder and Operator SDK are higher-level tools that wrap this controller-runtime.

The shape of a Reconciler #

A controller-runtime-based Operator almost always has the following shape.

WidgetReconciler — a simplified skeleton

package controller

import (
    "context"

    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

    myteamv1 "myteam.example.com/api/v1"
)

type WidgetReconciler struct {
    client.Client
}

func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    // 1. desired state — read the Widget object
    var widget myteamv1.Widget
    if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // 2. observed state — read the Deployment the Widget should create
    var deploy appsv1.Deployment
    err := r.Get(ctx, client.ObjectKey{
        Namespace: widget.Namespace,
        Name:      widget.Name,
    }, &deploy)

    if err != nil && client.IgnoreNotFound(err) == nil {
        // create it if absent
        newDeploy := buildDeployment(&widget)
        if err := r.Create(ctx, newDeploy); err != nil {
            return reconcile.Result{}, err
        }
    } else if err == nil {
        // if present, compare with desired state and update
        if needsUpdate(&deploy, &widget) {
            updateDeployment(&deploy, &widget)
            if err := r.Update(ctx, &deploy); err != nil {
                return reconcile.Result{}, err
            }
        }
    }

    // 3. update status
    widget.Status.Phase = "Ready"
    if err := r.Status().Update(ctx, &widget); err != nil {
        return reconcile.Result{}, err
    }

    return reconcile.Result{}, nil
}

This function is called once per Widget object (or whenever there’s a change). Because controller-runtime handles all the underlying infrastructure like watch / queue / retry / leader election, the Operator developer can focus only on the reconcile logic above.

Three standard operational patterns #

There are three patterns you almost always meet when writing an Operator. Since they’re standard K8s API mechanisms, knowing them well carries you a long way.

1. ownerReference — automatic cleanup of child objects #

If a Widget created a Deployment, when the Widget is deleted that Deployment should be deleted with it. The mechanism that lets K8s’s garbage collector handle it automatically rather than handling it by hand in code is the ownerReference. The model from Chapter 4, where a Deployment bound a ReplicaSet and a ReplicaSet bound Pods with ownerReference, applies directly to your own domain.

Attaching an ownerReference when creating a child object

deploy := &appsv1.Deployment{
    ObjectMeta: metav1.ObjectMeta{
        Name:      widget.Name,
        Namespace: widget.Namespace,
        OwnerReferences: []metav1.OwnerReference{
            *metav1.NewControllerRef(&widget, myteamv1.GroupVersion.WithKind("Widget")),
        },
    },
    Spec: ...
}

The Deployment with this ownerReference attached is automatically deleted by K8s’s garbage collector the moment the parent Widget is deleted. There’s no need to put explicit deletion logic in the Operator code.

2. finalizer — a hook for cleaning up external resources #

When an object is deleted, the child objects inside K8s are auto-cleaned via ownerReference, but resources outside K8s are not. For an Operator that created external resources like a cloud LB, an S3 bucket, or an RDS instance, those external resources must be cleaned up with the object’s deletion. The mechanism that hangs this hook is the finalizer.

When you try to delete an object with a finalizer registered, the K8s API server doesn’t delete the object immediately but only fills the metadata.deletionTimestamp field. That object enters a “deletion in progress” state and remains in etcd until the finalizer list is emptied.

Cleaning up external resources with a finalizer

const widgetFinalizer = "widget.myteam.example.com/finalizer"

func (r *WidgetReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    var widget myteamv1.Widget
    if err := r.Get(ctx, req.NamespacedName, &widget); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // is deletion in progress
    if !widget.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
            // clean up external resources
            if err := cleanupExternalResources(&widget); err != nil {
                return reconcile.Result{}, err
            }
            // once cleanup is done, remove the finalizer → K8s really deletes the object
            controllerutil.RemoveFinalizer(&widget, widgetFinalizer)
            if err := r.Update(ctx, &widget); err != nil {
                return reconcile.Result{}, err
            }
        }
        return reconcile.Result{}, nil
    }

    // if not being deleted, ensure the finalizer
    if !controllerutil.ContainsFinalizer(&widget, widgetFinalizer) {
        controllerutil.AddFinalizer(&widget, widgetFinalizer)
        if err := r.Update(ctx, &widget); err != nil {
            return reconcile.Result{}, err
        }
    }

    // proceed with the normal reconcile
    ...
}

Because an object with a finalizer isn’t really deleted until external-resource cleanup is done, even if an operator tries to force-delete the object, no cloud resource is left behind.

3. status subresource — separating spec and status #

The meaning of the one line subresources.status: {} written in the CRD is completed here. A CRD with this subresource separates the following two behaviors.

The user (or GitOps tool) modifies only spec. Modifications to status are ignored.
The controller modifies only status with r.Status().Update(). It doesn’t touch spec.

The reason this separation matters is that it prevents conflicts with GitOps. When ArgoCD, covered in Chapter 20 GitOps, syncs git’s manifest to the cluster, the status field is a value the controller fills, so it isn’t in git. If the status subresource is separated, ArgoCD doesn’t look at status and compares only spec, so even when the controller updates status, ArgoCD doesn’t recognize it as “drift.”

Operator build tools — Kubebuilder vs Operator SDK #

There are two tools that provide a higher abstraction than using controller-runtime directly.

Tool	Characteristics
Kubebuilder	The official Kubernetes project tool. CRD scaffolding, Makefile, kustomize integration. Wraps controller-runtime most directly
Operator SDK	Red Hat’s tool. Based on Kubebuilder + adds Helm / Ansible-based Operator options

The flow of creating a skeleton and filling in only the reconcile function is nearly the same for both tools. A full-fledged Operator written directly in Go is closer to standard with Kubebuilder, and the migration scenario of starting from a Helm chart and moving to an Operator is smoother with Operator SDK.

Starting a new Operator project with Kubebuilder

kubebuilder init --domain example.com --repo github.com/myteam/widget-operator
kubebuilder create api --group myteam --version v1 --kind Widget

These two commands generate the CRD definition, the controller skeleton, manifests, Dockerfile, and Makefile all at once. After that, filling in the Reconcile function in controller.go is the body of the actual work.

Operators you often meet in operations #

Once you know this chapter’s model, the intent and behavior of the various Operators already installed in a production cluster become easy to read. Let’s briefly pin down the cases you often meet.

cert-manager — the automatic certificate-issuance tool we touched on in Chapter 10 Ingress. It defines the CRDs Certificate, Issuer, ClusterIssuer and automates the ACME challenge flow with Let’s Encrypt via reconcile.
AWS Load Balancer Controller — the ALB Ingress Controller of Chapter 10 is also an Operator. It keeps the Ingress object and the AWS ALB consistent with reconcile.
External Secrets Operator — the tool covered in Chapter 29 Secret Operations. The ExternalSecret CRD syncs values from AWS Secrets Manager · Vault · GCP SM into a K8s Secret.
CloudNativePG · Zalando Postgres Operator — they layer on top of Chapter 8 StatefulSet and automate the setup · backup · failover of a PostgreSQL cluster. It’s the area where the Operator’s value is largest, in the operational automation of stateful workloads like a DB.
Karpenter — the EKS node-automation tool of Chapter 13 Autoscaling. It runs reconcile on top of the NodePool, NodeClass CRDs to dynamically bring up and clean up nodes.

When you meet an unfamiliar API group like apiVersion: postgres-operator.crunchydata.com/v1beta1 in a production cluster’s manifest directory, it’s almost always a CRD some Operator defined. With this chapter’s model you can infer in one sentence: which Operator provides this CRD, and what resources its reconcile creates and manages.

When you should write an Operator #

CRD and Operator are powerful tools, but not every domain object needs them. The Operator’s value is large when the following conditions converge.

Operational automation of stateful workloads — procedures a person used to run regularly, like the setup, backup, failover, and upgrade of a DB cluster
Abstracting a bundle of several K8s objects into one domain object — bundling Deployment + Service + Ingress + PDB + HPA + ServiceMonitor, etc., into one object
Keeping external resources and K8s objects consistent — auto-syncing things like a cloud LB or DNS records bound to K8s objects
Codifying domain knowledge — operational know-how like “if the PostgreSQL primary dies, promote the replica with the smallest sync lag”

Conversely, for a simple bundle that ends with a single Helm chart, there’s no need to write an Operator. An Operator is a tool with a high code-maintenance cost, so it’s better to adopt it only where the automation value is truly large.

Exercises #

Check the CRDs installed in your cluster (kubectl get crd). Organize into a table which CRDs exist beyond the standard K8s objects (Pod, Service, etc.), and map which Operator defined each CRD (cert-manager, ALB Controller, Karpenter, ArgoCD, etc.). Organize, one paragraph each, what resources that Operator’s reconcile creates and manages with the model of §“Operators you often meet in operations.”
Apply widget-crd.yaml from the body unchanged, then create one Widget object. Because the CRD is registered but there’s no controller watching it, only the object is stored in etcd and no behavior happens. Organize in one paragraph in your own words how this “the object exists but there’s no controller” state connects to the model of §“A CRD alone isn’t enough.”
Write out, as a simulation, the difference between a CRD with a status subresource and one without. When ArgoCD syncs a git manifest to the cluster, while the status field is being updated by the controller, that value isn’t in git. Organize in one paragraph, connecting to Chapter 20 GitOps, how ArgoCD’s drift detection behaves differently when the status subresource is separated and when it isn’t.

In one line: a CRD extends the K8s API with a new object kind, and a controller-runtime-based Operator applies the reconcile loop of Chapter 1 to that object, extending K8s’s declarative model into your own domain. The three standard operational patterns are ownerReference (auto-cleanup of children) · finalizer (a hook for cleaning up external resources) · status subresource (preventing GitOps conflicts). Core production-cluster tools like cert-manager · External Secrets · CloudNativePG · Karpenter all stand on this pattern.

Next chapter #

Up through this chapter we’ve followed the depth of the K8s API itself — CNI · RBAC · Admission · CRD. This much is the extension mechanism of the K8s object model. The next chapter shifts the vantage point one notch — how to observe the cluster on which all these components run.

Chapter 19 Observability covers the three dimensions of metrics · logs · traces of the cluster and workloads. It organizes the metric stack of Prometheus + Grafana, the log aggregation of Loki, the distributed traces of OpenTelemetry, and the standard exporters that layer on top, like kube-state-metrics · node_exporter. The point that the signals of Chapter 11 resources.requests / limits · Chapter 12 Health check · Chapter 13 Autoscaling all reach the operator through this observability stack becomes the final knot of these chapters.