Certified Kubernetes Administrator (CKA) #17 Storage 2: StorageClass, Dynamic Provisioning, Reclaim Policy, Expansion

In #16 Storage 1 we covered PV and PVC, along with static provisioning — where an administrator creates PVs ahead of time. The static approach works reliably, but every time a user requests storage, an administrator has to create a PV by hand. In a cluster with dozens of nodes and PVC requests arriving constantly, this approach quickly becomes a bottleneck.

This post is about StorageClass and dynamic provisioning, which remove that bottleneck. We’ll make a PV appear automatically when you create just one PVC, and from an operations perspective sort out the reclaim policy that decides whether to keep or delete data when that PV is removed, plus the volume expansion that grows an in-use volume without downtime.

The limits of static provisioning #

Let’s recall the flow of the static approach from #16. An administrator first creates a PV, then a user requests capacity and access mode with a PVC, and the controller finds a PV that meets the conditions and ties it to the PVC (binding). The problem is that the PV has to be created in advance.

  • An administrator can’t know in advance the capacity and number of requests users will make.
  • Having an administrator create a PV by hand every time a request comes in increases operational burden.
  • If the capacity of a pre-created PV doesn’t exactly match the request, resources are wasted.

Dynamic provisioning flips this flow around. The moment a user creates a PVC, the StorageClass that PVC points to creates the PV automatically. The administrator doesn’t have to create PVs one by one — they only need to prepare a single StorageClass that defines what kind of storage to create and how.

What is a StorageClass #

A StorageClass is a cluster-scoped object that defines a class of storage. It’s a template that says, “when this grade of storage is requested, create a volume this way.” It doesn’t belong to a namespace.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Here’s what each field means.

  • provisioner: the entity that actually creates the volume. It specifies which CSI driver or in-tree plugin handles the provisioning. On a cloud it’s a CSI driver like EBS, PD, or Disk; on-premises it’s NFS, Ceph, or a local provisioner.
  • parameters: options passed to the provisioner. Values like disk type (gp3, premium), filesystem, and replica count are defined differently for each provisioner.
  • reclaimPolicy: how PVs created by this StorageClass are handled after their PVC is gone. We’ll cover this in detail later.
  • volumeBindingMode: when to bind and provision the PV. This value has a big effect on behavior.
  • allowVolumeExpansion: this must be true to later grow the PVC’s capacity.

The flow of dynamic provisioning #

In dynamic provisioning, the only thing a user creates is a single PVC. When a PVC specifies a StorageClass with storageClassName, that StorageClass’s provisioner automatically creates a matching PV and binds it to the PVC.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 20Gi

When you apply this PVC, things proceed in the following order.

  1. The user creates data-pvc, specifying the grade with storageClassName: fast-ssd.
  2. The provisioner of the fast-ssd StorageClass creates a 20Gi volume on actual storage (e.g., EBS).
  3. A PV representing that volume is created automatically and bound to data-pvc.
  4. When a Pod mounts this PVC, it starts using the volume.

The key point is that the administrator never created a PV by hand. A PV appeared automatically with nothing more than the PVC and the StorageClass.

# List StorageClasses (check the default marker)
k get storageclass

# Check whether the PVC is bound to a PV
k get pvc data-pvc
k get pv

The default StorageClass #

If you omit storageClassName in a PVC, the cluster’s default StorageClass is used. The default is designated by an annotation on the StorageClass.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

The StorageClass whose annotation value is "true" becomes the default. To switch the default to another StorageClass, set the current default’s annotation to "false" and set the new StorageClass’s annotation to "true". It’s safest to keep only one default — if there are two or more, it’s hard to predict which one gets chosen.

# Switch the default with a command
k patch storageclass standard \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Specifying storageClassName: "" (an empty string) explicitly means you don’t want to use the default and want to turn dynamic provisioning off. In that case, the PVC binds only to a statically created PV. This distinction is a frequent point of confusion on the exam.

volumeBindingMode #

volumeBindingMode determines when the PV is created and bound. The difference between the two values is directly tied to scheduling, so you need to understand it precisely.

ValueBehaviorUse
ImmediateProvisions and binds the PV the instant the PVC is createdNetwork storage accessible from anywhere
WaitForFirstConsumerDelays binding until a Pod that uses the PVC is scheduledStorage with topology constraints (a specific zone or node)

Immediate creates the volume the moment the PVC comes into existence. But for storage tied to a specific zone or node, like cloud disks or local volumes, this causes a problem. If the volume is created in one zone first while it hasn’t yet been decided which node the Pod will go to, and the scheduler picks a node in a different zone, the volume can’t be mounted.

WaitForFirstConsumer avoids this problem. It delays binding and provisioning until the Pod is actually scheduled, then creates a volume matching the topology of the node after the scheduler decides on one. For storage with topology constraints, this value is effectively the default choice.

reclaim policy: do you protect the data #

reclaimPolicy determines how the PV and the actual data are handled after a PVC is deleted and the PV is released. When you specify it on a StorageClass, the PVs that class creates inherit that policy.

ValueBehavior after PVC deletionData
DeleteDeletes the PV and the actual volume (EBS disk, etc.) togetherGone
RetainLeaves the PV in the Released state and preserves the volume tooPreserved

Delete is the default for dynamic provisioning. When you delete a PVC, the PV — and the actual disk behind it — are deleted along with it. It’s suitable for temporary or recreatable data and wastes no resources. The catch is the risk that if you delete a PVC by mistake, the data disappears immediately.

Retain keeps the PV and the actual data even when you delete the PVC. When the PVC is gone, the PV moves to the Released state, and a PV in this state is not automatically bound to another PVC. To recover or reuse the data, an administrator has to intervene by hand.

# Clear the claimRef of a Released PV to make it Available again
k patch pv <pv-name> -p '{"spec":{"claimRef": null}}'

For data you can’t afford to lose, like a database, it’s safest to use Retain so that deleting a PVC doesn’t lead straight to deleting the data. The policy of an already-created PV can be changed after the fact, like this.

# Change a specific PV to Retain (prevent accidental deletion)
k patch pv <pv-name> \
  -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

A StorageClass’s reclaimPolicy applies to the PVs that class will create going forward. It doesn’t affect PVs that already exist, so to protect existing PVs you have to patch the PV directly as shown above.

volume expansion: growing a volume #

When a disk fills up during operation, you have to grow the volume. Kubernetes supports expansion without downtime by growing the PVC’s requested capacity. There are two conditions.

  • allowVolumeExpansion: true must be set on the relevant StorageClass.
  • The provisioner (CSI driver) must support expansion.

To expand, you fix the PVC’s spec.resources.requests.storage value to a larger value.

# Expand a 20Gi PVC to 50Gi
k patch pvc data-pvc \
  -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'

One thing to watch out for: shrinking is impossible. Capacity can only grow, never shrink. Also, expansion involves two steps — growing the actual storage and growing the filesystem — and in some cases the filesystem expansion finishes while the Pod is up, while in others it requires a Pod restart. Which one it is depends on the CSI driver, so you need the habit of checking the PVC and Pod state together after expansion.

# Check expansion progress
k get pvc data-pvc
k describe pvc data-pvc   # Check Conditions for FileSystemResizePending, etc.

CSI in one line #

A value like ebs.csi.aws.com that goes into the provisioner is exactly a CSI (Container Storage Interface) driver. CSI is the interface Kubernetes defines so it can handle volumes from various storage vendors in a standard way, and features like dynamic provisioning, snapshots, and expansion all work only when the CSI driver supports them.

Exam points #

The Storage domain in CKA carries a 10% weight, but dynamic provisioning and reclaim policy come up often. Get the following into your hands.

  • Switching the default StorageClass: the task of lowering one class’s default annotation to "false" and raising another class’s to "true" shows up often. Make sure there aren’t two defaults.
  • The difference between omitting storageClassName and an empty string: omitting it uses the default, while "" turns dynamic provisioning off and binds only to static PVs.
  • Protecting data by changing reclaimPolicy: changing a specific PV to Retain so data doesn’t disappear when the PVC is deleted is a classic task. Memorize k patch pv.
  • Reusing a Released PV: a Retain PV becomes Available again only after you clear its claimRef.
  • volume expansion: confirm allowVolumeExpansion: true, then grow the PVC’s requests.storage. Shrinking is impossible.
  • WaitForFirstConsumer: you should be able to explain a case where a PVC is Pending but the cause is “the Pod isn’t scheduled yet” with this mode.

In particular, a situation where a PVC stalls at Pending is directly tied to troubleshooting. With WaitForFirstConsumer, it’s an intended Pending until you attach a Pod, but if there’s no default StorageClass or the storageClassName is wrong, it’s a real error. The instinct to distinguish these two through the events in k describe pvc is what makes the difference in your score.

Wrap-up #

What this post locked in:

  • A StorageClass is a cluster-scoped template that defines a storage grade. It’s made up of provisioner, parameters, reclaimPolicy, volumeBindingMode, and allowVolumeExpansion.
  • Dynamic provisioning is the approach where, when a PVC specifies a StorageClass, a PV is created automatically. The administrator doesn’t need to create PVs by hand.
  • The default StorageClass is designated by an annotation and applies when storageClassName is omitted. An empty string means turning dynamic provisioning off.
  • volumeBindingMode splits into Immediate (right away) and WaitForFirstConsumer (wait until the Pod is scheduled), and for storage with topology constraints the latter is safe.
  • reclaimPolicy splits into Delete (delete the PV and data together) and Retain (preserve). Protect important data with Retain.
  • volume expansion grows the PVC’s capacity for expansion without downtime when allowVolumeExpansion: true. Shrinking is impossible.

Next: Networking 1 #

Now that we’ve finished storage, we move on to networking. Pods die and come back at any time and their IPs change, so we need a stable access point. That access point is the Service.

In #18 Networking 1: Service we’ll sort out, with YAML, how ClusterIP (in-cluster access), NodePort (node port exposure), LoadBalancer (external load balancing), and ExternalName (DNS alias) each work, how selectors and endpoints are tied together, and how kube-proxy routes traffic.

X