Certified Kubernetes Application Developer (CKAD) #16 Resource Management: requests/limits, QoS Class, LimitRange

If #15 SecurityContext and Capabilities pinned down what privileges a container runs with, this post covers how much of the node’s resources a container may consume. If a Pod consumes the node’s CPU and memory without limit, a single workload can paralyze the entire node. Kubernetes has each container declare “guarantee me at least this much (requests)” and “no more than this (limits)”, taking care of both scheduling and stability.

This topic belongs to CKAD’s largest domain, Application Environment, Configuration and Security (25%). On the exam it shows up both as tasks where you attach requests/limits with exact units, and as discrimination questions like “what is this Pod’s QoS class” or “why did it get OOMKilled”. Getting the units and behavior into your fingers lets you bank points fast.

requests vs limits: what’s the difference #

Resource declarations go under spec.containers[].resources, per container. The two keys mean different things.

  • requests: the amount this container must be guaranteed at minimum. The scheduler places the Pod by subtracting this value from the node’s allocatable resources. In other words, requests is the basis for scheduling.
  • limits: the ceiling this container is allowed to use. The runtime enforces that the container cannot exceed this value.

With requests but no limits, the container can use the node’s spare capacity without a ceiling; with limits but no requests, Kubernetes treats requests as equal to limits.

Units: write CPU and memory exactly #

A frequently missed detail on CKAD is units. CPU and memory use different notation systems.

ResourceUnitsMeaning
CPU1, 0.5, 500m1 is one vCPU, 1000m (millicore) is one core. 500m is 0.5 core
memory128Mi, 1Gi, 512MMi/Gi are binary (1Mi = 1024Ki), M/G are decimal (1M = 1000K)

The m in CPU is millicore, so 500m equals 0.5 core. For memory, Mi (mebibyte) and M (megabyte) are different values, so if the exam asks for Mi, don’t write M. Conventionally Mi and Gi are used as the standard.

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
    - name: app
      image: nginx
      resources:
        requests:
          cpu: "250m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "256Mi"

This Pod is guaranteed 0.25 core and 128Mi, and can use up to 0.5 core and 256Mi. The scheduler finds a node where the sum of requests fits and places the Pod there.

What happens when you exceed the ceiling #

CPU and memory behave decisively differently when limits are exceeded. This difference is an exam regular.

  • CPU exceeded: CPU is a compressible resource. When it exceeds limits, the container isn’t killed — it’s throttled, meaning its speed is capped so it runs only for the allotted time. It just slows down; it doesn’t terminate.
  • memory exceeded: memory is an incompressible resource. When it exceeds limits, the kernel’s OOM killer terminates the container process and the container state shows OOMKilled. The container restarts according to restartPolicy, and repeated overruns lead to CrashLoopBackOff.

When you see OOMKilled, the cause is almost always “memory limits is lower than actual usage”. You approach it in one of two directions: raise the limits, or reduce the application’s memory usage.

# Check the termination cause
k describe pod resource-demo | grep -A3 "Last State"
# Last State: Terminated
#   Reason:   OOMKilled

QoS class: three tiers and eviction #

Kubernetes automatically assigns a Pod a Quality of Service (QoS) class based on how you specified requests and limits. This tier determines who gets kicked out first (eviction) when the node runs short on resources.

QoS classConditionEviction priority
GuaranteedEvery container specifies both cpu/memory + requests == limitsEvicted last
BurstableOnly some of requests/limits specified (Guaranteed condition unmet)Middle
BestEffortNo requests and no limits at allEvicted first

The core points are as follows.

  • Guaranteed: when every container specifies both cpu and memory, and for each one requests equals limits. It’s protected last when memory runs short.
  • Burstable: when requests exist but limits are larger, or only some resources are specified. The portion used beyond requests becomes the reclamation target.
  • BestEffort: a Pod with no resource declarations at all. Under node pressure it becomes the first eviction target.

A manifest that becomes Guaranteed #

Setting requests and limits to the same value for both cpu and memory yields Guaranteed. When the exam gives you a task like “make this Pod Guaranteed”, use this pattern.

apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-demo
spec:
  containers:
    - name: app
      image: nginx
      resources:
        requests:
          cpu: "500m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "256Mi"

Even if you specify only limits, Kubernetes fills in requests with the same value, so the result is still Guaranteed. On the exam, however, specifying both requests and limits is safer for making your intent explicit.

Checking the QoS class #

The assigned QoS class shows up directly via describe.

k describe pod guaranteed-demo | grep "QoS Class"
# QoS Class:  Guaranteed

LimitRange: setting defaults and bounds on a namespace #

Omitting requests/limits on individual Pods makes them BestEffort, which is risky. LimitRange is a policy object that fills in defaults for containers within a namespace and enforces a minimum/maximum allowed range.

  • default: the default limits to fill in when a container doesn’t specify limits.
  • defaultRequest: the default requests to fill in when requests aren’t specified.
  • min / max: the lower and upper bounds a container may specify. A Pod outside this range is rejected at creation.
apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-mem-limits
  namespace: dev
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "250m"
        memory: "128Mi"
      min:
        cpu: "100m"
        memory: "64Mi"
      max:
        cpu: "1"
        memory: "512Mi"

In a dev namespace where this LimitRange applies, if you create a Pod without requests/limits, the container is automatically assigned requests 250m/128Mi and limits 500m/256Mi, making it Burstable. Conversely, if you submit limits that exceed max, the Pod is rejected.

k apply -f limitrange.yaml
k describe limitrange cpu-mem-limits -n dev

ResourceQuota: capping the namespace total #

Where LimitRange handles the defaults and ranges of each individual container, ResourceQuota restricts the total across the entire namespace. Its purpose is to prevent one team from monopolizing cluster resources. In a namespace with a ResourceQuota, every Pod must specify requests/limits, and omitting them causes the Pod to be rejected.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "2"
    requests.memory: "2Gi"
    limits.cpu: "4"
    limits.memory: "4Gi"
    pods: "10"

This ResourceQuota caps, across all Pods in the dev namespace, the sum of requests at cpu 2 cores and memory 2Gi, the sum of limits at cpu 4 cores and memory 4Gi, and the Pod count at 10. Putting LimitRange and ResourceQuota together applies “automatic default fill-in + total ceiling” at the same time.

k apply -f resourcequota.yaml
k get resourcequota dev-quota -n dev
# Show current usage alongside the limit
k describe resourcequota dev-quota -n dev

Checking real usage: k top #

To see whether the declared values match actual usage, use kubectl top. This command works only when metrics-server is installed in the cluster.

# Actual per-Pod CPU/memory usage
k top pod
k top pod resource-demo -n dev

# Per-node usage
k top node

The typical tuning approach is to take real usage from k top as the baseline, setting requests to typical usage and limits to peak usage.

Exam points #

  • QoS class discrimination is key. If requests and limits are specified for both cpu/memory and the two values are equal, it’s Guaranteed; if only some are present or the values differ, Burstable; if neither is present, BestEffort. Confirm instantly from the QoS Class line of describe.
  • OOMKilled = memory limits exceeded. Read Reason: OOMKilled from Last State in k describe pod, and resolve it by raising memory limits or reducing usage. Don’t confuse this with CPU overruns, which don’t kill but only throttle.
  • Write units exactly. For CPU, 500m = 0.5 core; for memory, Mi/Gi (binary) and M/G (decimal) are different values. If the question asks for Mi, don’t write M.
  • Distinguish LimitRange’s default vs defaultRequest. default is the limits default, defaultRequest is the requests default.
  • Building the skeleton with dry-run then adding only resources is the fast flow. After k run app --image=nginx $do > pod.yaml, edit the resources block.

Resource management wraps up in a single manifest block, but it also comes with discrimination questions asking about QoS and OOMKilled behavior, so getting the units and tier conditions into your fingers translates straight into points. The broader context of resource requests and node scheduling is also covered in K8s Intermediate #4, which is worth reading alongside this.

Summary #

What this post locked in:

  • requests is the scheduling basis, limits is the ceiling. CPU is written in m (millicore), memory in Mi/Gi units.
  • CPU overrun is throttling, memory overrun is OOMKilled. CPU slows down without dying, but memory terminates.
  • Three QoS classes: Guaranteed (requests == limits, all specified), Burstable (partially specified), BestEffort (unspecified). Eviction starts from BestEffort.
  • LimitRange sets default/defaultRequest and min/max on a namespace, while ResourceQuota restricts the namespace total.
  • Verify with k describe pod (QoS Class , OOMKilled) and k top pod (real usage).

Next: Volumes #

Now that the amount of resources is settled, we turn to where a container puts its data.

#17 Volumes: emptyDir, PVC, projected, ephemeral walks through, hands-on in YAML, emptyDir for sharing a directory between containers inside a Pod, PersistentVolumeClaim for requesting persistent storage, projected volumes for mounting ConfigMaps and Secrets gathered in one place, and ephemeral volumes whose lifetime is bound to the Pod.

X