Hardware Basics #3: Memory — RAM, the Hierarchy, and What Happens When Swapping Starts

Wednesday, March 25, 2026

7 min read

In #2 we said the CPU keeps a cache to cut down the time it idles waiting on memory. This post is about what the CPU is waiting on. Memory is quiet most of the time, then drops the whole system off a cliff the moment it runs short. Understand that behavior and you can answer “memory usage is high, is that okay?” by measuring rather than guessing.

Where this post sits in the Hardware Basics series:

What RAM is #

RAM (Random Access Memory) is the working space where the CPU keeps the data it’s using right now. Run a program and its code and data climb from storage into RAM, and the CPU reads and writes what’s in RAM.

RAM has two important properties.

Volatile — its contents vanish when the power goes off. Permanent data belongs on storage.
Random access — it reads any location at roughly the same speed, unlike some storage that is fast only when read sequentially.

The core idea is simple: RAM is fast but limited, and it empties when the power goes off.

The memory hierarchy #

A computer’s storage stack isn’t one thing but several layers. Higher up is faster, smaller, and pricier; lower down is slower, bigger, and cheaper.

Layer	Rough access time	Size	Volatile?
Registers	under 1 ns	hundreds of bytes	volatile
L1 cache	about 1 ns	tens of KB	volatile
L2 / L3 cache	a few – tens of ns	a few MB	volatile
RAM	about 100 ns	GB – hundreds of GB	volatile
SSD / NVMe	tens – hundreds of μs	hundreds of GB – TB	non-volatile
HDD	a few ms	TB	non-volatile

Notice how the units jump with each row down. RAM is about 100 ns, an SSD is hundreds of times slower, an HDD tens of thousands of times slower.

Feeling the gap — if a RAM access were 1 second

RAM access   100 ns   →  1 second
SSD access   100 μs   →  about 17 minutes
HDD access   10 ms    →  about 28 hours

This gap is the heart of the next section. The difference between data being in RAM and having to go all the way down to disk is this large.

Capacity, bandwidth, and latency are different axes #

Three things get blurred together when people talk about memory. Keep them apart and the spec sheet reads differently.

Axis	Meaning	When short
Capacity	how much you can hold at once (GiB)	swapping begins
Bandwidth	how much you can move per second (GB/s)	bulk processing backs up
Latency	time per single access (ns)	frequent random access slows

In operations, capacity is the one you hit most. The moment capacity runs short, the system starts borrowing the slow disk, and that’s when performance collapses.

When memory runs short — swap #

When memory fills up, the OS moves data it isn’t actively using to a region of the disk to free up space in RAM. That region is swap, and the act of moving is swapping. The Linux swap setup was covered with actual commands in RHEL Basics #6.

Swap is a safety net that keeps processes from dying when memory runs low. But it comes at a steep cost. Recall the table: RAM is about 100 ns, disk is hundreds to tens of thousands of times slower. Once frequently used data is pushed to swap, the CPU waits on disk every time.

When swapping takes over

Normal:  CPU → RAM (100 ns)                  fast response
Short:   CPU → swap (disk, ~ms)              response thousands of times slower
Severe:  pushed out as soon as read (thrashing)   system looks nearly frozen

Pushing data to disk and immediately needing it back, then pushing out something else, over and over, is called thrashing. The CPU then spends its time moving data rather than computing, and the system slows as if frozen. It’s the common cause of “the server suddenly became unresponsive.”

OOM Killer — Linux’s last resort #

When swap also fills and there’s no more memory to give, Linux triggers the OOM Killer (Out Of Memory Killer). It picks a process using a lot of memory and force-terminates it to keep the whole system from locking up.

In operations, if a database or application “died for no reason,” suspect the OOM Killer. The log carries a line like Out of memory: Killed process. The fix is to add memory, reduce the process’s memory use, or — for a container — adjust its memory limit.

Page cache — spare memory isn’t idle #

On Linux, free often shows little free memory and people panic. Most of the time it’s normal. Linux uses spare RAM as a disk cache (the page cache).

Once a file is read, caching it in RAM means the next read of the same file comes straight from RAM instead of going to disk. This cache is reclaimed the instant an application needs the memory.

How to read free -h

              total   used   free   buff/cache   available
Mem:          16Gi    6Gi    0.5Gi   9.5Gi        9Gi
                                     └─ cache     └─ what you can actually use

The value to watch isn’t free but available. Since buff/cache is reclaimed when needed, ample available means memory isn’t short. There’s no need to panic that free is near zero.

Common pitfalls #

“Memory usage is 90%, should I scale up?” #

The number alone can’t tell you. If that 90% includes page cache, it’s normal. You judge by whether available is sufficient and whether swapping is actually happening.

“Keeping swap on is safe” #

Swap only keeps you from dying; it doesn’t preserve performance. By the time swap is used in earnest, the response is already broken. Watch swap usage alongside a response-latency alarm.

“More memory makes things faster” #

The misconception from #1. Adding it when short avoids the cliff, but adding more when you already have enough doesn’t make things faster.

“A container can use all the host’s memory” #

Without a limit, one container can swallow the host’s memory and threaten the others. In operations it’s safer to state a memory limit. Container resource isolation is the subject of #7.

Wrap-up #

What we covered:

RAM is the CPU’s working space. Fast, but volatile and limited.
Storage forms a hierarchy — registers → cache → RAM → SSD → HDD — and the speed gap widens sharply with each step down.
Memory has separate axes: capacity, bandwidth, latency. In operations you hit capacity most.
When capacity runs short, swap begins, and severe swapping becomes thrashing, which slows the system as if frozen.
With no memory left to give, the OOM Killer force-terminates a process.
Linux uses spare RAM as the page cache. Watch available, not free.

Next — storage #

The place the system spills to when memory runs short is the disk. That disk is the subject of the next two posts. #4 Storage ① Devices — HDD / SSD / NVMe and IOPS / Throughput / Latency sorts out a single disk’s types and performance metrics — how HDD, SSD, and NVMe differ, and the IOPS, throughput, and latency so often confused with capacity.