Hardware Basics #2: CPU — Cores, Threads, Clock, Cache, and What a vCPU Really Is
In #1 we placed the CPU as the resource that handles computation. This post looks one level deeper. Cores, threads, clock, and cache are four words you meet every time you pick an instance, yet what they mean exactly is often hazy. At the end we’ll pin down what the vCPU on a cloud spec sheet actually is.
Where this post sits in the Hardware Basics series:
- #1 Four resources that run a computer — CPU / Memory / Storage / Network
- #2 CPU — Cores / Threads / Clock / Cache, and what a vCPU really is ← this post
- #3 Memory — RAM, the hierarchy, and what happens when swapping starts
- #4 Storage ① Devices — HDD / SSD / NVMe and IOPS / Throughput / Latency
- #5 Storage ② Layout and connection — RAID and DAS / NAS / SAN
- #6 Network — Bandwidth and latency, from the NIC to the data center
- #7 Virtualization and containers — How one physical server becomes many
- #8 Cloud — From owning to renting, from on-prem to IaaS / PaaS / SaaS
- #9 Reading cloud instance specs — Choosing to match the workload
What the CPU does #
The CPU repeats the same motion without end. Fetch one instruction from memory, decode what it is, execute it, and write the result — then fetch the next. It runs this cycle billions of times a second.
fetch → decode → execute → write → fetch the next → ...Two questions govern performance: how fast it runs this cycle (clock), and how many workers run the cycle (cores and threads).
Cores and threads #
A core — the real unit of computation #
A core is a computation unit that runs the instruction cycle independently. With 4 cores you can make progress on 4 different tasks at once. CPUs used to have a single core per chip; today multi-core — many cores on one chip — is the norm.
More cores means more work you can handle at the same time. If a web server receives 100 requests concurrently, more cores process more of them side by side.
Threads and hyper-threading #
A thread is a single flow of work the CPU processes. One core usually handles one thread, but with hyper-threading (or SMT) a single core alternates between two threads.
The idea is this: while one thread idles waiting on memory, the other thread slots into that empty time. The core doesn’t truly become two — it just cuts down idle time to raise utilization.
4 physical cores
└─ each core handles 2 threads
= the OS sees 8 'logical processors'So the same chip is listed as “4 cores, 8 threads”: 4 physical cores, 8 logical processors as seen by the OS. Performance doesn’t scale like 8 cores. Depending on the work, you typically get 20–30% more.
Clock — cycles per second #
The clock is how fast the CPU runs its cycle. The unit is hertz (Hz); 3.0 GHz means three billion cycles a second.
A higher clock makes a single core faster. But clock alone can’t compare two CPUs, because how much work each cycle does (IPC), how big the cache is, and how many cores there are all differ.
| Comparison | By clock alone | In reality |
|---|---|---|
| Old 4.0 GHz vs new 3.0 GHz | Old looks faster | The new one often does more per cycle and wins |
| Single-threaded work | Higher clock wins | Mostly true |
| Handling many requests | Can’t judge by clock | Core count matters more |
The point: clock is one gauge of a single core’s speed, and once generation and core count differ, comparing clock numbers means little.
Cache — fast storage inside the CPU #
#1 noted the CPU is far faster than memory. If the CPU waited on slow memory every time, cores would idle. So the CPU keeps a small, very fast store of its own — the cache.
The cache is usually three levels.
| Level | Size | Speed | Location |
|---|---|---|---|
| L1 | tens of KB | fastest | per core |
| L2 | hundreds of KB – few MB | fast | per core or shared |
| L3 | few – tens of MB | moderate | shared across cores |
The CPU looks in L1 first. Found there is a cache hit; not found drops a level down; not in cache at all goes all the way to slow memory — a cache miss. Frequent misses leave cores waiting on memory, wasting that fast clock.
So at equal core count and clock, a CPU with more cache is often faster, because frequently used data is more likely to stay in cache.
What a vCPU really is #
Cloud spec sheets list vCPUs, not cores. A t3.medium is 2 vCPUs, a c5.xlarge is 4 vCPUs. What is a vCPU exactly?
On most clouds, 1 vCPU is not one physical core but one hyper-thread. So a physical core handling two threads is sold as 2 vCPUs.
1 physical core (hyper-threading ON)
├─ thread 1 → vCPU 1
└─ thread 2 → vCPU 2
A 4-vCPU instance = typically 2 physical coresSo expecting “4 vCPUs” to be 4 physical cores misses the mark. 2 physical cores is the more accurate picture. Some high-performance or bare-metal types do provide 1 vCPU per physical core, so check the instance documentation for the exact value. Reading instance types as a whole is the subject of #9.
When more cores doesn’t help #
Let’s revisit the misconception from #1. The idea that more cores is always faster is often false.
For a task to speed up, it has to split into pieces that run on several cores at once. But many tasks have a serial section that can’t be split — a part where the next step needs the previous step’s result. That section runs in order on a single core no matter how many cores you have.
Whole task = 20% serial + 80% parallelizable
1 core: ████████████████████ (100)
4 cores: ████ + ████ (serial 20 + parallel 20 = 40)
∞ cores: ████ + ▏ (serial 20 + ~0 → converges to 20)With a 20% serial section, infinite cores still won’t get you past about 5x. This is Amdahl’s law.
In short, you benefit from more cores in two cases: handling many independent requests at once (like a web server), or one task that parallelizes well. For single-threaded work, the speed of one core (clock, IPC, cache) matters more than core count.
Common pitfalls #
“CPU at 100% means scale up, period” #
Distinguish whether the CPU is truly busy computing or spinning while waiting on another resource. Usage can appear high while waiting on disk or network, so measure the bottleneck first, per #1.
“vCPU count = physical core count” #
On most instances, a vCPU corresponds to one hyper-thread. 4 vCPUs typically means 2 physical cores. For CPU-sensitive workloads, account for this when picking a type.
“Expecting steady performance from burstable instances” #
Burstable instances like the t family guarantee only a baseline and can burst above it briefly using credits. Once credits run out, performance is throttled, so they don’t fit workloads that use CPU steadily.
“Comparing different-generation CPUs by clock number” #
Across generations, per-cycle throughput differs, so comparing by clock alone misses the mark. Clock compares cleanly only within the same generation.
Wrap-up #
What we covered:
- A core computes independently; more of them means more work handled at once.
- Hyper-threading (SMT) lets one core alternate two threads to raise utilization. It doesn’t double performance.
- Clock is a gauge of single-core speed; once generation and core count differ, the number means little.
- Cache (L1/L2/L3) cuts time spent waiting on slow memory. Frequent misses leave even a fast clock idle.
- The cloud’s 1 vCPU is usually one hyper-thread. 4 vCPUs is roughly 2 physical cores.
- To benefit from more cores, the work must parallelize or the requests must be many. A serial section sets the limit, per Amdahl’s law.
Next — memory #
However fast the CPU is, it waits if the data isn’t nearby. That working space is memory. #3 Memory — RAM, the hierarchy, and what happens when swapping starts covers what RAM is, the memory hierarchy from registers to disk, and how performance falls off a cliff when memory runs short and the system spills to slow disk.