Hardware Advanced #6: Data Center Cooling and Racks — Electricity Always Becomes Heat
In Hardware Advanced #5 we followed the flow of power through a data center and arrived at one fact: nearly all the power that enters a server comes back out as heat. A server drawing 1kW is also a 1kW electric heater. Cooling is therefore the mirror image of power. If power design is the question of how to put a given amount of energy in, cooling design is the question of how to take that same energy out — and the two must be matched. In this post we’ll follow the path heat takes on its way out, starting from the airflow through a single server and moving up through racks, thermal aisles, liquid cooling, and operating temperature standards.
Server cooling basics — breathe in the front, exhale out the back #
Airflow through a rackmount server goes one way. It pulls cool air in through the front, pushes it across the CPU and memory heatsinks, and expels hot air out the back. Front intake, rear exhaust. It looks trivial, but this contract is the starting point for all of data center cooling. The aisle designs we’ll get to shortly only work because every piece of equipment is assumed to blow in the same direction.
The first thing that surprises anyone walking into a server room is the noise. There’s a physical reason 1U servers are especially loud. The only fans that fit inside a 44mm-tall chassis are small ones around 40mm in diameter, and generating enough static pressure to force air through dense heatsinks with fans that small takes rotation speeds north of 10,000 RPM. Compared with a desktop’s 120mm fan spinning at around 1,500 RPM, the roar of a 1U server is not a malfunction — it’s the design working as intended. Fan power draw isn’t negligible either; in high-density servers, fans can account for around 10% of total power consumption.
Hot aisle, cold aisle — mixing destroys efficiency #
If every server blows in one direction, you can separate cold air from hot air through rack placement alone. The technique is to arrange rows of racks facing each other. The aisle where the fronts face each other receives only cold supply air (the cold aisle), and the aisle where the backs face each other collects only hot exhaust (the hot aisle).
Hot aisle (exhaust) Cold aisle (intake) Hot aisle (exhaust)
↑↑↑↑ ↓↓↓↓ ↑↑↑↑
[Rack rear] [Rack front] [Rack rear]
[Rack ←row] [row→ Rack ←row] [row→ Rack]This separation breaks down in two ways.
- Recirculation — hot exhaust from the hot aisle wraps over the top or around the sides of a rack and drifts into the cold aisle, so servers re-inhale their own hot air. Intake temperatures climb, leading straight to overheating equipment.
- Bypass — cold air from the cold aisle escapes into the hot aisle without ever passing through a server. The equipment stays safe, but you’re throwing cooling energy into thin air.
The physical barrier against both is containment. Doors close off the ends of the aisle and ceiling panels cover the top, turning either the cold aisle or the hot aisle into a fully sealed room. Enclose the cold side and you have cold aisle containment; enclose the hot side and you have hot aisle containment. Either way, the principle is the same: make sure cold air and hot air have no path to meet except through the inside of a server.
Airflow details — blanking panels and cables #
If you’ve done containment and one particular server still runs hot, the culprit is usually a small gap.
- Empty U slots — when a rack is sparsely populated, the empty slots become holes connecting the hot aisle to the cold aisle. Hot air from the back flows backward through the gaps and into the intake of the server right above. Blanking panels that fill those gaps are just a few dollars’ worth of plastic, but they’re the cheapest part you can buy for cutting off recirculation inside a rack.
- Cable bundles at the rear — an unmanaged tangle of cables covering the back of a server blocks exhaust, internal temperatures rise, and the fans spin faster and burn more power.
- Floor tile placement — with a pressurized raised-floor design, cold air rises through perforated tiles. Any perforated tile sitting outside the cold aisle is pure bypass.
A large share of cooling problems come not from chiller capacity but from airflow details like these. The total volume of cold air supplied is sufficient — it just never makes it to the front of the servers.
Rack density — the limit line for air cooling #
The power a single rack consumes — kW per rack — determines the cooling method. Air is a medium with low heat capacity, so as density rises, the airflow required to remove the same heat climbs steeply.
| Rack density | Cooling method | Notes |
|---|---|---|
| Up to ~10kW | Standard air cooling | The default for traditional enterprise floor space |
| 10–20kW | Air cooling + containment | Airflow management is a prerequisite |
| 20–40kW | The limit zone for air cooling | Needs supplements like rear-door coolers |
| 40kW and up | Liquid cooling | Airflow and fan power become impractical with air alone |
AI servers are what put this table to the test. A single server carrying eight GPUs draws around 10kW, so stacking just a few pushes a rack into air cooling’s limit zone. Configurations like GB200 NVL72, where one rack exceeds 100kW, are designed around liquid cooling from the start. Not because liquid cooling is some new technology, but because density has exceeded the physical limits of air as a medium.
Liquid cooling — D2C and immersion #
Water carries thousands of times more heat than the same volume of air. There are two main ways to exploit that difference.
- D2C (direct-to-chip) — a cold plate replaces the heatsink on top of the CPU and GPU, with coolant flowing through it. A CDU (coolant distribution unit) installed at the rack or row level handles circulation and heat exchange. The liquid carries away 70–80% of the heat the chips produce, while air still handles the rest — memory, power circuitry, and so on — making it a hybrid design. Because it can be phased into existing floor space, it’s the mainstream choice for today’s AI infrastructure.
- Immersion cooling — the entire server is submerged in electrically non-conductive coolant. Fans disappear entirely, and every heat-producing component touches the liquid directly, giving the highest heat recovery of any approach. But it requires dedicated tanks and maintenance procedures — even pulling a server out becomes a fundamentally different operation from what you’re used to.
From an operator’s perspective, liquid cooling also brings a new set of operational concerns into the data center: plumbing, leak detection, and coolant water-quality management. It introduces failure scenarios that didn’t exist in the air-cooled era, so adoption typically happens only when density forces it, and only to the extent it’s needed.
Temperature standards — colder is not better #
It’s tempting to assume a server room is safer the colder it is, but operating standards in modern data centers have moved in the opposite direction. ASHRAE’s recommended range is 18–27°C at the server intake — considerably warmer than the refrigerator-like server rooms of 20 years ago.
The reason lies in PUE, which we covered in #5. Raise the intake temperature setpoint by one degree and the chillers run for fewer hours, and on days when the outside air is cold enough, free cooling — using outside air with no chillers at all — becomes possible. Less power spent on cooling means a lower PUE for the same IT load. Cooling beyond what’s necessary isn’t a safety margin; it’s an electricity bill.
It isn’t free, of course. The higher the intake temperature, the less time you have before equipment hits its thermal limits when a cooling failure occurs. The setpoint is ultimately a trade-off between cooling power and ride-through time during a failure, and the ASHRAE recommended range is the balance point the industry has agreed on.
Racks and floor space — weight, cable runs, room for hands #
Finally, the design details of the rack itself. The standard 19-inch, 42U rack form factor stays the same, but what you fill it with changes the floor design around it.
- Weight load — a 42U rack fully loaded with servers approaches or exceeds a metric ton. If it exceeds the raised floor’s design load per unit area, you can’t fill the rack, so for high-density racks the floor-load review comes before placement. Liquid-cooled racks weigh even more once you add coolant and manifolds.
- Cable runs — power cables and network cables get separate paths, with enough slack (a service loop) that nothing tears when a server is pulled out on its rails. Cable management that keeps the exhaust clear is part of cooling performance.
- Maintenance space — servers slide out on front rails, so the aisle in front of a rack needs to be at least as deep as the equipment. The rear needs room too, for cable and PDU work. Aisle width isn’t waste — it’s a design factor that determines how fast you can recover during a failure.
Wrap-up #
The picture we built in this post:
- Nearly all the power that enters a server becomes heat, so cooling capacity is the mirror of power capacity. The starting point is the airflow contract of front intake and rear exhaust.
- Hot/cold aisle separation and containment serve a single purpose: keeping cold air and hot air from meeting anywhere but inside a server. Details like blanking panels and cable management complete that separation.
- As rack density climbs, air cooling hits its limits, and the density of AI servers is forcing the shift to liquid cooling — D2C and immersion.
- The temperature setpoint isn’t “colder is better” but a trade-off between cooling power (PUE) and ride-through time during a failure, with ASHRAE’s recommended 18–27°C as the agreed balance point.
- Rack and floor details — weight load, cable runs, maintenance space — are part of cooling design too.
Next — firmware, BMC, and the lifecycle #
The next post, “Hardware Advanced #7: Firmware, BMC, and the Lifecycle,” is the final entry in the series. We’ll close out with the BMC — the other computer that watches over a server from below the OS — remote management through IPMI and Redfish, the operational procedures of firmware updates, and the lifecycle of a single server from deployment to decommissioning.