Hardware Basics #5: Storage ② Layout and Connection — RAID and DAS / NAS / SAN

6 min read

In #4 we covered a single disk’s types and performance. One disk has three limits: capacity is finite, speed has a ceiling, and if it fails the data is gone. To get past these limits, operations bundle many disks (RAID) and attach them to servers in different ways (DAS / NAS / SAN). Those are the two topics here.

Where this post sits in the Hardware Basics series:

RAID — many disks as one #

RAID bundles several disks so the OS sees them as one. Depending on how you bundle, you gain speed or safety. Each method has a number; four are common.

RAID 0 — striping #

Splits data across several disks. Split across two disks and reads/writes are nearly twice as fast. But safety is zero: one disk failing renders the data, now scattered, entirely unusable. It aims only at speed and gives up safety.

RAID 1 — mirroring #

Writes the same data identically to two disks. If one fails, the other still has it intact, so it’s safe. The cost is that two disks give you the usable capacity of one. It trades half the capacity for safety.

RAID 5 / 6 — parity #

Splits data across several disks while distributing parity — recovery information — alongside it. If one disk fails, the lost data is reconstructed from the survivors and the parity. RAID 5 survives one disk failure, RAID 6 survives two. It wastes less capacity than mirroring, so it becomes more efficient as disk count grows.

RAID 10 — mirroring and striping combined #

Striping (RAID 0) across mirrored (RAID 1) pairs. You gain both speed and safety but spend half the capacity on mirroring. It’s common for databases where both performance and safety matter.

RAIDGainsLosesSurvives
0speed, full capacityno safety0 disks
1safetyhalf capacity1 disk
5safety + capacity efficiencyparity cost on writes1 disk
6stronger safetylarger parity cost2 disks
10speed + safetyhalf capacity1 disk per pair

RAID is not a backup. RAID only lets you survive disk failure. Delete a file by mistake, get hit by ransomware, or overwrite the wrong data, and that change is reflected on every disk alike. A backup must be kept separately, somewhere apart.

Connection — DAS / NAS / SAN #

If RAID answered “how do you bundle disks,” the next question is “how do you attach disks to a server.” It splits three ways.

DAS — directly attached #

DAS (Direct Attached Storage) attaches the disk directly to the server. The disk inside a laptop, the disks slotted into a server — all DAS. Fastest and simplest, but usable only from that server, and access is cut off when the server dies.

NAS — file-level, shared over the network #

NAS (Network Attached Storage) is a network-connected storage device that several servers share at the file level. Several servers read and write the same folder together. It fits file-sharing workloads and, going over the network, is slower than DAS.

SAN — block-level, dedicated network #

SAN (Storage Area Network) serves disks to servers at the block level over a dedicated high-speed storage network. To the server, it looks like its own disk even though it’s across the network. It handles blocks from #4 rather than files, so it’s used for performance-critical workloads like databases. It needs a dedicated network and gear, so it’s expensive.

MethodUnitSharingCharacter
DASblockone server onlyfast and simple
NASfileshared by several serversfits file sharing
SANblockserves blocks to several serversfast but expensive

Cloud storage is a repackaging of these concepts #

Here’s the real payoff of this post. Cloud storage wasn’t invented anew; it’s a repackaging of the concepts above as services. With AWS as the example, the mapping lines up cleanly.

On-prem conceptCloud equivalentCharacter
DAS (directly attached disk)instance storefast but gone when the instance stops
SAN (block, over network)EBSblock volume attached to an instance, persistent
NAS (file, over network)EFS / FSxa file share across several instances
RAID mirroring / paritydurability from replicationautomatically replicated across devices

The question #4 left open — “why instance store data vanishes when the instance stops, but EBS persists” — is answered here. Instance store is DAS attached directly to that instance, so it shares the instance’s fate. EBS is block storage across the network (the SAN equivalent), so it lives separately from the instance and survives. The high durability the cloud advertises is built by scaling RAID’s mirroring and parity ideas up to data-center scale, replicating across many devices automatically.

Common pitfalls #

“I did RAID, so I don’t need backups” #

The most dangerous misconception. RAID survives disk failure only. A file deleted by mistake, data overwritten wrongly, ransomware — all reflected identically on every disk. Always keep a separate backup.

“RAID 5 means I’m safe” #

RAID 5 survives only one disk failure. While one failed disk rebuilds, load piles on the survivors and a second disk sometimes fails alongside it. With large, numerous disks, consider RAID 6 or RAID 10.

“NAS and SAN are the same thing” #

The unit differs. NAS shares files; SAN serves blocks. If you need block-level performance, like a database, NAS may fall short.

“Instance store is fast, so put the database there” #

Fast it is, but it vanishes when the instance stops. Keep persistent data on EBS and use the instance store only for temporary cache or scratch.

Wrap-up #

What we covered:

  • RAID bundles disks for speed (0), safety (1), a balance of both (5, 6), or both at once (10).
  • RAID is not a backup. It survives disk failure only; mistakes, deletions, and ransomware get through.
  • Connection splits into DAS (direct, block, private), NAS (network, file, shared), and SAN (dedicated network, block, high performance).
  • Cloud storage is a repackaging of these: instance store = DAS, EBS = SAN, EFS = NAS, with high durability a data-center-scale version of RAID replication.

Next — network #

If the resources so far were a story inside one server, now we step outside it. With NAS, SAN, and EBS all crossing the network, it’s an already-foreshadowed topic. #6 Network — Bandwidth and latency, from the NIC to the data center sorts out the most-confused pair in operations — bandwidth and latency — the ceiling distance puts on latency, and why the same AZ, cross-region, and the internet all differ.

X