Hardware Intermediate #7: Storage Networking — iSCSI, FC, NVMe-oF, Multipath

6 min read

Up through Part 6, the disks lived inside the server. But as we saw in Basics #5, once scale grows, storage moves out of the server into shared equipment (a SAN), and servers use disks across the network as if they were their own. This post is about the technologies that actually make that connection — the “block device across the network.” Even with the same SAN, the choice of how you connect it has a substantial impact on cost and operations.

First, let’s draw one boundary clearly. This post is about block storage — what appears to the server as a disk. File-level sharing with NFS and Samba is the territory of RHEL Intermediate #3, and it sits at a different layer.

iSCSI — a SAN over an ordinary network #

iSCSI is a protocol that carries disk commands (SCSI) over an ordinary TCP/IP network. When the server (the initiator) logs in to the storage (the target), a volume across the network shows up like a local disk (/dev/sdX).

Its strength is, above all, universality. You can start with your existing Ethernet and switches, no dedicated gear, and the operational knowledge is an extension of general networking. In exchange, it inherits all of an ordinary network’s problems. Contention with other traffic and TCP’s latency variation translate directly into disk latency. That’s why the first rule of running iSCSI is separating storage traffic: give it dedicated NICs and a dedicated network (at minimum a dedicated VLAN), and don’t mix it with service traffic.

FC — a network born for storage #

FC (Fibre Channel) is a separate network designed for storage from the start. You build a SAN fabric with dedicated adapters (HBAs) and dedicated switches, and control which server sees which volume through zoning.

iSCSIFC
Networkexisting Ethernetdedicated fabric (HBAs, FC switches)
Cost and entrylowhigh
Latency consistencyvaries with the designstable
Operational knowledgeextension of the network teamseparate specialty (zoning, etc.)

The point is not superiority but a trade. FC is the choice that buys consistent latency and isolation with money and expertise; iSCSI is the choice that takes universality while leaving the network-design responsibility with the operator. Why mission-critical databases traditionally sat on FC, and why iSCSI is widely used for virtualization and general workloads — both answers are inside this table.

NVMe-oF — a fast path for fast disks #

In Basics #4 we saw that NVMe is an interface that dropped SATA’s command model and was redesigned for flash. The same thing repeats on the network. Put NVMe disks behind iSCSI/FC, which carry SCSI commands, and a protocol translation layer is introduced; NVMe-oF (NVMe over Fabrics) removes that gap by extending NVMe commands, without translation, over the network — RDMA over Ethernet, FC, or plain TCP. The goal is latency close to local NVMe, delivered across the network.

From an operator’s perspective, here is where it stands. It is becoming the standard for the combination of all-flash modern storage and latency-sensitive workloads, and among the variants, NVMe/TCP works without special NICs and is spreading as the successor to iSCSI. If you are designing a new SAN, “iSCSI or FC” now comes with a companion question: “shouldn’t this be NVMe-oF?”

Multipath — the disk must survive a broken path #

Whatever you connect with, disks across the network gain a new failure mode: the disk is fine, but the path (cable, switch, adapter) breaks. So a SAN keeps two or more paths as a baseline, and bundling those paths into a single managed device is multipath (a feature that groups multiple paths to the same volume, handling failover and load distribution).

multipath -ll (abridged)
mpatha (3600a0980...) dm-2 VENDOR,MODEL
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 3:0:0:1  sdb  8:16  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 4:0:0:1  sdc  8:32  active ready running

This is Linux’s device-mapper multipath bundling two paths (sdb, sdc) into a single mpatha. There are three operational points.

  • Applications must use the bundled device (mpatha). Use sdb directly and I/O stops the moment that path breaks.
  • One dead path means the same thing as Part 6’s degraded state. The service runs, but the margin is gone — so path state belongs on your alert list too.
  • Understand the policy you set. Spreading I/O across all paths is the default, but some storage has preferred-path configurations (ALUA), so following the storage vendor’s recommended settings is the safe move.

In the cloud — the same structure, repackaged #

Cloud block storage (EBS and the like) is exactly the structure in this post. The volume attached to an instance is not a local disk but a block device across the network, with the redundant paths and the storage equipment operated on your behalf by the cloud. The “limits on both the volume and the instance” we saw in Part 5 are also a consequence of this structure. Disk I/O ultimately passes through network bandwidth. On-prem SAN knowledge survives in the cloud, with only its form changed into reading spec sheets.

Common pitfalls #

  • Mixing iSCSI into the service network — the moment traffic surges, disk latency spikes along with it. Dedicated NIC/VLAN separation is half of running iSCSI.
  • Using the raw devices beneath multipath directly — write /dev/sdb into fstab and the redundancy becomes void. Use the bundled device and its identifier (WWID).
  • Letting a path failure pass quietly — with one path dead the service stays healthy, so it’s hard to notice. Alerts are what prevent the incident you only discover after the second path dies.

Wrap-up #

The picture from this post:

  • iSCSI, FC, and NVMe-oF each have their own place: universality, consistency and isolation, and flash-era low latency.
  • The new failure mode of networked storage is a broken path, and multipath is the answer. Use the bundled device and watch path states.
  • Cloud block storage is a repackaging of this structure. Its networked nature shows up as limits and latency characteristics.

Next — GPU servers #

The next post, “Hardware Intermediate #8: GPUs and Accelerators — The Fifth Resource of the AI Era,” invites a fifth guest to join the series’ four resources. We’ll cover how the GPU — the protagonist of AI workloads — works inside a server, why VRAM is the new bottleneck, how to read the numbers in nvidia-smi, and the technologies for sharing a GPU (passthrough, vGPU, MIG).

X