AWS Certified Solutions Architect - Associate (SAA-C03) #11 Domain 3-3 High-Performing Architectures — Choosing Storage
Following #10 Caching, this time we cover which storage to put data on. AWS storage selection first splits into three types — block , file , object — then picks the service that fits the performance, sharing, and cost requirements within each.
The three storage types #
| Type | Service | Access method | Analogy |
|---|---|---|---|
| Block | EBS | Attached to an instance as a disk | A hard disk plugged into a computer |
| File | EFS, FSx | Network file system (shared) | A shared network drive |
| Object | S3 | Per-object via API/HTTP | An unlimited file repository |
The first fork is “is it a disk for one instance (EBS), a file system shared by multiple instances (EFS/FSx), or object storage (S3)?”
EBS — block storage #
EBS is a block volume you attach to an EC2 instance. By default it belongs to one AZ and is attached to one instance (it’s not storage shared across AZs). You pick the volume type based on the workload.
| Type | Class | Suitable for |
|---|---|---|
| gp3 / gp2 | General-purpose SSD | Most workloads, boot volumes |
| io2 / io1 | Provisioned IOPS SSD | High-IOPS , high-performance DBs |
| st1 | Throughput-optimized HDD | Big data, log processing (sequential access) |
| sc1 | Cold HDD | Infrequently used large capacity (lowest cost) |
A “database that needs high IOPS” is io2, “large sequential throughput (logs , big data)” is st1, and the general case is gp3.
EFS — shared file system (Linux) #
EFS is a managed NFS that multiple EC2 instances can mount concurrently across multiple AZs. Capacity grows and shrinks automatically, making it well-suited as a shared file system for Linux workloads. It’s the default answer to “multiple instances need to share the same files.”
- Storage classes — Standard, IA (infrequent access), One Zone (single AZ , low cost)
- You can lower cost by automatically moving infrequently accessed files to IA.
FSx — special-purpose file systems #
FSx is a managed file system tailored to specific workloads. Two mainly appear on the exam.
| FSx type | Protocol/use |
|---|---|
| FSx for Windows File Server | SMB, shared files in a Windows environment (AD integration) |
| FSx for Lustre | HPC , ML and other high-throughput, high-performance computing |
“SMB file sharing on Windows” is FSx for Windows, and “an ultra-high-throughput file system for HPC , machine learning” is FSx for Lustre. (Beyond these there are also NetApp ONTAP and OpenZFS.)
S3 storage classes #
S3 is object storage, and you pick the class based on access frequency and retrieval time. Class selection is the heart of the cost questions.
| Class | Access frequency | Characteristics |
|---|---|---|
| Standard | Frequent | Default. Immediate access |
| Intelligent-Tiering | Unknown/variable | Moves between tiers automatically based on access pattern |
| Standard-IA | Rare | Cheaper storage, retrieval fee. Multi-AZ |
| One Zone-IA | Rare | Even cheaper, single AZ (lower durability) |
| Glacier Instant Retrieval | Archive | Archive with immediate access available |
| Glacier Flexible Retrieval | Archive | Retrieval in minutes to hours |
| Glacier Deep Archive | Long-term storage | Lowest cost, retrieval ~12 hours |
- Don’t know the access pattern → Intelligent-Tiering (auto-optimizes)
- Used occasionally but needs immediate access → Standard-IA (or Glacier Instant Retrieval)
- Long-term storage rarely used, slow retrieval is fine → Glacier Deep Archive (lowest cost)
- One Zone-IA is single-AZ, so you can lose the data if that AZ fails, making it suitable only for regeneratable data.
Lifecycle policies #
Lifecycle rules automatically move objects to cheaper classes or expire them as time passes. For example: “after 30 days to Standard-IA, after 90 days to Glacier, after 365 days delete.” Combined with the versioning from #8 Backup, it keeps old versions at low cost.
Exam question patterns #
- “A file system shared by multiple EC2 (Linux).” → EFS
- “Windows SMB sharing.” → FSx for Windows File Server
- “HPC/ML high-throughput file system.” → FSx for Lustre
- “High-IOPS block for a single instance.” → EBS io2
- “Don’t know the access pattern, automatic cost optimization.” → S3 Intelligent-Tiering
- “Lowest-cost long-term storage, 12-hour retrieval OK.” → Glacier Deep Archive
- “Automatically move old objects to lower cost.” → S3 lifecycle policy
Common traps #
1) Thinking EBS is shared by multiple instances/multiple AZs #
EBS is by default for a single AZ and a single instance. The shared file system is EFS.
2) Overlooking the durability of One Zone-IA #
One Zone-IA is stored in a single AZ only, so there’s a loss risk if that AZ fails. It’s unsuitable for important original data.
3) Ignoring the retrieval-time differences among Glacier classes #
Glacier Instant is immediate, Deep Archive is ~12 hours. If “immediate access” is the clue, Deep Archive is the wrong answer.
4) Recommending EFS for a Windows share #
EFS is Linux NFS. A Windows SMB share is FSx for Windows.
Wrap-up #
What this post locked in:
- The three types — block (EBS) , file (EFS , FSx) , object (S3). Whether it’s shared is the first fork
- EBS — single AZ , single instance. io2 (high IOPS) , st1 (throughput) , gp3 (general purpose)
- EFS (Linux shared NFS) vs. FSx (Windows SMB / Lustre HPC)
- S3 classes — Standard , Intelligent-Tiering (pattern unknown) , IA , Glacier (archive; Deep is lowest cost , slow)
- Automatic cost optimization with lifecycle policies
Next — Domain 3-4 Choosing a DB #
Now that we’ve nailed storage, the last topic in the high-performing domain is choosing a database.
In #12 Domain 3-4 Choosing a DB we’ll cover the difference between RDS’s Multi-AZ (high availability) and read replicas (read scaling), cloud-native Aurora, NoSQL DynamoDB, and Redshift for analytics, organizing how to choose the database that fits a workload.