S3 — static hosting, presigned URLs
AWS's oldest object storage, S3. The shape of a bucket and the global uniqueness of its name, policies and Public Access Block, static site hosting, presigned URLs, and the patterns for lowering cost with storage classes.
If the EC2 of Chapter 8 ~ Chapter 9 is the compute domain, then S3 (Simple Storage Service) is AWS’s object storage domain. Launched in 2006 as AWS’s first service, it’s the oldest service and still one of the most used.
S3 is effectively an infinite-capacity global file system (though directories are fake). 11 9’s (99.999999999%) of durability, a price of about $0.023 per GB, and a data hub for every other AWS service — these three are what S3 is.
In this chapter we start from S3’s structure and lay out, in one flow, policies and security, static hosting, presigned URLs, and storage classes. The static hosting covered here is completed with HTTPS and an Edge cache in Chapter 14 CloudFront, and the S3 PUT trigger carries into the event processing of Chapter 17 Lambda Basics.
Buckets and objects #
S3 has only two core things to remember.
- A Bucket — a container that holds objects. Created per account / region.
- An Object — the actual file. Identified by a key.
my-bucket/ ← bucket (name is globally unique)
images/
profile/2026/avatar-001.jpg ← object (key = the full path)
profile/2026/avatar-002.jpg
videos/
intro.mp4
index.htmlDirectories actually don’t exist. The / in the picture above is part of the key. images/profile/2026/avatar-001.jpg is the full key of one object. The console just shows it folder-like, based on /.
The global uniqueness of a bucket name #
A bucket name must be unique across all AWS accounts worldwide. A plain name like my-bucket has already been taken by someone.
my-company-dev-uploads-2026
acme-prod-static-ap-northeast-2The rules are as follows.
- 3 ~ 63 characters, using lowercase / digits /
-/.. - A dot (
.) is allowed but causes problems with SSL certificate wildcards, so usually only-is used. - It can’t be in IP address form, and it can’t start with
xn--(Punycode). - Uppercase and underscores are not allowed.
Putting the environment / purpose / region / company name in the name makes billing and search easier.
A bucket is per-region #
A bucket name is globally unique, but the data lives in one region. If you create it in ap-northeast-2 (Seoul), the object data is inside the Seoul data center. The console shows which region it’s in. This distinction of global name / regional data is in the same vein as the global services vs regional services discussion in Chapter 1 Intro to AWS.
Cross-region replication is set up explicitly with S3 Replication (CRR, Cross Region Replication).
The core attributes of an object #
One object has the following.
| Attribute | Role |
|---|---|
| Key | The object’s full path. Unique within the bucket |
| Body | The actual data (up to 5TB) |
| Content-Type | How the browser should handle it (image/jpeg, application/json) |
| Metadata | User-defined headers (x-amz-meta-*) |
| ACL | Object-level permissions (rarely used now, replaced by bucket policy) |
| Storage Class | The storage class (Standard, IA, Glacier, etc.) |
| Version ID | The version identifier, if versioning is on |
| ETag | A content hash (mostly MD5) |
Uploading via console / CLI / SDK #
# a single file
aws s3 cp ./image.jpg s3://my-bucket/images/profile/avatar.jpg
# sync an entire folder
aws s3 sync ./public s3://my-bucket --delete
# specify Content-Type
aws s3 cp ./index.html s3://my-bucket/ --content-type "text/html; charset=utf-8"
# download
aws s3 cp s3://my-bucket/data.json ./import boto3
s3 = boto3.client("s3")
s3.upload_file("image.jpg", "my-bucket", "images/avatar.jpg")
s3.download_file("my-bucket", "data.json", "data.json")The four components of security #
S3’s security operates as overlapping layers. The evaluation order gets weaker as you go from top to bottom.
1. Public Access Block ← most powerful. A block decision sits above everything
2. SCP (Organizations) ← an account-level guard
3. IAM Policy ← per user / role
4. Bucket Policy ← per bucket
5. Object ACL ← per object (old way, rarely used)Public Access Block — first of all #
Public Access Block (PAB) is a safeguard that prevents a bucket from being made public by mistake. There are four options.
| Option | Meaning |
|---|---|
| BlockPublicAcls | Prevent new ACLs from becoming public |
| IgnorePublicAcls | Ignore existing public ACLs |
| BlockPublicPolicy | Prevent new bucket policies from becoming public |
| RestrictPublicBuckets | Even for already-public buckets, only IAM Principals can access |
These days, turning on all four at the account level is the default for every new bucket. Only buckets that are deliberately public — like for static hosting — get explicitly disabled.
aws s3control put-public-access-block \
--account-id 123456789012 \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=trueBucket Policy — a JSON policy #
A Bucket Policy is a JSON policy attached directly to a bucket. It defines who (Principal) can do what (Action) where (Resource).
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "logdelivery.elasticloadbalancing.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-alb-logs/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/MyAppRole"
},
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}IAM Policy #
A policy attached to an IAM user / role. It combines with the Bucket Policy to produce the effect. In the cross-account case, both must pass for Allow to apply.
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-bucket/uploads/*"
}]
}Detailed IAM setup is covered in Chapter 2 IAM.
Static site hosting #
S3 can just host static HTML / CSS / JS. It’s the simplest static site hosting method.
aws s3 website s3://my-static-site/ \
--index-document index.html \
--error-document 404.htmlAfter this command, the bucket responds at the following URL.
http://my-static-site.s3-website-ap-northeast-2.amazonaws.comAllowing public access #
Under the defaults, PAB blocks it. Static hosting is intentionally public, so do the following.
- Disable the two BlockPublicPolicy items in the bucket’s PAB.
- Allow GetObject to everyone with a Bucket Policy.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-static-site/*"
}]
}The limits of S3 static hosting #
S3 alone can’t do the following.
- HTTPS — the S3 website endpoint is HTTP.
- Custom domain + SSL certificate — not directly.
- Edge cache — no fast worldwide responses.
So in operations you almost always use the S3 + CloudFront pattern (covered in Chapter 14 CloudFront). At that point it becomes a pattern where you turn PAB back on and allow access to CloudFront only via OAC.
Presigned URL — temporary permission #
A Presigned URL is a way to create temporary permission allowing anyone to download or upload this object for just N minutes. It’s a pattern of briefly delegating permission to a user who doesn’t have it.
The most common use cases are as follows.
- Uploading a user profile image — the client PUTs directly to S3.
- Downloading a payment receipt — a 5-minute link.
- Private video streaming — a 1-hour token.
import boto3
s3 = boto3.client("s3")
url = s3.generate_presigned_url(
"put_object",
Params={
"Bucket": "my-bucket",
"Key": f"uploads/user-123/{filename}",
"ContentType": "image/jpeg",
},
ExpiresIn=600, # 10 minutes
)
# the client makes a PUT request to this URLcurl -X PUT --upload-file ./photo.jpg "<presigned-url>"The security of presigned URLs #
- The URL itself contains temporary credentials. Anyone with just that URL can use it.
- It automatically becomes invalid once the expiry time passes.
- Use HTTPS only. If it leaks over HTTP, it’s exposed.
- You can bake in conditions like ContentType / Content-Length too.
POST form vs PUT URL #
There are two upload methods.
- PUT URL — simple. Pass metadata via headers, and fix a single ContentType.
- POST form (presigned post) — complex. Safer with multiple conditions (
content-length-range,starts-with, etc.).
For large-scale or important uploads, the POST form is recommended. For simple cases, a PUT URL is enough.
Versioning and lifecycle #
Versioning — object history #
When you turn on Versioning for a bucket, the previous version is automatically preserved when you PUT to the same key multiple times.
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=EnabledAfter it’s on, the following holds.
- Even a Delete doesn’t actually delete. Only a Delete Marker is added.
- A previous version can be restored with
--version-id. - Storage cost sums up all versions (a trap).
Lifecycle — automatic cleanup / transition #
A rule that automatically moves old objects to a cheaper class or deletes them.
{
"Rules": [{
"ID": "ArchiveOldLogs",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" }
],
"Expiration": { "Days": 365 }
}]
}In operations a lifecycle is nearly essential. Without one, the bill becomes scary a few months later.
Storage classes — the cost item #
By placing the same data in a different class based on how often and how fast you retrieve it, you save a great deal on cost.
| Class | GB/month | Frequent access | Retrieval time | Role |
|---|---|---|---|---|
| Standard | $0.023 | Daily | Instant | Default. Hot data |
| Standard-IA | $0.0125 | Occasional | Instant | Backups, analytics data |
| One Zone-IA | $0.01 | Occasional, recreatable | Instant | One AZ only — low importance |
| Intelligent-Tiering | Auto | Pattern unknown | Instant | Access frequency varies |
| Glacier Instant Retrieval | $0.004 | Once a quarter | Instant | Archive + occasionally needed instantly |
| Glacier Flexible Retrieval | $0.0036 | 1~2 times a year | Minutes~hours | General archive |
| Glacier Deep Archive | $0.00099 | Almost never | 12 hours | Long-term compliance |
The numbers are approximate values for
ap-northeast-2. For details, see the official price list.
A class decision guide #
This data — do you look at it daily / weekly?
├── YES → Standard
└── NO →
Occasionally (once a month or less)?
├── YES → Standard-IA (One Zone-IA if recreatable)
└── NO →
Pattern predictable?
├── YES → Glacier family
└── NO → Intelligent-TieringTrap — class transition cost #
Each transition like Standard to IA costs a small amount, about $0.01 per object. If you have 100 million objects, this cost adds up. Don’t move them often with a lifecycle. Decide by looking at object size and frequency.
S3’s consistency #
In the old days, read-after-write consistency was weak. Since December 2020, strong consistency is guaranteed in all regions.
- GET immediately after PUT is possible.
- LIST after DELETE is reflected immediately.
That said, version objects or metadata changes may still have a slight time lag.
The role of S3 and other services #
| Service often used together | Pattern |
|---|---|
| CloudFront | S3 + Edge cache + your domain (Chapter 14) |
| Lambda | Image conversion / indexing via an S3 PUT trigger (Chapter 17) |
| Athena | SQL over CSV / Parquet / JSON in S3 |
| Glue | Data catalog / ETL over S3 |
| CloudTrail / VPC Flow Logs / ALB Logs | All stored in S3 |
Common pitfalls #
- A bucket unintentionally public — Half of the data leaks that make the news are S3. Start new buckets with all four PAB options on, and explicitly disable only in intended cases like static hosting.
- A cost bomb — It’s triple billing: storage per GB + number of requests + data transfer. In particular, traffic going out to the internet (Egress) is about $0.09 per GB, so for a popular static site this is large. Bundle it with CloudFront to cut Egress and accelerate with the Edge cache (Chapter 14).
- Millions of small files — There’s a GET / PUT cost for every single object, so a pattern of millions of small files is surprisingly costly. The answer is to bundle them (
tar.gz, Parquet) for storage or move to a different store. - A year with no Lifecycle — If logs or temp files just sit in Standard, the bill explodes a few months later. Set up the lifecycle on the first day you create the bucket.
- Turning on Versioning and forgetting — If you turn on Versioning with no lifecycle, storage cost grows without bound. If you turned it on, clean up old noncurrent versions with a lifecycle.
- Presigned URL expiry too long — A 24-hour presigned URL is effectively permanent access. Usually set it to 5 ~ 15 minutes, 1 hour at the longest.
s3:*wildcard IAM — AnAction: "s3:*"policy is dangerous. Spell out at least GetObject / PutObject / ListBucket.
Exercises #
- Of the five layers in §“The evaluation order of S3 permissions”, write down which layer you have to disable, and how, to deliberately make a static-hosting bucket public. Then, in one sentence, contrast why the OAC pattern of Chapter 14 CloudFront turns the same bucket’s PAB back on.
- Looking at the code that sets
ExpiresInto 600 seconds for a presigned PUT URL, write down what problems arise when the expiry is too long versus too short, based on §“The security of presigned URLs”. - Assume one log bucket and write a Lifecycle rule yourself that moves to Standard-IA after 30 days, to Glacier after 90 days, and deletes after 365 days. Note which item in §“A cost bomb” this rule reduces, connecting it to Chapter 27 Cost Optimization.
In short: S3 is infinite object storage with only two concepts — buckets (globally unique names) and objects (keys) — and directories are fake. For security, PAB is at the top, and new buckets turn on all four PAB options. Static hosting has no HTTPS or Edge, so bundle it with CloudFront; a presigned URL is a 5~15-minute temporary permission delegation; and Versioning must always be paired with a Lifecycle.
Next chapter #
We’ve got the object domain in hand. Next, Chapter 11 RDS moves on to relational DBs. We’ll lay out RDS’s managed model, automated backups and PITR, Multi-AZ, parameter / option groups, and how to handle minor vs major upgrades.