AWS Basics #6: Security Basics — MFA, Key Rotation, Least Privilege
#2 IAM gave us the permission model and #5 took us all the way to SSO. This post layers production-grade security guardrails on top.
90% of AWS security incidents are one of these:
- Root / user password compromise (phishing) — no MFA
- Access keys leaked to git / Slack / logs
- Overly broad permissions amplifying the blast radius after compromise
- CloudTrail / GuardDuty disabled — incidents go undetected
Putting guardrails on these four brings incident odds down to single digits.
MFA — the single most important thing #
Authentication via password alone is not 2026’s production standard. One phishing attempt and the password is gone. MFA (Multi-Factor Authentication) demands an additional 6-digit code from a second factor (usually a phone app).
Kinds of MFA #
| Kind | What it is | Recommendation |
|---|---|---|
| Virtual MFA (TOTP) | A phone app (Google Authenticator, 1Password, Authy) | Standard — almost everywhere |
| Hardware MFA | A USB key like YubiKey | Root / high-privilege — strongest |
| U2F / WebAuthn | Browser + hardware key | Production-grade credentials |
| SMS | Text message | Don’t use (SIM-swap attacks) |
Hardware MFA is ideal for root, virtual MFA is plenty for regular users.
Activate MFA on the root user #
Right after signup, the very first task.
Console (root login) → top-right user menu → Security credentials
→ Multi-factor authentication (MFA) → Assign MFA device
→ Pick Virtual MFA / Hardware MFA
→ Scan the QR code with your phone app
→ Enter two consecutive codes (the app rotates every 30s)After this every root login = password + 6-digit code.
Force MFA on IAM users #
Root alone is not enough. Force it on every IAM user. Two ways.
Option 1: A policy — “deny almost every action unless the session has MFA”
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSelfManageCredentials",
"Effect": "Allow",
"Action": [
"iam:ChangePassword",
"iam:CreateVirtualMFADevice",
"iam:EnableMFADevice",
"iam:GetUser",
"iam:ListMFADevices",
"iam:ResyncMFADevice"
],
"Resource": [
"arn:aws:iam::*:user/${aws:username}",
"arn:aws:iam::*:mfa/${aws:username}"
]
},
{
"Sid": "DenyAllExceptListedIfNoMFA",
"Effect": "Deny",
"NotAction": [
"iam:CreateVirtualMFADevice",
"iam:EnableMFADevice",
"iam:GetUser",
"iam:ListMFADevices",
"iam:ResyncMFADevice",
"iam:ChangePassword",
"sts:GetSessionToken"
],
"Resource": "*",
"Condition": {
"BoolIfExists": { "aws:MultiFactorAuthPresent": "false" }
}
}
]
}Attach this to the group all users belong to and they can do effectively nothing without MFA — except register their MFA device.
Option 2: SSO (#5)
IAM Identity Center forces MFA automatically at console / CLI login. No policy authoring needed.
First-login MFA-registration flow #
A new IAM user logs in for the first time → the policy above blocks everything except MFA registration → they register their MFA → normal use afterward. That’s the standard flow.
Access-key rotation — 90 days is the norm #
The access keys from #4. These keys accumulate leakage risk over time.
Rotation policy #
| Item | Recommended cadence |
|---|---|
| User access keys | 90 days |
| CI / CD keys | 60 days (or move to OIDC) |
| Service-account keys | 30–60 days |
| Temporary credentials (SSO / Role) | No rotation needed (auto-issued short-lived) |
Rotation procedure #
Use IAM’s ability to hold two keys at once.
# 1) Issue a new key (now two active keys)
aws iam create-access-key --user-name curtis
# 2) Replace the key everywhere (CI env vars, ~/.aws/credentials, etc.)
# 3) Monitor for a few days — is the old key still in use?
# Check via CloudTrail or IAM credential report
# 4) Deactivate the old key (don't delete yet — for rollback)
aws iam update-access-key --user-name curtis --access-key-id AKIA-OLD --status Inactive
# 5) Monitor a week more → really unused → delete
aws iam delete-access-key --user-name curtis --access-key-id AKIA-OLDIAM Credential Report — rotation audit #
A single CSV with every user’s keys / MFA / activity status.
aws iam generate-credential-report
aws iam get-credential-report --query Content --output text | base64 -d > report.csvuser
mfa_active
access_key_1_active
access_key_1_last_rotated
access_key_1_last_used_date
access_key_2_active
access_key_2_last_rotated
password_last_usedKeys older than 90 days, users without MFA, idle users for over a month — all in one place.
When a key leaks #
You pushed code containing a key. Bots usually find it within minutes.
Immediate steps (chronological) #
aws iam update-access-key --user-name <user> --access-key-id <KEY-ID> --status Inactiveaws iam create-access-key --user-name <user># Console → CloudTrail → Event history
# Filter by AccessKeyId → look for unintended usageaws iam delete-access-key --user-name <user> --access-key-id <KEY-ID># Use BFG Repo-Cleaner or git filter-repo to scrub the key from history
# (if it was already pushed, history scrub alone isn't enough — key deletion is more important)What AWS does for you #
AWS scans public repos like GitHub for leaked keys and, when found:
- Emails you immediately
- In some cases auto-deactivates the key + attaches a policy
That’s a backup safety net — finding it yourself is faster.
IAM Access Analyzer — finding overly broad permissions #
Analyzes your account’s policies / resource policies (S3 bucket policies, KMS key policies, etc.) to find anything externally accessible. Free.
Activation #
Console → IAM → Access Analyzer → Create analyzer
- Type: Account / Organization
- Name: my-account-analyzerWithin 24 hours you’ll see a list of resources reachable from outside.
What it catches #
| Resource | Risk |
|---|---|
| S3 bucket — public read | Anyone can read objects |
| S3 bucket — granted to another account | Verify it’s intended |
| KMS key — external use | Encryption key exposure |
| IAM Role — external trust | Another account can assume it |
| Lambda — external invoke permission | Anyone can invoke |
| RDS snapshots / SQS / SNS / Secrets Manager / EBS / ECR — public | Data / message exposure |
Policy validation #
When you author a new policy, Access Analyzer also surfaces recommendations.
- Unused permissions
- Wildcards that are too broad
- Suggestions to add conditions
Action Last Accessed — find unused permissions #
For each IAM user / role it shows the last action used. Permissions unused for 90 days are candidates to narrow.
aws iam generate-service-last-accessed-details --arn arn:aws:iam::123:role/MyRole
# (after a moment)
aws iam get-service-last-accessed-details --job-id ...Least privilege — patterns that hold up #
“Only what’s needed, only where it’s needed.” Easy on paper, but how do you do it in production?
Pattern 1: start broad → narrow #
Crafting perfect permissions up front is hard. The realistic flow:
- Start with
PowerUserAccess/ a service’s*FullAccess - After a week of use, check Access Analyzer’s Action Last Accessed
- Drop unused services / actions
- Replace wildcards with ARNs
- Add conditions
Repeat each quarter.
Pattern 2: separate users from roles #
Re-confirming the #2 patterns.
- Humans = SSO (#5)
- Machines = Roles (instance profiles, execution roles, OIDC)
- CI/CD = OIDC + Role (GitHub Actions, GitLab)
This is the setup where permanent access keys nearly disappear.
Pattern 3: permission boundary #
“Whatever this user does, it stays inside this fence.” Give a junior developer IAM rights without letting them create new policies / users to expand their own permissions.
{
"Effect": "Allow",
"Action": [
"ec2:*",
"s3:*",
"rds:*",
"logs:*"
],
"Resource": "*",
"Condition": {
"StringEquals": { "aws:RequestTag/env": "dev" }
}
}Users with this boundary can’t create resources outside dev — even with their own policies.
Pattern 4: environment separation #
- Account separation (Organizations) — the strongest
- VPC separation — even within one account, separate prod and dev VPCs (Intermediate #1)
- Tag separation —
env=prodtag for permission / cost separation (#3 tag strategy)
Pattern 5: break-glass role #
ReadOnly day-to-day, briefly elevated to Admin during an incident. Split into two SSO Permission Sets.
| Item | Day-to-day | During incident |
|---|---|---|
| In use | ReadOnly | Break-glass-Admin (1 hour) |
| Notification | — | Auto-notify a Slack channel |
| Audit | — | All actions logged in CloudTrail |
CloudTrail — who did what #
CloudTrail records every API call in the account. Auto-activated on signup (free 90-day event history). For production, create a Trail to persist into S3.
Post-signup check #
Console → CloudTrail → Trails
→ Create one Multi-region Trail (production standard)
- Name: my-trail
- Storage: a new S3 bucket
- Log file SSE-KMS encryption: on
- Log file validation: on (tamper detection)Two kinds of CloudTrail events #
| Kind | What it is | Cost |
|---|---|---|
| Management events | API calls (RunInstances, DeleteBucket, etc.) | Free (once) |
| Data events | S3 object access, Lambda invocations, etc. | Paid (high volume) |
Most Trails enable management only. Data events are turned on selectively for specific buckets / functions.
Common queries #
In the console’s Event history, search by:
- AccessKeyId — traces of a leaked key in use
- UserName — what one user did
- EventName —
ConsoleLogin,DeleteBucket,RunInstances - Time range
For larger setups, CloudTrail Lake or Athena enables SQL analysis.
GuardDuty — automated threat detection #
GuardDuty analyzes CloudTrail / VPC Flow Logs / DNS logs with ML and surfaces suspicious activity.
Examples it catches #
| Pattern | Description |
|---|---|
| EC2 talking to an unusual region | Compromise / C&C |
| Access key used from an unusual region | Use after leak |
| Cryptocurrency-mining traffic | Post-compromise mining |
| Unusual port-scan behavior from EC2 | Lateral movement |
| Communication with Tor exit nodes | Suspicious traffic |
| Anomalous IAM-user behavior | Stolen credentials |
Activation #
Console → GuardDuty → Get started → Enable
- 30-day free trial
- After trial, billed by data volume (typically \$10–50 / month for small ops)In production always turn this on. Per dollar, the incidents it prevents are far more valuable.
Security Hub — unified security posture #
Aggregates results from many security tools into one place. Auto-runs standards like CIS Benchmark and AWS Foundational Security Best Practices.
Console → Security Hub → Enable
- Turn on every recommended standard
- Findings from GuardDuty / Access Analyzer / Inspector aggregate hereRight after signup it’s a bit much — turn it on once your resource footprint is non-trivial, perhaps a quarter in.
Real incident scenarios #
Case 1: access key pushed to GitHub → crypto mining #
The most common scenario. Bots find it in minutes, costs accumulate by the hour.
Response: deactivate immediately → new key → check usage in CloudTrail → if needed, isolate the account. For billing disputes, contact AWS Support — most Free Tier incidents get retroactive forgiveness (don’t repeat).
Prevention: pre-commit hooks (gitleaks, truffleHog), GitHub secret scanning, don’t keep keys locally at all (SSO).
Case 2: overly broad S3 bucket policy #
To “let everyone in our company see it,” someone adds Principal: "*" → public read → indexed by search engines.
Response: Access Analyzer catches it. S3 Block Public Access enforces at account / bucket level.
Prevention: keep S3 Block Public Access on always, real-public needs go through CloudFront + Origin Access Control.
Case 3: root password phishing without MFA #
A fake “AWS billing alert” email → fake login page → password entered. Without root MFA, that’s a full breach.
Response: change the password immediately, audit every resource, contact AWS Support.
Prevention: root MFA is mandatory (ideally hardware), do daily work via IAM / SSO.
Case 4: a former employee’s keys still alive #
The IAM user / keys remain after offboarding. Months later, an incident.
Response: immediately deactivate / delete the user + audit Trail.
Prevention: include IAM cleanup in the offboarding checklist. With SSO, deactivating in the IdP is the single step.
Case 5: CloudTrail was off #
Investigation begins and there’s no log of the time of the incident — someone (intentionally) turned the Trail off, or it was never enabled.
Response: too late. Try partial reconstruction from other logs (CloudWatch, GuardDuty findings).
Prevention: Trail enabled + log file validation + S3 object lock. Use SCPs (Service Control Policies) to deny disabling CloudTrail in the first place.
Common pitfalls #
1) Just turning on MFA and stopping there #
Even with MFA on, a leaked access key still works (CLI calls don’t go through MFA — see the credential chain in #4). Keep an eye on keys too — or eliminate them by switching to SSO.
2) Promising to rotate but not actually doing it #
Automate periodic checks via Credential Report. Slack-alert keys older than 90 days.
3) Wide-open iam:PassRole
#
iam:PassRole = * is effectively privilege escalation. Tighten to specific role ARNs.
4) Conditions so strict you lock yourself out #
Allowing only the office IP via aws:SourceIp, then you’re traveling. Apply console IP guards last, with a fallback (VPN / break-glass user).
5) Turning off GuardDuty #
“To save money” — but a single incident dwarfs a year of GuardDuty cost. In production, always on.
6) Ignoring Security Hub findings #
If you turn it on but never look, it’s noise. Weekly review or EventBridge to Slack.
Wrap-up #
What we covered:
- MFA — required everywhere. Hardware for root if possible, virtual TOTP for the rest. No SMS
- MFA enforcement policy — force on IAM users via policy. SSO does it automatically
- 90-day access-key rotation — issue new → swap → deactivate old → delete a week later. Audit via Credential Report
- On leak — deactivate immediately, check usage in CloudTrail, new key, then delete
- IAM Access Analyzer — externally accessible resources, policy validation, Action Last Accessed (unused permissions)
- Least-privilege patterns — start broad → narrow, user/role split, permission boundary, env separation, break-glass
- CloudTrail — one multi-region Trail + S3 + log validation. SCP to block disabling
- GuardDuty — ML-based threat detection. Production essential
- Security Hub — unified standard checks. Once your footprint is non-trivial
- Incident scenarios — key push, overly broad S3 policy, root phishing, ex-employee keys, Trail off
- Pitfalls — MFA-only isn’t safe, promise-only rotation, PassRole wildcards, IP conditions too strict, GuardDuty off, ignoring Hub findings
Next — CloudWatch #
The last stop in the series. We’ll cover CloudWatch, the eyes of all production work.
#7 CloudWatch intro — logs and metrics walks through Logs / Metrics / Alarms / Dashboards, log groups and retention, Metric Filters, and the basics of Logs Insights queries.