AWS in Practice #4: IaC — Terraform Fundamentals

Tuesday, May 5, 2026

10 min read

The infrastructure built in #1 through #3 is still managed by hand through the console / CLI. If we had to recreate it from scratch — from memory? from notes? — the results would be inconsistent. This post moves that to Terraform.

What we’ll cover:

Why IaC — repeatability / code review / drift tracking
The shape of Terraform — provider, resource, data, variable, output, state
State is the real core — S3 + DynamoDB lock backend
Modules — reusable units, environment branching
Code-ifying #1’s ECS infrastructure line by line

Why IaC #

Four pains you meet from the console-only approach:

Can’t reproduce — “spin up staging just like prod”? Human memory always leaves subtle differences
Can’t track changes — “who changed the SG last week?” → digging through CloudTrail. With code, it’s git log
Can’t review — that one-line SG inbound change to production cluster doesn’t get peer eyes
Delete / recreate fear — get one resource wrong and you’re afraid to fix it

IaC (Infrastructure as Code) expresses infrastructure as declarative code, addressing all four problems at once.

Tool	Role
Terraform	Multi-cloud, the most standard. Star of this post
Pulumi	Written in TypeScript / Python / Go. Strong for dynamic logic
AWS CDK	TypeScript / Python → transpiles to CloudFormation
CloudFormation	AWS-native YAML/JSON. Weak in dynamic expression
OpenTofu	OSS fork of Terraform (after license dispute)

This series unifies on Terraform. Even if your company uses OpenTofu by policy, the syntax is identical.

1) Terraform’s five blocks #

main.tf — the smallest shape

# 1) Provider — how to talk to AWS
provider "aws" {
  region = "ap-northeast-2"
}

# 2) Resource — actual infrastructure to create
resource "aws_ecr_repository" "blog_api" {
  name                 = "blog-api"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }
}

# 3) Data — query existing resources
data "aws_caller_identity" "current" {}

# 4) Variable — external input
variable "environment" {
  type    = string
  default = "dev"
}

# 5) Output — expose results
output "ecr_url" {
  value = aws_ecr_repository.blog_api.repository_url
}

Five blocks in one place make a unit of infrastructure.

The 4-step workflow #

Terraform's 4 steps

terraform init      # download providers, init backend
terraform plan      # preview what gets created/changed/destroyed
terraform apply     # apply
terraform destroy   # delete

The output of plan is Terraform’s biggest value. It catches incidents before code merge.

plan output example

Terraform will perform the following actions:

  # aws_security_group.fargate will be created
  + resource "aws_security_group" "fargate" {
      + arn                    = (known after apply)
      + name                   = "sg-fargate"
      + ingress = [
          + {
              + from_port = 8000
              + to_port   = 8000
              + protocol  = "tcp"
              + ...
            },
        ]
    }

Plan: 1 to add, 0 to change, 0 to destroy.

+ add / ~ change / - destroy / -/+ recreate (if the ID changes, the resource is replaced — always pay attention to this).

2) State — the real core #

Terraform stores “the state of the infrastructure built so far” in state (the .tfstate file). This file has to exist for the next plan to compute the diff.

Where state lives

Actual AWS infrastructure  ←──────  Terraform code
                                      │
                                      ▼
                              state (last apply's result)

Terraform looks at the 3-way consistency code ↔ state ↔ AWS and then plans changes.

What happens if state breaks #

Situation	Result
State lost	Terraform thinks “nothing was created” → tries to recreate existing resources
Two people apply simultaneously	State breaks or one overwrites the other’s changes
State file as plaintext in git	Password / key exposure (state contains secrets in many resources)

Local .tfstate is for learning only. Production needs a remote backend.

S3 + DynamoDB Backend #

The most common production pattern.

backend.tf

terraform {
  required_version = ">= 1.7"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "blog-api/prod/terraform.tfstate"
    region         = "ap-northeast-2"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Roles:

	Role
S3 bucket	State file storage (versioning + encryption enabled)
DynamoDB table	Block concurrent applies — lock table
bucket key prefix	`<project>/<env>/terraform.tfstate` pattern for env separation
encrypt = true	Auto-encrypt with KMS

One-time bootstrap to set up the backend #

S3 and DynamoDB themselves need to exist first. A classic chicken-and-egg problem. Two approaches:

Manual creation via console / CLI once (this post’s assumption)
Create with local backend in a separate “bootstrap” folder, then migrate backend to S3

bootstrap

aws s3api create-bucket \
  --bucket myorg-terraform-state \
  --region ap-northeast-2 \
  --create-bucket-configuration LocationConstraint=ap-northeast-2

aws s3api put-bucket-versioning \
  --bucket myorg-terraform-state \
  --versioning-configuration Status=Enabled

aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region ap-northeast-2

Never destroy these two via Terraform. Your state lives inside.

3) Directory structure — environment separation #

Production shape

infra/
├─ modules/
│   ├─ network/        ← VPC, Subnets, SGs
│   ├─ ecs-service/    ← ALB + Service + Auto Scaling
│   └─ rds/            ← DB
├─ envs/
│   ├─ dev/
│   │   ├─ main.tf
│   │   ├─ backend.tf
│   │   ├─ variables.tf
│   │   └─ terraform.tfvars
│   └─ prod/
│       ├─ main.tf
│       ├─ backend.tf
│       ├─ variables.tf
│       └─ terraform.tfvars
└─ bootstrap/          ← S3 / DynamoDB (one-time)

Use different backend keys per environment to keep state separate:

envs/dev/backend.tf

terraform { backend "s3" {
  bucket         = "myorg-terraform-state"
  key            = "blog-api/dev/terraform.tfstate"
  region         = "ap-northeast-2"
  dynamodb_table = "terraform-state-lock"
}}

This fully separates dev and prod. dev’s apply can never touch prod state.

4) Modules — units of reuse #

Don’t repeat the same infrastructure pattern in dev / prod.

modules/ecs-service/variables.tf

variable "name"          { type = string }
variable "cluster_arn"   { type = string }
variable "image"         { type = string }
variable "vpc_id"        { type = string }
variable "subnet_ids"    { type = list(string) }
variable "alb_sg_id"     { type = string }
variable "desired_count" { type = number, default = 2 }
variable "cpu"           { type = string, default = "512" }
variable "memory"        { type = string, default = "1024" }
variable "container_port" { type = number, default = 8000 }

modules/ecs-service/main.tf (excerpt)

resource "aws_security_group" "fargate" {
  name        = "sg-${var.name}-fargate"
  description = "Fargate task SG"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = var.container_port
    to_port         = var.container_port
    protocol        = "tcp"
    security_groups = [var.alb_sg_id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_lb_target_group" "this" {
  name        = "tg-${var.name}"
  port        = var.container_port
  protocol    = "HTTP"
  target_type = "ip"
  vpc_id      = var.vpc_id

  health_check {
    path                = "/health"
    healthy_threshold   = 2
    interval            = 15
  }
}

resource "aws_ecs_task_definition" "this" {
  family                   = var.name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.execution.arn
  task_role_arn            = aws_iam_role.task.arn

  container_definitions = jsonencode([{
    name  = "api"
    image = var.image
    portMappings = [{ containerPort = var.container_port, protocol = "tcp" }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.this.name
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = "api"
      }
    }
  }])
}

resource "aws_ecs_service" "this" {
  name            = var.name
  cluster         = var.cluster_arn
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.subnet_ids
    security_groups  = [aws_security_group.fargate.id]
    assign_public_ip = true
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.this.arn
    container_name   = "api"
    container_port   = var.container_port
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }
}

output "target_group_arn" { value = aws_lb_target_group.this.arn }
output "service_name"     { value = aws_ecs_service.this.name }

#1’s console work is now in this single file.

Using the module #

envs/prod/main.tf

module "network" {
  source       = "../../modules/network"
  name         = "blog-prod"
  cidr         = "10.0.0.0/16"
  azs          = ["ap-northeast-2a", "ap-northeast-2c"]
}

module "rds" {
  source            = "../../modules/rds"
  name              = "blog-prod"
  vpc_id            = module.network.vpc_id
  db_subnet_ids     = module.network.db_subnet_ids
  fargate_sg_id     = module.api.fargate_sg_id
  multi_az          = true
  instance_class    = "db.t4g.small"
  deletion_protection = true
}

module "api" {
  source         = "../../modules/ecs-service"
  name           = "blog-prod"
  cluster_arn    = aws_ecs_cluster.blog.arn
  image          = var.image  # injected by CI
  vpc_id         = module.network.vpc_id
  subnet_ids     = module.network.private_subnet_ids
  alb_sg_id      = module.network.alb_sg_id
  desired_count  = 4
  cpu            = "1024"
  memory         = "2048"
}

The dev environment uses a lighter configuration: desired_count = 1, multi_az = false, instance_class = "db.t4g.micro". Same module, different variables is the key.

5) Terraform ↔ CI/CD integration #

How to bundle with GitHub Actions from #3.

Two flows #

	Description
A. Separate infra / app	Infra changes via separate PR + apply, app deploy just updates the image
B. One bundled workflow	Image build → terraform apply puts new image into service

A is recommended to start. Infrastructure changes are infrequent and high-risk; app deploys are frequent and lower-risk. Keeping them separate reflects that difference.

Plan as a PR comment #

.github/workflows/terraform-plan.yml

name: Terraform Plan
on:
  pull_request:
    paths: ['infra/**']

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  plan:
    runs-on: ubuntu-latest
    defaults:
      run: { working-directory: infra/envs/prod }
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-plan
          aws-region: ap-northeast-2
      - uses: hashicorp/setup-terraform@v3
        with: { terraform_version: 1.9.0 }
      - run: terraform init
      - run: terraform plan -no-color -out=tfplan
      - name: Comment Plan
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const out = require('child_process')
              .execSync('terraform show -no-color tfplan', { cwd: 'infra/envs/prod' });
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '```\n' + out + '\n```'
            });

The PR review stage surfaces what changes in one place — the most effective point for catching potential production incidents before code merges.

The terraform-plan role is enough with read-only permissions. Apply needs a separate role.

6) Drift tracking #

Anything changed by hand in the console diverges from state — this is called drift. terraform plan shows the diff, effectively asking “should I revert this?”

Periodic drift check

terraform plan -detailed-exitcode
# exit 0 = no diff
# exit 2 = diff exists (not a failure)

Run daily in CI, alert Slack on exit 2.

Pitfalls — Terraform operations #

1) State lock not released #

Apply was ctrl-c’d → DynamoDB lock remains. Next apply fails with “Resource locked.”

Force unlock (careful)

terraform force-unlock <LOCK_ID>

LOCK_ID is in the error message. Always confirm that no one else is actually working before doing this.

2) Manual state edits #

Opening .tfstate in vim and editing directly almost always ends in regret. Instead:

state commands

terraform state list                       # list resources
terraform state show aws_ecr_repository.x  # show one resource
terraform state rm aws_ecr_repository.x    # remove from state (doesn't delete actual resource)
terraform state mv module.a.x module.b.x   # move resource
terraform import aws_ecr_repository.x my-repo  # register existing resource into state

3) Plaintext password in state #

aws_db_instance password, aws_secretsmanager_secret_version secret_string — go into state as plaintext. State bucket encryption + access restriction is essential.

State bucket policy (example)

data "aws_iam_policy_document" "state_bucket" {
  statement {
    effect    = "Deny"
    actions   = ["s3:*"]
    resources = ["arn:aws:s3:::myorg-terraform-state/*"]
    condition {
      test     = "Bool"
      variable = "aws:SecureTransport"
      values   = ["false"]
    }
  }
}

4) `-/+ destroy/create` #

If -/+ shows in plan, the resource ID changes. For RDS, that’s data loss. Things to look at carefully:

  # aws_db_instance.blog must be replaced
-/+ resource "aws_db_instance" "blog" {
      ~ engine_version = "16.3" -> "17.0"  # forces replacement
    }

A change like this requires a separate migration procedure. RDS has dedicated options for in-place major version upgrades.

5) Provider version not pinned #

Without version in required_providers, the next init may pull a breaking version. Always pin with a pattern like ~> 5.0.

6) `terraform destroy` accident #

Accidentally destroying production. Protection:

Protect important resources

resource "aws_db_instance" "blog" {
  # ...
  lifecycle {
    prevent_destroy = true
  }
}

A resource with prevent_destroy = true blocks destroy / replace at the plan stage.

Wrapping up #

What we covered in this post:

Why IaC — reproducibility / tracking / review / safe destroy
Five blocks — provider, resource, data, variable, output
Workflow — init → plan → apply → destroy. Plan is the biggest value
State — Terraform’s core. Local state is for learning only
S3 + DynamoDB Backend — production standard, encrypt, versioning
Bootstrap — backend itself via console / separate shape
Directory structure — modules/ + envs/{dev,prod}/, separate backend keys per env
Modules — same pattern, different variables. dev = light, prod = full options
CI/CD integration — Plan as PR comment, separate plan/apply permissions
Drift tracking — plan -detailed-exitcode periodically
Pitfalls — lock release, state edits, plaintext password, -/+, provider version, destroy protection

Next — Monitoring #

Infrastructure is now code and deployment is automated. Now it’s time to seriously look at whether it’s running / running well.

In #5 Monitoring — CloudWatch alarms and X-Ray we’ll cover the core metrics of ECS / RDS / ALB, operational queries in Logs Insights, sending alarms to Slack, and X-Ray distributed tracing for a one-line answer to “why did this one request take 5 seconds?”

Why IaC #

1) Terraform’s five blocks #

The 4-step workflow #

2) State — the real core #

What happens if state breaks #

S3 + DynamoDB Backend #

One-time bootstrap to set up the backend #

3) Directory structure — environment separation #

4) Modules — units of reuse #

Using the module #

5) Terraform ↔ CI/CD integration #

Two flows #

Plan as a PR comment #

6) Drift tracking #

Pitfalls — Terraform operations #

1) State lock not released #

2) Manual state edits #

3) Plaintext password in state #

4) -/+ destroy/create #

5) Provider version not pinned #

6) terraform destroy accident #

Wrapping up #

Next — Monitoring #

4) `-/+ destroy/create` #

6) `terraform destroy` accident #