K8s Practice #1: EKS Cluster Setup — Terraform / eksctl / IRSA / Addons

12 min read

The first post in the K8s Practice series. If the Basics, Intermediate, and Advanced tracks (20 posts) were the path of learning K8s at the object level — from a single manifest to policy engines and observability — the Practice series of 6 posts is the flow of putting one real service on top and operating it. We take the imaginary backend service myshop-api, put it on EKS, connect it to RDS, deploy it through CI/CD, and carry it through monitoring and operations as one bundle. This post is the starting point of that flow — bringing up an EKS cluster from scratch. We declare the VPC and cluster with Terraform, set up node groups and IRSA, and install the essential addons.

This series is K8s Practice, 6 posts.

Tip
The hands-on posts in this series have you write Terraform, Helm, and K8s YAML manifests by hand. One misplaced indent or quote sends kubectl apply into an error that points away from the real cause, leaving you to trace it back from the cluster side. Pasting the manifest into utilrepo’s YAML validator before applying surfaces syntax errors with line and column numbers. utilrepo is a collection of lightweight web utilities that run in your browser, so secrets never leave your machine, and it also catches multi-document manifests joined by --- and tab-space mixes you’d otherwise miss.

myshop-api — the imaginary service running through the series #

Locking in the scenario that binds the 6 posts in one line. Imaginary company myshop puts its own backend API service myshop-api on EKS. Specs are simple.

  • A small REST API written in Python (FastAPI) or Go
  • Uses RDS PostgreSQL as the data store
  • Externally exposed via HTTPS
  • prod / dev two environments
  • Pod count auto-adjusts with traffic swings

This scenario is followed consistently from #1’s cluster to #6’s operational cycle. Each post is the input of the next, and at the point of having followed through to the last post, one cycle of an actual operational cluster is in hand.

Choosing setup tools — Terraform vs eksctl #

The choices for creating an EKS cluster are several.

ToolModelWhere it fits
AWS ConsoleClickLearning / one-time look
eksctlOne YAML + CLIPoC / fast setup / learning
TerraformDeclared in HCLOperational cluster / multi-environment / IaC standard
AWS CDK / PulumiDeclared in code (TypeScript, Python, etc.)Code-friendly teams / complex branching

The de facto standard for operational clusters is Terraform. VPC, IAM, EKS control plane, node groups, addons, RDS, Route53 can all be declared as a single codebase and stored in git, making the cluster itself reproducible as code. The same manifest applies identically to dev / prod, and changes go through PR review.

eksctl can’t quite replace that position — its abstraction is too EKS-specific — but it is the most intuitive tool for fast setup and learning. This post covers Terraform as the main tool and briefly touches on eksctl as a comparison option.

Terraform project structure #

Locking in the Terraform code structure for myshop-api infrastructure.

terraform/ directory structure
terraform/
├── modules/
│   ├── network/         # VPC, subnets, NAT, routing
│   ├── eks/             # EKS cluster + node groups
│   └── addons/          # VPC CNI, EBS CSI, IRSA roles
├── envs/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       └── ...
└── versions.tf

A structure that puts reusable units in modules/ and instantiates dev / prod differently in envs/. The dev / prod difference is instance type, node count, and multi-AZ; the modules themselves are shared.

Provider and backend #

envs/prod/main.tf — provider and state backend
terraform {
  required_version = ">= 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "myshop-tfstate"
    key            = "eks/prod/terraform.tfstate"
    region         = "ap-northeast-2"
    dynamodb_table = "myshop-tfstate-lock"
    encrypt        = true
  }
}

provider "aws" {
  region = "ap-northeast-2"

  default_tags {
    tags = {
      Project     = "myshop"
      Environment = "prod"
      ManagedBy   = "terraform"
    }
  }
}

A standard pattern with the state file on S3 and lock on DynamoDB. default_tags automatically attaches tags to every resource made by this provider, simplifying cost tracking.

VPC — the foundation of EKS #

EKS does not create its own VPC. The control plane’s ENI is plugged into a user-provided VPC, so the VPC and subnets must be defined before creating the cluster.

VPC module — terraform-aws-modules/vpc/aws #

The standard is using a community module rather than defining a VPC from scratch.

modules/network/main.tf
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.env}"
  cidr = "10.10.0.0/16"

  azs              = ["ap-northeast-2a", "ap-northeast-2c"]
  private_subnets  = ["10.10.1.0/24",  "10.10.2.0/24"]
  public_subnets   = ["10.10.101.0/24", "10.10.102.0/24"]
  database_subnets = ["10.10.201.0/24", "10.10.202.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = var.env == "dev"
  one_nat_gateway_per_az = var.env == "prod"

  enable_dns_hostnames = true
  enable_dns_support   = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }
}

The key when paired with EKS is the last two tags.

  • kubernetes.io/role/elb = 1 — attached to public subnets, the AWS Load Balancer Controller creates external LBs in these subnets.
  • kubernetes.io/role/internal-elb = 1 — attached to private subnets, internal LBs (cluster-internal + within VPC) are created here.

Without these tags, the LB doesn’t know which subnet to go into, and the Ingress to be created in #2 won’t work.

single NAT vs NAT per AZ #

single_nat_gateway is only enabled in dev. The cost of a single NAT Gateway is not negligible (per hour + per data transfer), so dev environments consolidate into one, while prod has one per AZ so that workloads in other AZs are not affected if one AZ goes down.

EKS module — control plane + node groups #

Once VPC is ready, define the EKS cluster itself.

modules/eks/main.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "${var.project}-${var.env}"
  cluster_version = "1.30"

  vpc_id     = var.vpc_id
  subnet_ids = var.private_subnet_ids

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  enable_irsa = true

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent              = true
      service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
    }
  }

  eks_managed_node_groups = {
    general = {
      desired_size = var.env == "prod" ? 3 : 2
      min_size     = 2
      max_size     = 10

      instance_types = ["t3.medium"]
      capacity_type  = var.env == "prod" ? "ON_DEMAND" : "SPOT"

      labels = {
        role = "general"
      }
    }
  }
}

This single module creates all of the following:

  • EKS control plane (managed K8s 1.30)
  • Cluster’s IAM Role
  • OIDC provider (IRSA’s foundation — enable_irsa = true)
  • Managed Node Group (worker nodes created as EC2 instances)
  • 4 standard addons (VPC CNI, CoreDNS, kube-proxy, EBS CSI Driver)

The OIDC provider covered in the Advanced #2 IRSA section is automatically activated here, and in #3 when binding a ServiceAccount to an IAM Role, that OIDC provider becomes the foundation of trust.

Managed Node Group as the standard #

EKS worker nodes can be configured in three ways.

TypeModel
Managed Node GroupEKS manages the EC2 nodes’ lifecycle. Most standard path.
Self-managed Node GroupUser operates EC2 group directly. For advanced customization.
FargateServerless. No need to care about nodes themselves. Has cost / constraints.

Managed Node Group is the standard for first adoption. You define instance type, size, and capacity_type (ON_DEMAND / SPOT) in a manifest, and EKS automatically manages node joining, removal, and upgrades. The node OS is Amazon Linux 2 or Bottlerocket, shipped with an AMI that includes kubelet, containerd, and the CNI agent pre-installed.

Two modes of cluster endpoint #

The combination of cluster_endpoint_public_access and cluster_endpoint_private_access flags controls cluster API server access.

publicprivateMeaning
truefalseAnyone from the internet can access (RBAC is the only security boundary)
truetrueInternet + VPC internal both (most common setting)
falsetrueOnly from VPC internal. Most strict. Requires bastion or VPN.

Security guides for prod environments usually recommend the last option (private only), but for GitHub Actions to call kubectl directly, public access must be enabled. The compromise of enabling public access with an IP allowlist (cluster_endpoint_public_access_cidrs) is frequently used.

IRSA — granting IAM Role to addons #

Addons like EBS CSI Driver, AWS Load Balancer Controller, External Secrets call AWS API from inside the cluster. The pattern of granting permissions to these calls via IRSA.

modules/eks/irsa-ebs-csi.tf
module "ebs_csi_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name = "${var.project}-${var.env}-ebs-csi"

  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

This module automatically creates:

  • IAM Role (named myshop-prod-ebs-csi)
  • Policy (EBS disk create / delete / attach permissions)
  • Trust policy (the trust policy from Advanced #2 IRSA — restricting only the ebs-csi-controller-sa ServiceAccount in the kube-system namespace to take this Role)

This IAM Role’s ARN is passed to cluster_addons.aws-ebs-csi-driver.service_account_role_arn above, and EKS automatically attaches an annotation to the EBS CSI addon’s ServiceAccount. As a result, the flow where creating a PVC automatically provisions an EBS volume operates.

The same pattern applies to External Secrets in #3 and to the CloudWatch side covered in #5. A 1:1 mapping of ServiceAccount + IAM Role per workload is the standard security structure of K8s practice.

kubeconfig — accessing the cluster #

After creating the cluster with Terraform, kubeconfig must be fetched locally for kubectl to communicate with that cluster.

Update kubeconfig
aws eks update-kubeconfig \
  --region ap-northeast-2 \
  --name myshop-prod
Verify
kubectl get nodes
kubectl get pods -A
Expected output
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-10-1-145.ap-northeast-2.compute.internal  Ready    <none>   3m    v1.30.0
ip-10-10-2-201.ap-northeast-2.compute.internal  Ready    <none>   3m    v1.30.0
ip-10-10-2-83.ap-northeast-2.compute.internal   Ready    <none>   3m    v1.30.0

When three nodes are up and the system Pods (coredns, kube-proxy, aws-node, ebs-csi-controller) in the kube-system namespace are all in Running state, the cluster is normal.

Access permissions — the role of aws-auth ConfigMap #

The IAM user/Role that created the EKS cluster automatically receives system:masters permission. To grant access to other IAM users, either modify the aws-auth ConfigMap or use the newer model available since 1.23+, EKS Access Entries. Access Entries is the more standard approach and is recommended in Terraform modules.

EKS Access Entries — Terraform
access_entries = {
  developers = {
    principal_arn = "arn:aws:iam::123456789012:role/Developer"

    policy_associations = {
      view = {
        policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
        access_scope = {
          type = "cluster"
        }
      }
    }
  }
}

A manifest binding one IAM Role with cluster-wide view permission. K8s RBAC’s view ClusterRole is automatically mapped, so users with this Role can read every object in the cluster.

eksctl — the path of fast setup #

When a quick cluster is needed for learning or PoC, eksctl gets you there in a single command.

cluster.yaml — eksctl manifest
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: myshop-dev
  region: ap-northeast-2
  version: "1.30"

vpc:
  nat:
    gateway: Single

managedNodeGroups:
  - name: general
    instanceType: t3.medium
    desiredCapacity: 2
    minSize: 2
    maxSize: 5
    spot: true

addons:
  - name: vpc-cni
  - name: coredns
  - name: kube-proxy
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true
Create cluster
eksctl create cluster -f cluster.yaml

This single command automatically creates the VPC, EKS cluster, node groups, OIDC provider, and basic addons. It takes 15–20 minutes, and kubeconfig is automatically updated when it finishes.

A quick eksctl summary:

  • Strengths — lowest learning curve, one manifest for an entire cluster
  • Weaknesses — hard to manage as a bundle with surrounding resources like multi-environment / RDS / Route53. Internally uses CloudFormation, so state is separated from Terraform.

The endpoint for operational clusters is almost always Terraform, but eksctl is the fastest for first learning or one-time PoC.

Karpenter — a new path for node autoscaling #

Managed Node Group’s own autoscaling (Cluster Autoscaler) is possible, but limited to a few pre-defined instance types. In environments with large traffic swings and varied workload resource requirements, Karpenter is establishing itself as the new standard.

Karpenter is a different kind of autoscaling from the metric-based autoscaling covered in Advanced #5. It looks at Pods in Pending state, picks an instance type that matches the Pod’s resource requirements in real time, and brings up new nodes. Instead of using a pre-defined instance pool, it selects the best fit from all of AWS’s EC2 types.

Karpenter NodePool — node auto-provisioning policy
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["t3.medium", "t3.large", "m5.large", "m5.xlarge"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

Adopting Karpenter from the start is a heavier lift, so the natural flow is to begin with Managed Node Group and migrate to Karpenter once traffic patterns stabilize. The series will revisit this in the #6 operations checklist.

First checks after cluster setup #

Commands worth running once right after the cluster comes up.

Version and node health
kubectl version --short
kubectl get nodes -o wide
System Pods
kubectl get pods -n kube-system
OIDC provider check (was IRSA activated)
aws eks describe-cluster \
  --name myshop-prod \
  --region ap-northeast-2 \
  --query "cluster.identity.oidc.issuer" \
  --output text
EKS addon status
aws eks list-addons --cluster-name myshop-prod --region ap-northeast-2
aws eks describe-addon --cluster-name myshop-prod \
  --addon-name vpc-cni --region ap-northeast-2

These four commands verify that cluster, nodes, system Pods, OIDC, and addons are all normal. If any anomalies are found, resolving them before moving to the next post is the safer approach.

First impression of cost #

EKS cluster cost breaks down into three main components.

ItemCost (ap-northeast-2 basis)
EKS control plane$0.10/hour (≈ $73/month)
EC2 nodes (3× t3.medium)About $80/month (ON_DEMAND) / $25 (SPOT)
NAT GatewayAbout $35/month + data transfer
EBS / Load Balancer / data transferPer usage

The starting cost of the smallest prod cluster is around $200–$300/month. Using SPOT instances and a Single NAT in the dev environment brings it to less than half that. Cost is revisited in #6, but it pays to be aware of the baseline from the moment the cluster is created.

Closing #

The first post in the K8s Practice series is wrapped up. We followed the flow of declaring VPC, EKS control plane, node groups, IRSA, and standard addons as a single codebase with Terraform, and noted the path of starting fast with eksctl and broadening node autoscaling with Karpenter. At this point the cluster is empty — nodes are up and system Pods are alive, but the myshop-api we want to put up hasn’t entered as a single line of manifest yet. The next post fills that void — organizing Deployment / Service / Ingress / ConfigMap / Secret as a single bundle and connecting per-environment deployment via Helm chart.

X