K8s Practice #1: EKS Cluster Setup — Terraform / eksctl / IRSA / Addons
The first post in the K8s Practice series. If the Basics, Intermediate, and Advanced tracks (20 posts) were the path of learning K8s at the object level — from a single manifest to policy engines and observability — the Practice series of 6 posts is the flow of putting one real service on top and operating it. We take the imaginary backend service myshop-api, put it on EKS, connect it to RDS, deploy it through CI/CD, and carry it through monitoring and operations as one bundle. This post is the starting point of that flow — bringing up an EKS cluster from scratch. We declare the VPC and cluster with Terraform, set up node groups and IRSA, and install the essential addons.
This series is K8s Practice, 6 posts.
- #1 EKS Cluster Setup — Terraform / eksctl / IRSA / Addons ← this post
- #2 App deployment skeleton — Deployment / Service / Ingress / Helm
- #3 DB integration — RDS / Secrets Manager / External Secrets / connection pool
- #4 CI/CD pipeline — GitHub Actions / ECR / ArgoCD
- #5 Monitoring/alarming — Prometheus / CloudWatch / Alertmanager
- #6 Operations checklist — upgrades / backup,recovery / cost / security
kubectl apply into an error that points away from the real cause, leaving you to trace it back from the cluster side. Pasting the manifest into utilrepo’s YAML validator before applying surfaces syntax errors with line and column numbers. utilrepo is a collection of lightweight web utilities that run in your browser, so secrets never leave your machine, and it also catches multi-document manifests joined by --- and tab-space mixes you’d otherwise miss.myshop-api — the imaginary service running through the series #
Locking in the scenario that binds the 6 posts in one line. Imaginary company myshop puts its own backend API service myshop-api on EKS. Specs are simple.
- A small REST API written in Python (FastAPI) or Go
- Uses RDS PostgreSQL as the data store
- Externally exposed via HTTPS
- prod / dev two environments
- Pod count auto-adjusts with traffic swings
This scenario is followed consistently from #1’s cluster to #6’s operational cycle. Each post is the input of the next, and at the point of having followed through to the last post, one cycle of an actual operational cluster is in hand.
Choosing setup tools — Terraform vs eksctl #
The choices for creating an EKS cluster are several.
| Tool | Model | Where it fits |
|---|---|---|
| AWS Console | Click | Learning / one-time look |
| eksctl | One YAML + CLI | PoC / fast setup / learning |
| Terraform | Declared in HCL | Operational cluster / multi-environment / IaC standard |
| AWS CDK / Pulumi | Declared in code (TypeScript, Python, etc.) | Code-friendly teams / complex branching |
The de facto standard for operational clusters is Terraform. VPC, IAM, EKS control plane, node groups, addons, RDS, Route53 can all be declared as a single codebase and stored in git, making the cluster itself reproducible as code. The same manifest applies identically to dev / prod, and changes go through PR review.
eksctl can’t quite replace that position — its abstraction is too EKS-specific — but it is the most intuitive tool for fast setup and learning. This post covers Terraform as the main tool and briefly touches on eksctl as a comparison option.
Terraform project structure #
Locking in the Terraform code structure for myshop-api infrastructure.
terraform/
├── modules/
│ ├── network/ # VPC, subnets, NAT, routing
│ ├── eks/ # EKS cluster + node groups
│ └── addons/ # VPC CNI, EBS CSI, IRSA roles
├── envs/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ └── ...
└── versions.tfA structure that puts reusable units in modules/ and instantiates dev / prod differently in envs/. The dev / prod difference is instance type, node count, and multi-AZ; the modules themselves are shared.
Provider and backend #
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "myshop-tfstate"
key = "eks/prod/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "myshop-tfstate-lock"
encrypt = true
}
}
provider "aws" {
region = "ap-northeast-2"
default_tags {
tags = {
Project = "myshop"
Environment = "prod"
ManagedBy = "terraform"
}
}
}A standard pattern with the state file on S3 and lock on DynamoDB. default_tags automatically attaches tags to every resource made by this provider, simplifying cost tracking.
VPC — the foundation of EKS #
EKS does not create its own VPC. The control plane’s ENI is plugged into a user-provided VPC, so the VPC and subnets must be defined before creating the cluster.
VPC module — terraform-aws-modules/vpc/aws #
The standard is using a community module rather than defining a VPC from scratch.
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.project}-${var.env}"
cidr = "10.10.0.0/16"
azs = ["ap-northeast-2a", "ap-northeast-2c"]
private_subnets = ["10.10.1.0/24", "10.10.2.0/24"]
public_subnets = ["10.10.101.0/24", "10.10.102.0/24"]
database_subnets = ["10.10.201.0/24", "10.10.202.0/24"]
enable_nat_gateway = true
single_nat_gateway = var.env == "dev"
one_nat_gateway_per_az = var.env == "prod"
enable_dns_hostnames = true
enable_dns_support = true
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
}The key when paired with EKS is the last two tags.
kubernetes.io/role/elb = 1— attached to public subnets, the AWS Load Balancer Controller creates external LBs in these subnets.kubernetes.io/role/internal-elb = 1— attached to private subnets, internal LBs (cluster-internal + within VPC) are created here.
Without these tags, the LB doesn’t know which subnet to go into, and the Ingress to be created in #2 won’t work.
single NAT vs NAT per AZ #
single_nat_gateway is only enabled in dev. The cost of a single NAT Gateway is not negligible (per hour + per data transfer), so dev environments consolidate into one, while prod has one per AZ so that workloads in other AZs are not affected if one AZ goes down.
EKS module — control plane + node groups #
Once VPC is ready, define the EKS cluster itself.
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "${var.project}-${var.env}"
cluster_version = "1.30"
vpc_id = var.vpc_id
subnet_ids = var.private_subnet_ids
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
enable_irsa = true
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
}
}
eks_managed_node_groups = {
general = {
desired_size = var.env == "prod" ? 3 : 2
min_size = 2
max_size = 10
instance_types = ["t3.medium"]
capacity_type = var.env == "prod" ? "ON_DEMAND" : "SPOT"
labels = {
role = "general"
}
}
}
}This single module creates all of the following:
- EKS control plane (managed K8s 1.30)
- Cluster’s IAM Role
- OIDC provider (IRSA’s foundation —
enable_irsa = true) - Managed Node Group (worker nodes created as EC2 instances)
- 4 standard addons (VPC CNI, CoreDNS, kube-proxy, EBS CSI Driver)
The OIDC provider covered in the Advanced #2 IRSA section is automatically activated here, and in #3 when binding a ServiceAccount to an IAM Role, that OIDC provider becomes the foundation of trust.
Managed Node Group as the standard #
EKS worker nodes can be configured in three ways.
| Type | Model |
|---|---|
| Managed Node Group | EKS manages the EC2 nodes’ lifecycle. Most standard path. |
| Self-managed Node Group | User operates EC2 group directly. For advanced customization. |
| Fargate | Serverless. No need to care about nodes themselves. Has cost / constraints. |
Managed Node Group is the standard for first adoption. You define instance type, size, and capacity_type (ON_DEMAND / SPOT) in a manifest, and EKS automatically manages node joining, removal, and upgrades. The node OS is Amazon Linux 2 or Bottlerocket, shipped with an AMI that includes kubelet, containerd, and the CNI agent pre-installed.
Two modes of cluster endpoint #
The combination of cluster_endpoint_public_access and cluster_endpoint_private_access flags controls cluster API server access.
| public | private | Meaning |
|---|---|---|
| true | false | Anyone from the internet can access (RBAC is the only security boundary) |
| true | true | Internet + VPC internal both (most common setting) |
| false | true | Only from VPC internal. Most strict. Requires bastion or VPN. |
Security guides for prod environments usually recommend the last option (private only), but for GitHub Actions to call kubectl directly, public access must be enabled. The compromise of enabling public access with an IP allowlist (cluster_endpoint_public_access_cidrs) is frequently used.
IRSA — granting IAM Role to addons #
Addons like EBS CSI Driver, AWS Load Balancer Controller, External Secrets call AWS API from inside the cluster. The pattern of granting permissions to these calls via IRSA.
module "ebs_csi_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.0"
role_name = "${var.project}-${var.env}-ebs-csi"
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
}This module automatically creates:
- IAM Role (named
myshop-prod-ebs-csi) - Policy (EBS disk create / delete / attach permissions)
- Trust policy (the trust policy from Advanced #2 IRSA — restricting only the
ebs-csi-controller-saServiceAccount in thekube-systemnamespace to take this Role)
This IAM Role’s ARN is passed to cluster_addons.aws-ebs-csi-driver.service_account_role_arn above, and EKS automatically attaches an annotation to the EBS CSI addon’s ServiceAccount. As a result, the flow where creating a PVC automatically provisions an EBS volume operates.
The same pattern applies to External Secrets in #3 and to the CloudWatch side covered in #5. A 1:1 mapping of ServiceAccount + IAM Role per workload is the standard security structure of K8s practice.
kubeconfig — accessing the cluster #
After creating the cluster with Terraform, kubeconfig must be fetched locally for kubectl to communicate with that cluster.
aws eks update-kubeconfig \
--region ap-northeast-2 \
--name myshop-prodkubectl get nodes
kubectl get pods -ANAME STATUS ROLES AGE VERSION
ip-10-10-1-145.ap-northeast-2.compute.internal Ready <none> 3m v1.30.0
ip-10-10-2-201.ap-northeast-2.compute.internal Ready <none> 3m v1.30.0
ip-10-10-2-83.ap-northeast-2.compute.internal Ready <none> 3m v1.30.0When three nodes are up and the system Pods (coredns, kube-proxy, aws-node, ebs-csi-controller) in the kube-system namespace are all in Running state, the cluster is normal.
Access permissions — the role of aws-auth ConfigMap #
The IAM user/Role that created the EKS cluster automatically receives system:masters permission. To grant access to other IAM users, either modify the aws-auth ConfigMap or use the newer model available since 1.23+, EKS Access Entries. Access Entries is the more standard approach and is recommended in Terraform modules.
access_entries = {
developers = {
principal_arn = "arn:aws:iam::123456789012:role/Developer"
policy_associations = {
view = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}A manifest binding one IAM Role with cluster-wide view permission. K8s RBAC’s view ClusterRole is automatically mapped, so users with this Role can read every object in the cluster.
eksctl — the path of fast setup #
When a quick cluster is needed for learning or PoC, eksctl gets you there in a single command.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: myshop-dev
region: ap-northeast-2
version: "1.30"
vpc:
nat:
gateway: Single
managedNodeGroups:
- name: general
instanceType: t3.medium
desiredCapacity: 2
minSize: 2
maxSize: 5
spot: true
addons:
- name: vpc-cni
- name: coredns
- name: kube-proxy
- name: aws-ebs-csi-driver
iam:
withOIDC: trueeksctl create cluster -f cluster.yamlThis single command automatically creates the VPC, EKS cluster, node groups, OIDC provider, and basic addons. It takes 15–20 minutes, and kubeconfig is automatically updated when it finishes.
A quick eksctl summary:
- Strengths — lowest learning curve, one manifest for an entire cluster
- Weaknesses — hard to manage as a bundle with surrounding resources like multi-environment / RDS / Route53. Internally uses CloudFormation, so state is separated from Terraform.
The endpoint for operational clusters is almost always Terraform, but eksctl is the fastest for first learning or one-time PoC.
Karpenter — a new path for node autoscaling #
Managed Node Group’s own autoscaling (Cluster Autoscaler) is possible, but limited to a few pre-defined instance types. In environments with large traffic swings and varied workload resource requirements, Karpenter is establishing itself as the new standard.
Karpenter is a different kind of autoscaling from the metric-based autoscaling covered in Advanced #5. It looks at Pods in Pending state, picks an instance type that matches the Pod’s resource requirements in real time, and brings up new nodes. Instead of using a pre-defined instance pool, it selects the best fit from all of AWS’s EC2 types.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["t3.medium", "t3.large", "m5.large", "m5.xlarge"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 100
disruption:
consolidationPolicy: WhenEmptyOrUnderutilizedAdopting Karpenter from the start is a heavier lift, so the natural flow is to begin with Managed Node Group and migrate to Karpenter once traffic patterns stabilize. The series will revisit this in the #6 operations checklist.
First checks after cluster setup #
Commands worth running once right after the cluster comes up.
kubectl version --short
kubectl get nodes -o widekubectl get pods -n kube-systemaws eks describe-cluster \
--name myshop-prod \
--region ap-northeast-2 \
--query "cluster.identity.oidc.issuer" \
--output textaws eks list-addons --cluster-name myshop-prod --region ap-northeast-2
aws eks describe-addon --cluster-name myshop-prod \
--addon-name vpc-cni --region ap-northeast-2These four commands verify that cluster, nodes, system Pods, OIDC, and addons are all normal. If any anomalies are found, resolving them before moving to the next post is the safer approach.
First impression of cost #
EKS cluster cost breaks down into three main components.
| Item | Cost (ap-northeast-2 basis) |
|---|---|
| EKS control plane | $0.10/hour (≈ $73/month) |
| EC2 nodes (3× t3.medium) | About $80/month (ON_DEMAND) / $25 (SPOT) |
| NAT Gateway | About $35/month + data transfer |
| EBS / Load Balancer / data transfer | Per usage |
The starting cost of the smallest prod cluster is around $200–$300/month. Using SPOT instances and a Single NAT in the dev environment brings it to less than half that. Cost is revisited in #6, but it pays to be aware of the baseline from the moment the cluster is created.
Closing #
The first post in the K8s Practice series is wrapped up. We followed the flow of declaring VPC, EKS control plane, node groups, IRSA, and standard addons as a single codebase with Terraform, and noted the path of starting fast with eksctl and broadening node autoscaling with Karpenter. At this point the cluster is empty — nodes are up and system Pods are alive, but the myshop-api we want to put up hasn’t entered as a single line of manifest yet. The next post fills that void — organizing Deployment / Service / Ingress / ConfigMap / Secret as a single bundle and connecting per-environment deployment via Helm chart.