Skip to content

Why is EKS so expensive?

"EKS is expensive" is usually one of three things in a trench coat: too many control planes, an oversized node fleet, or low utilisation that nobody is watching. This guide walks through each driver, why it sneaks up, and the fix.

Run the EKS cost calculator →


The three drivers

Driver 1 — Control planes you forgot you own

Each EKS cluster's control plane costs $0.10/hour (~$73/month, ~$876/year). That number sounds small until you count clusters:

Pattern Clusters Monthly cost Annual cost
One per environment 3 (dev/staging/prod) $219 $2,628
One per team 8 $584 $7,008
One per service (microservice-cluster anti-pattern) 20+ $1,460+ $17,500+

This is the easiest tax in EKS to pay accidentally — terraform apply creates a control plane in minutes, deleting a cluster takes a coordination meeting.

Fix: consolidate via namespaces + RBAC instead of separate clusters. One cluster per environment tier (dev / staging / prod) is a sensible bar. Teams or services get a namespace + RoleBinding + NetworkPolicy, not their own control plane.


Driver 2 — The node fleet is the dollar number

The control plane is the visible line item; the node fleet is the big one. Twelve m5.large workers running 24/7 cost roughly $0.096/hr × 730 hr × 12 = $841/month — over 11× the control plane.

The two failure modes:

  • Oversized pod requests. Pods asking for 2 vCPU when they need 0.5 vCPU pin the cluster autoscaler to a node count that's 4× what the actual workload needs.
  • Static node groups. A managed node group with min: 6, max: 12 will never go below 6 even at midnight on a Sunday.

Fix:

  1. Right-size pod requests. Vertical Pod Autoscaler recommends realistic requests. Apply its recommendations and the autoscaler can pack more pods per node.
  2. Use Karpenter instead of static node groups. Karpenter provisions just-in-time nodes picked from the cheapest instance type that fits the pending pods. It also consolidates underutilised nodes by replacing them with a single smaller one.
  3. Mix Spot capacity into the fleet. Stateless workloads run on Spot; the static minimum runs on a Compute Savings Plan. Most clusters can run 60–80% on Spot once you've tagged which workloads can take an interruption.

Driver 3 — Low utilisation that nobody is looking at

A typical untuned EKS cluster runs at 20–30% average CPU utilisation. At a 50% target (a generous bar by Kubernetes-best-practice standards), that's a 40% waste slice on the entire node fleet.

For our 12 × m5.large example: 40% of $841 = $336/month of waste hiding inside an otherwise normal-looking cluster.

Fix: drive pod requests realistically (see VPA above), enable bin-packing scheduling (spec.binPackingPolicy on Karpenter; --score-strategy on the scheduler), and set CPU/memory utilisation alarms with a real threshold — anything sustained below 30% for a week is a right-sizing ticket.


Worked example

A typical Series-A SaaS company:

  • 3 EKS clusters (dev / staging / prod)
  • 12 m5.large workers, total across all clusters
  • 30% average CPU utilisation against a 50% target
3 × $0.10/hr × 730 hr   = $219/mo (control planes)
12 × $0.096/hr × 730 hr = $841/mo (node fleet)
                          ────────
Total                     $1,060/mo (~$12,720/yr)

Low-util waste = (1 − 30/50) × $841 = ~$336/mo (~$4,000/yr)

The fix sequence — VPA recommendations → Karpenter rollout → consolidate the dev / staging clusters into a single one — typically reclaims the entire low-utilisation slice within a quarter.


When to not consolidate clusters

Cluster-per-environment is the right default; cluster-per-team is usually wrong. But there are legitimate reasons to keep separate clusters:

  • Regulatory / compliance isolation (PCI workloads, HIPAA).
  • Distinct Kubernetes version trains (a team running a canary version vs. the rest of the fleet on stable).
  • Geographic isolation (a cluster per region for data-residency reasons).

These are real. "We didn't get around to consolidating" is not.


Where to go next

  • Run the EKS calculator with your real cluster count + nodes + utilisation to get the numbers for your footprint.
  • Cross-link: NAT Gateway traffic inside EKS (especially ECR image pulls) is a big chunk of the data-transfer line — see Reduce AWS NAT Gateway cost.
  • Run the Cloud Waste Radar to see EKS in context with the rest of your AWS footprint.

Want the real number for your clusters?

Book a free 30-minute audit and we'll come back with the exact monthly cost per cluster, the low-utilisation slice on each, and the Karpenter-rollout savings forecast.

Book a free audit →