Skylynk — AWS cloud consulting that actually moves the needle

ECS is not wrong. It is just limited.

ECS works. For teams starting out on AWS containers, it is genuinely the right choice: less operational overhead, tight native integration with ALB and IAM, no control plane to manage, and a simpler mental model than Kubernetes. Fargate removes even the node management concern. If you have a handful of services with straightforward deployment requirements, ECS is fine and you should not migrate.

The limitations surface at scale and complexity. There is no first-class GitOps story for ECS. Deployments happen via `ecs update-service`, which is imperative — you push a command rather than declaring a desired state in version-controlled YAML. Task definition versioning accumulates quickly: after a year of active development, it is normal to have three hundred versions of a task definition with no clear understanding of what changed between them or why.

Scheduling in ECS is coarser than Kubernetes. You can specify CPU and memory, but you cannot express pod affinity, topology spread constraints, or node selectors in the same way. For workloads that need to co-locate for performance or spread for availability, ECS gives you fewer levers.

The plugin ecosystem gap is the most significant long-term constraint. Tools like Karpenter (intelligent node provisioning), Cilium (eBPF-based networking with fine-grained policy), external-secrets-operator (Kubernetes-native secret sync from Secrets Manager), and KEDA (event-driven autoscaling) simply do not have ECS equivalents. Teams that need these capabilities build bespoke tooling that they then have to maintain.

Signs you have outgrown ECS

The most reliable signal is copy-pasting. When your engineers are copying task definitions across services and manually editing the JSON or Terraform for each one, you have a maintainability problem that gets worse with each new service. ECS has no equivalent to Helm — there is no standard way to parameterize a deployment template and reuse it across multiple services with per-service overrides.

No canary deployment story is the second signal. Canary deployments in ECS require custom tooling: a Lambda that shifts weights on an ALB target group, CloudWatch alarms that trigger rollback, and integration between all of these that you built and maintain. In Kubernetes, Argo Rollouts handles this with a few lines of YAML and a CRD. The operational difference is significant when you are shipping tens of deployments per day.

CI/CD that requires manual `ecs update-service` calls is a symptom of the deeper GitOps gap. When your deployment process is a shell script in a CI pipeline that calls the AWS CLI, changes to your deployment process require changes to the script — not changes to a declarative manifest that can be reviewed, approved, and applied consistently.

The clearest signal is when engineers start asking for Kubernetes features by name. When someone asks "can we have a HorizontalPodAutoscaler for this service?" or "can we use a sidecar for this?" — these are not signs of Kubernetes hype. They are signs that the team has encountered a real problem and knows what the standard solution looks like.

What EKS actually gives you

GitOps with ArgoCD changes the deployment model fundamentally. Every deploy is a Git commit to a repository that represents the desired state of your cluster. ArgoCD continuously reconciles the cluster to match that state. Rollbacks are a `git revert`. The audit trail is the Git history. Approval workflows are pull request reviews. This is a qualitatively different operational model from `ecs update-service`.

Helm gives you parameterized, reusable deployment manifests. One chart for your service, with a values file per environment. Upgrading your base chart (updating the container port, adding a new environment variable default) propagates across all services automatically on the next deploy cycle. The alternative — propagating a change across thirty ECS task definitions — is a week of work and a merge conflict minefield.

The Kubernetes scheduler is more expressive. Topology spread constraints ensure pods are distributed across availability zones automatically. Pod affinity rules can co-locate a service with its cache. Disruption budgets guarantee a minimum number of pods are available during node drains. These are not features you implement — they are configuration you express in YAML and the scheduler enforces.

The plugin ecosystem is what the ecosystem is. Karpenter provisions nodes in response to actual pod scheduling demand, using spot instances intelligently and consolidating nodes when utilization drops. KEDA scales deployments based on queue depth, Kafka consumer lag, or any CloudWatch metric. External-secrets-operator syncs secrets from Secrets Manager into Kubernetes secrets automatically. These capabilities significantly reduce the amount of custom glue code your team needs to write and maintain.

The migration is the hard part

A naive lift-and-shift from ECS to EKS creates a worse cluster than your ECS setup. The common failure mode: take the ECS task definitions, convert them to Kubernetes Deployments one-to-one, deploy them to a cluster with the default configuration, and declare the migration complete. The result is a cluster with no proper RBAC, no pod security standards, no network policy, a single node group with no autoscaling, and a deployment process that is just as manual as before — but with more YAML.

Proper cluster design starts with managed node groups across three AZs, a separate node group for system workloads, and Karpenter for application workloads if your team can handle the operational complexity. VPC design matters: the VPC CNI allocates an IP per pod, so your subnets need to be sized for peak pod count, not just node count. Cilium as a replacement for the VPC CNI gives you better performance and network policy capabilities at the cost of more configuration.

RBAC from day one means defining what namespaces exist, what service accounts applications use, and what those service accounts can do — before the first application is deployed. Retrofitting RBAC onto a cluster where everything runs as the default service account is painful. Pod security standards (baseline or restricted) should be enforced at the namespace level using built-in admission control.

The GitOps repo structure needs to be designed before migration: how are environments represented (branches vs directories), what is the promotion workflow, how are secrets handled (sealed secrets, external secrets operator, or Vault agent injector). These decisions are hard to change after services are running.

What to do before you migrate

Before writing any Kubernetes YAML, do the inventory work. List every ECS service: container count, CPU and memory allocation, scaling behavior, dependencies (what does it call? what calls it?), statefulness (does it write to disk? does it need persistent storage?). Stateful workloads — anything writing to EFS mounts or using ECS-managed service discovery for stateful sets — need specific handling in Kubernetes that is worth planning before you start.

Decide on your namespace strategy before anything else. Namespaces as team boundaries, namespaces as environment boundaries, or namespaces as service boundaries all have different implications for RBAC, network policy, and resource quotas. There is no universally correct answer, but changing namespace strategy mid-migration is expensive.

Choose your ingress controller. AWS Load Balancer Controller (ALB-backed) integrates well with existing AWS infrastructure. NGINX ingress is more feature-rich and cloud-agnostic. Traefik is a reasonable middle ground. The choice affects your load balancer cost, configuration model, and what routing features are available.

Do not start with your most critical service. Start with a non-production workload, run it on EKS alongside ECS, learn the operational patterns, and build confidence before migrating anything that pages engineers when it breaks. The goal of the first migration is institutional knowledge, not cost savings.

What Skylynk does

Skylynk's DevOps and Platform engagement handles ECS-to-EKS migrations end to end: cluster design, networking, RBAC, GitOps repository structure, ArgoCD setup, Helm chart authoring for your services, and the migration sequence itself. We work with your team through the migration rather than handing off a cluster and a ticket queue.

For teams that are not sure whether to migrate, we start with the inventory and assessment: what do you have, what are the pain points, would EKS solve them or just introduce new problems. The devops-platform service page describes the full engagement. If your ECS setup is showing the symptoms described in section two of this post, it is worth a conversation.

KubernetesECSEKSDevOps

DevOps & Platform

Ready to fix this?

Skylynk works with engineering teams to solve exactly these problems — no generic advice, no long assessments before any value. The DevOps & Platform engagement is built around your specific situation.

See the DevOps & Platform service