Lädt...

🔧 Understanding the True Cost of Ownership: ECS vs. EKS


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

While there are plenty of articles already on the Total Cost of Ownership (TCO) between a fully-managed service like ECS vs. one that shares the responsibility more with its users like EKS, the discussion is almost always very high-level, geared towards C-level executives. There's certainly value in having those discussions, but problem I see over and over again, is more at the ground-level between developers and DevOps teams struggling to internalize what it really means for them on a day-to-day basis.

I recently went through this exercise that highlights some of these key points so wanted to walk through how TCO actually plays out in practice in terms of concrete workstreams for both dev and infra teams.

Background

To lay out some context, there is a homegrown, legacy ETL system that has been running on ECS for years. This system was developed when there were no embedded DevOps engineers on the team, meaning that some developers on the team wrote some bespoke Terraform code and decided to use ECS as it required lower DevOps overhead upfront.

While the system is fairly simple (e.g., moves files from S3 to a data lake, does some simple transformations), it become a critical component of the entire data pipeline that it became one of those "don't break what works" systems that was always on the backlog for migrations but never had enough momentum to carry it through.

During this time, the DevOps team grew in size and EKS became the norm at the company for container orchestration. All of the new workloads were deployed onto EKS, and all the internal tooling to help manage not just the cluster itself but adding some controls onto the applications as well were geared towards supporting Kubernetes workloads (e.g., network policies, security, etc).

At every quarterly planning event, the question of "why aren't we using a single container orchestration system?" would be brought up. Every now and then, the DevOps team would do an initial analysis on how ECS is actually costing more in terms of operational and management costs as backporting new EKS features to ECS was expensive in terms of time and internal resources. This would in turn trigger the dev teams to do their due diligence in estimating how much effort it would take to migrate, but because things are "still working", it would always fall behind in priority and the issue would become stale and forgotten until the next time TCO discussion would bubble up again.

Problems Bubbling Up

Cracks started showing when there were finally new feature requests to add to the legacy ETL system. From the dev side, this was a well-scoped problem. For example, instead of storing data in CSV, this system would now convert the format into Parquet for other systems to efficiently ingest. After the feature was developed, the dev team worked with infra teams to run some preliminary scaling analysis and pushed to prod with no problem.

Or so they thought.

After a few weeks, the team was getting paged for two reasons. First, sometimes the pods would eat up too many resources on the node and not let other pods including observability agents from being scheduled. Secondly, the finance team was noticing a huge uptick in network costs as soon as this feature was released.

Both the dev team and the infra teams were confused. Afterall, they had done some scalability testing and nothing they were doing was ground-breaking (meaning these exact problems were already solved on the EKS side). But what they found was that even though best-practices like anti-affinity rules, container limits, and using S3 Private Endpoints were thought to be in place, due to bespoke Terraform code and subtle differences in ECS and EKS, it was in fact not working as intended (e.g., S3 Private Endpoint was only on for VPCs hosting EKS and not ECS).

Takeaways

This "incident" finally illustrated to the dev teams what the hidden operational and maintenances costs are and how it can manifest in practice. Even though ECS is easier to manage and requires very little input from developers, there is a hidden cost to maintaining two difference infrastructure systems across teams. So the argument of "ECS is so easy to use and it's working" is true, it does not diminish the fact that it is masking a TCO problem that can bubble up in the future.

Most of the TCO discussion is often focused on how running EKS adds on more operational burden, but this can be a nuanced discussion as this case study shows. If the rest of the team is running on EKS and has more expertise, maintaining a more "fully-managed" solution can bring on more challenges as well.

...

🔧 Understanding the True Cost of Ownership: ECS vs. EKS


📈 67.44 Punkte
🔧 Programmierung

🔧 ECS vs. ECS Anywhere & EKS vs. EKS Anywhere: Making the Right Choice for Your Workloads


📈 55.99 Punkte
🔧 Programmierung

🔧 Mastering AWS Container Cost Optimization with EKS and ECS: Essential Tips for Developers


📈 36.38 Punkte
🔧 Programmierung

🔧 THE USE TOTAL COST OF OWNERSHIP(TCO) AND PRICING CALCULATOR FOR COST MANAGMENT


📈 32.03 Punkte
🔧 Programmierung

🔧 Practical Use of Total Cost Ownership and Pricing Calculator for Cost Management


📈 32.03 Punkte
🔧 Programmierung

🔧 Practical Use of Total Cost of Ownership (TCO) and Pricing Calculator for Cost Management.


📈 32.03 Punkte
🔧 Programmierung

🔧 ECS FinHacks: Scaling Microservices with AWS ECS Fargate and RDS


📈 29.22 Punkte
🔧 Programmierung

🔧 Macroquad Rapier ECS: Using Bevy ECS in Macroquad Game


📈 29.22 Punkte
🔧 Programmierung

📰 Amazon ECS Anywhere enables customers to run Amazon ECS on any infrastructure


📈 29.22 Punkte
📰 IT Security Nachrichten

🔧 Grok 3: AI Thông Minh Nhất Thế Giới


📈 28.58 Punkte
🔧 Programmierung

🕵️ Kèo Thẻ Phạt Vip66 Là Gì? 3 Lối Đánh Kèo Chậm Mà Chắc


📈 28.58 Punkte
🕵️ Reverse Engineering

🔧 KISS Principle: Giữ Mọi Thứ Đơn Giản Nhất Có Thể


📈 28.58 Punkte
🔧 Programmierung

🔧 Có thể bạn chưa biết (Phần 1)


📈 28.58 Punkte
🔧 Programmierung

🔧 Tìm Hiểu Về RAG: Công Nghệ Đột Phá Đang "Làm Mưa Làm Gió" Trong Thế Giới Chatbot


📈 28.58 Punkte
🔧 Programmierung

🔧 Containers on AWS: Comparing ECS and EKS


📈 27.99 Punkte
🔧 Programmierung

🔧 AWS Container Services: ECS, EKS, Fargate, ECR


📈 27.99 Punkte
🔧 Programmierung

🔧 Container Services on AWS: ECS, EKS, and Fargate


📈 27.99 Punkte
🔧 Programmierung

🔧 Amazon ECS vs. EKS: ¿Fargate o Kubernetes para contenedores?


📈 27.99 Punkte
🔧 Programmierung

🔧 Securing Containers with Amazon ECS/EKS


📈 27.99 Punkte
🔧 Programmierung

🔧 Taming the Microservices Beast: Container Orchestration with Amazon ECS and EKS


📈 27.99 Punkte
🔧 Programmierung

🔧 From Zero to EKS and Hybrid-Nodes — Part 2: The EKS and Hybrid Nodes configuration.


📈 26.77 Punkte
🔧 Programmierung

🔧 A Production Ready EKS Deployment with IaC & GitOps - Part 5 - Deploying EKS Cluster


📈 26.77 Punkte
🔧 Programmierung

🔧 EKS'pert Automation: Amazon EKS Auto Mode and Karpenter in action


📈 26.77 Punkte
🔧 Programmierung

🔧 Navigating AWS EKS with Terraform: Implementing Cluster Auoscaler in your EKS Cluster


📈 26.77 Punkte
🔧 Programmierung

🔧 Leverage On-Premises Infrastructure in Amazon EKS Clusters with Amazon EKS Hybrid Nodes


📈 26.77 Punkte
🔧 Programmierung

🔧 Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.31


📈 26.77 Punkte
🔧 Programmierung

🔧 Simplifying Access Control in EKS: A Guide to AWS EKS Cluster Access Management


📈 26.77 Punkte
🔧 Programmierung

🔧 Bootstrap Complete Amazon EKS Clusters with EKS Blueprints for Terraform


📈 26.77 Punkte
🔧 Programmierung

🔧 Bootstrap Complete Amazon EKS Clusters with EKS Blueprints for Terraform


📈 26.77 Punkte
🔧 Programmierung

🔧 New-Amazon EKS and Amazon EKS Distro now support Kubernetes version 1.26


📈 26.77 Punkte
🔧 Programmierung

📰 Understanding the True Cost of a Data Breach in 2023


📈 24.19 Punkte
📰 IT Security Nachrichten

matomo