Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate

๐Ÿ  Team IT Security News ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security

๐Ÿ“š Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate

๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle:

Working on a new project recently, I delved into deploying ECS Fargate containers in private subnets. The goal in this case was to have ECS Fargate containers deployed in private subnets, which allowed ingress only through an Application Load Balancer. We chose this configuration primarily for security and firewall configuration reasons. Cost optimization was also an important consideration for this architecture.

The containers also needed egress access to other (non AWS) services, and this is allowed through a NAT Gateway.


Note: Some parts of the architecture(like the database) are omitted from this post, in order to focus on the necessary components.

With this configuration alone, the images would be fetched from ECR(and S3) using the NAT Gateway, which presents the following challenges:

  1. Cost Implications of NAT Gateway Usage: The NAT gateway accrues costs based on a per GB data processing fee, in addition to an hourly charge. For instance, in the us-east-1 region at the time of writing, it's $0.045 per GB. At first glance, this might seem negligible. But consider this: if your container images are around 400MB, deploying just three containers exceeds 1GB. This can quickly add up, leading to unexpectedly high charges. Instances of such unexpected expenses have been reported (source: Excellent article). Furthermore, repeated deployments due to failures can exacerbate this cost, as the image is pulled multiple times.

  2. Security Concerns with Data Transit: While this post focuses primarily on cost, it's worth noting that routing traffic over the public internet can pose security risks. For a deeper dive into this aspect, refer to AWS's documentation on VPC Endpoints and ECR.

The Networking Behind Docker Image Retrieval in Private Subnets

ECS interacts with three AWS services behind the scenes when pulling Docker images:

  • ECR DKR: Utilized for Docker Registry APIs. Docker client commands like push and pull engage with this endpoint.
  • ECR API: This endpoint handles calls to the Amazon ECR API, facilitating actions like DescribeImages and CreateRepository.
  • S3: ECR stores the actual layers of Docker images in AWS-managed S3 buckets, typically named arn:aws:s3:::prod-<region>-starport-layer-bucket.

ECS also needs to have access to other services, like ECS telemetry and CloudWatch, but they are not directly linked to the docker image pull.

Understanding and Mitigating NAT Gateway Traffic

In this section, we'll explore different strategies to minimise NAT gateway traffic and, consequently, its associated costs.

The experiment is to deploy one container instance with every scenario. With each step, we add the VPC endpoint(s) mentioned in the scenario to evaluate the difference.

The infrastructure is created using terraform, and can be found in this git repository. The project uses community maintained AWS Terraform modules, which simplify this process. The code examples that follow in the post are using the vpc-endpoints module to create the Gateway and interface endpoints.

In addition, I created a custom dashboard on CloudWatch that has a widget showing the sum of BytesOutToSource(The number of bytes sent through the NAT gateway to the clients in your VPC.) and BytesOutToDestination(The number of bytes sent out through the NAT gateway to the destination.) as an indication of the data processed by the NAT Gateway.

The docker image being used in this scenario is a very simple NodeJS image with a size of ~403MB.

That's enough about the setup, let's dive into the scenarios and results.

1. Only NAT Gateway, no VPC endpoints

As we see in the Total Bytes Out below, all the data(~414MB) for pulling the docker image flows through the NAT Gateway.

Data processed, only NAT Gateway

2. NAT Gateway + S3 Gateway endpoint

Now let's add an S3 Gateway endpoint to the VPC. Gateway endpoints have no cost associated with them. These are offered for S3 and DynamoDB by AWS.

In this case, adding the s3 endpoint using the vpc-endpoints module:

 s3 = {
      service             = "s3"
      private_dns_enabled = true
      service_type        = "Gateway"
      tags                = { Name = "S3 Gateway Endpoint" }
      policy              = data.aws_iam_policy_document.s3_endpoint_policy.json
      route_table_ids     = module.vpc.private_route_table_ids

And corresponding endpoint policy

data "aws_iam_policy_document" "s3_endpoint_policy" {
  statement {
    effect    = "Allow"
    actions   = ["s3:GetObject"]
    resources = ["arn:aws:s3:::prod-${local.region}-starport-layer-bucket/*"] # to access the layer files

    principals {
      type        = "*"
      identifiers = ["*"]

Important to note here is that S3 Gateway endpoints should be created in the same region as the S3 bucket.

Data Processed, NAT Gateway + S3 endpoint

As we see here, the data processed by the NAT Gateway drops drastically(to ~245KB), confirming our image layers are now largely being transferred through the S3 gateway endpoint.

Note: If your containers have existing connections to Amazon S3, their connections might be briefly interrupted when you add the Amazon S3 gateway endpoint. Source

3. NAT Gateway + S3 Gateway endpoint + ECR DKR interface endpoints

In the next step, we add an ECR DKR interface endpoint.

ecr_dkr = {
      service             = "ecr.dkr"
      private_dns_enabled = true
      tags                = { Name = "ECR DKR Interface Endpoint" }
      subnet_ids          = [module.vpc.private_subnets[0]] # Interface endpoints are priced per AZ
      policy              = data.aws_iam_policy_document.generic_endpoint_policy.json

See the demo project for details on the endpoint policy.

Note that interface endpoints also have an hourly and data processing fees, but these tend to be lower than NAT gateway charges. Depending on the amount of data processed by the NAT gateway for a particular service, it might make sense to include these for cost optimization reasons.

Data Processed, NAT Gateway + S3 Gateway endpoint + ECR DKR interface endpoint

In this instance the traffic for a single deployment dropped further to ~33KB.

4. NAT Gateway + S3 Gateway endpoint + ECR DKR and API interface endpoints

Adding the ECR API endpoint:

ecr_api = {
      service             = "ecr.api"
      private_dns_enabled = true
      tags                = { Name = "ECR API Interface Endpoint" }
      subnet_ids          = [module.vpc.private_subnets[0]] # Interface endpoints are priced per AZ
      policy              = data.aws_iam_policy_document.generic_endpoint_policy.json

Data processed, NAT Gateway + S3 Gateway endpoint + ECR DKR and API interface endpoint

Comparing the scenarios

The results needed to be plotted on a logarithmic scale for visibility. As we see below, the S3 Gateway endpoint has the biggest impact on the data processed by the NAT gateway.

NAT Gateway bytes processed comparison graph

The cost impact

Considering a scenario similar to the original article, how much impact could the Gateway S3 endpoint have made?

The article mentions that their NAT Gateway processed 16TB of data, with a 500MB docker image. This is approximately 32,000 deployments. This was also because of a failing health check, which can happen in real world scenarios.

Let's simulate the same scenario with our docker image, which is 403MB.

Without the S3 Endpoint, the NAT Gateway processes ~414MB.
With an S3 Gateway endpoint, the NAT Gateway processes ~0.245MB.

If there were 32,000 deployments with the image in our example:

1. Without the S3 Gateway endpoint
Data processed: 414MB*32,000 = 13,248,000MB = 13,248GB
Cost($0.045/GB) = $596.16

2. With the S3 Gateway endpoint
Data processed: 0.245MB*32,000 = 7,840MB = 7.84GB
Costs($0.045/GB) = $0.3528

This could of course be mitigated further with VPC interface endpoints, but since they come with their own costs, it would be worth analysing based on requirements for a specific setup.

Wrapping up

Looking at the data processed by the NAT gateway in different scenarios, I think it's fair to say:

  • Definitely consider creating an S3 gateway endpoint, since these are available at no additional cost and drastically reduce the data processed by the NAT Gateway for this and other scenarios.
  • Depending on the number of deployments and security aspects of your architecture, consider using VPC interface endpoints.

If there are questions or feedback, please feel free to reach out!


๐Ÿ“Œ Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate

๐Ÿ“ˆ 155.79 Punkte

๐Ÿ“Œ Knock, knock. Who's there? NAT. Nat who? A NAT URL-borne killer

๐Ÿ“ˆ 52.4 Punkte

๐Ÿ“Œ Deploying a Containerized App to ECS Fargate Using a Private ECR Repo & Terragrunt

๐Ÿ“ˆ 50.83 Punkte

๐Ÿ“Œ AWS ECS - Spot Instance Draining vs Fargate Spot

๐Ÿ“ˆ 50.83 Punkte

๐Ÿ“Œ Deploy NodeJS REST API on ECS Fargate using AWS CodePipeline

๐Ÿ“ˆ 50.83 Punkte

๐Ÿ“Œ Acessando Containers do Amazon ECS Fargate pelo AWS Cli

๐Ÿ“ˆ 50.83 Punkte

๐Ÿ“Œ AWS Container Services: ECS, EKS, Fargate, ECR

๐Ÿ“ˆ 50.83 Punkte

๐Ÿ“Œ Threat Stack Cloud Security Platform extends security observability to AWS Fargate tasks

๐Ÿ“ˆ 45.35 Punkte

๐Ÿ“Œ 6-part series - (2) Create Task Definition in ECS and Application Load balancer for the Task to be run on Fargate Cluster

๐Ÿ“ˆ 42.85 Punkte

๐Ÿ“Œ Un cรณctel perfecto ๐Ÿน ECS Fargate, Service Connect,Terraform y Github Actions.

๐Ÿ“ˆ 42.85 Punkte

๐Ÿ“Œ Using Amazon EventBridge Pipes to build an AWS CDK stack to launch ECS Tasks

๐Ÿ“ˆ 40.1 Punkte

๐Ÿ“Œ AWS Fargate und EKS: AWS bringt Container ohne alles und mit Kubernetes

๐Ÿ“ˆ 40 Punkte

๐Ÿ“Œ Use AWS Graviton processors on AWS Fargate with Copilot

๐Ÿ“ˆ 40 Punkte

๐Ÿ“Œ How to achieve cost savings with cloud management optimisation

๐Ÿ“ˆ 38.72 Punkte

๐Ÿ“Œ Amazon ECS Anywhere enables customers to run Amazon ECS on any infrastructure

๐Ÿ“ˆ 37.61 Punkte

๐Ÿ“Œ Optimizing AWS ECS for Cost and Performance: A Comprehensive Guide

๐Ÿ“ˆ 37.28 Punkte

๐Ÿ“Œ NAT-Again: IRC NAT helper flaws

๐Ÿ“ˆ 34.93 Punkte

๐Ÿ“Œ AWS ECS vs. AWS Lambda: Top 5 Main Differences

๐Ÿ“ˆ 34.76 Punkte

๐Ÿ“Œ How to Optimize Performance and Cost for Prometheus & Grafana Pods on EKS Fargate

๐Ÿ“ˆ 34.55 Punkte

๐Ÿ“Œ AWS Networking - AWS VPC, Subnets, Security Groups, NAT Gateway & IP Addresses

๐Ÿ“ˆ 33.42 Punkte

๐Ÿ“Œ Active Workload Protection on Amazon EKS and AWS Fargate

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ EKS cluster Monitoring for AWS Fargate with Prometheus and managed Grafana

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ Understanding the desiredCount and autoscaling behaviour of AWS Fargate - Fuckups ๐Ÿคฆโ€โ™‚๏ธ and learnings ๐Ÿค“

๐Ÿ“ˆ 32.03 Punkte


๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ Sysdig announces automated inline image scanning for AWS Fargate containers

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ Sysdig adds detailed audit logs for runtime detection and response for AWS Fargate

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ AWS Fargate: Deploying Jakarta EE Applications on Serverless Infrastructures

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ Easing the Burden of Container Management with AWS Fargate

๐Ÿ“ˆ 32.03 Punkte

๐Ÿ“Œ Heap Layout Optimisation for Exploitation

๐Ÿ“ˆ 28.21 Punkte

๐Ÿ“Œ Optimisation: Whatโ€™s stopping you?

๐Ÿ“ˆ 28.21 Punkte

๐Ÿ“Œ Gradient Descent: Optimisation and Initialisation Explained

๐Ÿ“ˆ 28.21 Punkte

๐Ÿ“Œ Optimisation Algorithms: Neural Networks 101

๐Ÿ“ˆ 28.21 Punkte