In this chapter, we’ll learn from the inside out how an ECS Windows container instance works. We will understand the available ECS-optimized AMI for Windows and how the ECS container agent plays a vital role between the container instance and the ECS control plane. Then, we will learn about the four pillars that need to be considered when right-sizing a Windows container instance. Finally, we will use Terraform to deploy an Auto Scaling group to launch Windows container instances in an ECS cluster.
We are going to cover the following main topics:
In the Deploying a Windows container instance with Terraform section, you will need to have the following technologies installed and expertise in them:
To have access to the source code used in this chapter, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS//tree/main/ecs-ec2-windows.
Important note
It is strongly recommended that you use an AWS test account to perform the activities described in the book and never use it against your production environment.
AWS provides customers with the Amazon ECS Windows-optimized AMIs, which are preconfigured with the necessary components such as Docker Engine, ECS Agent, and Hyper-V vSwitch, to run Windows containers as tasks successfully.
There are four Amazon ECS Windows-optimized AMI variants:
The Full AMI has the Windows Desktop Experience GUI installed, and the Core AMI installation is based on Server Core (only PowerShell). The main difference between one and another is the GUI shell packages:
Not having these shell packages installed drastically reduces the amount of provisioned block storage (Amazon EBS) needed to run the Windows Server operating system, directly impacting the solution cost. Always remember, with Amazon EBS, you pay for what you provisioned. Not taking the AMI choice into consideration between one or two Amazon ECS Windows container instances isn’t a big deal; however, when running an ECS cluster that scales out/in multiple times a day with thousands of EC2 Windows, the AMI version plays a crucial role in the solution cost. In simple words, use Core AMI as much as possible.
When working with Terraform, one of the easiest ways to use the latest ECS-optimized Windows AMI is through Data Sources. Data Sources allow Terraform to query for data outside of Terraform and then output it as a value elsewhere in Terraform code. The following is an example AWS Systems Manager (SSM) API call to retrieve the latest ECS-optimized Windows AMI ID:
data "aws_ami" "ecs_optimized_ami" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"] } }
To close this section, I recommend using the ECS Windows-optimized AMI as a starting point. Then, you can use tools such as HashiCorp Packer or EC2 Image Builder to apply the necessary configurations and hardening required by your company.
As explained in the previous chapter, the ECS container agent is responsible for communicating between the Amazon ECS cluster and the Amazon EC2 instance. The ECS agent sends information about the currently running tasks and resource utilization of containers from the container instance to the Amazon ECS cluster. The ECS agent also receives the request from the Amazon ECS cluster to start and stop tasks:
Figure 4.1 – ECS agent two-way communication with an ECS cluster
ECS Agent runs as a Windows service on the Windows container instance, and it communicates with the Docker daemon through a named pipe at \.pipedocker_engine. A named pipe is a mechanism for facilitating communication between two processes using shared memory.
You can use PipeList from Windows System Internals to list the Windows opened pipes:
Figure 4.2 – Listing pipelines with Pipelist
Assuming we are launching a new Amazon EC2 Windows instance based on the ECS-optimized Windows AMI to be part of an ECS cluster, this instance will become a Windows container instance.
We need to bootstrap the instance if IaC is used in order to add it to an ECS cluster. The bootstrap usually is done by EC2 user data by passing parameters to be included in the ECS Agent configuration.
The following is the PowerShell bootstrap to join the Amazon EC2 Windows instance into the ECS cluster:
<powershell> Initialize-ECSAgent -Cluster ${aws_ecs_cluster.ecs_windows_cluster.name} -EnableTaskIAMRole -AwsvpcBlockIMDS -EnableTaskENI -LoggingDrivers '["json-file","awslogs"]' [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE",$TRUE, "Machine") </powershell>
Let’s understand the parameters:
In Terraform, we can bootstrap these EC2 Windows containers instances using a Launch template by specifying the following user data inside the aws_launch_template resource:
user_data = "${base64encode(<<EOF <powershell> Initialize-ECSAgent -Cluster ${aws_ecs_cluster.ecs_windows_cluster.name} -EnableTaskIAMRole -AwsvpcBlockIMDS -EnableTaskENI -LoggingDrivers '["json-file","awslogs"]' [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE",$TRUE, "Machine") </powershell> EOF )}"
The Windows container instance bootstrap is a crucial part of the deployment because by doing it via the console, both awsvpc mode and logging drivers aren’t enabled by default.
In this section, we dove deep into how the ECS agent works and the parameters to successfully bootstrap a Windows container instance.
I have seen many customers struggling to identify the right EC2 instance type for their Windows container workloads, usually, because they are using the same right-sizing approach that is used on Linux and forgetting to match the hardware requirements for Windows Server.
When right-sizing an ECS Windows container instance, we need to take into account four pillars:
We will explore these pillars next.
Storage is one of the most complex calculations during right-sizing as it uses different inputs, such as AMI, container image size, the number of other containers running in the same host, temporary files, paging, and dump files.
Following the official Microsoft documentation for hardware requirements for Windows Server, a minimum of 32 GB is considered for a successful installation based on a Server Core installation, which, in our case, means the Amazon ECS-optimized Windows Server 2022 Core AMI. However, if you choose an EC2 instance type with more than 16 GB of RAM, additional disk space will be required for paging and dumping files. So, let’s start building our storage right-sizing spreadsheet.
In this exercise, we will use three columns:
Important note
In the Enumerating the Windows container image sizes section of Chapter 1, Windows Container 101, we dove deep on Windows containers image sizes in greater details.
Assuming we will use Amazon ECS-optimized Windows Server 2022 Core AMI, which already caches the Server Core and Nanoserver Windows container base images as part of the AMI, we have an additional 5.28 GB on top of the 32 GB required by Microsoft:
Resource |
Disk Space (GB) |
Total Disk Space (GB) |
Core AMI |
37.28 GB |
37.28 GB |
Table 4.1 — Core AMI used disk space
In summary, following the Microsoft official hardware requirements and deploying the Windows Server with the Amazon ECS-optimized Windows Server 2022 Core AMI, we need 37.28 GB in storage capacity.
Now, let’s assume we will deploy four different Windows tasks based on the ASP.NET framework and an application layer composed of 1 GB. We will end up with three additional Windows container images with the following layer chain:
Figure 4.3 — ASP.NET application container image layer chain
Following this example, we need to increase the storage capacity to accommodate the framework/aspnet:4.8 image and the application layer:
Resource |
Disk Space (GB) |
Total Disk Space (GB) |
Core AMI |
32 GB |
32 GB |
Server Core base image |
4.99 GB |
36.99 GB |
Nanoserver base image |
296 MB |
37.28 GB |
Framework/aspnet:4.8 image |
1.11 GB |
38.39 GB |
Task 1 – Application 1 image |
1 GB |
39.39 GB |
Task 2 – Application 2 image |
1 GB |
40.39 GB |
Task 3 – Application 3 image |
1 GB |
41.39 GB |
Task 4 – Application 4 image |
1 GB |
42.39 GB |
Temporary files |
5 GB |
47.39 GB |
Table 4.2 — Disk space in use after container launch
We had to increase by 5.11 GB on top of the CORE AMI to accommodate the framework/aspnet:4.8 image and 4 GB in applications layers. Yet, we must plan for additional temporary files generated by the host and each container, so let’s assume an extra 5 GB in free space if we add new application images and/or sidecars containers.
In this exercise, we end up with an Amazon EBS volume of 47.39 GB per Windows container instance in the cluster. Remember, container image layers are immutable, and as a result, it is sharable between different container images that use the same layer, drastically reducing disk space. In our example earlier, framework/aspnet:4.8 is shared between all four ASP.NET application container images.
Important note
With Amazon EBS, you pay for what you provision, independent of usage time. So, it doesn’t matter whether your Windows container instance lived for 5 minutes or 24 hours; the EBS cost will be the same.
With this important note in mind, I have a question for you: If your application has traffic spikes that require more tasks to handle the load, would you plan to use a bigger EC2 instance or multiple EC2 instances?
Processor performance depends on factors such as clock frequency, number of processor cores, processor cache size, and processor generation.
Important note
Remember that 50% CPU utilization on a 4 vCPU EC2 instance powered by an Intel Ice Lake processor such as c6i.xlarge won’t be 50% utilization on a 4 vCPU EC2 instance powered by Intel Cascade Lake, such as c5.xlarge. Sometimes, this can easily mislead the right-size plan, resulting in a wrong EC2 instance selection.
Following the official Microsoft documentation for hardware requirements for Windows Server, there isn’t much valuable information about processor requirements. However, monitoring your Windows container instance requires consecutive days to capture Windows tasks load, Docker Engine CPU cycles, ECS agent CPU cycles, and any additional endpoint protection or tools you may have installed. A healthy EC2 instance CPU consumption is when it doesn’t fall into underutilization (CPU consumption between 20% to 50%) or overutilization (CPU consumption between 88% to 100%).
Using AWS CloudWatch Container Insights is an excellent way to monitor Windows tasks, ECS services, and container instances, giving you the necessary metrics to properly set CPU and memory utilization into the container definitions within a task definition.
Important note
This book isn’t meant to teach you Amazon EC2 instances in depth, so we’ll assume from now on that we’ll be using the latest EC2 instances generation. At the time I am writing the book, Intel Ice Lake processors are the newest generation available.
Memory is a combination of the EC2 instance OS, Windows containers, and any additional tools you might have installed in the container instance. For example, a new EC2 instance based on the Amazon ECS-optimized Windows Server 2022 Core AMI consumes a 1.2 GiB spread between the Windows Server 2022 operational system, Docker Engine, and ECS agent.
By default, a Windows container running over a Nanoserver base image consumes 40 MB, followed by Server Core with 50 MB. Then, you need to identify how much memory will be consumed by the framework and application. Typically, a “Hello World” app running on IIS and ASP.NET in a Windows container consumes 240 MB on average.
Again, it is hard to say exact GB numbers when it relates to the container application, and a monitoring tool such as Amazon CloudWatch Container Insights will be your best friend to fetch the necessary metrics so you can take a further decision.
Network plays an essential role in the right-size planning; as I have already mentioned, it isn’t typical for legacy Windows applications to be heavy on network demand, but the type of network mode selected directly affects Windows task density on a host and solution cost.
As mentioned in Chapter 3, Amazon ECS – Overview, in the Amazon ECS – task networking section, the default awsvpc mode gives you a lot of flexibility and control at the expense of lower Windows task density compared to the default (NAT) mode. For example, a c6i.xlarge EC2 instance has 4 vCPUs and 8 GiB memory, but it only supports 4 network interfaces, limiting to a maximum of 3 Windows tasks to be scheduled in the Windows container instance if awsvpc mode is used. This isn’t the best use of computing resources, nor is it cost-effective, probably leading you to have too many EC2 instances and have them underutilized to accommodate the necessary numbers of Windows tasks.
The network mode selection will ultimately influence the Windows task density on a Windows container instance, affecting the amount of CPU and memory in use. Therefore, you must balance how much control/flexibility versus density is the right choice for your solution.
Now that we have learned about the four pillars that need to be considered when right-sizing a Windows container instance, let’s get our hands dirty and deploy it using Terraform.
In Chapter 3, Amazon ECS – Overview, in the Deploying an Amazon ECS cluster with Terraform section, we covered and deployed the IAM role essentials and policies to deploy an ECS cluster and Windows container instances successfully.
In this chapter, we will first deploy the Windows container instance requirements, such as security groups and a launch template, then we’ll deploy the Windows container instance via the Auto Scaling group.
Important note
You will see code snippets for the remainder of this section. The full Terraform code for this chapter can be found at https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/ecs-ec2-windows.
The security group to be created will have the following rules:
Name |
Source |
Protocol |
Port |
ALB-SG-Ingress |
0.0.0.0/0 |
TCP |
80, 443 |
ContainerInstance-SG-Ingress |
ALB-SG-Ingress |
TCP |
32768-65535 |
Table 4.3 – Security group for external and internal traffic
Let’s first create ALB-SG-Ingress. I’m using for_each to create multiple rules entries based on the specified ports in var.alb_ingress_ports, which you can find the values for in the variable.tf file:
## Security Groups resource "aws_security_group" "alb_ingress" { name = "ALB-SG-Ingress" description = "Ingress traffic from Internet" vpc_id = data.aws_vpc.vpc_id.id dynamic "ingress" { for_each = var.alb_ingress_ports content { from_port = ingress.value to_port = ingress.value protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } } egress { from_port = "0" …
Second, let’s deploy ContainerInstance-SG-Ingress, which will allow all traffic originating on the ALB to reach the Windows container instances on dynamic TCP ports, which are used by the Windows tasks when the default (NAT) network mode is used:
resource "aws_security_group" "ecs_container_instances_ingress" { name = "ContainerInstance-SG-Ingress" description = "Ingress traffic from ALB to Container Instance - Dynamic Ports" vpc_id = data.aws_vpc.vpc_id.id ingress { from_port = 32768 to_port = 65535 protocol = "tcp" security_groups = [aws_security_group.alb_ingress.id] } egress { from_port = "0" ..
Now that we have all the requirements in place, let’s deploy a Launch template that will be used by the Auto Scaling group to deploy ECS Windows containers instances:
## Launch Template data "aws_ami" "ecs_optimized_ami" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"] } } resource "aws_launch_template" "ecs_container_instances" { name = "ECS Windows container instance" image_id = data.aws_ami.ecs_optimized_ami.id instance_type = "t3.large" vpc_security_group_ids = [aws_security_group.ecs_container_instances_ingress.id] …
As the final step to deploy an ECS Windows container instance, let’s deploy an Auto Scaling group that uses the preceding Launch template:
## Auto_Scaling_Group resource "aws_autoscaling_group" "asg_ecs_cluster" { name = "ecs-asg" desired_capacity = "1" max_size = "10" min_size = "1" vpc_zone_identifier = data.aws_subnets.private_subnets.ids force_delete = true enabled_metrics = local.asg_metrics launch_template { id = aws_launch_template.ecs_container_instances.id version = aws_launch_template.ecs_container_instances.latest_version } instance_refresh { …
At this point, we have an ECS cluster with an ECS Windows container instance and roles, policies, and auto-scaling properly set up. The deployment we just did looks like the following figure:
Figure 4.4 – ECS Windows container instances deployment
Congratulations, you have accomplished a lot! We just finished the deployment of an Amazon ECS cluster using the AWS best practices. This cluster is prepared to receive thousands of simultaneous connections, which will scale out to a maximum of 10 Windows container instances to handle the traffic.
In this chapter, we learned about the differences between ECS-optimized Windows AMIs and how to bootstrap an Amazon ECS agent; then, we delved into the four pillars to right-size a Windows container instance. Finally, we deployed an ECS Windows container instance and its requirements, such as a security group, a Launch template, and Auto Scaling group.
In Chapter 5, Deploying an EC2 Windows-Based Task, we will dive deep into ECS task definitions, gMSA support, and persistent storage. Then, we will deploy a Windows task using Terraform.