4

Deploying a Windows Container Instance

In this chapter, we’ll learn from the inside out how an ECS Windows container instance works. We will understand the available ECS-optimized AMI for Windows and how the ECS container agent plays a vital role between the container instance and the ECS control plane. Then, we will learn about the four pillars that need to be considered when right-sizing a Windows container instance. Finally, we will use Terraform to deploy an Auto Scaling group to launch Windows container instances in an ECS cluster.

We are going to cover the following main topics:

  • Amazon ECS-optimized Windows AMIs
  • Amazon ECS agent
  • Right-sizing a Windows container instance
  • Deploying a Windows container instance with Terraform

Technical requirements

In the Deploying a Windows container instance with Terraform section, you will need to have the following technologies installed and expertise in them:

  • AWS CLI
  • Terraform CLI
  • IAM user with AmazonECS_FullAccess, IAMFullAccess, and AmazonEC2FullAccess managed policies
  • Terraform development expertise

To have access to the source code used in this chapter, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS//tree/main/ecs-ec2-windows.

Important note

It is strongly recommended that you use an AWS test account to perform the activities described in the book and never use it against your production environment.

Amazon ECS-optimized Windows AMIs

AWS provides customers with the Amazon ECS Windows-optimized AMIs, which are preconfigured with the necessary components such as Docker Engine, ECS Agent, and Hyper-V vSwitch, to run Windows containers as tasks successfully.

There are four Amazon ECS Windows-optimized AMI variants:

  • Amazon ECS-optimized Windows Server 2022 Full AMI
  • Amazon ECS-optimized Windows Server 2022 Core AMI
  • Amazon ECS-optimized Windows Server 2019 Full AMI
  • Amazon ECS-optimized Windows Server 2019 Core AMI

The Full AMI has the Windows Desktop Experience GUI installed, and the Core AMI installation is based on Server Core (only PowerShell). The main difference between one and another is the GUI shell packages:

  • Microsoft-Windows-Server-Gui-Mgmt-Package
  • Microsoft-Windows-Server-Shell-Package
  • Microsoft-Windows-Server-Gui-RSAT-Package
  • Microsoft-Windows-Cortana-PAL-Desktop-Package

Not having these shell packages installed drastically reduces the amount of provisioned block storage (Amazon EBS) needed to run the Windows Server operating system, directly impacting the solution cost. Always remember, with Amazon EBS, you pay for what you provisioned. Not taking the AMI choice into consideration between one or two Amazon ECS Windows container instances isn’t a big deal; however, when running an ECS cluster that scales out/in multiple times a day with thousands of EC2 Windows, the AMI version plays a crucial role in the solution cost. In simple words, use Core AMI as much as possible.

When working with Terraform, one of the easiest ways to use the latest ECS-optimized Windows AMI is through Data Sources. Data Sources allow Terraform to query for data outside of Terraform and then output it as a value elsewhere in Terraform code. The following is an example AWS Systems Manager (SSM) API call to retrieve the latest ECS-optimized Windows AMI ID:

data "aws_ami" "ecs_optimized_ami" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"]
  }
}

To close this section, I recommend using the ECS Windows-optimized AMI as a starting point. Then, you can use tools such as HashiCorp Packer or EC2 Image Builder to apply the necessary configurations and hardening required by your company.

Amazon ECS agent

As explained in the previous chapter, the ECS container agent is responsible for communicating between the Amazon ECS cluster and the Amazon EC2 instance. The ECS agent sends information about the currently running tasks and resource utilization of containers from the container instance to the Amazon ECS cluster. The ECS agent also receives the request from the Amazon ECS cluster to start and stop tasks:

Figure 4.1 – ECS agent two-way communication with an ECS cluster

Figure 4.1 – ECS agent two-way communication with an ECS cluster

ECS Agent runs as a Windows service on the Windows container instance, and it communicates with the Docker daemon through a named pipe at \.pipedocker_engine. A named pipe is a mechanism for facilitating communication between two processes using shared memory.

You can use PipeList from Windows System Internals to list the Windows opened pipes:

Figure 4.2 – Listing pipelines with Pipelist

Figure 4.2 – Listing pipelines with Pipelist

Assuming we are launching a new Amazon EC2 Windows instance based on the ECS-optimized Windows AMI to be part of an ECS cluster, this instance will become a Windows container instance.

We need to bootstrap the instance if IaC is used in order to add it to an ECS cluster. The bootstrap usually is done by EC2 user data by passing parameters to be included in the ECS Agent configuration.

The following is the PowerShell bootstrap to join the Amazon EC2 Windows instance into the ECS cluster:

<powershell>
Initialize-ECSAgent -Cluster ${aws_ecs_cluster.ecs_windows_cluster.name} -EnableTaskIAMRole -AwsvpcBlockIMDS -EnableTaskENI -LoggingDrivers '["json-file","awslogs"]'
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE",$TRUE, "Machine")
</powershell>

Let’s understand the parameters:

  • Cluster specifies the existing ECS cluster name.
  • EnableTaskIAMRole enables tasks to assume IAM role; this will make port 80 unavailable for tasks.
  • AwsvpcBlockIMDS is an optional parameter that blocks instance metadata service (IMDS) access for the tasks running in the awsvpc mode.
  • EnableTaskENI turns on task networking and is required to use the awsvpc network mode.
  • LoggingDrivers specifies the log format and the logging driver.
  • [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE",$TRUE, "Machine") creates a system variable that enables the awslogs log driver to authenticate using the task execution IAM role. On Windows, the default value is false.

In Terraform, we can bootstrap these EC2 Windows containers instances using a Launch template by specifying the following user data inside the aws_launch_template resource:

user_data = "${base64encode(<<EOF
<powershell>
Initialize-ECSAgent -Cluster ${aws_ecs_cluster.ecs_windows_cluster.name} -EnableTaskIAMRole -AwsvpcBlockIMDS -EnableTaskENI -LoggingDrivers '["json-file","awslogs"]'
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE",$TRUE, "Machine")
</powershell>
EOF
  )}"

The Windows container instance bootstrap is a crucial part of the deployment because by doing it via the console, both awsvpc mode and logging drivers aren’t enabled by default.

In this section, we dove deep into how the ECS agent works and the parameters to successfully bootstrap a Windows container instance.

Right-sizing a Windows container instance

I have seen many customers struggling to identify the right EC2 instance type for their Windows container workloads, usually, because they are using the same right-sizing approach that is used on Linux and forgetting to match the hardware requirements for Windows Server.

When right-sizing an ECS Windows container instance, we need to take into account four pillars:

  • Storage
  • Processor
  • Memory
  • Network

We will explore these pillars next.

Storage

Storage is one of the most complex calculations during right-sizing as it uses different inputs, such as AMI, container image size, the number of other containers running in the same host, temporary files, paging, and dump files.

Following the official Microsoft documentation for hardware requirements for Windows Server, a minimum of 32 GB is considered for a successful installation based on a Server Core installation, which, in our case, means the Amazon ECS-optimized Windows Server 2022 Core AMI. However, if you choose an EC2 instance type with more than 16 GB of RAM, additional disk space will be required for paging and dumping files. So, let’s start building our storage right-sizing spreadsheet.

In this exercise, we will use three columns:

  • Resource to be measured
  • Required disk space (GB) to accommodate the resource
  • Total disk space (GB), which is the sum of the resources

Important note

In the Enumerating the Windows container image sizes section of Chapter 1, Windows Container 101, we dove deep on Windows containers image sizes in greater details.

Assuming we will use Amazon ECS-optimized Windows Server 2022 Core AMI, which already caches the Server Core and Nanoserver Windows container base images as part of the AMI, we have an additional 5.28 GB on top of the 32 GB required by Microsoft:

Resource

Disk Space (GB)

Total Disk Space (GB)

Core AMI

37.28 GB

37.28 GB

Table 4.1 — Core AMI used disk space

In summary, following the Microsoft official hardware requirements and deploying the Windows Server with the Amazon ECS-optimized Windows Server 2022 Core AMI, we need 37.28 GB in storage capacity.

Now, let’s assume we will deploy four different Windows tasks based on the ASP.NET framework and an application layer composed of 1 GB. We will end up with three additional Windows container images with the following layer chain:

Figure 4.3 — ASP.NET application container image layer chain

Figure 4.3 — ASP.NET application container image layer chain

Following this example, we need to increase the storage capacity to accommodate the framework/aspnet:4.8 image and the application layer:

Resource

Disk Space (GB)

Total Disk Space (GB)

Core AMI

32 GB

32 GB

Server Core base image

4.99 GB

36.99 GB

Nanoserver base image

296 MB

37.28 GB

Framework/aspnet:4.8 image

1.11 GB

38.39 GB

Task 1 – Application 1 image

1 GB

39.39 GB

Task 2 – Application 2 image

1 GB

40.39 GB

Task 3 – Application 3 image

1 GB

41.39 GB

Task 4 – Application 4 image

1 GB

42.39 GB

Temporary files

5 GB

47.39 GB

Table 4.2 — Disk space in use after container launch

We had to increase by 5.11 GB on top of the CORE AMI to accommodate the framework/aspnet:4.8 image and 4 GB in applications layers. Yet, we must plan for additional temporary files generated by the host and each container, so let’s assume an extra 5 GB in free space if we add new application images and/or sidecars containers.

In this exercise, we end up with an Amazon EBS volume of 47.39 GB per Windows container instance in the cluster. Remember, container image layers are immutable, and as a result, it is sharable between different container images that use the same layer, drastically reducing disk space. In our example earlier, framework/aspnet:4.8 is shared between all four ASP.NET application container images.

Important note

With Amazon EBS, you pay for what you provision, independent of usage time. So, it doesn’t matter whether your Windows container instance lived for 5 minutes or 24 hours; the EBS cost will be the same.

With this important note in mind, I have a question for you: If your application has traffic spikes that require more tasks to handle the load, would you plan to use a bigger EC2 instance or multiple EC2 instances?

Processor

Processor performance depends on factors such as clock frequency, number of processor cores, processor cache size, and processor generation.

Important note

Remember that 50% CPU utilization on a 4 vCPU EC2 instance powered by an Intel Ice Lake processor such as c6i.xlarge won’t be 50% utilization on a 4 vCPU EC2 instance powered by Intel Cascade Lake, such as c5.xlarge. Sometimes, this can easily mislead the right-size plan, resulting in a wrong EC2 instance selection.

Following the official Microsoft documentation for hardware requirements for Windows Server, there isn’t much valuable information about processor requirements. However, monitoring your Windows container instance requires consecutive days to capture Windows tasks load, Docker Engine CPU cycles, ECS agent CPU cycles, and any additional endpoint protection or tools you may have installed. A healthy EC2 instance CPU consumption is when it doesn’t fall into underutilization (CPU consumption between 20% to 50%) or overutilization (CPU consumption between 88% to 100%).

Using AWS CloudWatch Container Insights is an excellent way to monitor Windows tasks, ECS services, and container instances, giving you the necessary metrics to properly set CPU and memory utilization into the container definitions within a task definition.

Important note

This book isn’t meant to teach you Amazon EC2 instances in depth, so we’ll assume from now on that we’ll be using the latest EC2 instances generation. At the time I am writing the book, Intel Ice Lake processors are the newest generation available.

Memory

Memory is a combination of the EC2 instance OS, Windows containers, and any additional tools you might have installed in the container instance. For example, a new EC2 instance based on the Amazon ECS-optimized Windows Server 2022 Core AMI consumes a 1.2 GiB spread between the Windows Server 2022 operational system, Docker Engine, and ECS agent.

By default, a Windows container running over a Nanoserver base image consumes 40 MB, followed by Server Core with 50 MB. Then, you need to identify how much memory will be consumed by the framework and application. Typically, a “Hello World” app running on IIS and ASP.NET in a Windows container consumes 240 MB on average.

Again, it is hard to say exact GB numbers when it relates to the container application, and a monitoring tool such as Amazon CloudWatch Container Insights will be your best friend to fetch the necessary metrics so you can take a further decision.

Network

Network plays an essential role in the right-size planning; as I have already mentioned, it isn’t typical for legacy Windows applications to be heavy on network demand, but the type of network mode selected directly affects Windows task density on a host and solution cost.

As mentioned in Chapter 3, Amazon ECS – Overview, in the Amazon ECS – task networking section, the default awsvpc mode gives you a lot of flexibility and control at the expense of lower Windows task density compared to the default (NAT) mode. For example, a c6i.xlarge EC2 instance has 4 vCPUs and 8 GiB memory, but it only supports 4 network interfaces, limiting to a maximum of 3 Windows tasks to be scheduled in the Windows container instance if awsvpc mode is used. This isn’t the best use of computing resources, nor is it cost-effective, probably leading you to have too many EC2 instances and have them underutilized to accommodate the necessary numbers of Windows tasks.

The network mode selection will ultimately influence the Windows task density on a Windows container instance, affecting the amount of CPU and memory in use. Therefore, you must balance how much control/flexibility versus density is the right choice for your solution.

Now that we have learned about the four pillars that need to be considered when right-sizing a Windows container instance, let’s get our hands dirty and deploy it using Terraform.

Deploying a Windows container instance with Terraform

In Chapter 3, Amazon ECS – Overview, in the Deploying an Amazon ECS cluster with Terraform section, we covered and deployed the IAM role essentials and policies to deploy an ECS cluster and Windows container instances successfully.

In this chapter, we will first deploy the Windows container instance requirements, such as security groups and a launch template, then we’ll deploy the Windows container instance via the Auto Scaling group.

Important note

You will see code snippets for the remainder of this section. The full Terraform code for this chapter can be found at https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/ecs-ec2-windows.

Deploying security groups

The security group to be created will have the following rules:

Name

Source

Protocol

Port

ALB-SG-Ingress

0.0.0.0/0

TCP

80, 443

ContainerInstance-SG-Ingress

ALB-SG-Ingress

TCP

32768-65535

Table 4.3 – Security group for external and internal traffic

Let’s first create ALB-SG-Ingress. I’m using for_each to create multiple rules entries based on the specified ports in var.alb_ingress_ports, which you can find the values for in the variable.tf file:

## Security Groups
resource "aws_security_group" "alb_ingress" {
  name        = "ALB-SG-Ingress"
  description = "Ingress traffic from Internet"
  vpc_id      = data.aws_vpc.vpc_id.id
  dynamic "ingress" {
    for_each = var.alb_ingress_ports
    content {
      from_port   = ingress.value
      to_port     = ingress.value
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }
  egress {
    from_port        = "0"
…

Second, let’s deploy ContainerInstance-SG-Ingress, which will allow all traffic originating on the ALB to reach the Windows container instances on dynamic TCP ports, which are used by the Windows tasks when the default (NAT) network mode is used:

resource "aws_security_group" "ecs_container_instances_ingress" {
  name        = "ContainerInstance-SG-Ingress"
  description = "Ingress traffic from ALB to Container Instance - Dynamic Ports"
  vpc_id      = data.aws_vpc.vpc_id.id
  ingress {
      from_port       = 32768
      to_port         = 65535
      protocol        = "tcp"
      security_groups = [aws_security_group.alb_ingress.id]
    }
  egress {
    from_port        = "0"
..

Now that we have all the requirements in place, let’s deploy a Launch template that will be used by the Auto Scaling group to deploy ECS Windows containers instances:

## Launch Template
data "aws_ami" "ecs_optimized_ami" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2019-English-Core-ECS_Optimized-*"]
  }
}
resource "aws_launch_template" "ecs_container_instances" {
  name                   = "ECS Windows container instance"
  image_id               = data.aws_ami.ecs_optimized_ami.id
  instance_type          = "t3.large"
  vpc_security_group_ids = [aws_security_group.ecs_container_instances_ingress.id]
…

As the final step to deploy an ECS Windows container instance, let’s deploy an Auto Scaling group that uses the preceding Launch template:

## Auto_Scaling_Group
resource "aws_autoscaling_group" "asg_ecs_cluster" {
  name                = "ecs-asg"
  desired_capacity    = "1"
  max_size            = "10"
  min_size            = "1"
  vpc_zone_identifier = data.aws_subnets.private_subnets.ids
  force_delete        = true
  enabled_metrics     = local.asg_metrics
  launch_template {
    id      = aws_launch_template.ecs_container_instances.id
    version = aws_launch_template.ecs_container_instances.latest_version
  }
  instance_refresh {
…

At this point, we have an ECS cluster with an ECS Windows container instance and roles, policies, and auto-scaling properly set up. The deployment we just did looks like the following figure:

Figure 4.4 – ECS Windows container instances deployment

Figure 4.4 – ECS Windows container instances deployment

Congratulations, you have accomplished a lot! We just finished the deployment of an Amazon ECS cluster using the AWS best practices. This cluster is prepared to receive thousands of simultaneous connections, which will scale out to a maximum of 10 Windows container instances to handle the traffic.

Summary

In this chapter, we learned about the differences between ECS-optimized Windows AMIs and how to bootstrap an Amazon ECS agent; then, we delved into the four pillars to right-size a Windows container instance. Finally, we deployed an ECS Windows container instance and its requirements, such as a security group, a Launch template, and Auto Scaling group.

In Chapter 5, Deploying an EC2 Windows-Based Task, we will dive deep into ECS task definitions, gMSA support, and persistent storage. Then, we will deploy a Windows task using Terraform.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset