3

Amazon ECS – Overview

In this chapter, we’ll learn about Amazon Elastic Container Service (ECS) and its relevant Windows components, such as container instances, services, tasks, and task definitions, and then deep-dive into task networking. Finally, we’ll deploy an empty Amazon ECS cluster using Terraform, which will be the first step in the building block exercise of configuring Amazon ECS entirely with Windows container instances and tasks.

We are going to cover the following main topics:

  • Amazon ECS – fundamentals
  • Amazon ECS – task networking
  • Deploying an Amazon ECS cluster with Terraform

This chapter will give us the fundamentals for Chapters 4, 5, and 6, which will deep-dive into the Windows specifics and deployments.

Technical requirements

In the Deploying an Amazon ECS Cluster with Terraform section, you will need to have the following expertise and technologies installed:

  • The AWS CLI
  • The Terraform CLI
  • An IAM user account with AmazonECS_FullAccess and IAMFullAccess managed policies attached
  • Terraform development expertise

To get access to the source code used in this chapter, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS//tree/main/ecs-ec2-windows.

Important note

It is strongly recommended to use an AWS test account to perform the activities described in this book and never use it against your production environment.

Amazon ECS – fundamentals

Amazon ECS is a managed container orchestrator created by AWS that allows customers to run containers as tasks. A task is defined in a task definition, a kind of configuration blueprint in which you specify container configurations, such as the amount of vCPU, memory, network ports, and more.

Amazon ECS comprises the following components:

  • Clusters
  • Container instances
  • Task definitions
  • Tasks
  • Services

Clusters are the logical grouping of tasks or services. Amazon ECS clusters are free of charge, and you only pay for the underlying infrastructure, such as Amazon EC2 Windows instances, Amazon EBS, Amazon CloudWatch, and so on. Figure 3.1 illustrates an empty Amazon ECS cluster and an existing VPC. When you deploy an empty Amazon ECS cluster, no resources are created inside or outside the VPC.

The container instance is the Amazon EC2 instance name that works as an ECS cluster member. The ECS agent installed acts as a man in the middle, being responsible for the communication exchange between the container instance and the ECS control plane. The ECS agent also receives the request from the Amazon ECS cluster to start and stop tasks. It is a best practice to deploy container instances in the private subnets within a VPC:

Figure 3.1 – Amazon ECS cluster with two container instances

Figure 3.1 – Amazon ECS cluster with two container instances

The task definition is a text file JSON format file that works as a blueprint for your application. First, you select the launch type compatibility (Fargate, ECS, or External, which is ECS Anywhere). Next, you set what operating system family (Windows or Linux) the task definition belongs to. Finally, you need to set the task size, which is the amount of vCPU and memory (GB) the containers inside the task can consume from the container instance.

The next step within a task definition is called the container definition, which you set to define how the container will behave, such as the container image name, ports to be exposed, vCPU and memory limits (MiB) to be consumed, health checks, storage, and logging:

Figure 3.2 – A task definition created, which the ECS cluster uses to launch a container

Figure 3.2 – A task definition created, which the ECS cluster uses to launch a container

A task definition is immutable, and it can’t be edited once created. Therefore, if any parameter needs to be changed, a new revision must be created.

Important note

Configuring the task size is not supported for Windows containers; however, it must be specified. The amount of vCPU and memory (MiB) a Windows container can consume from the operating system is set in the container definition.

A task is the instantiation of a task definition in a container shape. A task can be standalone or as part of a service.

A standalone task usually runs as a short-lived container that starts, does some processing, and shuts down – for instance, a job application. As part of a service, tasks usually run as long-running containers – for instance, an ASP.NET web application:

Figure 3.3 – Task definition instantiation through a task

Figure 3.3 – Task definition instantiation through a task

Services run and maintain the desired number of tasks in an Amazon ECS cluster all the time. They do it using a service component called the Amazon ECS service scheduler, which is responsible for launching new tasks or replacing existing ones. There are two types of service schedule strategies available:

  • DAEMON is a good strategy if the Amazon ECS cluster is fully dedicated to a specific application or the daemon tasks need to be prioritized, being the first tasks to launch and the last to stop
  • REPLICA, by default, will evenly spread tasks across Availability Zones by determining which container instance in the Amazon ECS cluster supports the task definition parameters, such as CPU, memory, ports, and container instance attributes

The following figure is a service with a replica schedule illustration. Two tasks were deployed, each with two Windows containers spread across different Availability Zones:

Figure 3.4 – Two tasks scheduled by replica mode through services

Figure 3.4 – Two tasks scheduled by replica mode through services

In this section, we learned about the Amazon ECS core components, including how the ECS control plane communicates with the container instances through the ECS agent, then how ECS uses task definitions to describe a Windows container and its resources. Finally, we learned how ECS uses tasks and services to deploy Windows containers into container instances.

Amazon ECS – task networking

For Windows, task networking is limited to two modes, default and awsvpc.

The default uses Docker’s built-in virtual network, a Network Address Translation (NAT) mode on Windows. In the default mode, Docker Engine is responsible for creating and managing the host network on Windows, which is built on top of a Hyper-V virtual switch (vSwitch). That doesn’t mean the Hyper-V hypervisor role is installed; instead, it only uses networking capabilities. Each Windows container is connected to the Hyper-V vSwitch using a virtual network interface card (vNIC):

Figure 3.5 – The Docker network and Windows adapters

Figure 3.5 – The Docker network and Windows adapters

A simple north-south workflow traffic would be as follows:

  1. Multiple Windows containers run within a standalone task with dynamic port enabled.
  2. The data package is sent to the vNIC attached to the Windows container.
  3. The data package is sent to the vSwitch, and Windows Network Address Translation (WinNAT) port forwarding is created.
  4. The data package is routed to the VPC through the Windows container instance's Elastic Network Interface (ENI).

This is depicted in the following figure:

Figure 3.6 – North-south traffic from the Windows task to the VPC

Figure 3.6 – North-south traffic from the Windows task to the VPC

The default mode is straightforward to use. The benefit is that as it shares the single Windows container instance's ENI, the number of running tasks in a single Windows container instance using dynamic ports equals the number of ephemeral ports available between 49153-65535 and 32768-61000. Also, it launches faster because it doesn’t need to call AWS APIs to create and attach an ENI.

Important note

To enable a task to use dynamic ports, set Host port = 0 as part of the container definition.

On the other hand, there are two drawbacks:

  • All tasks share the same host network namespace
  • Inbound/outbound traffic is controlled at the container instance ENI level, meaning all the containers running on the Windows container instance share the same network policies

A good use case for the default mode is for short-lived containers that need to start up fast, process some data, and shut down.

Meanwhile, awsvpc tasks allocate their own ENI and a private IPv4 address from within the VPC that the Windows container is deployed to. Under the hood, a secondary ENI is attached to the Windows container instance for each task in its network namespace:

Figure 3.7 – One vNIC and ENI per task in awsvpc mode

Figure 3.7 – One vNIC and ENI per task in awsvpc mode

When awsvpc mode is set, the ECS agent creates an additional pause container within the task before starting the Windows container in order to set up a new network namespace, and attach an ENI and Hyper-V vNIC by running the amazon-ecs-cni plugin. Once completed, the ECS agent starts the Windows container within the task and plugs the Hyper-V vNIC into it:

Figure 3.8 – Task networking in awsvpc mode

Figure 3.8 – Task networking in awsvpc mode

The awsvpc mode offers better network performance because each task has its own ENI; thereby, there is no concurrent traffic in a single ENI as in the default mode. However, two drawbacks need to be taken into account:

  • The number of tasks per container instance is limited by the number of secondary interfaces the EC2 instance type supports, drastically reducing the task density compared with the default mode
  • Tasks may take longer to launch and terminate because the ECS control plane must handle ENI creation, attachment, detachment, and termination

In this section, we dove deep into the internals of how Amazon ECS handles task networking on Windows containers. First, we learned how ECS uses the Docker default network through NAT (default) mode; then, we learned how awsvpc mode enables Windows tasks to have a dedicated EC2 ENI by using the amazon-ecs-cni plugin.

Deploying an Amazon ECS cluster with Terraform

This is the first deployment topic we will go over, and it is essential to understand how this will work. I believe that filling up pages with code doesn’t make much sense; instead, the complete code is available on GitHub per chapter, and in the book, we will use code snippets to illustrate each step.

Terraform offers a lot of string functions and expressions, which can be very complex to understand first-hand. Therefore, I will try to make the code as simple as possible so that you can understand the code easily if you are a Terraform beginner or an advanced developer.

Decoupling your Terraform code into modules is a typical pattern used in Terraform. For example, customers usually create reusable standalone modules to deploy security groups, ELB, EC2, and so on and then merge them into the root module. To keep the code simple and avoid many inter-module dependencies, I have described the entire Amazon ECS and its components, such as IAM roles, instance profiles, ELB, EC2 instances, and so on, into one main.tf file, so you can easily find the code snippet and learn from it.

Important note

This book is not meant to teach you Terraform. Work experience with Terraform is a requirement to go over the deployment topics in the book. The code in the book is given as snippets from main.tf, which is available in the GitHub repository.

Why Terraform?

Terraform is an open source Infrastructure-as-Code (IaC) tool created by HashiCorp; customers of all sizes and segments use Terraform to standardize their AWS environments. It uses a native syntax called HCL, which is easy to read and learn. I mainly chose to write this book and use Terraform for the deployment topics because almost all customers I’ve worked with used Terraform as their IaC tool.

HashiCorp officially publishes an AWS provider in the Terraform Registry, which can be accessed at the following URL: https://registry.terraform.io/namespaces/hashicorp. AWS employees and the community work together to make sure the provider keeps up with the pace of innovation and the service launches AWS does throughout the year. On the provider web page, you will find everything you need to deploy AWS services using Terraform. It is very well curated, documented, and rich in examples. In this first deployment build block, we will deploy the following:

  • IAM roles and instance profiles
  • An Amazon ECS cluster

IAM roles and instance profiles

Before deploying the cluster, we want to ensure that the necessary IAM roles and instance profiles are in place so the cluster components can successfully work.

We will start by creating ecsTaskExecutionRole. This role will be responsible for permitting ECS agents to make the AWS API call on a task’s behalf. The most common calls are as follows:

  • Pulling a container image from the Amazon ECR private repository
  • Using awslogs to send containers logs to CloudWatch Logs

AWS already provides a managed policy named AmazonECSTaskExecutionRolePolicy, which contains the permissions for the use cases described previously. The policy has the following effect and actions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

The following Terraform code snippet will perform these executions:

  • Create an IAM role named ecsTaskExecutionRole
  • Attach the AmazonECSTaskExecutionRolePolicy IAM managed policy to the ecsTaskExecutionRole role
  • Create an assume role policy that allows ECS tasks to call AWS APIs

Here is the snippet:

resource "aws_iam_role" "ecsTaskExecutionRole" {
  name                = "ecsTaskExecutionRole"
  path                = "/"
  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"]
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Sid    = ""
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      },
    ]
  })
}

In the next step, we will create ecsInstanceRole. This role will be responsible for performing actions to Amazon EC2 AWS APIs on your behalf. AWS provides AmazonEC2ContainerServiceforEC2Role as a managed IAM policy. The policy has the following effect and actions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeTags",
                "ecs:CreateCluster",
                "ecs:DeregisterContainerInstance",
                "ecs:DiscoverPollEndpoint",
                "ecs:Poll",
                "ecs:RegisterContainerInstance",
                "ecs:StartTelemetrySession",
                "ecs:UpdateContainerInstancesState",
                "ecs:Submit*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

The following Terraform code snippet will perform these executions:

  • Create an IAM role named ecsInstanceRole
  • Attach the AmazonEC2ContainerServiceforEC2Role IAM managed policy to the ecsInstanceRole role
  • Create an assume role policy that allows ECS to call the AWS EC2 APIs

Let us check the snippet:

resource "aws_iam_role" "ecsInstanceRole" {
  name                = "ecsInstanceRole"
  path                = "/"
  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"]
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Sid    = ""
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      },
    ]
  })
}

We need to add the ecsInstanceRole we just created as part of the instance profile. The instance profile is responsible for passing an IAM role to an EC2 instance.

We will reference the role attribute with the resource name we created in the prior step, role = aws_iam_role.ecsInstanceRole.name:

resource "aws_iam_instance_profile" "ecs_windows_ecsInstanceRole_profile" {
  name = "ecs_windows_ecsInstanceRole_profile"
  role = aws_iam_role.ecsInstanceRole.name
}

Amazon ECS clusters

Finally, we will deploy the Amazon ECS cluster. As I mentioned in the beginning, this will start as an empty cluster, so we can use it as a building block for the upcoming chapters. The cluster will have Container Insights enabled by default to fetch task resource consumption through the ECS agent:

resource "aws_ecs_cluster" "ecs_windows_cluster" {
  name = "ecs-cluster"
  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

Deploying the AWS services

Before now, we understood the code snippets that deploy the IAM roles and an Amazon ECS cluster; however, if you want to test the deployment, access the following GitHub repository: https://github.com/PacktPublishing/Running-Windows-Containers-on-AWS/tree/main/ecs-ec2-windows.

There is a file named chapter03.tf that contains all the resources covered in the chapter in a single TF file. Rename it main.tf and run the following:

terraform init
terraform apply

Check the AWS services and resources created in your account, then destroy (remove) them by running the following command:

terraform destroy

In Chapter 4, Deploying a Windows Container Instance, we will recreate the cluster with Windows container instances.

In this section, we got our hands dirty by coding in Terraform HCL. We learned about the necessary components for deploying an Amazon ECS cluster and its dependencies, such as IAM roles and policies, from code snippets.

Summary

In this chapter, we learned about Amazon ECS fundamentals and how its components are related; then, we delved into ECS task networking and the options available for Windows containers. Finally, we explored code snippets that deployed the necessary IAM roles and instance profiles used by the ECS Windows container instances and an empty ECS cluster.

In Chapter 4, Deploying a Windows Container Instance, we will learn about Amazon-optimized Windows AMIs, Auto Scaling groups, and capacity providers, and launch a Windows container instance inside the ECS cluster we deployed in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset