Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9
Monitoring and Metrics

THE AWS CERTIFIED SYSOPS ADMINISTRATOR - ASSOCIATE EXAM TOPICS COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:

Domain 1.0 Monitoring and Metrics
1.1 Demonstrate ability to monitor availability and performance
1.2 Demonstrate ability to monitor and manage billing and cost optimization processes
Domain 3.0 Analysis
3.1 Optimize the environment to ensure maximum performance
3.2 Identify performance bottlenecks and implement remedies
3.3 Identify potential issues on a given application deployment
Domain 6.0 Security
6.2 Ensure data integrity and access controls when using the AWS platform
6.4 Demonstrate ability to prepare for security assessment use of AWS

images

Introduction to Monitoring and Metrics

This chapter covers the collection of metrics from Amazon CloudWatch which, in conjunction with other AWS Cloud services, can assist in the management, deployment, and optimization of workloads in the cloud.

In addition to performance monitoring, services such as Amazon CloudWatch Logs, AWS Config, AWS Trusted Advisor, and AWS CloudTrail can provide a detailed inventory of provisioned resources for security audits and financial accounting.

Amazon CloudWatch was designed to monitor cloud-based computing. Even so, when using Amazon CloudWatch Logs, systems in an existing, on-premises datacenter can send log information to Amazon CloudWatch for monitoring.

Sometimes, it is important to monitor and view the health of AWS in general. To do this, there are two tools: the AWS Service Health Dashboard and the AWS Personal Health Dashboard. These services display the general status of AWS and provide a personalized view of the performance and availability of provisioned resources.

This chapter covers these topics separately and also highlights how they can work together to maintain a robust environment on AWS.

An Overview of Monitoring

Everything fails, all the time.

—Werner Vogels

Computing systems are incredibly complex. Effectively troubleshooting them requires data that is easily understood to be delivered in real time. Service Level Agreements (SLAs) often require high levels of availability, and a lack of meaningful information can lead to loss of time and revenue.

Why Monitor?

Monitoring provides several major benefits:

Monitoring enables systems operators to catch issues before they become problems. This, in turn, maintains high availability and delivers high-quality customer service.
Monitoring provides tools for making informed decisions about capacity planning.
Monitoring is an input mechanism for automation.
Monitoring provides visibility into the cost, utilization, and security of computing resources.

Monitoring is the process of observing and recording resource utilization in real time. Alarms are notifications based on this information in response to a predefined condition. Frequently, this condition involves failure. Alarms can also be configured to send notifications when resources are being underutilized and money is being wasted.

Traditional monitoring tools have been designed around on-premises datacenters with the idea that servers are going to be in place for an extended amount of time. Because of this, these tools have difficulty distinguishing between an Amazon Elastic Compute Cloud (Amazon EC2) instance that has failed and one that was terminated purposely. AWS created its own monitoring service, Amazon CloudWatch, that is integrated with other AWS Cloud services and uses AWS Identity and Access Management (IAM) to keep monitoring data secure.

Amazon CloudWatch

Amazon CloudWatch is a service that monitors the health and status of AWS resources in real time. It provides system-wide visibility into resource utilization, application performance, and operational health by tracking, measuring, reporting, alerting, and reacting to events that occur in an environment.

Amazon CloudWatch Logs collects and monitors log files, can set alarms, and automatically reacts to changes in AWS resources. Logs can be monitored in real time or stored for analysis.

Amazon CloudWatch Alarms monitor a single metric and perform one or more actions based on customer-defined criteria.

Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. These streams are delivered to Amazon EC2 instances, AWS Lambda functions, Amazon Kinesis Streams, Amazon EC2 Container Service (Amazon ECS) tasks, AWS Step Functions state machines, Amazon Simple Notification Service (Amazon SNS) topics, Amazon Simple Queue Service (Amazon SQS) queues, or built-in targets.

AWS CloudTrail

AWS CloudTrail monitors calls made to the Amazon CloudWatch Events Application Programming Interface (API) for an account. This includes calls made by the AWS Management Console, the AWS Command Line Interface (AWS CLI), and other AWS Cloud services. When AWS CloudTrail logging is turned on, Amazon CloudWatch Events writes log files to an Amazon Simple Storage Service (Amazon S3) bucket.

AWS Config

AWS Config provides a detailed view of the configuration of AWS resources in an AWS account, including how the resources are related to one another. It also provides historical information to show how configurations and relationships have changed over time. Related to monitoring, AWS Config allows customers to create rules that check the configuration of their AWS resources and check for compliance with an organization’s policies. When an AWS Config rule is triggered, it generates an event that can be captured by Amazon CloudWatch Events.

Amazon CloudWatch can monitor AWS resources, such as Amazon EC2 instances, Amazon DynamoDB tables, Amazon Relational Database Service (Amazon RDS) DB instances, custom metrics generated by applications and services, and log files generated by applications and operating systems.

AWS Trusted Advisor

AWS Trusted Advisor is an online resource designed to help reduce cost, increase performance, and improve security by optimizing an AWS environment. It provides real-time guidance to help provision resources following AWS best practices.

AWS Trusted Advisor checks for best practices in four categories:

Cost Optimization
Security
Fault Tolerance
Performance Improvement

The following four AWS Trusted Advisor checks are available at no charge to all AWS customers to help improve security and performance:

Service Limits
Security Groups – Specific Ports Unrestricted
IAM Use
Multi-Factor Authentication (MFA) on the Root Account

AWS Service Health Dashboard

The AWS Service Health Dashboard provides access to current status and historical data about every AWS Cloud service. If there’s a problem with a service, it is possible to expand the appropriate line in the details section to get more information.

In addition to the dashboard, it is also possible to subscribe to the RSS feed for any service.

For anyone experiencing a real-time operational issue with one of the AWS Cloud services currently reporting as being healthy, there is a Contact Us link at the top of the page to report an issue.

The AWS Service Health Dashboard is available at http://status.aws.amazon.com/.

AWS Personal Health Dashboard

The AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that impact customers. While the AWS Service Health Dashboard displays the general status of AWS Cloud services, the AWS Personal Health Dashboard provides a personalized view into the performance and availability of the AWS Cloud services underlying provisioned AWS resources.

The dashboard displays relevant and timely information to help manage events in progress and provides proactive notification to help plan scheduled activities. Alerts are automatically triggered by changes in the health of AWS resources and provide event visibility and also guidance to help diagnose and resolve issues quickly.

Now let’s take a deep dive into each of these services.

Amazon CloudWatch

Amazon CloudWatch monitors, in real time, AWS resources and applications running on AWS. Amazon CloudWatch is used to collect and track metrics, which are variables used to measure resources and applications. Amazon CloudWatch Alarms send notifications and can automatically make changes to the resources being monitored based on user-defined rules. Amazon CloudWatch is basically a metrics repository. An AWS product such as Amazon EC2 puts metrics into the repository and customers retrieve statistics based on those metrics. Additionally, custom metrics can be placed into Amazon CloudWatch for reporting and statistical analysis.

For example, it is possible to monitor the CPU usage and disk reads and writes of Amazon EC2 instances. With this information, it is possible to determine when additional instances should be launched to handle the increased load. Additionally, these new instances can be launched automatically before there is a problem, eliminating the need for human intervention. Conversely, monitoring data can be used to stop underutilized instances automatically in order to save money.

In addition to monitoring the built-in metrics that come with AWS, it is possible to create, monitor, and trigger actions using custom metrics. Amazon CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health. Figure 9.1 illustrates how Amazon CloudWatch connects to both AWS and on-premises environments.

Image shows AWS connects to Amazon cloud watch containing metrics, Amazon CloudWatch Alarm (divided into actions: Email and auto scaling), and available statistics (AWS management console and statistics consumer). — **FIGURE 9.1** Amazon CloudWatch integration

Amazon CloudWatch can be accessed using the following methods:

Amazon CloudWatch console: https://console.aws.amazon.com/cloudwatch/
The AWS CLI
The Amazon CloudWatch API
AWS SDKs

Amazon CloudWatch Services

The following services are used in conjunction with Amazon CloudWatch:

Amazon SNS Amazon SNS coordinates and manages the delivery or sending of messages to subscribing endpoints or clients. Use Amazon SNS with Amazon CloudWatch to send messages when an alarm threshold has been reached.

Auto Scaling Auto Scaling automatically launches or terminates Amazon EC2 instances based on user-defined policies, health status checks, and schedules. Use an Amazon CloudWatch Alarm with Auto Scaling to scale Amazon EC2 instances in or out based on demand.

AWS CloudTrail AWS CloudTrail can monitor calls made to the Amazon CloudWatch API for an account. It includes calls made by the AWS Management Console, AWS CLI, and other AWS Cloud services. When AWS CloudTrail logging is turned on, Amazon CloudWatch writes log files to an Amazon S3 bucket that is specified when AWS CloudTrail is configured.

IAM IAM manages the authentication and authorization for AWS resources such as Amazon CloudWatch and its collected metrics.

Metrics

At the core of Amazon CloudWatch are metrics, which are time-ordered sets of data points that contain information about the performance of resources. By default, several services provide some metrics at no additional charge. These metrics include information from Amazon EC2 instances, Amazon Elastic Block Store (Amazon EBS) volumes, and Amazon RDS DB instances. It is also possible to enable detailed monitoring for some resources, such as Amazon EC2 instances.

Amazon CloudWatch has a free tier, and many applications can operate within the free tier limits. The AWS Free Tier applies to participating services across the following AWS Regions: US East (Northern Virginia), US West (Oregon), US West (Northern California), Canada (Central), EU (London), EU (Ireland), EU (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), and South America (Sao Paulo).

New and existing customers get three dashboards of up to 50 metrics each per month with no additional charge.
Basic monitoring metrics for Amazon EC2 instances are available at no additional charge.
All metrics for Amazon EBS volumes, Elastic Load Balancing load balancers, and Amazon RDS DB instances are available at no additional charge.
Customers can use 10 metrics (applicable to detailed monitoring for Amazon EC2 instances, custom metrics, or Amazon CloudWatch Logs), 10 alarms, and 1 million API requests each month with no additional charges.
Customers receive 5 GB of data ingestion and 5 GB of archived storage per month with no additional charge.

Custom Metrics

In addition to monitoring AWS resources, Amazon CloudWatch can be used to monitor data produced from applications, scripts, and services. A custom metric is any metric provided to Amazon CloudWatch via an agent or an API.

Custom metrics can be used to monitor the time it takes to load a web page, capture request error rates, monitor the number of processes or threads on an instance, or track the amount of work performed by an application.

Ways to create custom metrics include the following:

The PutMetricData API.
AWS-provided sample monitoring scripts for Windows and Linux.
The Amazon CloudWatch collectd plugin.
Applications and tools offered by AWS Partner Network (APN) partners.

Custom metrics come at an additional cost based on usage.

Amazon CloudWatch Metrics Retention

In November 2016, Amazon CloudWatch changed the length of time metrics are stored inside the service as follows:

One-minute data points are available for 15 days.
Five-minute data points are available for 63 days.
One-hour data points are available for 455 days.

If metrics need to be available for longer than those periods, they can be archived using the GetMetricStatistics API call.

Metrics cannot be deleted. They automatically expire after 15 months if no new data is published to them.

Namespaces

A namespace is a container for a collection of Amazon CloudWatch metrics. Each namespace is isolated from other namespaces. This isolation ensures that data collected is only from services specified and prevents different applications from mistakenly aggregating the same statistics.

There are no default namespaces. When creating a custom metric, a namespace is required. If the specified namespace does not exist, Amazon CloudWatch will create it.

Namespace names must contain valid XML characters and be fewer than 256 characters in length.

Allowed characters in namespace names are as follows:

Alphanumeric characters (0-9, A-Z, a-z)
Period (.)
Hyphen (-)
Underscore (_)
Forward Slash (/)
Hash (#)
Colon (:)

AWS namespaces use the following naming convention: AWS/service. For example, Amazon EC2 uses the AWS/EC2 namespace. Sample AWS Namespaces are shown in Table 9.1.

TABLE 9.1 A Small Sample of AWS Namespaces

AWS Product	Namespace
Auto Scaling	AWS/AutoScaling
Amazon EC2	AWS/EC2
Amazon EBS	AWS/EBS
Elastic Load Balancing	AWS/ELB (Classic Load Balancers)
Elastic Load Balancing	AWS/ApplicationELB (Application Load Balancers)
For a comprehensive list of namespaces, see http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-namespaces.html.

Dimensions

A dimension is a name/value pair that uniquely identifies a metric and further clarifies the metric data stored. A metric can have up to 10 dimensions.

Every metric has specific characteristics that describe it. Think of dimensions as categories or metadata for those characteristics. The categories can aid in the design of a structure for a statistics plan. Because dimensions are part of the unique identifier for a metric, whenever a unique name/value pair is added to a metric, a new metric is created.

Dimensions can be used to filter the results from Amazon CloudWatch. For example, it is possible to get statistics for a specific Amazon EC2 instance by specifying the instance ID dimension when doing a search.

Dimension Combinations

Amazon CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics use the same metric name. It is not possible to retrieve statistics using combinations of dimensions that have not been specifically published. When retrieving statistics, specify the same values for the namespace, metric name, and dimension parameters that were used when the metrics were created. The start and end times can be specified for Amazon CloudWatch to use for aggregation.

To illustrate, here are four distinct metrics named ServerStats in the DataCenterMetric namespace that have the following properties:

Dimensions: Server=Prod, Domain=Titusville, Unit: Count, Timestamp: 2017-05-18T12:30:00Z, Value: 105
 
Dimensions: Server=Test, Domain=Titusville, Unit: Count, Timestamp: 2017-05-18T12:31:00Z, Value: 115
 
Dimensions: Server=Prod, Domain=Rockets, Unit: Count, Timestamp: 2017-05-18T12:32:00Z, Value: 95
 
Dimensions: Server=Test, Domain=Rockets, Unit: Count, Timestamp: 2017-05-18T12:33:00Z, Value: 97

If those four metrics are the only ones that have been published, statistics can be retrieved for these combinations of dimensions:

Server=Prod,Domain=Titusville
Server=Prod,Domain=Rockets
Server=Test,Domain=Titusville
Server=Test,Domain=Rockets

It is not possible to receive statistics for the following dimensions. In this case, dimensions must be specified to retrieve statistics:

Server=Prod
Server=Test
Domain=Titusville
Domain=Rockets

Statistics

Statistics are metric data aggregations over specified periods of time. Amazon CloudWatch provides statistics based on the metric data points provided by custom data or by other services in AWS to Amazon CloudWatch. Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure within the time period specified. Available CloudWatch statistics are provided in Table 9.2.

TABLE 9.2 Available CloudWatch Statistics

Statistic	Description
Minimum	The lowest value observed during the specified period. Use this value to determine low volumes of activity for an application.
Maximum	The highest value observed during the specified period. Use this value to determine high volumes of activity for an application.
Sum	All values submitted for the matching metric added together. This statistic can be useful for determining the total volume of a metric.
Average	The value of Sum/SampleCount during the specified period. By comparing this statistic with Minimum and Maximum, the full scope of a metric can be determined, and it is possible to discover how close average use is to the Minimum and Maximum.
SampleCount	The number of data points used for the statistical calculation.
pNN.NN	The value of the specified percentile. You can specify any percentile using up to two decimal places.

Pre-calculated statistics can be added to Amazon CloudWatch. Instead of data point values, specify values for SampleCount, Minimum, Maximum, and Sum. Amazon CloudWatch calculates the average for you. Values added in this way are aggregated with any other values associated with the matching metric.

Units

Each statistic has a unit of measure. Example units include bytes, seconds, count, and percent.

A unit can be specified when creating a custom metric. If one is not specified, Amazon CloudWatch uses None as the unit. Units provide conceptual meaning to data.

Metric data points that specify a unit of measure are aggregated separately. When getting statistics without specifying a unit, Amazon CloudWatch aggregates all data points of the same unit together. If there are two identical metrics that have different units, two separate data streams are returned—one for each unit.

Periods

A period is the length of time associated with a specific Amazon CloudWatch statistic. Each statistic represents an aggregation of the metrics data collected for a specified period of time. Although periods are expressed in seconds, the minimum granularity for a period is one minute.

Because of this minimum granularity, period values are expressed as multiples of 60. By varying the length of the period, the data aggregation can be adjusted.

When retrieving statistics, specify a period, start time, and an end time. These parameters determine the overall length of time associated with the collected statistic.

Default values for the start time and end time return statistics from the past hour.

The values specified for the start time and end time determine how many periods Amazon CloudWatch will return.

Periods are also important for Amazon CloudWatch Alarms. When creating an alarm to monitor a specific metric, Amazon CloudWatch will compare that metric to a specified threshold value. Customers have extensive control over how Amazon CloudWatch makes comparisons. In addition to the period length, the number of evaluation periods can be specified as well. For example, if three evaluation periods are specified, Amazon CloudWatch compares a window of three data points. Amazon CloudWatch only sends a notification if the oldest data point is breaching and the others are breaching or are missing.

Aggregation

Amazon CloudWatch aggregates statistics according to the period length specified when retrieving statistics. When publishing as multiple data points with the same or similar timestamps, Amazon CloudWatch aggregates them by period length.

Data points for a metric that share the same timestamp, namespace, and dimension can be published, and Amazon CloudWatch will return aggregated statistics for them. It is also possible to publish multiple data points for the same or different metrics with any timestamp.

For large datasets, a pre-aggregated dataset can be inserted called a statistic set. With statistic sets, give Amazon CloudWatch the Min, Max, Sum, and SampleCount for a number of data points. This is commonly used data that needs to be collected many times in a minute.

Amazon CloudWatch doesn’t differentiate the source of a metric. If a metric is published with the same namespace and dimensions from different sources, Amazon CloudWatch treats it as a single metric. This can be useful for service metrics in a distributed, scaled system.

Dashboards

Amazon CloudWatch dashboards are customizable pages in the Amazon CloudWatch console that can be used to monitor resources in a single view. Monitored resources can be in a single region or multiple regions. Use Amazon CloudWatch dashboards to create customized views of the metrics and alarms for AWS resources.

With dashboards, it is possible to create the following:

A single view for selected metrics and alarms to help assess the health of resources and applications across one or more regions
An operational playbook that provides guidance for team members during operational events about how to respond to specific incidents
A common view of critical resource and application measurements that can be shared by team members for faster communication flow during operational events

Percentiles

A percentile indicates the relative standing of a value in a dataset and is often used to isolate anomalies. For example, the 95th percentile means that 95 percent of the data is below this value and 5 percent of the data is above this value. Percentiles help in a better understanding of the distribution of metric data.

Percentiles can be used with the following services:

Amazon EC2
Amazon RDS
Amazon Kinesis
Application Load Balancer
Elastic Load Balancing
Amazon API Gateway

Monitoring Baselines

Monitoring is an important part of maintaining the reliability, availability, and performance of Amazon EC2 instances. Collect monitoring data from all of the parts of an AWS solution to be able to debug any multi-point failures easily.

In order to monitor an environment effectively, have a plan that answers the following questions:

What are the goals for monitoring?
What resources need to be monitored?
How often will these resources be monitored?
What monitoring tools will be used?
Who will perform the monitoring tasks?
Who should be notified when something goes wrong?

After the monitoring goals have been defined and the monitoring plan has been created, the next step is to establish a baseline for normal Amazon EC2 performance.

Measure Amazon EC2 performance at various times and under different load conditions. While monitoring Amazon EC2, store a history of collected monitoring data. Over time, the historical data can be compared to current data to identify normal performance patterns and also anomalies.

For example, if monitoring CPU utilization, disk I/O, and network utilization for your Amazon EC2 instances and performance falls outside of an established baseline, reconfigure or optimize the instance to reduce CPU utilization, improve disk I/O, or reduce network traffic (see Table 9.3).

Amazon EC2 Status Checks

Inside Amazon EC2, there are two types of status checks: a system status check and an instance status check.

TABLE 9.3 Establishing an Amazon EC2 Baseline

Item to Monitor	Amazon EC2 Metric
CPU utilization	CPUUtilization
Memory utilization	Requires an agent
Memory used	Requires an agent
Memory available	Requires an agent
Network utilization	NetworkIn NetworkOut
Disk performance	DiskReadOps DiskWriteOps
Disk Swap utilization (Linux instances)	Requires an agent
Swap used (Linux instances)	Requires an agent
Page File utilization (Windows instances only)	Requires an agent
Page File used (Windows instances only)	Requires an agent
Page File available (Windows instances only)	Requires an agent
Disk Reads/Writes	DiskReadBytes DiskWriteBytes
Disk Space utilization (Linux instances)	Requires an agent
Disk Space used (Linux instances)	Requires an agent
Disk Space available (Linux instances only)	Requires an agent

System Status Checks

System status checks monitor the AWS systems required to use your instance in order to ensure that they are working properly. These checks detect problems on the hardware that an instance is using.

When a system status check fails, there are three possible courses of action. One option is to wait for AWS to fix the issue. If an instance boots from an Amazon EBS volume, stopping and starting the instance will move it to new hardware. If the Amazon EC2 instance is using an instance store volume, terminate it, and start a new one to put it on new hardware.

The following are examples of problems that can cause system status checks to fail:

Loss of network connectivity
Loss of system power
Software issues on the physical host
Hardware issues on the physical host that impact network reachability

Instance Status Checks

Instance status checks monitor the software and network configuration of individual instances. These checks detect problems that require user involvement to repair. When an instance status check fails, it can often be fixed with a reboot or by reconfiguring the Amazon EC2 instance.

The following are examples of problems that can cause instance status checks to fail:

Failed system status checks
Incorrect networking or startup configuration
Exhausted memory
Corrupted file system
Incompatible kernel

To view the status checks for running Amazon EC2 instances, use the describe-instance-status AWS CLI command.

To view the status of all instances:

aws ec2 describe-instance-status

To get the status of all instances with an instance status of impaired:

aws ec2 describe-instance-status --filters Name=instance-status .status,Values=impaired

To get the status of a single instance, use the following command with a known instance ID:

aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0

Authentication and Access Control

Access to Amazon CloudWatch requires credentials. These credentials must have permissions to access AWS resources such as the Amazon CloudWatch console or retrieving Amazon CloudWatch metric data.

Every AWS resource is owned by an AWS account, and permissions to create or access a resource are governed by permissions policies. An account administrator can attach permissions policies to IAM identities: users, groups, and roles.

Permissions Required to Use the Amazon CloudWatch Console

For a user to work with the Amazon CloudWatch console, there is a minimum set of permissions required to allow that user to describe other AWS resources in the AWS account.

The Amazon CloudWatch management console requires permissions from the following services:

Auto Scaling
AWS CloudTrail
Amazon CloudWatch
Amazon CloudWatch Events
Amazon CloudWatch Logs
Amazon EC2
Amazon Elasticsearch Service
IAM
Amazon Kinesis
AWS Lambda
Amazon S3
Amazon SNS
Amazon SQS
Amazon Simple Workflow Service (Amazon SWF)

AWS Managed Policies for Amazon CloudWatch

AWS provides standalone IAM policies that cover many common use cases. These policies were created and administered by AWS and grant the required permissions for services without customers having to create and maintain their own.

Customers create their own IAM policies to allow permissions for Amazon CloudWatch actions and resources. Attach these custom policies to the IAM users or groups that require those permissions.

Amazon CloudWatch Resources and Operations

Amazon CloudWatch has no specific resources that can be controlled. As a result, there are no Amazon CloudWatch Amazon Resource Names (ARNs) to use in an IAM policy.

For example, it is not possible to give a user access to Amazon CloudWatch data for a specific set of Amazon EC2 instances or a specific load balancer.

When writing an IAM policy, use an asterisk (*) as the resource name to control access to Amazon CloudWatch actions.

{
  "Version": "2012-10-17",
  "Statement":[{
      "Effect":"Allow",
      "Action":["cloudwatch:GetMetricStatistics","cloudwatch:ListMetrics"],
      "Resource":"*",
      "Condition":{
         "Bool":{
            "aws:SecureTransport":"true"
            }
         }
      }
   ]
   }

AWS Cloud Services Integration

The following AWS Cloud services support Amazon CloudWatch without additional charges. Customers have the option to choose which of the preselected metrics they want to use.

Auto Scaling Groups Seven preselected metrics at a one-minute frequency

Elastic Load Balancing Thirteen preselected metrics at a one-minute frequency

Amazon Route 53 health checks One preselected metric at a one-minute frequency

Amazon EBS IOPS (Solid State Drive [SSD]) volumes Ten preselected metrics at a one-minute frequency

Amazon EBS General Purpose (SSD) volumes Ten preselected metrics at a one-minute frequency

Amazon EBS Magnetic volumes Eight preselected metrics at a five-minute frequency

AWS Storage Gateway Eleven preselected gateway metrics and five preselected storage volume metrics at a five-minute frequency

Amazon CloudFront Six preselected metrics at a one-minute frequency

Amazon DynamoDB tables Seven preselected metrics at a five-minute frequency

Amazon ElastiCache nodes Thirty-nine preselected metrics at a one-minute frequency

Amazon RDS DB instances Fourteen preselected metrics at a one-minute frequency

Amazon EMR job flows Twenty-six preselected metrics at a five-minute frequency

Amazon Redshift Sixteen preselected metrics at a one-minute frequency

Amazon SNS topics Four preselected metrics at a five-minute frequency

Amazon SQS queues Eight preselected metrics at a five-minute frequency

AWS OpsWorks Fifteen preselected metrics at a one-minute frequency

Amazon CloudWatch Logs Six preselected metrics at one-minute frequency

Estimated charges on your AWS bill It is also possible to enable metrics to monitor AWS charges. The number of metrics depends on the AWS products and services used. These metrics are offered at no additional charge.

Amazon CloudWatch Limits

Table 9.4 lists Amazon CloudWatch limits.

TABLE 9.4 Amazon CloudWatch Limits

Resource	Default Limit
Actions	5/alarm. This limit cannot be changed.
Alarms	10/month/customer for no additional charge 5,000 per region per account
API requests	1,000,000/month/customer for no additional charge
Custom metrics	No limit
DescribeAlarms	3 Transactions per Second (TPS) This is the maximum number of operation requests that can be made per second without being throttled. A limit increase can be requested.
Dimensions	10/metric. This limit cannot be changed.
GetMetricStatistics	400 TPS This is the maximum number of operation requests per second before being throttled. A limit increase can be requested.
ListMetrics	25 TPS This is the maximum number of operation requests per second before being throttled. A limit increase can be requested.
Metric data	15 months. This limit cannot be changed.
MetricDatum items	20/PutMetricData request A MetricDatum object can contain a single value or a StatisticSet object representing many values. This limit cannot be changed.
Metrics	10/month/customer for no additional charge
Period	One day (86,400 seconds) This limit cannot be changed.
PutMetricAlarm request	3 TPS The maximum number of operation requests you can make per second without being throttled. A limit increase can be requested.
PutMetricData request	40 KB for HTTP POST requests 150 TPS The maximum number of operation requests that you can make per second without being throttled. A limit increase can be requested.
Amazon SNS email notifications	1,000/month/customer for no additional charge

Amazon CloudWatch Alarms

Amazon CloudWatch Alarms are used to initiate automatically an action in response to a predefined condition. An alarm watches a single metric over a specified time period and, based on the value of that metric relative to a threshold over time, performs one or more specified actions. Those actions include triggering an Auto Scaling policy, publishing to an Amazon SNS topic, and updating a dashboard.

Alarms only trigger actions after sustained state changes. Amazon CloudWatch Alarms are not generated simply because a metric is in a particular state. The state must change and be maintained for a user-specified number of periods.

An Amazon CloudWatch Alarm is always in one of three states: OK, ALARM, or INSUFFICIENT_DATA.

When the monitored metric is within the range that has been defined as acceptable, it is in the OK state.
When a metric breaches a user-defined threshold, it transitions to the ALARM state.
If the data needed to make a decision is missing or incomplete, it is in the INSUFFICIENT_DATA state.

Actions are set to respond to the transition of the metric as it moves into each of the three states. Actions only happen on state transitions and will not be re-executed if the condition persists.

Multiple actions are allowed for an alarm. If an alarm is triggered, Amazon SNS could be used to send a notification email, while at the same time an Auto Scaling policy is updated.

Alarms and Thresholds

Alarms are designed to be triggered when three things happen:

A monitored metric has reached a particular value or is within a defined range.
That metric’s value stays at that value or within the specified range for a number of times in a row in a period.
The value or range of the metric has been consistent for a number of user-defined periods.

In short, an Amazon CloudWatch Alarm is triggered when a monitored metric reaches a value, is reported multiple times in a row, and stays at that value for a period of time.

In Figure 9.2, the alarm threshold is set to three units and the alarm is evaluated over three periods. The alarm goes to ALARM state if the oldest of the three periods evaluated has matched the alarm criteria and the next two periods have met the criteria or are missing.

In the figure, this happens with the third through fifth time periods when the alarm’s state is set to ALARM. At period six, the value drops below the threshold and the state reverts to OK.

Later, during the ninth time period, the threshold is breached again, but for only one period. Because of this, the alarm state remains OK. Figure 9.2 shows this as a graph.

Graph shows time period versus units that record threshold and value after three periods and only one period over threshold when action and no action is invoked respectively. — **FIGURE 9.2** A threshold breach without a change in alarm state

Alarms can also be added to dashboards. When an alarm is on a dashboard, it turns red when it is in the ALARM state, making it easier to monitor its status.

Missing Data Points

Similar to how each alarm is always in one of three states, each specific data point reported to Amazon CloudWatch falls under one of three categories:

Good: Within the threshold
Bad: Violating the threshold
Missing: Data for the metric is not available, or not enough data is available for the metric to determine the alarm state.

Customers can specify how alarms handle missing data points. They can be treated as:

Missing: The alarm looks back further in time to find additional data points.
Good: Treated as a data point that is within the threshold
Bad: Treated as a data point that is breaching the threshold
Ignored: The current alarm state is maintained.

The best choice of how to treat missing data points depends on the type of metric. For a metric that continually reports data, such as CPUUtilization of an instance, it might be best to treat missing data points as bad because their absence indicates something is wrong. For a metric that generates data points only when an error occurs, such as ThrottledRequests in Amazon DynamoDB, missing data points should be treated as good.

Choosing the best option for an alarm prevents unnecessary and misleading alarm condition changes and more accurately indicates the health of a system.

Common Amazon CloudWatch Metrics

There are hundreds of metrics available for monitoring on AWS. The common ones are listed here, broken down by service, with a brief explanation. As we have mentioned throughout, this book is designed to do more than just prepare you for an exam—it should serve you well as a day-to-day guide in working with AWS.

Amazon EC2

There are two types of EC2 status checks: system status checks and instance status checks.

System Status Checks

System Status Checks monitor AWS hardware to ensure that instances are working properly. These checks detect problems with an instance that requires AWS involvement to repair. When a system status check fails, customers can choose to wait for AWS to fix the issue, or they can resolve it by either stopping and starting an instance or by terminating and replacing it.

The following are examples of problems that can cause system status checks to fail:

Loss of network connectivity
Loss of system power
Software issues on the physical host
Hardware issues on the physical host that impact network reachability

Instance Status Checks

Instance Status Checks monitor the software and network configuration of an individual instance. These checks detect problems that require customer involvement to repair. When an instance status check fails, typical solutions include rebooting or reconfiguring the instance.

The following are examples of problems that can cause instance status checks to fail:

Failed system status checks
Incorrect networking or startup configuration
Exhausted memory
Corrupted file system
Incompatible kernel

The following Amazon CloudWatch metrics offer insight into the usage and utilization of Amazon EC2 instances. For Amazon EC2, common metrics include the following:

CPUUtilization
NetworkIn
NetworkOut
DiskReadOps
DiskWriteOps
DiskReadBytes
DiskWriteBytes

CPUUtilization This metric is the percentage of allocated Amazon EC2 compute units that are currently in use on an instance. This metric identifies the processing power required to run an application on a selected instance.

NetworkIn This metric represents the number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.

Similar to NetworkOut, the number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60.

NetworkOut This metric is the number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.

Similar to NetworkIn, the number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60.

DiskReadOps This metric reports the completed read operations from all instance store volumes available to the instance in a specified period of time.

To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period.

DiskWriteOps This metric reports the completed write operations to all instance store volumes available to the instance in a specified period of time.

To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period.

DiskReadBytes This metric reports the number of bytes read from all instance store volumes available to the instance.

This metric is used to determine the volume of the data the application reads from the hard disk of the instance. It can be used to determine the speed of the application.

The number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60.

DiskWriteBytes This metric reports the number of bytes written to all instance store volumes available to the instance.

The metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application.

The number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60.

Amazon Elastic Block Store Volume Monitoring

Amazon Elastic Block Store (Amazon EBS) sends data points to Amazon CloudWatch for several metrics. Amazon EBS General Purpose SSD (gp2), Throughput Optimized HDD (st1), Cold HDD (sc1), and Magnetic (standard) volumes automatically send five-minute metrics to CloudWatch. Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch.

Common Amazon EBS metrics include the following:

VolumeReadBytes
VolumeWriteBytes
VolumeReadOps
VolumeWriteOps
VolumeTotalReadTime
VolumeTotalWriteTime
VolumeIdleTime
VolumeQueueLength
VolumeThroughputPercentage
VolumeConsumedReadWriteOps
BurstBalance

VolumeReadBytes and VolumeWriteBytes These metrics provide information on the I/O operations in a specified period of time. The Sum statistic reports the total number of bytes transferred during the period. The Average statistic reports the average size of each I/O operation during the period. The SampleCount statistic reports the total number of I/O operations during the period. The Minimum and Maximum statistics are not relevant for this metric.

VolumeReadOps and VolumeWriteOps These metrics report the total number of I/O operations in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period.

VolumeTotalReadTime and VolumeTotalWriteTime These metrics report the total number of seconds spent by all operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period.

For example, for a period of 5 minutes (300 seconds): If 700 operations completed during that period, and each operation took 1 second, the value would be 700 seconds.

VolumeIdleTime This metric represents the total number of seconds in a specified period of time when no read or write operations were submitted.

VolumeQueueLength This metric is the number of read and write operation requests waiting to be completed in a specified period of time.

VolumeThroughputPercentage This metric is used with Provisioned IOPS SSD volumes only. It is the percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. Provisioned IOPS SSD volumes deliver within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

VolumeConsumedReadWriteOps This metric is used with Provisioned IOPS SSD volumes only. The total amount of read and write operations (normalized to 256K capacity units) consumed in a specified period of time

I/O operations that are smaller than 256K each count as 1 consumed IOPS. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS.

BurstBalance This metric is only used with General Purpose SSD (gp2), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes. It provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket.

Data is reported to Amazon CloudWatch only when the volume is active. If the volume is not attached, no data is reported.

Amazon EBS Status Checks

Volume status checks help customers understand, track, and manage potential inconsistencies in the data on an Amazon EBS volume. They are designed to provide you with the information needed to determine if an Amazon EBS volume is impaired and to help customers control how a potentially inconsistent volume is handled.

Volume status checks are automated tests that run every five minutes and return a pass or fail status. If all checks pass, the status of the volume is ok. If a check fails, the status of the volume is impaired. If the status is insufficient-data, the checks may still be in progress on the volume.

There are four status types for Provisioned IOPS EBS Volumes: ok, warning, impaired, and insufficient-data.

ok This status means that the volume is performing as expected.

warning This status means that the volume is either Degraded or Severely Degraded.

Degraded means that the volume performance is below expectations. Severely Degraded means that the volume performance is well below expectations.

impaired Impaired means that a volume has either Stalled or is Not Available. Stalled means that the volume performance is severely impacted. Not Available means that it is unable to determine I/O performance because I/O is disabled.

insufficient-data Insufficient-data means that there have not been enough data points collected but that it is online.

Amazon ElastiCache

The following Amazon CloudWatch metrics offer insight into Amazon ElastiCache performance. In most cases, the recommendation is to set CloudWatch Alarms for these metrics to be able to take corrective action before performance issues occur. For Amazon ElastiCache, common metrics include the following:

CPUUtilization
SwapUsage
Evictions
CurrConnections

CPUUtilization This is a host-level metric reported as a percent.

Memcached Because Memcached is multi-threaded, this metric can be as high as 90 percent. If this threshold is exceeded, scale the cache cluster up by using a larger cache node type, or scale out by adding more cache nodes.
Redis Because Redis is single-threaded, the threshold is calculated as utilization/number of processor cores. For example, when using a cache.m1.xlarge node that has four cores, the threshold for a 90 percent CPUUtilization would be 90/4, or 22.5 percent.

Administrators have to determine their own threshold value based on the number of cores in the cache node being used.

If this threshold is exceeded and the main workload is from read requests, scale the cache cluster out by adding read replicas. If the main workload is from write requests, AWS recommends scaling up by using a larger cache instance type.

SwapUsage This is a host-level metric reported in bytes.

Memcached This metric should not exceed 50 MB. If it does, AWS recommends that the ConnectionOverhead parameter value be increased.
Redis At this time, AWS has no recommendation for this parameter; there is not a need to set an Amazon CloudWatch Alarm for it.

Evictions This is a metric published for both Memcached and Redis cache clusters. AWS recommends customers determine their own alarm thresholds for this metric based on application needs.

Memcached If the chosen threshold is exceeded, scale the cache cluster up by using a larger node type or scale out by adding more nodes.
Redis If you exceed your chosen threshold, scale your cluster up by using a larger node type.

CurrConnections This is a cache engine metric, published for both Memcached and Redis cache clusters. AWS recommends customers determine their own alarm thresholds for this metric based on application needs.

Whether running Memcached or Redis, an increasing number of CurrConnections might indicate a problem with an application.

Amazon RDS Metrics

When using Amazon RDS resources, Amazon RDS sends metrics and dimensions to Amazon CloudWatch every minute. Common metrics include the following:

DatabaseConnections
DiskQueueDepth
FreeStorageSpace
ReplicaLag
ReadIOPS
WriteIOPS
ReadLatency WriteLatency

DatabaseConnections This metric is a count of the number of database connections in use.

DiskQueueDepth This metric is a count of the number of outstanding I/O operations waiting to access the disk.

FreeStorageSpace This metric, measured in bytes, is the amount of available storage space.

ReplicaLag This metric, measured in seconds, is the amount of time a Read Replica DB instance lags behind the source DB instance. It applies to MySQL, MariaDB, and PostgreSQL Read Replicas.

ReadIOPS This metric is the average number of disk read I/O operations per second.

WriteIOPS This metric is the average number of disk write I/O operations per second.

ReadLatency This metric, measured in seconds, is the average amount of time taken per disk read I/O operation.

WriteLatency This metric, measured in seconds, is the average amount of time taken per disk write I/O operation.

AWS Elastic Load Balancer

AWS ELB reports metrics to Amazon CloudWatch only when requests are flowing through the load balancer. When there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals. If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported.

Common metrics reported to Amazon CloudWatch from an ELB include the following:

BackendConnectionErrors
HealthyHostCount
UnHealthyHostCount
RequestCount
Latency
HTTPCode_Backend_2XX
HTTPCode_Backend_3XX
HTTPCode_Backend_4XX
HTTPCode_Backend_5XX
HTTPCode_ELB_4XX
HTTPCode_ELB_5XX
SpilloverCount
SurgeQueueLength

BackendConnectionErrors This metric is the number of connections that were not successfully established between the load balancer and the registered instances. Because the load balancer retries the connection when there are errors, this count can exceed the request rate. This count also includes any connection errors related to health checks.

HealthyHostCount This metric is the number of healthy instances registered with a load balancer. A newly registered instance is considered healthy after it passes the first health check.

If cross-zone load balancing is enabled, the number of healthy instances for the LoadBalancerName dimension is calculated across all Availability Zones. Otherwise, it is calculated per Availability Zone.

UnHealthyHostCount This metric is the number of unhealthy instances registered with a load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks. An unhealthy instance is considered healthy again after it meets the healthy threshold configured for health checks.

RequestCount This metric is the number of requests completed or connections made during the specified interval, which is either one or five minutes.

HTTP listener The number of requests received and routed, including HTTP error responses from the registered instances
TCP listener The number of connections made to the registered instances

Latency This metric represents the elapsed time, in seconds, between when a request has been sent to an instance and the reply received.

HTTP listener The total time elapsed, in seconds, from the time the load balancer sent the request to a registered instance until the instance started to send the response headers
TCP listener The total time elapsed, in seconds, for the load balancer to establish a connection to a registered instance successfully.

HTTPCode_Backend_2XX This metric is the number of HTTP 2XX response codes generated by registered instances. 2XX status codes report success. The action was successfully received, understood, and accepted. This count does not include any response codes generated by the load balancer.

HTTPCode_Backend_3XX This metric is the number of HTTP 3XX response codes generated by registered instances. 3XX status codes report redirection. Further action must be taken in order to complete the request.

HTTPCode_Backend_4XX This metric is the number of HTTP 4XX response codes generated by registered instances. 4XX status codes report client errors. The request contains bad syntax or cannot be fulfilled.

HTTPCode_Backend_5XX This metric is the number of HTTP 5XX response codes generated by registered instances. 5XX status codes report server errors. The server failed to fulfill an apparently valid request.

HTTPCode_ELB_4XX This metric is the number of HTTP 4XX client error codes generated by the load balancer. Client errors are generated when a request is malformed or incomplete. This error is generated by the ELB.

HTTPCode_ELB_5XX This metric the number of HTTP 5XX server error codes generated by the load balancer. This count does not include any response codes generated by the registered instances.

The metric is reported if there are no healthy instances registered to the load balancer, or if the request rate exceeds the capacity of the instances (spillover) or the load balancer. This error is generated by the ELB.

SpilloverCount This metric is the total number of requests that were rejected because the surge queue is full.

HTTP listener The load balancer returns an HTTP 503 error code.
TCP listener The load balancer closes the connection.

SurgeQueueLength This metric is the total number of requests that are pending routing. The load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request.

The maximum size of the queue is 1,024. Additional requests are rejected when the queue is full.

Amazon CloudWatch Events

Amazon CloudWatch Events delivers a near real-time stream of system events that describes changes in AWS resources. Using relatively simple rules, it is possible to route events to one or more targets for processing.

Think of Amazon CloudWatch Events as the central nervous system for an AWS environment. It is connected to supported AWS Cloud services, and it becomes aware of operational changes as they happen. Then, driven by rules, it sends messages and activates functions in response to the changes.

Events

An event is a change in an AWS environment, and it can be generated in four different ways:

They arise from within AWS when resources change state, like when an Amazon EC2 instance state changes from pending to running.
API calls and console sign-ins can generate events and deliver them to Amazon CloudWatch Events via AWS CloudTrail.
Code can be run to generate application-level events and publish them to Amazon CloudWatch Events for processing.
They can be issued on a scheduled basis, with options for periodic or Cron-style scheduling.

Here is an Amazon EC2 instance state-change notification event with an instance in the pending state.

{
   "id":"7bf73129-1428-4cd3-a780-95db273d1602",
   "detail-type":"EC2 Instance State-change Notification",
   "source":"aws.ec2",
   "account":"8675309719",
   "time":"2015-11-11T21:29:54Z",
   "region":"us-east-1",
   "resources":[
      "arn:aws:ec2:us-east-1: 8675309719:instance/i-dec1221"
   ],
   "detail":{
      "instance-id":"i-dec1221",
      "state":"pending"
   }
}

Remember, an event indicates there has been a change in an AWS environment. AWS resources can generate events when their state changes. AWS CloudTrail publishes events from API calls. Custom application-level events can be created and published to Amazon CloudWatch Events. Scheduled events are generated on a periodic basis.

Amazon CloudWatch Events can be used to schedule actions that trigger at certain times using cron or rate expressions. All scheduled events use the Universal Time (UTC) time zone and a minimum precision of one minute.

Rules

A rule matches incoming events and routes them to targets for processing. A single rule can route to multiple targets and are processed in parallel. This enables different parts of an organization to look for and process the events that are of interest to them.

A rule can customize the JSON sent to the target by passing only certain parts or by overwriting it with a constant.

Targets

A target processes data in JSON format that has been sent to it from Amazon CloudWatch Events. Amazon CloudWatch Events delivers a near real-time stream of system events to one or more target functions or streams for analysis.

These targets include the following:

Amazon EC2 instances
AWS Lambda functions
Amazon Kinesis Streams
Amazon ECS tasks
Amazon Step Functions state machines
Amazon SNS topics
Amazon SQS queues
Built-in targets

Metrics and Dimensions

Amazon CloudWatch Events sends metrics to Amazon CloudWatch every minute.

The AWS/Events namespace includes the metrics shown in Table 9.5.

TABLE 9.5 Amazon CloudWatch Events Metrics

Metric	Description
Invocations	Measures the number of times a target is invoked for a rule in response to an event. This includes successful and failed invocations, but it does not include throttled or retried attempts until they fail permanently. Amazon CloudWatch Events only sends this metric to Amazon CloudWatch if it has a non-zero value. Valid Dimensions: RuleName Units: Count
FailedInvocations	Measures the number of invocations that failed permanently. This does not include invocations that are retried or that succeeded after a retry attempt. Valid Dimensions: RuleName Units: Count
TriggeredRules	Measures the number of triggered rules that matched with any event. Valid Dimensions: RuleName Units: Count
MatchedEvents	Measures the number of events that matched with any rule. Valid Dimensions: None Units: Count
ThrottledRules	Measures the number of triggered rules that are being throttled. Valid Dimensions: RuleName Units: Count

Amazon CloudWatch Events metrics use a single dimension: RuleName. As the name implies, it filters available metrics by rule name.

Amazon CloudWatch Logs

Amazon CloudWatch Logs can be used to monitor, store, and access log files from Amazon EC2 instances, AWS CloudTrail, and servers running in an on-premises datacenter. It is then possible to retrieve and report on the associated log data from Amazon CloudWatch Logs.

Amazon CloudWatch Logs can monitor and store application logs, system logs, web server logs, and other custom logs. By setting alarms on these metrics, notifications can be generated for application or web server issues and can take the necessary actions.

Amazon CloudWatch Logs is made up of several components:

Log agents
Log events
Log streams
Log groups
Metric filters
Retention policies

Log Agents A log agent directs logs to Amazon CloudWatch. AWS does not have effective visibility above the Hypervisor as part of the shared responsibility model. Because of this, agents have to send data into Amazon CloudWatch.

Log Events A log event is an activity reported to the log file by the operating system or application along with a timestamp. Log events support only text format.

Log events contain two properties: the timestamp of when the event occurred and the raw log message.

By default, any line that begins with a non-whitespace character closes the previous log message and starts a new log message.

Log Streams A log stream is a group of log events reported by a single source, such as a web server.

Log Groups A log group is a group of log streams from multiple resources, such as a group of web servers managing the same content.

Retention policies and metric filters are set on log groups—not log streams.

Metric Filters Metric filters tell Amazon CloudWatch how to extract metric observations from ingested log events and turn them into Amazon CloudWatch metrics.

For example, with a metric filter called 404_Error, it will filter log events to find 404 access errors. An alarm can be created to monitor those 404 errors on different servers.

Retention Policies Retention policies determine how long events are retained inside Amazon CloudWatch Logs. Policies are assigned to log groups and applied to all of the log streams in the group.

Retention time can be set from 1 day to 10 years. You can also opt for logs to never expire.

Archived Data

All log events uploaded to Amazon CloudWatch are retained. It is possible to specify the retention duration. Data is compressed, put into an archive, and stored. Charges are incurred for storage of the archived data.

Log Monitoring

Use Amazon CloudWatch Logs to monitor applications and systems using log data. Amazon CloudWatch Logs can track the number of errors that occur in application logs and send a notification whenever the rate of errors exceeds a threshold.

Because Amazon CloudWatch Logs uses log data for monitoring, no code changes are required. The current time is used for each log event if the datetime_format isn’t provided. If the provided datetime_format is invalid for a given log message, the timestamp from the last log event with a successfully parsed timestamp is used. If no previous log events exist, the current time is used. A warning message is logged when a log event falls back to the current time or the time of a previous log event.

Agents

An agent is required to publish log data to Amazon CloudWatch Logs because AWS has no visibility above the Hypervisor. There are agents available for Linux and Windows.

Agents have the following components:

A plugin to the AWS CLI that pushes log data to Amazon CloudWatch Logs
A script that runs the Amazon CloudWatch Logs aws logs push command to send data to Amazon CloudWatch Logs
A cron job that ensures that the daemon is always running

Amazon CloudWatch Logs Agent for Linux

The Amazon CloudWatch Logs agent requires Python version 2.6, 2.7, 3.0, or 3.3 and any of the following versions of Linux:

Amazon Linux version 2014.03.02 or later
Ubuntu Server version 12.04, 14.04, or 16.04
CentOS version 6, 6.3, 6.4, 6.5, or 7.0
Red Hat Enterprise Linux (RHEL) version 6.5 or 7.0
Debian 8.0

Amazon CloudWatch Logs: Agents and IAM

The Amazon CloudWatch Logs agent requires the CreateLogGroup, CreateLogStream, DescribeLogStreams, and PutLogEvents operations.

Here is a sample IAM policy for using an agent:

{
"Version": "2012-10-17",
"Statement": [
  {
    "Effect": "Allow",
    "Action": [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": [
      "arn:aws:logs:*:*:*"
    ]
  }
 ]
}

Amazon CloudWatch Logs Agent for Windows

Starting with EC2Config version 2.2.5, it is possible to export all Windows Server log messages from the system log, security log, application log, and Internet Information Services (IIS) log and send them to Amazon CloudWatch Logs. EC2Config version 2.2.10 or later adds the ability to export any event log data, event tracing for Windows data, or text-based log files to Amazon CloudWatch Logs. Windows performance counter data can also be exported to Amazon CloudWatch.

TABLE 9.6 Microsoft Windows Agents

Operating System	Agent	Notes
Windows Server 2016	SSM Agent	The EC2Config service is not supported on Windows Server 2016.
Windows Server 2008-2012 R2	EC2Config or SSM Agent	If an instance is running EC2Config version 3.x or earlier, then the EC2Config service sends log data to Amazon CloudWatch. If an instance is running EC2Config version 4.x or later, then SSM Agent sends log data to Amazon CloudWatch.

It is also possible to install Amazon CloudWatch Logs agents and create log streams using AWS OpsWorks and Chef. Chef is a third-party systems and cloud infrastructure automation tool. Chef uses “recipes” to install and configure software and “cookbooks,” which are collections of recipes, to perform configuration and policy distribution tasks.

Searching and Filtering Log Data

After an agent begins publishing logs to Amazon CloudWatch, it is possible both to search for and filter log data by creating one or more filters. Metric filters define the terms and patterns to look for in log data as it is sent to Amazon CloudWatch Logs.

Amazon CloudWatch Logs uses these metric filters to turn log data into Amazon CloudWatch metrics that can be graphed or used to set an alarm condition.

Filters do not retroactively filter data. Filters only publish the metric data points for events that happen after the filter was created.

Metric filters consist of the following key elements:

Filter Pattern A symbolic description of how Amazon CloudWatch Logs should interpret the data in each log event. For example, a log entry could contain timestamps, IP addresses, and strings. Use the pattern to specify what to look for in the log file.

Metric Name The name of the Amazon CloudWatch metric to which the monitored log information should be published

Metric Namespace The destination namespace of the new Amazon CloudWatch metric

Metric Value The data published to the metric. For example, when counting the occurrences of a particular term like “Error,” the value will be “1” for each occurrence. When counting a value like bytes transferred, the published value will be the value in the log event.

Amazon CloudWatch Logs supports the rotation of logs.

The following file rotation mechanisms are supported:

Rename existing log files with a numerical suffix, then re-create the original empty log file. For example, /var/log/syslog.log is renamed /var/log/syslog.log.1. If /var/log/syslog.log.1 already exists from a previous rotation, it is renamed /var/log/syslog.log.2.
Truncate the original log file in place after creating a copy. For example, /var/log/syslog.log is copied to /var/log/syslog.log.1 and /var/log/syslog.log is truncated. There might be data loss for this case, so be careful about using this file rotation mechanism.
Create a new file with a common pattern like the old one. For example, /var/log/syslog.log.2017-01-01 remains and /var/log/syslog .log.2017-01-02 is created.

The fingerprint (source ID) of the file is calculated by hashing the log stream key and the first line of file content. To override this behavior, use the file_fingerprint_lines option. When file rotation happens, the new file is supposed to have new content, and the old file is not supposed to have content appended; the agent pushes the new file after it finishes reading the old file.

Amazon CloudWatch Logs Metrics and Dimensions

Amazon CloudWatch Logs sends data to Amazon CloudWatch every minute.

Metrics

The AWS/Logs namespace includes the metrics shown in the Table 9.7.

TABLE 9.7 AWS/Logs Namespace Metrics

Metric	Description
IncomingBytes	The volume of log events in uncompressed bytes uploaded to Amazon CloudWatch Logs. When used with the LogGroupName dimension, this is the volume of log events in uncompressed bytes uploaded to the log group. Valid Dimensions: LogGroupName Valid Statistic: Sum Units: Bytes
IncomingLogEvents	The number of log events uploaded to Amazon CloudWatch Logs. When used with the LogGroupName dimension, this is the number of log events uploaded to the log group. Valid Dimensions: LogGroupName Valid Statistic: Sum Units: None
ForwardedBytes	The volume of log events in compressed bytes forwarded to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: Bytes
ForwardedLogEvents	The number of log events forwarded to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None
DeliveryErrors	The number of log events for which Amazon CloudWatch Logs received an error when forwarding data to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None
DeliveryThrottling	The number of log events for which Amazon CloudWatch Logs was throttled when forwarding data to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None

Dimensions

Amazon CloudWatch Logs supports the filtering of metrics using the dimensions shown in Table 9.8.

TABLE 9.8 Amazon CloudWatch Logs Dimensions

Dimension	Description
LogGroupName	The name of the Amazon CloudWatch Logs log group from which to display metrics
DestinationType	The subscription destination for the Amazon CloudWatch Logs data, which can be AWS Lambda, Amazon Kinesis Streams, or Amazon Kinesis Firehose
FilterName	The name of the subscription filter that is forwarding data from the log group to the destination. The subscription filter name is automatically converted by Amazon CloudWatch to ASCII and any unsupported characters get replaced with a question mark (?).

Monitoring AWS Charges

Customers can monitor AWS costs using Amazon CloudWatch. With Amazon CloudWatch, it is possible to create billing alerts that send notifications when the usage of provisioned services exceeds a customer-defined limit.

When usage exceeds these thresholds, AWS can send an email notification or publish a notification to an Amazon SNS topic. To create billing alerts and register for notifications, enable them in the AWS Billing and Cost Management console.

Steps to Configure Price Change Notifications

Sign in to the AWS Management Console and open the Amazon SNS console: https://console.aws.amazon.com/sns/v2/home.
If necessary, change the region on the navigation bar to US East (N. Virginia). All AWS billing metric data is stored in this region, even for resources in other regions.
On the navigation pane, choose Subscriptions.
Choose Create Subscription.

For Topic ARN:

Notifications sent every time a price changes:

arn:aws:sns:us-east-1:278350005181:price-list-api

Notifications about price changes once a day:

arn:aws:sns:us-east-1:278350005181:daily-aggregated-price-list-api

For Protocol, use the default HTTP setting.
For Endpoint, choose email.
Choose Create Subscription.

Detailed Billing

In December 2016, Amazon announced the addition of detailed billing to Amazon CloudWatch Logs. Reports can be generated based on usage and cost per log group.

Tags and Log Groups

You can use tags to classify log groups and give them categories such as purpose, owner, or environment. You can create a custom set of categories to meet specific needs, and you can also use tags to categorize and track AWS costs. When you apply tags to your AWS resources, including log groups, AWS cost allocation reports include usage and costs aggregated by tags.

Tags can be added to log groups to get a detailed view of costs across business dimensions.
Up to 50 tags can be added to each log group.
Tags are added to log groups using the AWS CLI or Amazon CloudWatch Logs API.

Log Group Tag Restrictions

The following restrictions apply to tags.

Basic Restrictions

The maximum number of tags per log group is 50.
The keys and values of a tag are case sensitive.
Tags in a deleted log group cannot be changed or edited.

Tag Key Restrictions

Each tag key must be unique. If a tag is added with a key that’s already in use, the new tag overwrites the existing key/value pair.
Tag keys cannot start with aws: because this prefix is reserved for use by AWS. AWS creates tags that begin with this prefix on customers’ behalfs, but customers cannot edit or delete them.
Tag keys must be between 1 and 128 Unicode characters in length.
Tag keys must consist of the following characters:
- Unicode letters and digits
- Whitespace
- Underscore (_)
- Period (.)
- Forward slash (/)
- Equals sign (=)
- Plus sign (+)
- Hyphen (-)
- At symbol (@)

Tag Value Restrictions

Tag values must be between 0 and 255 Unicode characters in length.

At the end of the billing cycle, the total charges (tagged and untagged) on the billing report with cost allocation tags reconciles with the total charges on the Bills page total and other billing reports for the same period.

Tags can also be used to filter views in Cost Explorer.

Cost Explorer

Cost Explorer is an AWS tool that can be used to view charts of costs. This spend data can be viewed for up to the past 13 months and used to forecast the spend data for the next 3 months. It can also be used to see patterns in how much is spent on AWS resources over time, identify areas that need further inquiry, and see trends to assist in understanding costs.

Cost Explorer can reveal which service is being used the most and which Availability Zone gets the most network traffic.

With Cost Explorer, there are a variety of filters:

API operation
Availability Zone
AWS Cloud service
Custom cost allocation tags
Amazon EC2 instance type
Linked account(s)
Platform
Purchase option
Region
Tenancy
Usage type
Usage type group

Cost Explorer uses the same dataset used to generate the AWS Cost and Usage Reports and the detailed billing reports. The dataset can also be downloaded as a comma-separated value (CSV) file for detailed analysis.

AWS Billing and Cost Management Metrics and Dimensions

The AWS Billing and Cost Management service sends metrics to Amazon CloudWatch.

Metrics

The AWS/Billing namespace uses the metric in Table 9.9.

TABLE 9.9 AWS/Billing Namespace Metric

Metric	Description
EstimatedCharges	The estimated charges for AWS usage. This can be either estimated charges for one service or a roll-up of estimated charges for all services.

Dimensions

AWS Billing and Cost Management supports filtering metrics using the dimensions in Table 9.10.

TABLE 9.10 AWS/ Billing and Cost Management Metrics

Dimension	Description
ServiceName	The name of the AWS Cloud service This dimension is omitted for the total of estimated charges across all services.
LinkedAccount	The linked account number This is used for Consolidated Billing only. This dimension is included only for accounts that are linked to a separate paying account in a Consolidated Billing relationship. It is not included for accounts that are not linked to a Consolidated Billing paying account.
Currency	The monetary currency to bill the account. This dimension is required. Unit: USD

AWS CloudTrail

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of an AWS account. With AWS CloudTrail, it is possible to log, continuously monitor, and retain events related to API calls across an AWS infrastructure.

AWS CloudTrail provides a history of AWS API calls for an account. This includes API calls made through the AWS Management Console, AWS SDKs, command-line tools, and other AWS Cloud services. This history simplifies security analysis, resource change tracking, and troubleshooting.

AWS CloudTrail provides visibility into user activity by recording API calls made on an account. It records important information about each API call, including the name of the API, the identity of the caller, the time of the API call, the request parameters, and the response elements returned by the AWS Cloud service. This information can be used in the tracking of changes made to AWS resources and help troubleshoot operational issues. This makes it easier to ensure compliance with internal policies and regulatory standards.

What Are Trails?

A trail is a configuration that enables logging of the AWS API activity and related events in an account. AWS CloudTrail delivers the logs to an Amazon S3 bucket and, optionally, to an Amazon CloudWatch Logs log group.

It is possible to specify an Amazon SNS topic that receives notifications of log file deliveries. For a trail that applies to all regions, the trail configuration in each region is identical.

Types of Trails

You can create trails with the AWS CloudTrail console, the AWS CLI, or the AWS CloudTrail API. There are two types of trails: those that apply to all regions and those that apply to one region.

Trails that Apply to All Regions

When creating a trail that applies to all regions, AWS CloudTrail creates the same trail in each region. It then records the log files in each region and delivers the log files to an Amazon S3 bucket that is user-specified. This is the default option when you create a trail in the AWS CloudTrail console.

A trail that applies to all regions has the following advantages:

The configuration settings for the trail apply consistently across all regions.
Log files from all regions are sent to a single Amazon S3 bucket and, optionally, to an Amazon CloudWatch Logs log group.
Trail configurations for all regions are managed from one location.
Events are immediately received from new regions. When a new region launches, AWS CloudTrail automatically creates a trail in the new region with the same settings as your original trail.
Trails can be created in regions that are not used often to monitor for unusual activity.

When applying a trail to all regions, AWS CloudTrail uses the trail created in a particular region to create trails with identical configurations in all other regions in an account. This has the following effects:

If an Amazon SNS topic has been configured for the trail, Amazon SNS notifications about log file deliveries in all regions are sent to that single Amazon SNS topic.
Global service events will be delivered from a single region to the specified Amazon S3 bucket and to, if one has been configured, the Amazon CloudWatch Logs log group.
If log file integrity validation has been activated, log file integrity validation is enabled in all regions for the trail.

A Trail that Applies to One Region

When creating a trail that applies to one region, AWS CloudTrail records the log files in that region only and delivers log files to a user-specified Amazon S3 bucket. When creating additional individual trails that apply to specific regions, those trails can be set to deliver log files to a single Amazon S3 bucket regardless of region.

Multiple Trails per Region

If there are different but related user groups such as developers, security personnel, and IT auditors that need access to AWS CloudTrail, create multiple trails per region. This allows each group to receive its own copy of the log files.

AWS CloudTrail supports five trails per region. A trail that applies to all regions counts as one trail in every region. To see a list of the trails in all regions, open the Trails page of the AWS CloudTrail console.

Encryption

By default, log files are encrypted using Amazon S3 Server-Side Encryption (SSE). Log files can be stored in an Amazon S3 bucket for as long as they are needed. Amazon S3 lifecycle rules can be defined to archive or delete log files automatically.

AWS CloudTrail Log Delivery

AWS CloudTrail typically delivers log files within 15 minutes of an API call. In addition, AWS CloudTrail publishes log files multiple times an hour—approximately every five minutes. These log files contain API calls from services that support AWS CloudTrail.

Overview: Creating a Trail

When creating or updating a trail with the AWS CloudTrail console or using the AWS CLI, the same steps need to be followed.

Turn on AWS CloudTrail by creating a trail. By default, when you create a trail in a region in the AWS CloudTrail console, the trail applies to all regions.
Create an Amazon S3 bucket or specify an existing bucket where the log files are to be delivered. By default, log files from all regions in an account are delivered to the specified bucket.
Configure the trail to log the types of events desired. The choices are read-only, write-only, or all management and data events. By default, trails log all management events.
Create an Amazon SNS topic to receive notifications when log files are delivered. Delivery notifications from all regions are sent to the Amazon SNS topic specified.
Configure Amazon CloudWatch Logs to receive logs from AWS CloudTrail so that they can be monitored for specific log events.
Turn on log file encryption. This encrypts files for added security.
Turn on integrity validation for log files. This enables the delivery of digest files that you can use to validate the integrity of log files after AWS CloudTrail has delivered them.
Add tags to the trail.

Monitoring with AWS CloudTrail

Amazon CloudWatch is a web service that collects and tracks metrics to monitor AWS resources and the applications that run on it. Amazon CloudWatch Logs is a feature of Amazon CloudWatch used specifically to monitor log data. Integration with Amazon CloudWatch Logs enables AWS CloudTrail to send events containing API activity in an AWS account to an Amazon CloudWatch Logs log group.

AWS CloudTrail events that are sent to Amazon CloudWatch Logs can trigger alarms according to the metric filters defined by customers. Optionally, you can configure Amazon CloudWatch Alarms to send notifications or make changes to the resources being monitored based on log stream events that metric filters extract.

Using Amazon CloudWatch Logs, you can track AWS CloudTrail events alongside events from the operating system, applications, or other AWS Cloud services that are sent to Amazon CloudWatch Logs.

AWS CloudTrail vs. Amazon CloudWatch

AWS CloudTrail adds depth to the monitoring capabilities already offered by AWS. Amazon CloudWatch focuses on performance monitoring and system health, and AWS CloudTrail focuses on API activity. While AWS CloudTrail does not report on system performance or health, you can use AWS CloudTrail in combination with Amazon CloudWatch Logs alarms to create notifications to gain a deeper understanding of AWS resources and their utilization.

AWS CloudTrail: Trail Naming Requirements

AWS CloudTrail trail names must meet the following requirements:

Contain only ASCII letters (a-z, A-Z)
Numbers (0-9)
Periods (.)
Underscores (_)
Dashes (-)
They must start with a letter or number and end with a letter or number.
Be between 3 and 128 characters long.
Have no adjacent periods, underscores, dashes, or combinations of these characters. Names like my-_namespace and my--namespace are invalid.
Not be in IP address format. For example, 10.9.28.68 is invalid.

Getting and Viewing AWS CloudTrail Log Files

AWS CloudTrail delivers log files to an Amazon S3 bucket specified during the creation of the trail. Typically, log files appear in the bucket within 15 minutes of the recorded AWS API call or other AWS event. Log files are generally published every five minutes.

Finding AWS CloudTrail Log Files

AWS CloudTrail publishes log files to the Amazon S3 bucket in a gzip archive. In the Amazon S3 bucket, the log file has a formatted name that includes the following elements:

The bucket name specified when you created the trail
The (optional) prefix that you specified when the trail was created
The string "AWSLogs"
The account number
The string "CloudTrail"
A region identifier
The year the log file was published in YYYY format
The month the log file was published in MM format
The day the log file was published in DD format
An alphanumeric string that separates the file from others that cover the same time period

This is what a complete log file object name looks like:

Retrieve Log Files

To retrieve a log file, use the Amazon S3 console, the AWS CLI, or the Amazon S3 API.

To find your log files with the Amazon S3 console, do the following:

Open the Amazon S3 console.
Choose the bucket specified for the trails.
Navigate through the object hierarchy to find the correct log.

All log files have a .gz extension.

Configuring Amazon SNS Notifications for AWS CloudTrail

It is possible to be notified when AWS CloudTrail publishes new log files to an Amazon S3 bucket. You manage notifications using Amazon SNS.

Notifications are optional. To activate them, configure AWS CloudTrail to send update information to an Amazon SNS topic whenever a new log file has been sent. To receive these notifications, subscribe to the topic. To handle notifications programmatically, subscribe an Amazon SQS queue to the topic.

Controlling User Permissions for AWS CloudTrail

AWS CloudTrail integrates with IAM, which controls access to AWS CloudTrail and other AWS resources that AWS CloudTrail requires, including Amazon S3 buckets and Amazon SNS topics. Use IAM to control which AWS users can create, configure, or delete AWS CloudTrail trails, start and stop logging, and access the buckets that contain log information.

Granting Permissions for AWS CloudTrail Administration

To administer an AWS CloudTrail trail, grant explicit permissions to IAM users to perform the actions associated with the AWS CloudTrail tasks. For most scenarios, you can accomplish this by using an AWS managed policy that contains predefined permissions.

A typical approach is to create an IAM group that has the appropriate permissions and then add individual IAM users to that group. For example, you can create one IAM group for users that should have full access to AWS CloudTrail actions and a separate group for users who should be able to view trail information but not create or change trails.

These are the AWS Managed Policies for AWS CloudTrail:

AWSCloudTrailFullAccess This policy gives users in the group full access to AWS CloudTrail actions and permissions to manage the Amazon S3 bucket, the log group for Amazon CloudWatch Logs, and an Amazon SNS topic for a trail.

AWSCloudTrailReadOnlyAccess This policy lets users in the group view trails and buckets.

Log Management and Data Events

When creating a trail, the trail logs read-only and write-only management events for your account. If desired, update the trail to specify whether or not the trail should log data events. Data events are object-level API operations that access Amazon S3 object resources, such as GetObject, DeleteObject, and PutObject. Only events that match the trail settings are delivered to the Amazon S3 bucket and Amazon CloudWatch Logs log group. If the event doesn’t match the settings for a trail, the trail doesn’t log the event.

Amazon SNS Topic Policy for AWS CloudTrail

To send notifications to an Amazon SNS topic, AWS CloudTrail must have the required permissions. AWS CloudTrail automatically attaches the required permissions to the topic when the following occurs:

Create an Amazon SNS topic as part of creating or updating a trail in the AWS CloudTrail console.
Create an Amazon SNS topic with the AWS CLI create-subscription and update-subscription commands.

AWS CloudTrail adds the following fields in the policy automatically:

The allowed SIDs
The service principal name for AWS CloudTrail
The Amazon SNS topic, including region, account ID, and topic name

The following policy allows AWS CloudTrail to send notifications about log file delivery from supported regions:

Amazon SNS Topic Policy

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AWSCloudTrailSNSPolicy20131101",
        "Effect": "Allow",
        "Principal": {"Service": "cloudtrail.amazonaws.com"},
        "Action": "SNS:Publish",
        "Resource": "arn:aws:sns:Region:SNSTopicOwnerAccountId:SNSTopicName"
    }]
}

AWS Config

AWS Config is a fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security and governance. AWS Config can discover existing AWS resources, export a complete inventory of AWS resources with all configuration details, and determine how a resource was configured at any point in time. These capabilities enable compliance auditing, security analysis, resource change tracking, and troubleshooting.

AWS Config makes it easy to track resource configuration without the need for upfront investments and to avoid the complexity of installing and updating agents for data collection or maintaining large databases. After AWS Config is enabled, continuously updated details can be viewed of all configuration attributes associated with AWS resources. Amazon SNS can be configured to provide notifications of every configuration change.

AWS Config provides a detailed view of the configuration of AWS resources in an AWS account. This includes how resources are related to one another and how they were configured in the past to show how configurations and relationships change over time.

An AWS resource is an entity in AWS such as an Amazon EC2 instance, an Amazon EBS volume, a security group, or an Amazon Virtual Private Cloud (Amazon VPC).

With AWS Config, you can do the following:

Evaluate AWS resource configurations for desired settings.
Get a snapshot of the current configurations of the supported resources that are associated with an AWS account.
Retrieve configurations of one or more resources that exist in an account.
Retrieve historical configurations of one or more resources.
Receive a notification whenever a resource is created, modified, or deleted.
View relationships between resources, such as those that use a particular security group.

Ways to Use AWS Config

When running applications on AWS, resources must be created and managed collectively. As the demand for an application grows, so too does the need to keep track of the addition of AWS resources. AWS Config is designed to help oversee application resources in the following scenarios.

Resource Administration

To exercise better governance over resource configurations and to detect resource misconfigurations, fine-grained visibility is needed into what resources exist and how these resources are configured at any time. Use AWS Config to automatically send notifications whenever resources are created, modified, or deleted. There is no need to monitor these changes by polling calls made to each individual resource.

Use AWS Config rules to evaluate the configuration settings of AWS resources. When AWS Config detects that a resource violates the conditions in one of the established rules, AWS Config flags the resource as noncompliant and sends a notification. AWS Config continuously evaluates resources as they are created, changed, or deleted.

Auditing and Compliance

Some data requires frequent audits to ensure compliance with internal policies and best practices. To demonstrate compliance, access is needed to the historical configurations of the resources. This information is provided by AWS Config.

Managing and Troubleshooting Configuration Changes

When using multiple AWS resources that depend on one another, a change in the configuration of one resource might have unintended consequences on related resources. With AWS Config, it is possible to view how one resource is related to other resources and assess the impact of the proposed change.

The historical configurations of resources provided by AWS Config can assist troubleshooting issues by providing access to the last known good configuration of a problem resource.

Security Analysis

To analyze potential security weaknesses, detailed historical information about AWS resource configurations is required. This information could include the IAM permissions that are granted to your users or the Amazon EC2 security group rules that control access to your resources.

Use AWS Config to view the IAM policy that was assigned to an IAM user, group, or role at any time in which AWS Config was recording. This information can help determine the permissions that belonged to a user at a specific time.

Use AWS Config to view the configuration of Amazon EC2 security groups and the port rules that were open at a specific time. This information can help determine whether a security group was blocking incoming TCP traffic to a specific port.

AWS Config Rules

An AWS Config rule represents desired configurations for a resource, and it is evaluated against configuration changes on the relevant resources, as recorded by AWS Config. The results of evaluating a rule against the configuration of a resource are available on a dashboard. Using AWS Config rules, customers can assess their overall compliance and risk status from a configuration perspective, view compliance trends over time, and pinpoint which configuration change caused a resource to drift out of compliance with a rule.

A rule represents desired Configuration Item (CI) attribute values for resources, which are evaluated by comparing those attribute values with CIs recorded by AWS Config. There are two types of rules: AWS managed rules and customer managed rules.

AWS Managed Rules

AWS managed rules are prebuilt and managed by AWS. Choose the rule to enable and then supply a few configuration parameters to get started.

Customer Managed Rules

It is possible to develop custom rules and add them to AWS Config. Associate each custom rule with an AWS Lambda function. This Lambda function contains the logic that evaluates whether AWS resources comply with the rule.

Associate this function with a rule, and the rule invokes the function either in response to configuration changes or periodic intervals. The function then evaluates whether resources comply with the rule, and it sends its evaluation results to AWS Config.

Using AWS Config rules, customers can assess their overall compliance and risk status from a configuration perspective, view compliance trends over time, and pinpoint which configuration change caused a resource to drift out of compliance with a rule.

How Rules Are Evaluated

Any rule can be set up as a change-triggered rule or as a periodic rule.

A change-triggered rule is executed when AWS Config records a configuration change for any of the resources specified. Additionally, one of the following must be specified:

Tag Key A tag key:value implies any configuration changes recorded for resources with the specified tag key:value will trigger an evaluation of the rule.

Resource Type(s) Any configuration changes recorded for any resource within the specified resource type(s) will trigger an evaluation of the rule.

Resource ID Any changes recorded to the resource specified by the resource type and resource ID will trigger an evaluation of the rule.

A periodic rule is triggered at a specified frequency. Available frequencies are 1 hour, 3 hours, 6 hours, 12 hours, or 24 hours. A periodic rule has a full snapshot of current CIs for all resources available to the rule.

Configuration Items

A CI is the configuration of a resource at a given point in time. A CI consists of five sections:

Basic information about the resource that is common across different resource types (for example, ARNs, tags)
Configuration data specific to the resource (such as an Amazon EC2 instance type)
Map of relationships with other resources (for example, Amazon EC2::Volume vol-6886ff28 is “attached to instance” Amazon EC2 instance i-24601abc)
AWS CloudTrail event IDs that are related to this state
Metadata that helps identify information about the CI, such as the version of the CI and when the CI was captured

Rule Evaluation

Evaluation of a rule determines whether a rule is compliant with a resource at a particular point in time. It is the result of evaluating a rule against the configuration of a resource. AWS Config rules will capture and store the result of each evaluation. This result will include the resource, rule, time of evaluation, and a link to the CI that caused noncompliance.

Rule Compliance

A resource is compliant if it conforms with all rules that apply to it. Otherwise, it is noncompliant. Similarly, a rule is compliant if all resources evaluated by the rule comply with the rule. Otherwise, it is noncompliant.

In some cases, such as when inadequate permissions are available to the rule, an evaluation may not exist for the resource, leading to a state of insufficient data. This state is excluded from determining the compliance status of a resource or rule.

AWS Config and AWS CloudTrail

AWS CloudTrail records user API activity on an account and allows access to information about this activity. You can use AWS CloudTrail to get full details about API actions, such as identity of the caller, the time of the API call, the request parameters, and the response elements returned by the AWS Cloud service.

AWS Config records point-in-time configuration details for AWS resources as CIs. You can use a CI to answer “What did my AWS resource look like?” at a point in time. You can use AWS CloudTrail to answer “Who made an API call to modify this resource?”

In practice, you can use the AWS Config console to detect that a security group was incorrectly configured in the past. With the integrated AWS CloudTrail information, you can find the user that misconfigured the security group and learn when it happened.

Each custom rule is simply an AWS Lambda function. When the function is invoked in order to evaluate a resource, it is provided with the resource’s CI. The function can inspect the item and make calls to other AWS API functions as desired. After the AWS Lambda function makes its decision about compliance, it calls the PutEvaluations function to record the decision.

Pricing

With AWS Config, customers are charged based on the number of CIs recorded for supported resources in an AWS account and are charged only once for recording the CI. There is no additional fee or any upfront commitment for retaining the CI. Users can stop recording CIs at any time and continue to access the CIs previously recorded. Charges per CI are rolled up into the monthly bill.

If you are using AWS Config rules, charges are based on active AWS Config rules in that month. When a rule is compared with an AWS resource, the result is recorded as an evaluation. A rule is active if it has one or more evaluations in a month.

Configuration snapshots and configuration history files are delivered to an Amazon S3 bucket. Configuration change notifications are delivered via Amazon SNS. Standard rates for Amazon S3 and Amazon SNS apply. Customer-managed rules are authored using AWS Lambda. Standard rates for AWS Lambda apply.

Summary

Amazon CloudWatch monitors AWS resources and the applications run on AWS in real time. Use CloudWatch to collect and track metrics, which are variables used to measure resources and applications.

Amazon CloudWatch Alarms send notifications or automatically make changes to the resources being monitored based on customer-defined rules. This monitoring data can be used to determine whether additional instances should be launched in order to handle the increased load or stop under-utilized instances to save money.

In addition to monitoring the built-in metrics that come with AWS, custom metrics can be imported and monitored. Custom metrics can include detailed information about an Amazon EC2 Instance or data from servers running in an on-premises datacenter.

Amazon CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health.

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of an AWS account. With CloudTrail, customers can log, continuously monitor, and retain events related to API calls across an AWS infrastructure.

AWS CloudTrail provides a history of AWS API calls for an account. This includes API calls made through the AWS Management Console, AWS SDKs, command-line tools, and other AWS services. This history simplifies security analysis, resource change tracking, and troubleshooting.

AWS Config is a service that enables customers to assess, audit, and evaluate the configurations of their AWS resources. Config continuously monitors and records your AWS resource configurations and allows the automation of the evaluation of recorded configurations against desired configurations.

With AWS Config, review changes in configurations and relationships between AWS resources, dive into detailed resource configuration histories, and determine the overall compliance against the configurations specified in internal guidelines. This simplifies compliance auditing, security analysis, change management, and operational troubleshooting.

Resources to Review

The Status Heath Dashboard: http://status.aws.amazon.com/
Amazon CloudWatch: https://aws.amazon.com/cloudwatch/
Amazon CloudWatch Logs: http://docs.aws.amazon.com/AmazonCloudWatch/ latest/logs/WhatIsCloudWatchLogs.html
Amazon CloudWatch Events: http://docs.aws.amazon.com/AmazonCloudWatch/ latest/events/WhatIsCloudWatchEvents.html
AWS CloudTrail: http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html
Understanding Your Usage with Billing Reports: http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-reports.html
Analyzing Your Costs with Cost Explorer: http://docs.aws.amazon.com/ awsaccountbilling/latest/aboutv2/cost-explorer-what-is.html
Using AWS Trusted Advisor as a Web Service: https://docs.aws.amazon.com/ awssupport/latest/user/trustedadvisor.html
AWS Config: https://aws.amazon.com/config/
Amazon CloudWatch Plugin for collectd: https://aws.amazon.com/blogs/aws/ new-cloudwatch-plugin-for-collectd/

The AWS Well-Architected Framework: http://d0.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf
Troubleshoot a Classic Load Balancer—Response Code Metrics: http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-http-errors.html
Security at Scale—Logging in AWS: https://d0.awsstatic.com/whitepapers/aws-security-at-scale-logging-in-aws.pdf

Exam Essentials

Be familiar with Amazon CloudWatch. Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications that you run on AWS. You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, and set alarms.

Amazon CloudWatch can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and services and any log files your applications generate.

You can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. You can use these insights to react and keep your application running smoothly.

Amazon CloudWatch Events has three components: Events, Rules, and Targets. Events indicate a change in an AWS environment. Targets process events. Rules match incoming events and route them to targets for processing.

Understand what Amazon CloudWatch Alarms is and what it can do. Amazon CloudWatch Logs lets you monitor and troubleshoot systems and applications using existing system, application, and custom log files.

With Amazon CloudWatch Logs, monitor logs in near real time for specific phrases, values, or patterns. Log data can be stored and accessed indefinitely in highly durable, low-cost storage without filling hard drives.

Be able to create or edit an Amazon CloudWatch Alarm. You can choose specific metrics to trigger the alarm and specify thresholds for those metrics. You can then set your alarm to change state when a metric exceeds a threshold that you have defined.

Know how to create a monitoring plan. Creating a monitoring plan involves answering some basic questions. What are your goals for monitoring? What resources will you monitor? How often will you monitor these resources? What monitoring tools will you use? Who will perform the monitoring tasks? Who should be notified when something goes wrong?

Know and understand custom metrics. You can now store your business and application metrics in Amazon CloudWatch. You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that Amazon CloudWatch already stores for your AWS resources.

Visibility for metrics above the Hypervisor requires an agent. Amazon CloudWatch can tell you CPU and RAM utilization at the Hypervisor level but it has no way of knowing what specific tasks or processes are affecting performance. CloudWatch can see disk I/O but cannot see disk usage. To do this requires an agent.

Be familiar with what an Amazon CloudWatch Alarm is and how it works. You can create a CloudWatch Alarm that watches a single metric. The alarm performs one or more actions based on the value of the metric relative to a threshold over a number of time periods. The action can be an Amazon EC2 action, an Auto Scaling action, or a notification sent to an Amazon SNS topic.

Know the three states of an Amazon CloudWatch Alarm. These are OK, ALARM, and INSUFFICIENT_DATA. If an alarm is in the OK state, a monitored metric is within the range you have defined as acceptable. If the alarm is in the ALARM state, the metric has breached a threshold. If data is missing or incomplete, it is in the INSUFFICIENT_DATA state.

There are two levels of monitoring: Basic and Detailed. Basic Monitoring for Amazon EC2 sends CPU load, disk I/O, and network I/O metric data to Amazon CloudWatch in five-minute periods by default. To send metric data for an instance to CloudWatch in one-minute periods, enable Detailed Monitoring on the instance. Some services, like Amazon RDS, have Detailed Monitoring on by default.

Amazon CloudWatch performs two types of Amazon EC2 status checks: System and Instance. A System Status Check monitors the AWS systems required to use your instance to ensure that they are working properly. These checks detect problems with your instance that require AWS involvement to repair. An Instance Status Check monitors the software and network configuration of your individual instance. These checks detect problems that require your involvement to repair.

Be familiar with some common metrics used for monitoring. There are many metrics available, and not all of them are tested on the exam. Some are tested, however, and it is a good idea to know the common ones: VolumeQueueLength, DatabaseConnections, DiskQueueDepth, FreeStorageSpace, ReplicaLag, ReadIOPS, WriteIOS, ReadLatency, WriteLatency, SurgeQueueLength, and SpilloverCount.

Know how to set up an Amazon CloudWatch Event subscription for Amazon RDS. Amazon RDS uses the Amazon Simple Notification Service (Amazon SNS) to provide notification when an Amazon RDS event occurs. These notifications can be in any notification form supported by Amazon SNS for an AWS Region, such as an email, a text message, or a call to an HTTP endpoint. Details can be found here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html.

Amazon ElastiCache has two engines available: Redis and Memcached. Memcached is multi-threaded, and Redis is single-threaded.

Know the Amazon EBS Volume Status Checks. Degraded and Severely Degraded performance means that the EBS Volume Status Check is in a Warning state and there is some I/O. Stalled or Not Available means that the EBS Volume is in the Impaired state, and there is no I/O.

Learn the metrics SurgeQueueLength and SpilloverCount. When the SurgeQueueLength is exceeded, spillover occurs. Requests are dropped without notifying end users. Customer experience is negatively impacted.

Test Taking Tip

The more a learner focuses on the meaning of information being presented, the more elaborately he or she will process the information. This principle is so obvious that it is easy to miss. What it means is this: When you are trying to drive a piece of information into your brain’s memory systems, make sure you understand exactly what that information means.

Medina, John. Brain Rules (Updated and Expanded): 12 Principles for Surviving and Thriving at Work, Home, and School (p. 139). Pear Press. Kindle Edition.

As you prepare for your exam, make sure that you are worried less about facts and figures and more about how to accomplish tasks associated with systems operations. Worry less about the tool and more about what the tool does and how it integrates with other tools.

Yes, facts and figures are important. You will need to know things like what an Amazon SQS queue is and how it differs from an Amazon SNS topic. Instead of trying to memorize all of the things an Amazon SQS queue can do, study it in terms of how it solves a problem that you have.

If you understand how something like how Amazon SQS can improve your environment by decoupling computing resources, the details will fall into place on their own. It will no longer be a fact that you’ve memorized, but rather something that makes your life easier.

Exercises

By now you should have set up an account in AWS. If you haven’t, now would be the time to do so. It is important to note that these exercises are in your AWS account and thus are not free.

Use the Free Tier when launching resources. For more information, see https://aws.amazon.com/s/dm/optimization/server-side-test/free-tier/free_np/.

If you have not yet installed the AWS Command Line utilities, refer to Chapter 2, “Working with AWS Cloud Services,” Exercise 2.1 (Linux) or Exercise 2.2 (Windows).

The reference for the AWS CLI can be found at http://docs.aws.amazon.com/cli/latest/reference/.

EXERCISE 9.1

Search for Available Metrics.

It is possible to search within all of the metrics in an account using targeted search terms. Metrics are returned that have matching results within their namespace, metric name, or dimensions.

This exercise assumes the existence of an Amazon EC2 instance backed with an Amazon EBS volume.

Open the Amazon CloudWatch console: https://console.aws.amazon.com/ cloudwatch/.
In the navigation pane, choose Metrics.
In the search field on the All Metrics tab, type volume, and press Enter. Refer to the following graphic. This shows all of the namespaces with metrics using this search term.

Select a namespace with the results of the search to view the metrics. Perform the following:

To graph one or more metrics, select the checkbox next to each metric. (To select all metrics, select the checkbox in the heading row of the table.)
To view one of the resources in its console, choose the resource ID, and then select Jump to Resource. The following graphic illustrates this step.
To view help for a metric, choose the metric name, and then select What Is This?

EXERCISE 9.2

View Available Metrics for Running Amazon EC2 Instances by Namespace and Dimension Using the Amazon CloudWatch Console.

This exercise requires at least one running Amazon EC2 instance.

Open the Amazon CloudWatch console: https://console.aws.amazon.com/ cloudwatch/.
Ensure that the region you are in has at least one running Amazon EC2 instance. The following graphic shows the Amazon CloudWatch navigation pane.
In the navigation pane, choose Metrics. The following graphic shows an example. Available metrics will vary.
Select Amazon EC2. This will open a window of available dimensions. See the following graphic for an example. Available dimensions will vary.
Select Per-Instance Metrics.

The All Metrics tab displays all metrics for that dimension in the namespace. In this window, the following can be performed:
To sort the table, use the column heading.
To graph a metric, select the checkbox next to the metric. (To select all metrics, select the checkbox in the heading row of the table.)
To filter by resource, choose the resource ID, and click Add to Search.
To filter by metric, choose the metric name, and click Add to Search.

The following graphic shows an example of this window. Available metrics will vary.

EXERCISE 9.3

View Available Metrics by Namespace, Dimension, or Metric Using the AWS CLI.

To view all of the metrics for Amazon EC2 in the AWS/EC2 namespace, use the following command from the AWS CLI:

aws cloudwatch list-metrics --namespace AWS/EC2

The JSON output will look something like this:

{
    "Metrics": [
        {
            "Namespace": "AWS/EC2",
            "Dimensions": [
                {
                    "Name": "InstanceId",
                    "Value": "i-0ab76393e31b62ec2"
                }
            ],
            "MetricName": "DiskWriteBytes"
        },
        {
            "Namespace": "AWS/EC2",
            "Dimensions": [
                {
                    "Name": "InstanceId",
                    "Value": "i-0ab76393e31b62ec2"
                }
            ],
            "MetricName": "DiskReadOps"
        },
        ...
    ]
}

The following graphic shows the output in table format instead of JSON. Here is the command:

aws cloudwatch list-metrics --namespace AWS/EC2 --output table

EXERCISE 9.4

List All Available Metrics for a Specific Resource.

Specify the AWS/EC2 namespace and the InstanceId dimension to view the results for a single instance. In this example, the instance ID is i-0ab76393e31b62ec2.

aws cloudwatch list-metrics --namespace AWS/EC2 --dimensions Name=InstanceId,Value= i-0ab76393e31b62ec2

EXERCISE 9.5

List all Resources that Use a Single Metric.

Specify the AWS/EC2 namespace and the metric CPUUtilization to view the results for the single metric.

aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization

EXERCISE 9.6

Get Statistics for a Specific Resource.

The following exercise displays the maximum CPU utilization of a specific Amazon EC2 instance using the Amazon CloudWatch console.

Requirements

You must have an ID of an Amazon EC2 instance. Retrieve the instance ID using the AWS Management Console or the describe-instances command.

By default, basic monitoring is enabled and collects metrics at five-minute intervals. Detailed monitoring can be enabled to capture data points at one-minute intervals. This will incur additional charges.

Open the Amazon CloudWatch console:

https://console.aws.amazon.com/cloudwatch/
In the navigation pane, choose Metrics.
Select the Amazon EC2 metric namespace. The following graphic shows an example of the Metrics window. Available metrics will vary.
After choosing the Amazon EC2 metric namespace, a new window will open. See the following graphic for an example. Available dimensions will vary. Select the Per-Instance Metrics dimension.
In the search field, type CPUUtilization and press Enter.
Select the row for the specific instance that displays a graph for the CPUUtilization metric for the instance. The following graphic shows a graph for an instance named my-instance. Available instance names and IDs will vary.
Change the statistic by choosing the Graphed Metrics tab.
Choose either the column heading, Statistic, or an individual value in the Statistic column and then select Maximum. The following graphic shows an example of this menu.

Data points are not returned in chronological order.

In Linux, the backslashes () allow commands to span more than one line and are presented here for readability.

aws cloudwatch get-metric-statistics 
--namespace AWS/EC2 
--metric-name CPUUtilization 
--dimensions Name=InstanceId,Value=i-0e022540492655edd 
--statistics Maximum 
--start-time 2017-05-01T09:28:00 
--end-time 2017-05-02T09:28:00 
--period 360
--output json

The output in JSON looks like this:

{
    "Datapoints": [
        {
            "Timestamp": "2017-05-01T13:58:00Z",
            "Maximum": 0.67,
            "Unit": "Percent"
        },
        {
            "Timestamp": "2017-05-02T01:46:00Z",
            "Maximum": 0.83,
            "Unit": "Percent"
        },
        {
            "Timestamp": "2017-05-02T05:52:00Z",
            "Maximum": 0.67,
            "Unit": "Percent"
        },
        {
            "Timestamp": "2017-05-01T16:22:00Z",
            "Maximum": 0.49,
            "Unit": "Percent"
        },
        ...
        {
            "Timestamp": "2017-05-02T06:52:00Z",
            "Maximum": 0.67,
            "Unit": "Percent"
        }
    ],
    "Label": "CPUUtilization"
}

The following graphic shows the output in table format instead of JSON.

aws cloudwatch get-metric-statistics 
--namespace AWS/EC2 
--metric-name CPUUtilization 
--dimensions Name=InstanceId,Value=i-0e022540492655edd 
--statistics Maximum 
--start-time 2017-05-01T09:28:00 
--end-time 2017-05-02T09:28:00 
--period 360
--output table

EXERCISE 9.9

Create a Billing Alarm.

To create a billing alarm, billing alerts must be turned on first. These steps send alerts to a single email address.

Open the Amazon CloudWatch console: https://console.aws.amazon.com/cloudwatch/.
If necessary, change the region on the navigation bar to US East (N. Virginia). All billing metric data is stored in this region, even for resources in other regions.
On the navigation pane under Metrics, choose Billing.
In the list of billing metrics, select the checkbox next to Currency USD for the metric named EstimatedCharges, as shown in the following graphic.
Choose Create Alarm.

Define the alarm as follows:
For the alarm to trigger as soon as an account goes over the free tier, set When my total AWS charges for the month exceed to $.01. The AWS Free Tier applies to participating services across the following AWS Regions: US East (Northern Virginia), US West (Oregon), US West (Northern California), Canada (Central), EU (London), EU (Ireland), EU (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), and South America (Sao Paulo). This will result in a notification being sent as soon as the first charge is incurred. Otherwise, set it to a custom amount, and a notification will be sent when the chosen amount’s threshold is reached. Refer to the following graphic:
Choose the New List link next to the Send a Notification To box.
When prompted, enter a valid email address.
Choose Create Alarm.
In the Confirm New Email Addresses dialog box, confirm the email address. When choosing I will do it later, the alarm will remain in thePending confirmation status until the email address is confirmed. Alarms will not send an alert until the email address is confirmed. To view the status of the alarm, choose Alarms in the navigation pane. Refer to the following graphic to see an alarm with a Config Status of Pending confirmation.

Review Questions

Which of the following requires a custom Amazon CloudWatch metric to monitor?
1. Amazon EC2 CPU Utilization
2. Amazon EC2 Disk IO
3. Amazon EC2 Memory Utilization
4. Amazon EC2 Network IO
While using Auto Scaling with an ELB in front of several Amazon EC2 instances, you want to configure Auto Scaling to remove one instance when CPU Utilization is below 20 percent. How is this accomplished?
1. Configure Amazon CloudWatch Logs to send a notification to the Auto Scaling Group when CPU utilization is less than 20 percent, and configure the Auto Scaling policy to remove the instance.
2. Configure Amazon CloudWatch to send a notification to the Auto Scaling Group when the aggregated CPU Utilization is less than 20 percent, and configure the Auto Scaling policy to remove the instance.
3. Monitor the Amazon EC2 instances with Amazon CloudWatch, and use Auto Scaling to remove an instance with scheduled actions.
4. Configure Amazon CloudWatch to generate an email using Amazon SNS when CPU utilization is less than 20 percent. Log into the console, and lower the desired capacity number inside Auto Scaling to remove the instance.
Your company has configured the custom metric upload with Amazon CloudWatch, and it has authorized employees to upload data using AWS CLI as well as AWS SDK. How can you track API calls made to CloudWatch?
1. Use AWS CloudTrail to monitor the API calls.
2. Create an IAM role to allow users who assume the role to view the data using an Amazon S3 bucket policy.
3. Enable logging with Amazon CloudWatch to capture metrics for the API calls.
4. Enable detailed monitoring with Amazon CloudWatch.
Of the services listed here, which provide detailed monitoring without extra charges being incurred? (Choose two.)
1. AWS Auto Scaling
2. Amazon Route 53
3. Amazon Elastic Map Reduce
4. Amazon Relational Database Service
5. Amazon Simple Notification Service
You have configured an ELB Classic Load Balancer to distribute traffic among multiple Amazon EC2 instances. Which of the following will aid troubleshooting efforts related to back-end servers?
1. HTTPCode_Backend_2XX
2. HTTPCode_Backend_3XX
3. HTTPCode_Backend_4XX
4. HTTPCode_Backend_5XX
What is the minimum time interval for data that Amazon CloudWatch receives and aggregates?
1. Fifteen seconds
2. One minute
3. Three minutes
4. Five minutes
Using the Free Tier, what is the frequency of updates received by Amazon CloudWatch?
1. Fifteen seconds
2. One minute
3. Three minutes
4. Five minutes
The type of monitoring automatically available in five-minute periods is called what?
1. Elastic
2. Simple
3. Basic
4. Detailed
You have created an Auto Scaling Group using the AWS CLI. You now want to enable Detailed Monitoring for this group. How is this accomplished?
1. Enable Detailed Monitoring from the AWS console.
2. When creating an alarm on the Auto Scaling Group, Detailed Monitoring is automatically activated.
3. When creating Auto Scaling Groups using the AWS CLI or API, Detailed Monitoring is enabled for Auto Scaling by default.
4. Auto Scaling Groups do not support Detailed Monitoring.
There are 10 Amazon EC2 instances running in multiple regions using an internal memory management tool to capture log files and send them to Amazon CloudWatch in US-West-2. Additionally, you are using the AWS CLI to configure CloudWatch to use the same namespace and metric in all regions. Which of the following is true?
1. Amazon CloudWatch will receive and aggregate statistical data based on the namespace and metric.
2. Amazon CloudWatch will process the data only for the server that responds first and ignore the other regions.
3. Amazon CloudWatch will process the statistical data for the most recent response regardless of region and overwrite other data.
4. Amazon CloudWatch cannot receive data across regions.
You have misconfigured an Amazon EC2 instance’s clock and are sending data to Amazon CloudWatch via the API. Because of the misconfiguration, logs are being sent 60 minutes in the future. Which of the following is true?
1. Amazon CloudWatch will process the data.
2. It is not possible to send data from the future.
3. It is not possible to send data manually to Amazon CloudWatch.
4. Agents cannot send data for more than 60 minutes in the future.
You have a system that sends data to Amazon CloudWatch every five minutes for tracking/ monitoring. Which of these parameters is required as part of the put-metric-data request?
1. Key
2. Namespace
3. Metric Name
4. Timestamp
To monitor API calls against AWS use _______________ to capture the history API requests and use _______________ to respond to operational changes in real time.
1. AWS Config; Amazon Inspector
2. AWS CloudTrail; AWS Config
3. AWS CloudTrail; Amazon CloudWatch Events
4. AWS Config; AWS Lambda

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9 Monitoring and Metrics

Create new playlist

Sign In

Sign Up

Introduction to Monitoring and Metrics

An Overview of Monitoring

Why Monitor?

Amazon CloudWatch

AWS CloudTrail

AWS Config

AWS Trusted Advisor

AWS Service Health Dashboard

AWS Personal Health Dashboard

Amazon CloudWatch

Metrics

Custom Metrics

Amazon CloudWatch Metrics Retention

Namespaces

Dimensions

Dimension Combinations

Statistics

Units

Periods

Aggregation

Dashboards

Percentiles

Monitoring Baselines

Amazon EC2 Status Checks

System Status Checks

Instance Status Checks

Authentication and Access Control

Permissions Required to Use the Amazon CloudWatch Console

AWS Managed Policies for Amazon CloudWatch

Amazon CloudWatch Resources and Operations

AWS Cloud Services Integration

Amazon CloudWatch Limits

Amazon CloudWatch Alarms

Alarms and Thresholds

Missing Data Points

Common Amazon CloudWatch Metrics

Amazon EC2

System Status Checks

Instance Status Checks

Amazon Elastic Block Store Volume Monitoring

Amazon EBS Status Checks

Amazon ElastiCache

Amazon RDS Metrics

AWS Elastic Load Balancer

Amazon CloudWatch Events

Events

Rules

Targets

Metrics and Dimensions

Amazon CloudWatch Logs

Archived Data

Log Monitoring

Agents

Amazon CloudWatch Logs Agent for Linux

Amazon CloudWatch Logs: Agents and IAM

Amazon CloudWatch Logs Agent for Windows

Searching and Filtering Log Data

Amazon CloudWatch Logs Metrics and Dimensions

Metrics

Dimensions

Monitoring AWS Charges

Detailed Billing

Tags and Log Groups

Log Group Tag Restrictions

Cost Explorer

AWS Billing and Cost Management Metrics and Dimensions

Metrics

Dimensions

AWS CloudTrail

What Are Trails?

Types of Trails

Trails that Apply to All Regions

A Trail that Applies to One Region

Multiple Trails per Region

Encryption

AWS CloudTrail Log Delivery

Table of Contents for
Chapter 9 Monitoring and Metrics