THE AWS CERTIFIED SYSOPS ADMINISTRATOR - ASSOCIATE EXAM TOPICS COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:
This chapter covers the collection of metrics from Amazon CloudWatch which, in conjunction with other AWS Cloud services, can assist in the management, deployment, and optimization of workloads in the cloud.
In addition to performance monitoring, services such as Amazon CloudWatch Logs, AWS Config, AWS Trusted Advisor, and AWS CloudTrail can provide a detailed inventory of provisioned resources for security audits and financial accounting.
Amazon CloudWatch was designed to monitor cloud-based computing. Even so, when using Amazon CloudWatch Logs, systems in an existing, on-premises datacenter can send log information to Amazon CloudWatch for monitoring.
Sometimes, it is important to monitor and view the health of AWS in general. To do this, there are two tools: the AWS Service Health Dashboard and the AWS Personal Health Dashboard. These services display the general status of AWS and provide a personalized view of the performance and availability of provisioned resources.
This chapter covers these topics separately and also highlights how they can work together to maintain a robust environment on AWS.
Everything fails, all the time.
—Werner Vogels
Computing systems are incredibly complex. Effectively troubleshooting them requires data that is easily understood to be delivered in real time. Service Level Agreements (SLAs) often require high levels of availability, and a lack of meaningful information can lead to loss of time and revenue.
Monitoring provides several major benefits:
Monitoring is the process of observing and recording resource utilization in real time. Alarms are notifications based on this information in response to a predefined condition. Frequently, this condition involves failure. Alarms can also be configured to send notifications when resources are being underutilized and money is being wasted.
Traditional monitoring tools have been designed around on-premises datacenters with the idea that servers are going to be in place for an extended amount of time. Because of this, these tools have difficulty distinguishing between an Amazon Elastic Compute Cloud (Amazon EC2) instance that has failed and one that was terminated purposely. AWS created its own monitoring service, Amazon CloudWatch, that is integrated with other AWS Cloud services and uses AWS Identity and Access Management (IAM) to keep monitoring data secure.
Amazon CloudWatch is a service that monitors the health and status of AWS resources in real time. It provides system-wide visibility into resource utilization, application performance, and operational health by tracking, measuring, reporting, alerting, and reacting to events that occur in an environment.
Amazon CloudWatch Logs collects and monitors log files, can set alarms, and automatically reacts to changes in AWS resources. Logs can be monitored in real time or stored for analysis.
Amazon CloudWatch Alarms monitor a single metric and perform one or more actions based on customer-defined criteria.
Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. These streams are delivered to Amazon EC2 instances, AWS Lambda functions, Amazon Kinesis Streams, Amazon EC2 Container Service (Amazon ECS) tasks, AWS Step Functions state machines, Amazon Simple Notification Service (Amazon SNS) topics, Amazon Simple Queue Service (Amazon SQS) queues, or built-in targets.
AWS CloudTrail monitors calls made to the Amazon CloudWatch Events Application Programming Interface (API) for an account. This includes calls made by the AWS Management Console, the AWS Command Line Interface (AWS CLI), and other AWS Cloud services. When AWS CloudTrail logging is turned on, Amazon CloudWatch Events writes log files to an Amazon Simple Storage Service (Amazon S3) bucket.
AWS Config provides a detailed view of the configuration of AWS resources in an AWS account, including how the resources are related to one another. It also provides historical information to show how configurations and relationships have changed over time. Related to monitoring, AWS Config allows customers to create rules that check the configuration of their AWS resources and check for compliance with an organization’s policies. When an AWS Config rule is triggered, it generates an event that can be captured by Amazon CloudWatch Events.
Amazon CloudWatch can monitor AWS resources, such as Amazon EC2 instances, Amazon DynamoDB tables, Amazon Relational Database Service (Amazon RDS) DB instances, custom metrics generated by applications and services, and log files generated by applications and operating systems.
AWS Trusted Advisor is an online resource designed to help reduce cost, increase performance, and improve security by optimizing an AWS environment. It provides real-time guidance to help provision resources following AWS best practices.
AWS Trusted Advisor checks for best practices in four categories:
The following four AWS Trusted Advisor checks are available at no charge to all AWS customers to help improve security and performance:
The AWS Service Health Dashboard provides access to current status and historical data about every AWS Cloud service. If there’s a problem with a service, it is possible to expand the appropriate line in the details section to get more information.
In addition to the dashboard, it is also possible to subscribe to the RSS feed for any service.
For anyone experiencing a real-time operational issue with one of the AWS Cloud services currently reporting as being healthy, there is a Contact Us link at the top of the page to report an issue.
The AWS Service Health Dashboard is available at http://status.aws.amazon.com/.
The AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that impact customers. While the AWS Service Health Dashboard displays the general status of AWS Cloud services, the AWS Personal Health Dashboard provides a personalized view into the performance and availability of the AWS Cloud services underlying provisioned AWS resources.
The dashboard displays relevant and timely information to help manage events in progress and provides proactive notification to help plan scheduled activities. Alerts are automatically triggered by changes in the health of AWS resources and provide event visibility and also guidance to help diagnose and resolve issues quickly.
Now let’s take a deep dive into each of these services.
Amazon CloudWatch monitors, in real time, AWS resources and applications running on AWS. Amazon CloudWatch is used to collect and track metrics, which are variables used to measure resources and applications. Amazon CloudWatch Alarms send notifications and can automatically make changes to the resources being monitored based on user-defined rules. Amazon CloudWatch is basically a metrics repository. An AWS product such as Amazon EC2 puts metrics into the repository and customers retrieve statistics based on those metrics. Additionally, custom metrics can be placed into Amazon CloudWatch for reporting and statistical analysis.
For example, it is possible to monitor the CPU usage and disk reads and writes of Amazon EC2 instances. With this information, it is possible to determine when additional instances should be launched to handle the increased load. Additionally, these new instances can be launched automatically before there is a problem, eliminating the need for human intervention. Conversely, monitoring data can be used to stop underutilized instances automatically in order to save money.
In addition to monitoring the built-in metrics that come with AWS, it is possible to create, monitor, and trigger actions using custom metrics. Amazon CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health. Figure 9.1 illustrates how Amazon CloudWatch connects to both AWS and on-premises environments.
Amazon CloudWatch can be accessed using the following methods:
At the core of Amazon CloudWatch are metrics, which are time-ordered sets of data points that contain information about the performance of resources. By default, several services provide some metrics at no additional charge. These metrics include information from Amazon EC2 instances, Amazon Elastic Block Store (Amazon EBS) volumes, and Amazon RDS DB instances. It is also possible to enable detailed monitoring for some resources, such as Amazon EC2 instances. In addition to monitoring AWS resources, Amazon CloudWatch can be used to monitor data produced from applications, scripts, and services. A custom metric is any metric provided to Amazon CloudWatch via an agent or an API. Custom metrics can be used to monitor the time it takes to load a web page, capture request error rates, monitor the number of processes or threads on an instance, or track the amount of work performed by an application. Ways to create custom metrics include the following: Custom metrics come at an additional cost based on usage. In November 2016, Amazon CloudWatch changed the length of time metrics are stored inside the service as follows: If metrics need to be available for longer than those periods, they can be archived using the GetMetricStatistics API call. Metrics cannot be deleted. They automatically expire after 15 months if no new data is published to them. A namespace is a container for a collection of Amazon CloudWatch metrics. Each namespace is isolated from other namespaces. This isolation ensures that data collected is only from services specified and prevents different applications from mistakenly aggregating the same statistics. There are no default namespaces. When creating a custom metric, a namespace is required. If the specified namespace does not exist, Amazon CloudWatch will create it. Namespace names must contain valid XML characters and be fewer than 256 characters in length. Allowed characters in namespace names are as follows: AWS namespaces use the following naming convention: AWS/service. For example, Amazon EC2 uses the AWS/EC2 namespace. Sample AWS Namespaces are shown in Table 9.1. TABLE 9.1 A Small Sample of AWS Namespaces A dimension is a name/value pair that uniquely identifies a metric and further clarifies the metric data stored. A metric can have up to 10 dimensions. Every metric has specific characteristics that describe it. Think of dimensions as categories or metadata for those characteristics. The categories can aid in the design of a structure for a statistics plan. Because dimensions are part of the unique identifier for a metric, whenever a unique name/value pair is added to a metric, a new metric is created. Dimensions can be used to filter the results from Amazon CloudWatch. For example, it is possible to get statistics for a specific Amazon EC2 instance by specifying the instance ID dimension when doing a search. Amazon CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics use the same metric name. It is not possible to retrieve statistics using combinations of dimensions that have not been specifically published. When retrieving statistics, specify the same values for the namespace, metric name, and dimension parameters that were used when the metrics were created. The start and end times can be specified for Amazon CloudWatch to use for aggregation. To illustrate, here are four distinct metrics named ServerStats in the DataCenterMetric namespace that have the following properties: If those four metrics are the only ones that have been published, statistics can be retrieved for these combinations of dimensions: It is not possible to receive statistics for the following dimensions. In this case, dimensions must be specified to retrieve statistics:
Statistics are metric data aggregations over specified periods of time. Amazon CloudWatch provides statistics based on the metric data points provided by custom data or by other services in AWS to Amazon CloudWatch. Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure within the time period specified. Available CloudWatch statistics are provided in Table 9.2. TABLE 9.2 Available CloudWatch Statistics
Pre-calculated statistics can be added to Amazon CloudWatch. Instead of data point values, specify values for SampleCount, Minimum, Maximum, and Sum. Amazon CloudWatch calculates the average for you. Values added in this way are aggregated with any other values associated with the matching metric. Each statistic has a unit of measure. Example units include bytes, seconds, count, and percent. A unit can be specified when creating a custom metric. If one is not specified, Amazon CloudWatch uses None as the unit. Units provide conceptual meaning to data. Metric data points that specify a unit of measure are aggregated separately. When getting statistics without specifying a unit, Amazon CloudWatch aggregates all data points of the same unit together. If there are two identical metrics that have different units, two separate data streams are returned—one for each unit. A period is the length of time associated with a specific Amazon CloudWatch statistic. Each statistic represents an aggregation of the metrics data collected for a specified period of time. Although periods are expressed in seconds, the minimum granularity for a period is one minute. Because of this minimum granularity, period values are expressed as multiples of 60. By varying the length of the period, the data aggregation can be adjusted. When retrieving statistics, specify a period, start time, and an end time. These parameters determine the overall length of time associated with the collected statistic. Default values for the start time and end time return statistics from the past hour. The values specified for the start time and end time determine how many periods Amazon CloudWatch will return. Periods are also important for Amazon CloudWatch Alarms. When creating an alarm to monitor a specific metric, Amazon CloudWatch will compare that metric to a specified threshold value. Customers have extensive control over how Amazon CloudWatch makes comparisons. In addition to the period length, the number of evaluation periods can be specified as well. For example, if three evaluation periods are specified, Amazon CloudWatch compares a window of three data points. Amazon CloudWatch only sends a notification if the oldest data point is breaching and the others are breaching or are missing. Amazon CloudWatch aggregates statistics according to the period length specified when retrieving statistics. When publishing as multiple data points with the same or similar timestamps, Amazon CloudWatch aggregates them by period length. Data points for a metric that share the same timestamp, namespace, and dimension can be published, and Amazon CloudWatch will return aggregated statistics for them. It is also possible to publish multiple data points for the same or different metrics with any timestamp. For large datasets, a pre-aggregated dataset can be inserted called a statistic set. With statistic sets, give Amazon CloudWatch the Min, Max, Sum, and SampleCount for a number of data points. This is commonly used data that needs to be collected many times in a minute. Amazon CloudWatch doesn’t differentiate the source of a metric. If a metric is published with the same namespace and dimensions from different sources, Amazon CloudWatch treats it as a single metric. This can be useful for service metrics in a distributed, scaled system. Amazon CloudWatch dashboards are customizable pages in the Amazon CloudWatch console that can be used to monitor resources in a single view. Monitored resources can be in a single region or multiple regions. Use Amazon CloudWatch dashboards to create customized views of the metrics and alarms for AWS resources. With dashboards, it is possible to create the following: A percentile indicates the relative standing of a value in a dataset and is often used to isolate anomalies. For example, the 95th percentile means that 95 percent of the data is below this value and 5 percent of the data is above this value. Percentiles help in a better understanding of the distribution of metric data. Percentiles can be used with the following services: Monitoring is an important part of maintaining the reliability, availability, and performance of Amazon EC2 instances. Collect monitoring data from all of the parts of an AWS solution to be able to debug any multi-point failures easily. In order to monitor an environment effectively, have a plan that answers the following questions: After the monitoring goals have been defined and the monitoring plan has been created, the next step is to establish a baseline for normal Amazon EC2 performance. Measure Amazon EC2 performance at various times and under different load conditions. While monitoring Amazon EC2, store a history of collected monitoring data. Over time, the historical data can be compared to current data to identify normal performance patterns and also anomalies. For example, if monitoring CPU utilization, disk I/O, and network utilization for your Amazon EC2 instances and performance falls outside of an established baseline, reconfigure or optimize the instance to reduce CPU utilization, improve disk I/O, or reduce network traffic (see Table 9.3). Inside Amazon EC2, there are two types of status checks: a system status check and an instance status check. TABLE 9.3 Establishing an Amazon EC2 Baseline System status checks monitor the AWS systems required to use your instance in order to ensure that they are working properly. These checks detect problems on the hardware that an instance is using. When a system status check fails, there are three possible courses of action. One option is to wait for AWS to fix the issue. If an instance boots from an Amazon EBS volume, stopping and starting the instance will move it to new hardware. If the Amazon EC2 instance is using an instance store volume, terminate it, and start a new one to put it on new hardware. The following are examples of problems that can cause system status checks to fail: Instance status checks monitor the software and network configuration of individual instances. These checks detect problems that require user involvement to repair. When an instance status check fails, it can often be fixed with a reboot or by reconfiguring the Amazon EC2 instance. The following are examples of problems that can cause instance status checks to fail: Access to Amazon CloudWatch requires credentials. These credentials must have permissions to access AWS resources such as the Amazon CloudWatch console or retrieving Amazon CloudWatch metric data. Every AWS resource is owned by an AWS account, and permissions to create or access a resource are governed by permissions policies. An account administrator can attach permissions policies to IAM identities: users, groups, and roles. For a user to work with the Amazon CloudWatch console, there is a minimum set of permissions required to allow that user to describe other AWS resources in the AWS account. The Amazon CloudWatch management console requires permissions from the following services: AWS provides standalone IAM policies that cover many common use cases. These policies were created and administered by AWS and grant the required permissions for services without customers having to create and maintain their own. Customers create their own IAM policies to allow permissions for Amazon CloudWatch actions and resources. Attach these custom policies to the IAM users or groups that require those permissions. Amazon CloudWatch has no specific resources that can be controlled. As a result, there are no Amazon CloudWatch Amazon Resource Names (ARNs) to use in an IAM policy. For example, it is not possible to give a user access to Amazon CloudWatch data for a specific set of Amazon EC2 instances or a specific load balancer. When writing an IAM policy, use an asterisk (*) as the resource name to control access to Amazon CloudWatch actions. The following AWS Cloud services support Amazon CloudWatch without additional charges. Customers have the option to choose which of the preselected metrics they want to use. Auto Scaling Groups Seven preselected metrics at a one-minute frequency Elastic Load Balancing Thirteen preselected metrics at a one-minute frequency Amazon Route 53 health checks One preselected metric at a one-minute frequency Amazon EBS IOPS (Solid State Drive [SSD]) volumes Ten preselected metrics at a one-minute frequency Amazon EBS General Purpose (SSD) volumes Ten preselected metrics at a one-minute frequency Amazon EBS Magnetic volumes Eight preselected metrics at a five-minute frequency AWS Storage Gateway Eleven preselected gateway metrics and five preselected storage volume metrics at a five-minute frequency Amazon CloudFront Six preselected metrics at a one-minute frequency Amazon DynamoDB tables Seven preselected metrics at a five-minute frequency Amazon ElastiCache nodes Thirty-nine preselected metrics at a one-minute frequency Amazon RDS DB instances Fourteen preselected metrics at a one-minute frequency Amazon EMR job flows Twenty-six preselected metrics at a five-minute frequency Amazon Redshift Sixteen preselected metrics at a one-minute frequency Amazon SNS topics Four preselected metrics at a five-minute frequency Amazon SQS queues Eight preselected metrics at a five-minute frequency AWS OpsWorks Fifteen preselected metrics at a one-minute frequency Amazon CloudWatch Logs Six preselected metrics at one-minute frequency Estimated charges on your AWS bill It is also possible to enable metrics to monitor AWS charges. The number of metrics depends on the AWS products and services used. These metrics are offered at no additional charge. Table 9.4 lists Amazon CloudWatch limits. TABLE 9.4 Amazon CloudWatch Limits Amazon CloudWatch Alarms are used to initiate automatically an action in response to a predefined condition. An alarm watches a single metric over a specified time period and, based on the value of that metric relative to a threshold over time, performs one or more specified actions. Those actions include triggering an Auto Scaling policy, publishing to an Amazon SNS topic, and updating a dashboard. Alarms only trigger actions after sustained state changes. Amazon CloudWatch Alarms are not generated simply because a metric is in a particular state. The state must change and be maintained for a user-specified number of periods. An Amazon CloudWatch Alarm is always in one of three states: OK, ALARM, or INSUFFICIENT_DATA. Actions are set to respond to the transition of the metric as it moves into each of the three states. Actions only happen on state transitions and will not be re-executed if the condition persists. Multiple actions are allowed for an alarm. If an alarm is triggered, Amazon SNS could be used to send a notification email, while at the same time an Auto Scaling policy is updated. Alarms are designed to be triggered when three things happen: In short, an Amazon CloudWatch Alarm is triggered when a monitored metric reaches a value, is reported multiple times in a row, and stays at that value for a period of time. In Figure 9.2, the alarm threshold is set to three units and the alarm is evaluated over three periods. The alarm goes to ALARM state if the oldest of the three periods evaluated has matched the alarm criteria and the next two periods have met the criteria or are missing. In the figure, this happens with the third through fifth time periods when the alarm’s state is set to ALARM. At period six, the value drops below the threshold and the state reverts to OK. Later, during the ninth time period, the threshold is breached again, but for only one period. Because of this, the alarm state remains OK. Figure 9.2 shows this as a graph. Alarms can also be added to dashboards. When an alarm is on a dashboard, it turns red when it is in the ALARM state, making it easier to monitor its status. Similar to how each alarm is always in one of three states, each specific data point reported to Amazon CloudWatch falls under one of three categories: Customers can specify how alarms handle missing data points. They can be treated as: The best choice of how to treat missing data points depends on the type of metric. For a metric that continually reports data, such as CPUUtilization of an instance, it might be best to treat missing data points as bad because their absence indicates something is wrong. For a metric that generates data points only when an error occurs, such as ThrottledRequests in Amazon DynamoDB, missing data points should be treated as good. Choosing the best option for an alarm prevents unnecessary and misleading alarm condition changes and more accurately indicates the health of a system. There are hundreds of metrics available for monitoring on AWS. The common ones are listed here, broken down by service, with a brief explanation. As we have mentioned throughout, this book is designed to do more than just prepare you for an exam—it should serve you well as a day-to-day guide in working with AWS. There are two types of EC2 status checks: system status checks and instance status checks. System Status Checks monitor AWS hardware to ensure that instances are working properly. These checks detect problems with an instance that requires AWS involvement to repair. When a system status check fails, customers can choose to wait for AWS to fix the issue, or they can resolve it by either stopping and starting an instance or by terminating and replacing it. The following are examples of problems that can cause system status checks to fail: Instance Status Checks monitor the software and network configuration of an individual instance. These checks detect problems that require customer involvement to repair. When an instance status check fails, typical solutions include rebooting or reconfiguring the instance. The following are examples of problems that can cause instance status checks to fail: The following Amazon CloudWatch metrics offer insight into the usage and utilization of Amazon EC2 instances. For Amazon EC2, common metrics include the following: CPUUtilization This metric is the percentage of allocated Amazon EC2 compute units that are currently in use on an instance. This metric identifies the processing power required to run an application on a selected instance. NetworkIn This metric represents the number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance. Similar to NetworkOut, the number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60. NetworkOut This metric is the number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance. Similar to NetworkIn, the number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60. DiskReadOps This metric reports the completed read operations from all instance store volumes available to the instance in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period. DiskWriteOps This metric reports the completed write operations to all instance store volumes available to the instance in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period. DiskReadBytes This metric reports the number of bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. It can be used to determine the speed of the application. The number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60. DiskWriteBytes This metric reports the number of bytes written to all instance store volumes available to the instance. The metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. The number reported is the number of bytes received during the period. When using Basic Monitoring, divide this number by 300 to find bytes/second. With Detailed Monitoring, divide it by 60. Amazon Elastic Block Store (Amazon EBS) sends data points to Amazon CloudWatch for several metrics. Amazon EBS General Purpose SSD (gp2), Throughput Optimized HDD (st1), Cold HDD (sc1), and Magnetic (standard) volumes automatically send five-minute metrics to CloudWatch. Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch. Common Amazon EBS metrics include the following: VolumeReadBytes and VolumeWriteBytes These metrics provide information on the I/O operations in a specified period of time. The Sum statistic reports the total number of bytes transferred during the period. The Average statistic reports the average size of each I/O operation during the period. The SampleCount statistic reports the total number of I/O operations during the period. The Minimum and Maximum statistics are not relevant for this metric. VolumeReadOps and VolumeWriteOps These metrics report the total number of I/O operations in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period. VolumeTotalReadTime and VolumeTotalWriteTime These metrics report the total number of seconds spent by all operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 5 minutes (300 seconds): If 700 operations completed during that period, and each operation took 1 second, the value would be 700 seconds. VolumeIdleTime This metric represents the total number of seconds in a specified period of time when no read or write operations were submitted. VolumeQueueLength This metric is the number of read and write operation requests waiting to be completed in a specified period of time. VolumeThroughputPercentage This metric is used with Provisioned IOPS SSD volumes only. It is the percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. Provisioned IOPS SSD volumes deliver within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year. VolumeConsumedReadWriteOps This metric is used with Provisioned IOPS SSD volumes only. The total amount of read and write operations (normalized to 256K capacity units) consumed in a specified period of time I/O operations that are smaller than 256K each count as 1 consumed IOPS. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS. BurstBalance This metric is only used with General Purpose SSD (gp2), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes. It provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. Data is reported to Amazon CloudWatch only when the volume is active. If the volume is not attached, no data is reported. Volume status checks help customers understand, track, and manage potential inconsistencies in the data on an Amazon EBS volume. They are designed to provide you with the information needed to determine if an Amazon EBS volume is impaired and to help customers control how a potentially inconsistent volume is handled. Volume status checks are automated tests that run every five minutes and return a pass or fail status. If all checks pass, the status of the volume is ok. If a check fails, the status of the volume is impaired. If the status is insufficient-data, the checks may still be in progress on the volume. There are four status types for Provisioned IOPS EBS Volumes: ok, warning, impaired, and insufficient-data. ok This status means that the volume is performing as expected. warning This status means that the volume is either Degraded or Severely Degraded. Degraded means that the volume performance is below expectations. Severely Degraded means that the volume performance is well below expectations. impaired Impaired means that a volume has either Stalled or is Not Available. Stalled means that the volume performance is severely impacted. Not Available means that it is unable to determine I/O performance because I/O is disabled. insufficient-data Insufficient-data means that there have not been enough data points collected but that it is online. The following Amazon CloudWatch metrics offer insight into Amazon ElastiCache performance. In most cases, the recommendation is to set CloudWatch Alarms for these metrics to be able to take corrective action before performance issues occur. For Amazon ElastiCache, common metrics include the following: CPUUtilization This is a host-level metric reported as a percent. Administrators have to determine their own threshold value based on the number of cores in the cache node being used. If this threshold is exceeded and the main workload is from read requests, scale the cache cluster out by adding read replicas. If the main workload is from write requests, AWS recommends scaling up by using a larger cache instance type. SwapUsage This is a host-level metric reported in bytes. Evictions This is a metric published for both Memcached and Redis cache clusters. AWS recommends customers determine their own alarm thresholds for this metric based on application needs. CurrConnections This is a cache engine metric, published for both Memcached and Redis cache clusters. AWS recommends customers determine their own alarm thresholds for this metric based on application needs. Whether running Memcached or Redis, an increasing number of CurrConnections might indicate a problem with an application. When using Amazon RDS resources, Amazon RDS sends metrics and dimensions to Amazon CloudWatch every minute. Common metrics include the following: DatabaseConnections This metric is a count of the number of database connections in use. DiskQueueDepth This metric is a count of the number of outstanding I/O operations waiting to access the disk. FreeStorageSpace This metric, measured in bytes, is the amount of available storage space. ReplicaLag This metric, measured in seconds, is the amount of time a Read Replica DB instance lags behind the source DB instance. It applies to MySQL, MariaDB, and PostgreSQL Read Replicas. ReadIOPS This metric is the average number of disk read I/O operations per second. WriteIOPS This metric is the average number of disk write I/O operations per second. ReadLatency This metric, measured in seconds, is the average amount of time taken per disk read I/O operation. WriteLatency This metric, measured in seconds, is the average amount of time taken per disk write I/O operation. AWS ELB reports metrics to Amazon CloudWatch only when requests are flowing through the load balancer. When there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals. If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported. Common metrics reported to Amazon CloudWatch from an ELB include the following: BackendConnectionErrors This metric is the number of connections that were not successfully established between the load balancer and the registered instances. Because the load balancer retries the connection when there are errors, this count can exceed the request rate. This count also includes any connection errors related to health checks. HealthyHostCount This metric is the number of healthy instances registered with a load balancer. A newly registered instance is considered healthy after it passes the first health check. If cross-zone load balancing is enabled, the number of healthy instances for the LoadBalancerName dimension is calculated across all Availability Zones. Otherwise, it is calculated per Availability Zone. UnHealthyHostCount This metric is the number of unhealthy instances registered with a load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks. An unhealthy instance is considered healthy again after it meets the healthy threshold configured for health checks. RequestCount This metric is the number of requests completed or connections made during the specified interval, which is either one or five minutes. Latency This metric represents the elapsed time, in seconds, between when a request has been sent to an instance and the reply received. HTTPCode_Backend_2XX This metric is the number of HTTP 2XX response codes generated by registered instances. 2XX status codes report success. The action was successfully received, understood, and accepted. This count does not include any response codes generated by the load balancer. HTTPCode_Backend_3XX This metric is the number of HTTP 3XX response codes generated by registered instances. 3XX status codes report redirection. Further action must be taken in order to complete the request. HTTPCode_Backend_4XX This metric is the number of HTTP 4XX response codes generated by registered instances. 4XX status codes report client errors. The request contains bad syntax or cannot be fulfilled. HTTPCode_Backend_5XX This metric is the number of HTTP 5XX response codes generated by registered instances. 5XX status codes report server errors. The server failed to fulfill an apparently valid request. HTTPCode_ELB_4XX This metric is the number of HTTP 4XX client error codes generated by the load balancer. Client errors are generated when a request is malformed or incomplete. This error is generated by the ELB. HTTPCode_ELB_5XX This metric the number of HTTP 5XX server error codes generated by the load balancer. This count does not include any response codes generated by the registered instances. The metric is reported if there are no healthy instances registered to the load balancer, or if the request rate exceeds the capacity of the instances (spillover) or the load balancer. This error is generated by the ELB. SpilloverCount This metric is the total number of requests that were rejected because the surge queue is full. SurgeQueueLength This metric is the total number of requests that are pending routing. The load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request. The maximum size of the queue is 1,024. Additional requests are rejected when the queue is full. Amazon CloudWatch Events delivers a near real-time stream of system events that describes changes in AWS resources. Using relatively simple rules, it is possible to route events to one or more targets for processing. Think of Amazon CloudWatch Events as the central nervous system for an AWS environment. It is connected to supported AWS Cloud services, and it becomes aware of operational changes as they happen. Then, driven by rules, it sends messages and activates functions in response to the changes. An event is a change in an AWS environment, and it can be generated in four different ways: Remember, an event indicates there has been a change in an AWS environment. AWS resources can generate events when their state changes. AWS CloudTrail publishes events from API calls. Custom application-level events can be created and published to Amazon CloudWatch Events. Scheduled events are generated on a periodic basis. Amazon EC2 generates an event when the state of an Amazon EC2 instance changes from pending to running. Auto Scaling generates events when it launches or terminates instances. Amazon CloudWatch Events can be used to schedule actions that trigger at certain times using cron or rate expressions. All scheduled events use the Universal Time (UTC) time zone and a minimum precision of one minute. A rule matches incoming events and routes them to targets for processing. A single rule can route to multiple targets and are processed in parallel. This enables different parts of an organization to look for and process the events that are of interest to them. A rule can customize the JSON sent to the target by passing only certain parts or by overwriting it with a constant. Because rules sent to multiple targets are processed in parallel, their order is lost. A target processes data in JSON format that has been sent to it from Amazon CloudWatch Events. Amazon CloudWatch Events delivers a near real-time stream of system events to one or more target functions or streams for analysis. These targets include the following: Amazon CloudWatch Events sends metrics to Amazon CloudWatch every minute. The AWS/Events namespace includes the metrics shown in Table 9.5. TABLE 9.5 Amazon CloudWatch Events Metrics Amazon CloudWatch Events metrics use a single dimension: RuleName. As the name implies, it filters available metrics by rule name. Amazon CloudWatch Logs can be used to monitor, store, and access log files from Amazon EC2 instances, AWS CloudTrail, and servers running in an on-premises datacenter. It is then possible to retrieve and report on the associated log data from Amazon CloudWatch Logs. Amazon CloudWatch Logs can monitor and store application logs, system logs, web server logs, and other custom logs. By setting alarms on these metrics, notifications can be generated for application or web server issues and can take the necessary actions. Amazon CloudWatch Logs is made up of several components: Log Agents A log agent directs logs to Amazon CloudWatch. AWS does not have effective visibility above the Hypervisor as part of the shared responsibility model. Because of this, agents have to send data into Amazon CloudWatch. Log Events A log event is an activity reported to the log file by the operating system or application along with a timestamp. Log events support only text format. Log events contain two properties: the timestamp of when the event occurred and the raw log message. By default, any line that begins with a non-whitespace character closes the previous log message and starts a new log message. Log Streams A log stream is a group of log events reported by a single source, such as a web server. Log Groups A log group is a group of log streams from multiple resources, such as a group of web servers managing the same content. Retention policies and metric filters are set on log groups—not log streams. Metric Filters Metric filters tell Amazon CloudWatch how to extract metric observations from ingested log events and turn them into Amazon CloudWatch metrics. For example, with a metric filter called 404_Error, it will filter log events to find 404 access errors. An alarm can be created to monitor those 404 errors on different servers. Retention Policies Retention policies determine how long events are retained inside Amazon CloudWatch Logs. Policies are assigned to log groups and applied to all of the log streams in the group. Retention time can be set from 1 day to 10 years. You can also opt for logs to never expire. All log events uploaded to Amazon CloudWatch are retained. It is possible to specify the retention duration. Data is compressed, put into an archive, and stored. Charges are incurred for storage of the archived data. Amazon CloudWatch Logs can ingest logs from sources external to AWS. This means that log data from an on-premises datacenter can be sent to Amazon CloudWatch Logs for reporting. Use Amazon CloudWatch Logs to monitor applications and systems using log data. Amazon CloudWatch Logs can track the number of errors that occur in application logs and send a notification whenever the rate of errors exceeds a threshold. Because Amazon CloudWatch Logs uses log data for monitoring, no code changes are required. The current time is used for each log event if the datetime_format isn’t provided. If the provided datetime_format is invalid for a given log message, the timestamp from the last log event with a successfully parsed timestamp is used. If no previous log events exist, the current time is used. A warning message is logged when a log event falls back to the current time or the time of a previous log event. Timestamps are used for retrieving log events and generating metrics. If the wrong format is specified, log events could become non-retrievable and generate inaccurate metrics. An agent is required to publish log data to Amazon CloudWatch Logs because AWS has no visibility above the Hypervisor. There are agents available for Linux and Windows. Agents have the following components: The Amazon CloudWatch Logs agent requires Python version 2.6, 2.7, 3.0, or 3.3 and any of the following versions of Linux: The Amazon CloudWatch Logs agent requires the CreateLogGroup, CreateLogStream, DescribeLogStreams, and PutLogEvents operations. With the latest agent, DescribeLogStreams is no longer needed. Here is a sample IAM policy for using an agent: Starting with Amazon Linux Amazon Machine Image (AMI) 2014.09, the Amazon CloudWatch Logs agent is available as a Red Hat Package Manager (RPM) installation with the awslogs package. Earlier versions of Amazon Linux can access the awslogs package by updating their instance with the sudo yum update -y command. By installing the awslogs package as an RPM instead of the using the Amazon CloudWatch Logs installer, instances receive regular package updates and patches from AWS without having to reinstall the Amazon CloudWatch Logs agent manually. Do not update the Amazon CloudWatch Logs agent using the RPM installation method if Python script was used to install the agent. Doing so may cause configuration issues that prevent the Amazon CloudWatch Logs agent from sending logs to Amazon CloudWatch. Starting with EC2Config version 2.2.5, it is possible to export all Windows Server log messages from the system log, security log, application log, and Internet Information Services (IIS) log and send them to Amazon CloudWatch Logs. EC2Config version 2.2.10 or later adds the ability to export any event log data, event tracing for Windows data, or text-based log files to Amazon CloudWatch Logs. Windows performance counter data can also be exported to Amazon CloudWatch. Amazon EC2 instances use an agent to send log data to Amazon CloudWatch. For Microsoft Windows, the agent is either the EC2Config service or the Systems Manager (SSM) Agent. By default, the EC2Config service is included in AWS Windows Server 2003-2012 R2 AMIs. EC2Config starts when the instance boots and performs tasks during startup and each time an Amazon EC2 instance starts or stops. EC2Config can also perform tasks on demand. Some of these tasks are automatically enabled, while others must be enabled manually. Windows Server 2016 AMIs do not use EC2Config. Instead, these AMIs use the EC2Launch PowerShell script. Table 9.6 shows which agent types are available on different versions of Microsoft Windows. TABLE 9.6 Microsoft Windows Agents
Log data is encrypted in transit and at rest. It is also possible to install Amazon CloudWatch Logs agents and create log streams using AWS OpsWorks and Chef. Chef is a third-party systems and cloud infrastructure automation tool. Chef uses “recipes” to install and configure software and “cookbooks,” which are collections of recipes, to perform configuration and policy distribution tasks. It is possible to monitor application logs for specific literal terms (such as NullReferenceException) or count the number of occurrences of a literal term at a particular position in log data. This literally could be a 404 status code in an Apache access log. When the term is found, Amazon CloudWatch Logs reports the data to a customer-specified Amazon CloudWatch metric. After an agent begins publishing logs to Amazon CloudWatch, it is possible both to search for and filter log data by creating one or more filters. Metric filters define the terms and patterns to look for in log data as it is sent to Amazon CloudWatch Logs. Amazon CloudWatch Logs uses these metric filters to turn log data into Amazon CloudWatch metrics that can be graphed or used to set an alarm condition. Filters do not retroactively filter data. Filters only publish the metric data points for events that happen after the filter was created. Metric filters consist of the following key elements: Filter Pattern A symbolic description of how Amazon CloudWatch Logs should interpret the data in each log event. For example, a log entry could contain timestamps, IP addresses, and strings. Use the pattern to specify what to look for in the log file. Metric Name The name of the Amazon CloudWatch metric to which the monitored log information should be published Metric Namespace The destination namespace of the new Amazon CloudWatch metric Metric Value The data published to the metric. For example, when counting the occurrences of a particular term like “Error,” the value will be “1” for each occurrence. When counting a value like bytes transferred, the published value will be the value in the log event. Amazon CloudWatch Logs supports the rotation of logs. The following file rotation mechanisms are supported: The fingerprint (source ID) of the file is calculated by hashing the log stream key and the first line of file content. To override this behavior, use the file_fingerprint_lines option. When file rotation happens, the new file is supposed to have new content, and the old file is not supposed to have content appended; the agent pushes the new file after it finishes reading the old file. Amazon CloudWatch Logs sends data to Amazon CloudWatch every minute. The AWS/Logs namespace includes the metrics shown in the Table 9.7. TABLE 9.7 AWS/Logs Namespace Metrics Amazon CloudWatch Logs supports the filtering of metrics using the dimensions shown in Table 9.8. TABLE 9.8 Amazon CloudWatch Logs Dimensions Customers can monitor AWS costs using Amazon CloudWatch. With Amazon CloudWatch, it is possible to create billing alerts that send notifications when the usage of provisioned services exceeds a customer-defined limit. When usage exceeds these thresholds, AWS can send an email notification or publish a notification to an Amazon SNS topic. To create billing alerts and register for notifications, enable them in the AWS Billing and Cost Management console. Using the procedure shown, it is possible to sign up to receive notifications from AWS when prices change. Notifications sent every time a price changes: Notifications about price changes once a day: In December 2016, Amazon announced the addition of detailed billing to Amazon CloudWatch Logs. Reports can be generated based on usage and cost per log group. You can use tags to classify log groups and give them categories such as purpose, owner, or environment. You can create a custom set of categories to meet specific needs, and you can also use tags to categorize and track AWS costs. When you apply tags to your AWS resources, including log groups, AWS cost allocation reports include usage and costs aggregated by tags. Using tags is a simple yet powerful way to manage AWS resources and organize data, including billing data. Tags can be applied to resources that represent business categories such as cost centers, application names, or owners to organize costs across multiple services. It is an AWS best practice for customers to tag as many of their resources as possible. The following restrictions apply to tags. Basic Restrictions Tag Key Restrictions Tag Value Restrictions Tag values can be blank—this is the 0 part of the length. At the end of the billing cycle, the total charges (tagged and untagged) on the billing report with cost allocation tags reconciles with the total charges on the Bills page total and other billing reports for the same period. Tags can also be used to filter views in Cost Explorer. In order for tags to appear on your billing reports, they must be activated in the Billing console. To activate tags: Sign in to the AWS Management Console, and open the AWS Billing and Cost Management console: https://console.aws.amazon.com/billing/home#/. If tags are added or changed on a resource partway through a billing period, costs will be split into two separate lines in the Cost Allocation Report. The first line will show costs before the update, and the second line will show costs after the update. Cost Explorer is an AWS tool that can be used to view charts of costs. This spend data can be viewed for up to the past 13 months and used to forecast the spend data for the next 3 months. It can also be used to see patterns in how much is spent on AWS resources over time, identify areas that need further inquiry, and see trends to assist in understanding costs. Cost Explorer can reveal which service is being used the most and which Availability Zone gets the most network traffic. With Cost Explorer, there are a variety of filters: Every time a filter is applied to costs, Cost Explorer creates a new chart. It is possible to use a browser’s bookmark feature to save configuration settings for repeated use. When returning to the saved link, Cost Explorer refreshes the page using current cost data for the time range selected and displays the most recent forecast. This feature makes it easy to save a configuration that is often used, such as “Spend Report – Last Seven Days.” Cost Explorer uses the same dataset used to generate the AWS Cost and Usage Reports and the detailed billing reports. The dataset can also be downloaded as a comma-separated value (CSV) file for detailed analysis. The AWS Billing and Cost Management service sends metrics to Amazon CloudWatch. The AWS/Billing namespace uses the metric in Table 9.9. TABLE 9.9 AWS/Billing Namespace Metric AWS Billing and Cost Management supports filtering metrics using the dimensions in Table 9.10. TABLE 9.10 AWS/ Billing and Cost Management Metrics AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of an AWS account. With AWS CloudTrail, it is possible to log, continuously monitor, and retain events related to API calls across an AWS infrastructure. AWS CloudTrail provides a history of AWS API calls for an account. This includes API calls made through the AWS Management Console, AWS SDKs, command-line tools, and other AWS Cloud services. This history simplifies security analysis, resource change tracking, and troubleshooting. AWS CloudTrail provides visibility into user activity by recording API calls made on an account. It records important information about each API call, including the name of the API, the identity of the caller, the time of the API call, the request parameters, and the response elements returned by the AWS Cloud service. This information can be used in the tracking of changes made to AWS resources and help troubleshoot operational issues. This makes it easier to ensure compliance with internal policies and regulatory standards. A trail is a configuration that enables logging of the AWS API activity and related events in an account. AWS CloudTrail delivers the logs to an Amazon S3 bucket and, optionally, to an Amazon CloudWatch Logs log group. It is possible to specify an Amazon SNS topic that receives notifications of log file deliveries. For a trail that applies to all regions, the trail configuration in each region is identical. You can create trails with the AWS CloudTrail console, the AWS CLI, or the AWS CloudTrail API. There are two types of trails: those that apply to all regions and those that apply to one region. When creating a trail that applies to all regions, AWS CloudTrail creates the same trail in each region. It then records the log files in each region and delivers the log files to an Amazon S3 bucket that is user-specified. This is the default option when you create a trail in the AWS CloudTrail console. A trail that applies to all regions has the following advantages: When applying a trail to all regions, AWS CloudTrail uses the trail created in a particular region to create trails with identical configurations in all other regions in an account. This has the following effects: When creating a trail that applies to one region, AWS CloudTrail records the log files in that region only and delivers log files to a user-specified Amazon S3 bucket. When creating additional individual trails that apply to specific regions, those trails can be set to deliver log files to a single Amazon S3 bucket regardless of region. For both types of trails, it is possible to use an Amazon S3 bucket from any region. If there are different but related user groups such as developers, security personnel, and IT auditors that need access to AWS CloudTrail, create multiple trails per region. This allows each group to receive its own copy of the log files. AWS CloudTrail supports five trails per region. A trail that applies to all regions counts as one trail in every region. To see a list of the trails in all regions, open the Trails page of the AWS CloudTrail console. By default, log files are encrypted using Amazon S3 Server-Side Encryption (SSE). Log files can be stored in an Amazon S3 bucket for as long as they are needed. Amazon S3 lifecycle rules can be defined to archive or delete log files automatically. AWS CloudTrail typically delivers log files within 15 minutes of an API call. In addition, AWS CloudTrail publishes log files multiple times an hour—approximately every five minutes. These log files contain API calls from services that support AWS CloudTrail. AWS CloudTrail captures API calls made directly by a user or on behalf of a user by an AWS Cloud service. Services that make API calls on behalf of users include AWS CloudFormation, AWS Elastic Beanstalk, AWS OpsWorks, and Auto Scaling. For example, an AWS CloudFormation CreateStack call can result in additional API calls to Amazon EC2, Amazon RDS, Amazon EBS, or other services as required by the AWS CloudFormation template. This behavior is normal and expected. To identify whether an API call was made by an AWS Cloud service, review the invokedby field in the AWS CloudTrail event. When creating or updating a trail with the AWS CloudTrail console or using the AWS CLI, the same steps need to be followed. Amazon CloudWatch is a web service that collects and tracks metrics to monitor AWS resources and the applications that run on it. Amazon CloudWatch Logs is a feature of Amazon CloudWatch used specifically to monitor log data. Integration with Amazon CloudWatch Logs enables AWS CloudTrail to send events containing API activity in an AWS account to an Amazon CloudWatch Logs log group. AWS CloudTrail events that are sent to Amazon CloudWatch Logs can trigger alarms according to the metric filters defined by customers. Optionally, you can configure Amazon CloudWatch Alarms to send notifications or make changes to the resources being monitored based on log stream events that metric filters extract. Using Amazon CloudWatch Logs, you can track AWS CloudTrail events alongside events from the operating system, applications, or other AWS Cloud services that are sent to Amazon CloudWatch Logs. AWS CloudTrail adds depth to the monitoring capabilities already offered by AWS. Amazon CloudWatch focuses on performance monitoring and system health, and AWS CloudTrail focuses on API activity. While AWS CloudTrail does not report on system performance or health, you can use AWS CloudTrail in combination with Amazon CloudWatch Logs alarms to create notifications to gain a deeper understanding of AWS resources and their utilization. AWS CloudTrail trail names must meet the following requirements: AWS CloudTrail delivers log files to an Amazon S3 bucket specified during the creation of the trail. Typically, log files appear in the bucket within 15 minutes of the recorded AWS API call or other AWS event. Log files are generally published every five minutes. AWS CloudTrail publishes log files to the Amazon S3 bucket in a gzip archive. In the Amazon S3 bucket, the log file has a formatted name that includes the following elements: This is what a complete log file object name looks like: To retrieve a log file, use the Amazon S3 console, the AWS CLI, or the Amazon S3 API. To find your log files with the Amazon S3 console, do the following: All log files have a .gz extension. You can find the bucket name on the Trails page of the AWS CloudTrail console. Log files are written in JSON format. There are a number of options available for viewing JSON-formatted files, including browser plugins, text editors, and Integrated Development Environments (IDEs). Use your preferred search engine to research more details. It is possible to be notified when AWS CloudTrail publishes new log files to an Amazon S3 bucket. You manage notifications using Amazon SNS. Notifications are optional. To activate them, configure AWS CloudTrail to send update information to an Amazon SNS topic whenever a new log file has been sent. To receive these notifications, subscribe to the topic. To handle notifications programmatically, subscribe an Amazon SQS queue to the topic. AWS CloudTrail integrates with IAM, which controls access to AWS CloudTrail and other AWS resources that AWS CloudTrail requires, including Amazon S3 buckets and Amazon SNS topics. Use IAM to control which AWS users can create, configure, or delete AWS CloudTrail trails, start and stop logging, and access the buckets that contain log information. To administer an AWS CloudTrail trail, grant explicit permissions to IAM users to perform the actions associated with the AWS CloudTrail tasks. For most scenarios, you can accomplish this by using an AWS managed policy that contains predefined permissions. A typical approach is to create an IAM group that has the appropriate permissions and then add individual IAM users to that group. For example, you can create one IAM group for users that should have full access to AWS CloudTrail actions and a separate group for users who should be able to view trail information but not create or change trails. These are the AWS Managed Policies for AWS CloudTrail: AWSCloudTrailFullAccess This policy gives users in the group full access to AWS CloudTrail actions and permissions to manage the Amazon S3 bucket, the log group for Amazon CloudWatch Logs, and an Amazon SNS topic for a trail. AWSCloudTrailReadOnlyAccess This policy lets users in the group view trails and buckets. When creating a trail, the trail logs read-only and write-only management events for your account. If desired, update the trail to specify whether or not the trail should log data events. Data events are object-level API operations that access Amazon S3 object resources, such as GetObject, DeleteObject, and PutObject. Only events that match the trail settings are delivered to the Amazon S3 bucket and Amazon CloudWatch Logs log group. If the event doesn’t match the settings for a trail, the trail doesn’t log the event. To send notifications to an Amazon SNS topic, AWS CloudTrail must have the required permissions. AWS CloudTrail automatically attaches the required permissions to the topic when the following occurs: AWS CloudTrail adds the following fields in the policy automatically: The following policy allows AWS CloudTrail to send notifications about log file delivery from supported regions: Amazon SNS Topic Policy AWS Config is a fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security and governance. AWS Config can discover existing AWS resources, export a complete inventory of AWS resources with all configuration details, and determine how a resource was configured at any point in time. These capabilities enable compliance auditing, security analysis, resource change tracking, and troubleshooting. AWS Config makes it easy to track resource configuration without the need for upfront investments and to avoid the complexity of installing and updating agents for data collection or maintaining large databases. After AWS Config is enabled, continuously updated details can be viewed of all configuration attributes associated with AWS resources. Amazon SNS can be configured to provide notifications of every configuration change. AWS Config provides a detailed view of the configuration of AWS resources in an AWS account. This includes how resources are related to one another and how they were configured in the past to show how configurations and relationships change over time. An AWS resource is an entity in AWS such as an Amazon EC2 instance, an Amazon EBS volume, a security group, or an Amazon Virtual Private Cloud (Amazon VPC). With AWS Config, you can do the following: When running applications on AWS, resources must be created and managed collectively. As the demand for an application grows, so too does the need to keep track of the addition of AWS resources. AWS Config is designed to help oversee application resources in the following scenarios. To exercise better governance over resource configurations and to detect resource misconfigurations, fine-grained visibility is needed into what resources exist and how these resources are configured at any time. Use AWS Config to automatically send notifications whenever resources are created, modified, or deleted. There is no need to monitor these changes by polling calls made to each individual resource. Use AWS Config rules to evaluate the configuration settings of AWS resources. When AWS Config detects that a resource violates the conditions in one of the established rules, AWS Config flags the resource as noncompliant and sends a notification. AWS Config continuously evaluates resources as they are created, changed, or deleted. Some data requires frequent audits to ensure compliance with internal policies and best practices. To demonstrate compliance, access is needed to the historical configurations of the resources. This information is provided by AWS Config. When using multiple AWS resources that depend on one another, a change in the configuration of one resource might have unintended consequences on related resources. With AWS Config, it is possible to view how one resource is related to other resources and assess the impact of the proposed change. The historical configurations of resources provided by AWS Config can assist troubleshooting issues by providing access to the last known good configuration of a problem resource. To analyze potential security weaknesses, detailed historical information about AWS resource configurations is required. This information could include the IAM permissions that are granted to your users or the Amazon EC2 security group rules that control access to your resources. Use AWS Config to view the IAM policy that was assigned to an IAM user, group, or role at any time in which AWS Config was recording. This information can help determine the permissions that belonged to a user at a specific time. Use AWS Config to view the configuration of Amazon EC2 security groups and the port rules that were open at a specific time. This information can help determine whether a security group was blocking incoming TCP traffic to a specific port. An AWS Config rule represents desired configurations for a resource, and it is evaluated against configuration changes on the relevant resources, as recorded by AWS Config. The results of evaluating a rule against the configuration of a resource are available on a dashboard. Using AWS Config rules, customers can assess their overall compliance and risk status from a configuration perspective, view compliance trends over time, and pinpoint which configuration change caused a resource to drift out of compliance with a rule. A rule represents desired Configuration Item (CI) attribute values for resources, which are evaluated by comparing those attribute values with CIs recorded by AWS Config. There are two types of rules: AWS managed rules and customer managed rules. AWS managed rules are prebuilt and managed by AWS. Choose the rule to enable and then supply a few configuration parameters to get started. It is possible to develop custom rules and add them to AWS Config. Associate each custom rule with an AWS Lambda function. This Lambda function contains the logic that evaluates whether AWS resources comply with the rule. Associate this function with a rule, and the rule invokes the function either in response to configuration changes or periodic intervals. The function then evaluates whether resources comply with the rule, and it sends its evaluation results to AWS Config. Using AWS Config rules, customers can assess their overall compliance and risk status from a configuration perspective, view compliance trends over time, and pinpoint which configuration change caused a resource to drift out of compliance with a rule. Customers can create up to 50 AWS Config rules in an AWS account by default. This is a soft limit, and it can be increased by contacting AWS Support. Any rule can be set up as a change-triggered rule or as a periodic rule. A change-triggered rule is executed when AWS Config records a configuration change for any of the resources specified. Additionally, one of the following must be specified: Tag Key A tag key:value implies any configuration changes recorded for resources with the specified tag key:value will trigger an evaluation of the rule. Resource Type(s) Any configuration changes recorded for any resource within the specified resource type(s) will trigger an evaluation of the rule. Resource ID Any changes recorded to the resource specified by the resource type and resource ID will trigger an evaluation of the rule. A periodic rule is triggered at a specified frequency. Available frequencies are 1 hour, 3 hours, 6 hours, 12 hours, or 24 hours. A periodic rule has a full snapshot of current CIs for all resources available to the rule. A CI is the configuration of a resource at a given point in time. A CI consists of five sections: Evaluation of a rule determines whether a rule is compliant with a resource at a particular point in time. It is the result of evaluating a rule against the configuration of a resource. AWS Config rules will capture and store the result of each evaluation. This result will include the resource, rule, time of evaluation, and a link to the CI that caused noncompliance. A resource is compliant if it conforms with all rules that apply to it. Otherwise, it is noncompliant. Similarly, a rule is compliant if all resources evaluated by the rule comply with the rule. Otherwise, it is noncompliant. In some cases, such as when inadequate permissions are available to the rule, an evaluation may not exist for the resource, leading to a state of insufficient data. This state is excluded from determining the compliance status of a resource or rule. AWS CloudTrail records user API activity on an account and allows access to information about this activity. You can use AWS CloudTrail to get full details about API actions, such as identity of the caller, the time of the API call, the request parameters, and the response elements returned by the AWS Cloud service. AWS Config records point-in-time configuration details for AWS resources as CIs. You can use a CI to answer “What did my AWS resource look like?” at a point in time. You can use AWS CloudTrail to answer “Who made an API call to modify this resource?” In practice, you can use the AWS Config console to detect that a security group was incorrectly configured in the past. With the integrated AWS CloudTrail information, you can find the user that misconfigured the security group and learn when it happened. Each custom rule is simply an AWS Lambda function. When the function is invoked in order to evaluate a resource, it is provided with the resource’s CI. The function can inspect the item and make calls to other AWS API functions as desired. After the AWS Lambda function makes its decision about compliance, it calls the PutEvaluations function to record the decision. With AWS Config, customers are charged based on the number of CIs recorded for supported resources in an AWS account and are charged only once for recording the CI. There is no additional fee or any upfront commitment for retaining the CI. Users can stop recording CIs at any time and continue to access the CIs previously recorded. Charges per CI are rolled up into the monthly bill. If you are using AWS Config rules, charges are based on active AWS Config rules in that month. When a rule is compared with an AWS resource, the result is recorded as an evaluation. A rule is active if it has one or more evaluations in a month. Configuration snapshots and configuration history files are delivered to an Amazon S3 bucket. Configuration change notifications are delivered via Amazon SNS. Standard rates for Amazon S3 and Amazon SNS apply. Customer-managed rules are authored using AWS Lambda. Standard rates for AWS Lambda apply. Amazon CloudWatch monitors AWS resources and the applications run on AWS in real time. Use CloudWatch to collect and track metrics, which are variables used to measure resources and applications. Amazon CloudWatch Alarms send notifications or automatically make changes to the resources being monitored based on customer-defined rules. This monitoring data can be used to determine whether additional instances should be launched in order to handle the increased load or stop under-utilized instances to save money. In addition to monitoring the built-in metrics that come with AWS, custom metrics can be imported and monitored. Custom metrics can include detailed information about an Amazon EC2 Instance or data from servers running in an on-premises datacenter. Amazon CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health. AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of an AWS account. With CloudTrail, customers can log, continuously monitor, and retain events related to API calls across an AWS infrastructure. AWS CloudTrail provides a history of AWS API calls for an account. This includes API calls made through the AWS Management Console, AWS SDKs, command-line tools, and other AWS services. This history simplifies security analysis, resource change tracking, and troubleshooting. AWS Config is a service that enables customers to assess, audit, and evaluate the configurations of their AWS resources. Config continuously monitors and records your AWS resource configurations and allows the automation of the evaluation of recorded configurations against desired configurations. With AWS Config, review changes in configurations and relationships between AWS resources, dive into detailed resource configuration histories, and determine the overall compliance against the configurations specified in internal guidelines. This simplifies compliance auditing, security analysis, change management, and operational troubleshooting. Yes, this book is about systems operations. However, everyone should be aware of the principle behind the Well-Architected Framework, as it is part of all three associate level exams. Be familiar with Amazon CloudWatch. Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications that you run on AWS. You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, and set alarms. Amazon CloudWatch can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and services and any log files your applications generate. You can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. You can use these insights to react and keep your application running smoothly. Amazon CloudWatch Events has three components: Events, Rules, and Targets. Events indicate a change in an AWS environment. Targets process events. Rules match incoming events and route them to targets for processing. Understand what Amazon CloudWatch Alarms is and what it can do. Amazon CloudWatch Logs lets you monitor and troubleshoot systems and applications using existing system, application, and custom log files. With Amazon CloudWatch Logs, monitor logs in near real time for specific phrases, values, or patterns. Log data can be stored and accessed indefinitely in highly durable, low-cost storage without filling hard drives. Be able to create or edit an Amazon CloudWatch Alarm. You can choose specific metrics to trigger the alarm and specify thresholds for those metrics. You can then set your alarm to change state when a metric exceeds a threshold that you have defined. Know how to create a monitoring plan. Creating a monitoring plan involves answering some basic questions. What are your goals for monitoring? What resources will you monitor? How often will you monitor these resources? What monitoring tools will you use? Who will perform the monitoring tasks? Who should be notified when something goes wrong? Know and understand custom metrics. You can now store your business and application metrics in Amazon CloudWatch. You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that Amazon CloudWatch already stores for your AWS resources. Visibility for metrics above the Hypervisor requires an agent. Amazon CloudWatch can tell you CPU and RAM utilization at the Hypervisor level but it has no way of knowing what specific tasks or processes are affecting performance. CloudWatch can see disk I/O but cannot see disk usage. To do this requires an agent. Be familiar with what an Amazon CloudWatch Alarm is and how it works. You can create a CloudWatch Alarm that watches a single metric. The alarm performs one or more actions based on the value of the metric relative to a threshold over a number of time periods. The action can be an Amazon EC2 action, an Auto Scaling action, or a notification sent to an Amazon SNS topic. Know the three states of an Amazon CloudWatch Alarm. These are OK, ALARM, and INSUFFICIENT_DATA. If an alarm is in the OK state, a monitored metric is within the range you have defined as acceptable. If the alarm is in the ALARM state, the metric has breached a threshold. If data is missing or incomplete, it is in the INSUFFICIENT_DATA state. There are two levels of monitoring: Basic and Detailed. Basic Monitoring for Amazon EC2 sends CPU load, disk I/O, and network I/O metric data to Amazon CloudWatch in five-minute periods by default. To send metric data for an instance to CloudWatch in one-minute periods, enable Detailed Monitoring on the instance. Some services, like Amazon RDS, have Detailed Monitoring on by default. Amazon CloudWatch performs two types of Amazon EC2 status checks: System and Instance. A System Status Check monitors the AWS systems required to use your instance to ensure that they are working properly. These checks detect problems with your instance that require AWS involvement to repair. An Instance Status Check monitors the software and network configuration of your individual instance. These checks detect problems that require your involvement to repair. Be familiar with some common metrics used for monitoring. There are many metrics available, and not all of them are tested on the exam. Some are tested, however, and it is a good idea to know the common ones: VolumeQueueLength, DatabaseConnections, DiskQueueDepth, FreeStorageSpace, ReplicaLag, ReadIOPS, WriteIOS, ReadLatency, WriteLatency, SurgeQueueLength, and SpilloverCount. Know how to set up an Amazon CloudWatch Event subscription for Amazon RDS. Amazon RDS uses the Amazon Simple Notification Service (Amazon SNS) to provide notification when an Amazon RDS event occurs. These notifications can be in any notification form supported by Amazon SNS for an AWS Region, such as an email, a text message, or a call to an HTTP endpoint. Details can be found here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html. Amazon ElastiCache has two engines available: Redis and Memcached. Memcached is multi-threaded, and Redis is single-threaded. Know the Amazon EBS Volume Status Checks. Degraded and Severely Degraded performance means that the EBS Volume Status Check is in a Warning state and there is some I/O. Stalled or Not Available means that the EBS Volume is in the Impaired state, and there is no I/O. Learn the metrics SurgeQueueLength and SpilloverCount. When the SurgeQueueLength is exceeded, spillover occurs. Requests are dropped without notifying end users. Customer experience is negatively impacted. The more a learner focuses on the meaning of information being presented, the more elaborately he or she will process the information. This principle is so obvious that it is easy to miss. What it means is this: When you are trying to drive a piece of information into your brain’s memory systems, make sure you understand exactly what that information means. Medina, John. Brain Rules (Updated and Expanded): 12 Principles for Surviving and Thriving at Work, Home, and School (p. 139). Pear Press. Kindle Edition. As you prepare for your exam, make sure that you are worried less about facts and figures and more about how to accomplish tasks associated with systems operations. Worry less about the tool and more about what the tool does and how it integrates with other tools. Yes, facts and figures are important. You will need to know things like what an Amazon SQS queue is and how it differs from an Amazon SNS topic. Instead of trying to memorize all of the things an Amazon SQS queue can do, study it in terms of how it solves a problem that you have. If you understand how something like how Amazon SQS can improve your environment by decoupling computing resources, the details will fall into place on their own. It will no longer be a fact that you’ve memorized, but rather something that makes your life easier. By now you should have set up an account in AWS. If you haven’t, now would be the time to do so. It is important to note that these exercises are in your AWS account and thus are not free. Use the Free Tier when launching resources. For more information, see https://aws.amazon.com/s/dm/optimization/server-side-test/free-tier/free_np/. If you have not yet installed the AWS Command Line utilities, refer to Chapter 2, “Working with AWS Cloud Services,” Exercise 2.1 (Linux) or Exercise 2.2 (Windows). The reference for the AWS CLI can be found at http://docs.aws.amazon.com/cli/latest/reference/. Search for Available Metrics. It is possible to search within all of the metrics in an account using targeted search terms. Metrics are returned that have matching results within their namespace, metric name, or dimensions. This exercise assumes the existence of an Amazon EC2 instance backed with an Amazon EBS volume. Select a namespace with the results of the search to view the metrics. Perform the following: View Available Metrics for Running Amazon EC2 Instances by Namespace and Dimension Using the Amazon CloudWatch Console. This exercise requires at least one running Amazon EC2 instance. Select Per-Instance Metrics. The All Metrics tab displays all metrics for that dimension in the namespace. In this window, the following can be performed: The following graphic shows an example of this window. Available metrics will vary. View Available Metrics by Namespace, Dimension, or Metric Using the AWS CLI. To view all of the metrics for Amazon EC2 in the AWS/EC2 namespace, use the following command from the AWS CLI: The JSON output will look something like this: The following graphic shows the output in table format instead of JSON. Here is the command: List All Available Metrics for a Specific Resource. Specify the AWS/EC2 namespace and the InstanceId dimension to view the results for a single instance. In this example, the instance ID is i-0ab76393e31b62ec2. List all Resources that Use a Single Metric. Specify the AWS/EC2 namespace and the metric CPUUtilization to view the results for the single metric. Get Statistics for a Specific Resource. The following exercise displays the maximum CPU utilization of a specific Amazon EC2 instance using the Amazon CloudWatch console. You must have an ID of an Amazon EC2 instance. Retrieve the instance ID using the AWS Management Console or the describe-instances command. By default, basic monitoring is enabled and collects metrics at five-minute intervals. Detailed monitoring can be enabled to capture data points at one-minute intervals. This will incur additional charges. Open the Amazon CloudWatch console: Get CPU Utilization for a Single Amazon EC2 Instance from the Command Line. You must have an ID of an Amazon EC2 instance. Retrieve the instance ID using the management console or the describe-instances command. By default, basic monitoring is enabled and collects metrics at five-minute intervals. Detailed monitoring can be enabled to capture data points at one-minute intervals. This will incur additional charges. From the AWS CLI, you will use the command get-metric-statistics and the metric name CPUUtilization. The returned statistics are six-minute values for the requested 24-hour time interval. Each value represents the maximum CPU utilization percentage for the specified instance for a particular six-minute time period. Data points are not returned in chronological order. In Linux, the backslashes () allow commands to span more than one line and are presented here for readability. The output in JSON looks like this: The following graphic shows the output in table format instead of JSON. Create a Billing Alert. In order to create a billing alarm, billing alerts must first be enabled. This only needs to be done once and, once enabled, billing alerts cannot be disabled. Create a Billing Alarm. To create a billing alarm, billing alerts must be turned on first. These steps send alerts to a single email address. Choose Create Alarm. Define the alarm as follows: Create an Amazon CloudWatch Dashboard. To get started with Amazon CloudWatch dashboards, first create a dashboard. Which of the following requires a custom Amazon CloudWatch metric to monitor? While using Auto Scaling with an ELB in front of several Amazon EC2 instances, you want to configure Auto Scaling to remove one instance when CPU Utilization is below 20 percent. How is this accomplished? Your company has configured the custom metric upload with Amazon CloudWatch, and it has authorized employees to upload data using AWS CLI as well as AWS SDK. How can you track API calls made to CloudWatch? Of the services listed here, which provide detailed monitoring without extra charges being incurred? (Choose two.) You have configured an ELB Classic Load Balancer to distribute traffic among multiple Amazon EC2 instances. Which of the following will aid troubleshooting efforts related to back-end servers? What is the minimum time interval for data that Amazon CloudWatch receives and aggregates? Using the Free Tier, what is the frequency of updates received by Amazon CloudWatch? The type of monitoring automatically available in five-minute periods is called what? You have created an Auto Scaling Group using the AWS CLI. You now want to enable Detailed Monitoring for this group. How is this accomplished? There are 10 Amazon EC2 instances running in multiple regions using an internal memory management tool to capture log files and send them to Amazon CloudWatch in US-West-2. Additionally, you are using the AWS CLI to configure CloudWatch to use the same namespace and metric in all regions. Which of the following is true? You have misconfigured an Amazon EC2 instance’s clock and are sending data to Amazon CloudWatch via the API. Because of the misconfiguration, logs are being sent 60 minutes in the future. Which of the following is true? You have a system that sends data to Amazon CloudWatch every five minutes for tracking/ monitoring. Which of these parameters is required as part of the put-metric-data request? To monitor API calls against AWS use _______________ to capture the history API requests and use _______________ to respond to operational changes in real time.
Metrics
Custom Metrics
Amazon CloudWatch Metrics Retention
Namespaces
AWS Product
Namespace
Auto Scaling
AWS/AutoScaling
Amazon EC2
AWS/EC2
Amazon EBS
AWS/EBS
Elastic Load Balancing
AWS/ELB (Classic Load Balancers)
Elastic Load Balancing
AWS/ApplicationELB (Application Load Balancers)
For a comprehensive list of namespaces, see http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-namespaces.html.
Dimensions
Dimension Combinations
Dimensions: Server=Prod, Domain=Titusville, Unit: Count, Timestamp: 2017-05-18T12:30:00Z, Value: 105
Dimensions: Server=Test, Domain=Titusville, Unit: Count, Timestamp: 2017-05-18T12:31:00Z, Value: 115
Dimensions: Server=Prod, Domain=Rockets, Unit: Count, Timestamp: 2017-05-18T12:32:00Z, Value: 95
Dimensions: Server=Test, Domain=Rockets, Unit: Count, Timestamp: 2017-05-18T12:33:00Z, Value: 97
Statistics
Statistic
Description
Minimum
The lowest value observed during the specified period. Use this value to determine low volumes of activity for an application.
Maximum
The highest value observed during the specified period. Use this value to determine high volumes of activity for an application.
Sum
All values submitted for the matching metric added together. This statistic can be useful for determining the total volume of a metric.
Average
The value of Sum/SampleCount during the specified period. By comparing this statistic with Minimum and Maximum, the full scope of a metric can be determined, and it is possible to discover how close average use is to the Minimum and Maximum.
SampleCount
The number of data points used for the statistical calculation.
pNN.NN
The value of the specified percentile. You can specify any percentile using up to two decimal places.
Units
Periods
Aggregation
Dashboards
Percentiles
Monitoring Baselines
Amazon EC2 Status Checks
Item to Monitor
Amazon EC2 Metric
CPU utilization
CPUUtilization
Memory utilization
Requires an agent
Memory used
Requires an agent
Memory available
Requires an agent
Network utilization
NetworkIn NetworkOut
Disk performance
DiskReadOps DiskWriteOps
Disk Swap utilization (Linux instances)
Requires an agent
Swap used (Linux instances)
Requires an agent
Page File utilization (Windows instances only)
Requires an agent
Page File used (Windows instances only)
Requires an agent
Page File available (Windows instances only)
Requires an agent
Disk Reads/Writes
DiskReadBytes DiskWriteBytes
Disk Space utilization (Linux instances)
Requires an agent
Disk Space used (Linux instances)
Requires an agent
Disk Space available (Linux instances only)
Requires an agent
System Status Checks
Instance Status Checks
Authentication and Access Control
Permissions Required to Use the Amazon CloudWatch Console
AWS Managed Policies for Amazon CloudWatch
Amazon CloudWatch Resources and Operations
AWS Cloud Services Integration
Amazon CloudWatch Limits
Resource
Default Limit
Actions
5/alarm. This limit cannot be changed.
Alarms
10/month/customer for no additional charge 5,000 per region per account
API requests
1,000,000/month/customer for no additional charge
Custom metrics
No limit
DescribeAlarms
3 Transactions per Second (TPS) This is the maximum number of operation requests that can be made per second without being throttled. A limit increase can be requested.
Dimensions
10/metric. This limit cannot be changed.
GetMetricStatistics
400 TPS This is the maximum number of operation requests per second before being throttled. A limit increase can be requested.
ListMetrics
25 TPS This is the maximum number of operation requests per second before being throttled. A limit increase can be requested.
Metric data
15 months. This limit cannot be changed.
MetricDatum items
20/PutMetricData request A MetricDatum object can contain a single value or a StatisticSet object representing many values. This limit cannot be changed.
Metrics
10/month/customer for no additional charge
Period
One day (86,400 seconds) This limit cannot be changed.
PutMetricAlarm request
3 TPS The maximum number of operation requests you can make per second without being throttled. A limit increase can be requested.
PutMetricData request
40 KB for HTTP POST requests 150 TPS The maximum number of operation requests that you can make per second without being throttled. A limit increase can be requested.
Amazon SNS email notifications
1,000/month/customer for no additional charge
Amazon CloudWatch Alarms
Alarms and Thresholds
Missing Data Points
Common Amazon CloudWatch Metrics
Amazon EC2
System Status Checks
Instance Status Checks
Amazon Elastic Block Store Volume Monitoring
Amazon EBS Status Checks
Amazon ElastiCache
Amazon RDS Metrics
AWS Elastic Load Balancer
Amazon CloudWatch Events
Events
Rules
Targets
Metrics and Dimensions
Metric
Description
Invocations
FailedInvocations
TriggeredRules
MatchedEvents
ThrottledRules
Amazon CloudWatch Logs
Archived Data
Log Monitoring
Agents
Amazon CloudWatch Logs Agent for Linux
Amazon CloudWatch Logs: Agents and IAM
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Resource": [
"arn:aws:logs:*:*:*"
]
}
]
}
Amazon CloudWatch Logs Agent for Windows
Operating System
Agent
Notes
Windows Server 2016
SSM Agent
The EC2Config service is not supported on Windows Server 2016.
Windows Server 2008-2012 R2
EC2Config or SSM Agent
If an instance is running EC2Config version 3.x or earlier, then the EC2Config service sends log data to Amazon CloudWatch. If an instance is running EC2Config version 4.x or later, then SSM Agent sends log data to Amazon CloudWatch.
Searching and Filtering Log Data
Amazon CloudWatch Logs Metrics and Dimensions
Metrics
Metric
Description
IncomingBytes
The volume of log events in uncompressed bytes uploaded to Amazon CloudWatch Logs. When used with the LogGroupName dimension, this is the volume of log events in uncompressed bytes uploaded to the log group. Valid Dimensions: LogGroupName Valid Statistic: Sum Units: Bytes
IncomingLogEvents
The number of log events uploaded to Amazon CloudWatch Logs. When used with the LogGroupName dimension, this is the number of log events uploaded to the log group. Valid Dimensions: LogGroupName Valid Statistic: Sum Units: None
ForwardedBytes
The volume of log events in compressed bytes forwarded to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: Bytes
ForwardedLogEvents
The number of log events forwarded to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None
DeliveryErrors
The number of log events for which Amazon CloudWatch Logs received an error when forwarding data to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None
DeliveryThrottling
The number of log events for which Amazon CloudWatch Logs was throttled when forwarding data to the subscription destination. Valid Dimensions: LogGroupName, DestinationType, FilterName Valid Statistic: Sum Units: None
Dimensions
Dimension
Description
LogGroupName
The name of the Amazon CloudWatch Logs log group from which to display metrics
DestinationType
The subscription destination for the Amazon CloudWatch Logs data, which can be AWS Lambda, Amazon Kinesis Streams, or Amazon Kinesis Firehose
FilterName
The name of the subscription filter that is forwarding data from the log group to the destination. The subscription filter name is automatically converted by Amazon CloudWatch to ASCII and any unsupported characters get replaced with a question mark (?).
Monitoring AWS Charges
Steps to Configure Price Change Notifications
arn:aws:sns:us-east-1:278350005181:price-list-api
arn:aws:sns:us-east-1:278350005181:daily-aggregated-price-list-api
Detailed Billing
Tags and Log Groups
Log Group Tag Restrictions
Steps to Configure Billing Tags
Cost Explorer
AWS Billing and Cost Management Metrics and Dimensions
Metrics
Metric
Description
EstimatedCharges
The estimated charges for AWS usage. This can be either estimated charges for one service or a roll-up of estimated charges for all services.
Dimensions
Dimension
Description
ServiceName
The name of the AWS Cloud service This dimension is omitted for the total of estimated charges across all services.
LinkedAccount
The linked account number This is used for Consolidated Billing only. This dimension is included only for accounts that are linked to a separate paying account in a Consolidated Billing relationship. It is not included for accounts that are not linked to a Consolidated Billing paying account.
Currency
The monetary currency to bill the account. This dimension is required. Unit: USD
AWS CloudTrail
What Are Trails?
Types of Trails
Trails that Apply to All Regions
A Trail that Applies to One Region
Multiple Trails per Region
Encryption
AWS CloudTrail Log Delivery
Overview: Creating a Trail
Monitoring with AWS CloudTrail
AWS CloudTrail vs. Amazon CloudWatch
AWS CloudTrail: Trail Naming Requirements
Getting and Viewing AWS CloudTrail Log Files
Finding AWS CloudTrail Log Files
bucket/prefix/AWSLogs/AccountID/CloudTrail/region/YYYY/MM/DD/file_name.json.gz
Retrieve Log Files
Configuring Amazon SNS Notifications for AWS CloudTrail
Controlling User Permissions for AWS CloudTrail
Granting Permissions for AWS CloudTrail Administration
Log Management and Data Events
Amazon SNS Topic Policy for AWS CloudTrail
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AWSCloudTrailSNSPolicy20131101",
"Effect": "Allow",
"Principal": {"Service": "cloudtrail.amazonaws.com"},
"Action": "SNS:Publish",
"Resource": "arn:aws:sns:Region:SNSTopicOwnerAccountId:SNSTopicName"
}]
}
AWS Config
Ways to Use AWS Config
Resource Administration
Auditing and Compliance
Managing and Troubleshooting Configuration Changes
Security Analysis
AWS Config Rules
AWS Managed Rules
Customer Managed Rules
How Rules Are Evaluated
Configuration Items
Rule Evaluation
Rule Compliance
AWS Config and AWS CloudTrail
Pricing
Summary
Resources to Review
Exam Essentials
Test Taking Tip
Exercises
EXERCISE 9.1
EXERCISE 9.2
EXERCISE 9.3
aws cloudwatch list-metrics --namespace AWS/EC2
{
"Metrics": [
{
"Namespace": "AWS/EC2",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-0ab76393e31b62ec2"
}
],
"MetricName": "DiskWriteBytes"
},
{
"Namespace": "AWS/EC2",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-0ab76393e31b62ec2"
}
],
"MetricName": "DiskReadOps"
},
...
]
}
aws cloudwatch list-metrics --namespace AWS/EC2 --output table
EXERCISE 9.4
aws cloudwatch list-metrics --namespace AWS/EC2 --dimensions Name=InstanceId,Value= i-0ab76393e31b62ec2
EXERCISE 9.5
aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization
EXERCISE 9.6
Requirements
EXERCISE 9.7
Requirements
aws cloudwatch get-metric-statistics
--namespace AWS/EC2
--metric-name CPUUtilization
--dimensions Name=InstanceId,Value=i-0e022540492655edd
--statistics Maximum
--start-time 2017-05-01T09:28:00
--end-time 2017-05-02T09:28:00
--period 360
--output json
{
"Datapoints": [
{
"Timestamp": "2017-05-01T13:58:00Z",
"Maximum": 0.67,
"Unit": "Percent"
},
{
"Timestamp": "2017-05-02T01:46:00Z",
"Maximum": 0.83,
"Unit": "Percent"
},
{
"Timestamp": "2017-05-02T05:52:00Z",
"Maximum": 0.67,
"Unit": "Percent"
},
{
"Timestamp": "2017-05-01T16:22:00Z",
"Maximum": 0.49,
"Unit": "Percent"
},
...
{
"Timestamp": "2017-05-02T06:52:00Z",
"Maximum": 0.67,
"Unit": "Percent"
}
],
"Label": "CPUUtilization"
}
aws cloudwatch get-metric-statistics
--namespace AWS/EC2
--metric-name CPUUtilization
--dimensions Name=InstanceId,Value=i-0e022540492655edd
--statistics Maximum
--start-time 2017-05-01T09:28:00
--end-time 2017-05-02T09:28:00
--period 360
--output table
EXERCISE 9.8
EXERCISE 9.9
EXERCISE 9.10
Review Questions