Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13
Network Troubleshooting

THE AWS CERTIFIED ADVANCED NETWORKING – SPECIALTY EXAM OBJECTIVES COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:

Domain 2.0: Design and Implement AWS Networks
2.1 Apply AWS networking concepts

Domain 6.0: Manage, Optimize, and Troubleshoot the Network
6.1 Troubleshoot and resolve a network issue

images

Introduction to Network Troubleshooting

AWS provides a number of networking features to connect within the AWS Cloud, outside the AWS Cloud to the Internet, and in a hybrid manner to an on-premises environment. This chapter discusses tools and techniques for troubleshooting networking issues that can arise with these connections. In addition, the chapter discusses a number of common troubleshooting scenarios. Knowledge of AWS troubleshooting tools and how to troubleshoot common scenarios are both required skills for the exam, which are highlighted in this chapter.

Methodology for Troubleshooting

Troubleshooting can follow either a bottom-up approach (traversing through the Open Systems Interconnection [OSI] model one by one) or a top-down approach (working through likely areas that can cause issues). There are scenarios where each has its own merits, and they can be used in combination to help resolve issues quickly. The approach can also change based on the environment in which the troubleshooting is occurring.

Traversing through the OSI model systematically from Layer 1 through Layer 7 is often a useful way to pinpoint issues. Such a method can be optimized by taking into account the environment. For example, there is implicit routing for all subnets by default within a Virtual Private Cloud (VPC). This can rule out Layer 2 and Layer 3 communication issues within a VPC; thus, troubleshooting should start at Layer 4. In another example, when custom routing is set up through Amazon Elastic Compute Cloud (Amazon EC2) instances, Layer 3 troubleshooting may be required to ensure that routing is occurring as expected.

Stepping back and taking a top-down approach to pinpointing potential areas for network issues is also a valuable way to troubleshoot. Knowing service limits, for example, can help resolve otherwise difficult issues to fix. Being able to recognize security group and network Access Control List (ACL) issues without having to dig through the network stack layer-by-layer is also another example of how this approach can be helpful.

Network Troubleshooting Tools

AWS offers a rich set of tools that can be combined with traditional tools to help troubleshoot networking connectivity issues. Both traditional tools and AWS-native tools are discussed in this section.

Traditional Tools

In this section, we discuss traditional network troubleshooting tools, many of which you may be already familiar with.

Packet Captures

For troubleshooting when deep packet inspection is necessary, packet captures can be useful. Packet capture tools like Wireshark (Windows/Linux) and tcpdump (Linux) can be run on an Amazon EC2 instance. By listening at the interface level, these tools are able to view the packets as they are sent to and received from the network, revealing both packet header and payload.

ping

ping is a utility that records network round trip times using the Internet Control Message Protocol (ICMP). It is commonly used to test if a host is up and responsive on a network. ping can be useful for troubleshooting within AWS. It is important to note that network ACLs, security groups, and operating system firewalls must all be configured to allow ICMP traffic for this tool to be useful. Note that ICMP traffic is typically not enabled by default on many network devices and operating systems.

traceroute

traceroute is a utility that discovers the path to a destination IP or hostname. This tool can be helpful in verifying the route that traffic is following through a network. It works by sending out an ICMP packet with increasing Time-To-Live (TTL) values. Note that not all devices in a network path will respond to the ICMP request, so there may not be a value for all hops in the route. In addition, this tool will not provide meaningful results within a VPC or across VPC peering links because each is only one network hop away.

Telnet

Telnet is a text-based TCP utility. While the default telnet port is 23, telnet can be set to initiate a TCP connection on any user-specified port. This can be very helpful for troubleshooting if a service is running on a port and responding to traffic.

nslookup

nslookup is a command-line utility that resolves hostnames into IP addresses. It can be useful in network troubleshooting to confirm your Domain Name System (DNS) server settings and determine to what IP address a hostname is being resolved.

AWS-Native Tools

In this section, we discuss AWS-native tools for troubleshooting, which provide additional insight to augment traditional troubleshooting tools.

Amazon CloudWatch

Amazon CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

The services shown in Table 13.1 send network metrics to Amazon CloudWatch that can be useful in troubleshooting.

TABLE 13.1 Amazon CloudWatch Metrics

Amazon CloudWatch Metric	Description
Amazon EC2	Sends metrics to Amazon CloudWatch recording the number of bytes and packets in and out of each Amazon EC2 instance
Amazon VPC Virtual Private Network (VPN)	Sends metrics to Amazon CloudWatch recording tunnel state and bytes in and out
AWS Direct Connect	Sends metrics to Amazon CloudWatch recording connection state, bits per second egress and ingress, packets per second egress and ingress, Cyclic Redundancy Check (CRC) error count, and connection-light level egress and ingress (only for 10 Gbps port speeds)
Amazon Route 53	Sends metrics to Amazon CloudWatch recording health check count, connection time, health check percentage, health check status, Secure Sockets Layer (SSL) handshake time, and time to first byte
Amazon CloudFront	Sends metrics to Amazon CloudWatch recording requests, bytes downloaded and uploaded, bytes uploaded, total error rate, 4xx error rate, and 5xx error rate
Elastic Load Balancing	Sends metrics to Amazon CloudWatch recording healthy host count, 4xx and 5xx load balancer error count, back-end/target error count (2xx, 3xx, 4xxx, and 5xx), and a number of additional metrics
Amazon Relational Database Service (Amazon RDS)	Sends metrics to Amazon CloudWatch recording network receive and transmit throughput
Amazon Redshift	Sends metrics to Amazon CloudWatch recording network receive and transmit throughput

Note: There are many more Amazon CloudWatch metrics that are recorded.

Amazon VPC Flow Logs

Amazon VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data is stored using Amazon CloudWatch Logs. After you have created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs.

Flow logs can help you with a number of tasks, such as troubleshooting why specific traffic is not reaching an instance. If you see that the traffic is logged in flow logs, then you know it is reaching the VPC. In turn, if you notice that there is a DENY entry, then it points to a permission issue and can help you diagnose overly restrictive security group rules or network ACLs. You can also use flow logs as a security tool to monitor the traffic that is reaching your instance. Flow logs can also be exported to other services like the Amazon Elasticsearch Service to gain insight into traffic and to enable visualizations.

AWS Config

With AWS Config, you can capture a comprehensive history of your AWS resource configuration changes to simplify troubleshooting of your operational issues. AWS Config can be used to help identify AWS resource changes that may have caused operational issues. AWS Config leverages AWS CloudTrail records to correlate configuration changes to particular events in your account. You can obtain the details of the event Application Programming Interface (API) call that invoked the change (for example, who made the request, at what time, and from which IP address) from the AWS CloudTrail logs.

AWS Trusted Advisor

AWS Trusted Advisor is an online resource to help you reduce cost, increase performance, and improve security by optimizing your AWS environment. AWS Trusted Advisor provides real-time guidance to help you provision your resources following AWS best practices. Part of the metrics and recommendations that are reported are network-related service limits around VPCs, elastic IPs, and load balancers. These service limit metrics can help quickly identify whether a service limit has been reached, and they allow you to request limit increases proactively.

AWS Identity and Access Management (IAM) Policy Simulator

The AWS Identity and Access Management (IAM) Policy Simulator is a very useful tool in troubleshooting IAM permissions. The simulator evaluates the policies that you choose and determines the effective permissions for each of the actions that you specify. The simulator uses the same policy evaluation engine that is used during real requests to AWS Cloud services.

Troubleshooting Common Scenarios

There are some common scenarios and technologies that can experience network connectivity issues. A description of each, situations in which they occur, and key points to consider in troubleshooting are discussed in this section.

Internet Connectivity

New VPCs do not have public Internet connectivity by default. User action is required to set up the appropriate requirements for Internet connectivity.

There are five requirements for connectivity to the Internet from an Amazon EC2 instance:

A public IP address is assigned to an instance (note that for IPv6, all addresses are assigned by AWS and are public) or a Network Address Translation (NAT) gateway with a public IP in a public subnet.
An Internet gateway is attached to the VPC.
There is a default route to an Internet gateway in the route table on the public subnet. If a NAT gateway is used, then the default route on private subnets should be the NAT gateway instance.
Outbound ports are open in the instance security group (80 and 443 for web traffic).
Inbound and outbound ports are open in the subnet network ACL (80 and 443 outbound and ephemeral port range inbound).

Virtual Private Network

AWS provides a managed Virtual Private Network (VPN) service to allow for easy connectivity between on-premises environments and VPCs. After you have created your VPN, you can download the IP Security (IPsec) VPN configuration from the VPC console to configure the firewall or device in your local network that will connect to the VPN.

The following should be checked if there are issues with traffic over a VPN tunnel:

Verify that the VPN tunnels are connected. (See the following section for more details on Internet Key Exchange [IKE] phase 1 and phase 2 troubleshooting.)
Verify that the proper VPC is attached to the Virtual Private Gateway (VGW).
Verify that there are either propagated or static routes to the VPN subnets, with the VGW as the destination, in each subnet route table.
Verify that the subnet network ACLs and instance security groups are set to allow the traffic that you would like to flow over the VPN.

These steps can be optimized using a top-down approach. For example, if some traffic is traversing a VPN connection and other traffic is not, then you can skip the steps on troubleshooting the VPN tunnel connectivity and start looking at routing and security groups/network ACLs.

Internet Key Exchange (IKE) Phase 1 and Phase 2 Troubleshooting

In the event that the VPN tunnels are not established, IKE phase 1 followed by IKE phase 2 of the IPsec tunnel should be investigated.

If there are issues establishing an IKE phase 1 connection, then the following should be checked:

Verify that IKEv1 is being used instead of IKEv2; AWS only supports IKEv1.
Verify that Diffie-Hellman (DH) group 2 is being used.
Verify that the phase 1 lifetime is set to 28,800 seconds (480 minutes or 8 hours).
Verify that phase 1 is using the Secure Hash Algorithm (SHA) 1 hashing algorithm.
Verify that phase 1 is using Advanced Encryption Standard (AES) 128 as the encryption algorithm.
Verify that the customer gateway device is configured with the correct Preshared Key (PSK) specified in the downloaded AWS VPN configuration for the tunnels.
If the customer gateway endpoint is behind a NAT device, verify that IKE traffic leaving the customer on-premises network is sourced from the configured customer gateway IP address and on User Datagram Protocol (UDP) port 500. Also test by disabling NAT traversal on the customer gateway device.
Verify that UDP packets on port 500 (and port 4500 if using NAT traversal) are allowed to pass to and from your network to the AWS VPN endpoints. Ensure that there is no device in place between your customer gateway and the VGW that could be blocking UDP port 500; this includes checking Internet Service Providers (ISPs) that could be blocking UDP port 500.

If there are issues establishing an IKE phase 2 connection, then the following should be checked:

Verify that Encapsulating Security Payload (ESP) protocol 50 is not blocked inbound or outbound.
Verify that the security association lifetime is 3,600 seconds (60 minutes).
Verify that there are no firewall ACLs interfering with IPsec traffic.
Verify that phase 2 is using the SHA-1 hashing algorithm.
Verify that phase 2 is using AES-128 as the encryption algorithm.
Verify that Perfect Forward Secrecy (PFS) is enabled and that DH group 2 is being used for key generation.
Enhanced AWS VPN endpoints support some additional advanced encryption and hashing algorithms, such as AES 256; SHA-2(256); and DH groups 5, 14–18, 22, 23, and 24 for phase 2. If your VPN connection requires any of these additional features, contact AWS to verify that you are using the enhanced VPN endpoints. Typically, you must re-create the VGW of your VPC to move to the enhanced VPN endpoints.
If you are using policy-based routing, verify that you have correctly defined the source and destination networks in your encryption domain.

AWS Direct Connect

AWS Direct Connect lets you establish a dedicated network connection between your network and one of the AWS Direct Connect locations. Using industry standard 802.1q Virtual Local Area Networks (VLANs), this dedicated connection can be partitioned into multiple Virtual Interfaces (VIFs). This allows you to use the same connection to access public resources, such as objects stored in Amazon Simple Storage Service (Amazon S3) using public IP address space and private resources such as Amazon EC2 instances running within a VPC using private IP space, all while maintaining network separation between the public and private environments. VIFs can be reconfigured at any time to meet your changing needs.

An AWS Direct Connect connection can either be established directly within an AWS Direct Connect location or extended to your location through an AWS Partner Network (APN) partner. Some APN partners also offer hosted VIFs at sub-1 Gbps speeds. Note that these hosted VIFs are not full AWS Direct Connect connections and only support a single VIF.

The following are items to consider when troubleshooting an AWS Direct Connect connection:

AWS Direct Connect requires single mode fiber.
A VIF must have a public or private Border Gateway Protocol (BGP) Autonomous System Number (ASN). If you are using a public ASN, you must own it.
For a public VIF, you must specify public IPv4 addresses (/30) that you own.
For IPv6, regardless of the type of VIF, AWS automatically allocates a /125 IPv6 Classless Inter-Domain Routing (CIDR) for you. You cannot specify your own peer IPv6 addresses.
There is a limit of 50 VIFs per AWS Direct Connect connection.
There is a limit of 100 routes per BGP session. Advertising more routes than this can cause port flapping.
AWS Direct Connect supports a Maximum Transmission Unit (MTU) up to 1,522 bytes at the physical connection layer. If your network path does not support this, then you should set an MTU value below 1,500 bytes on your router to avoid issues.
Security groups and network ACLs must be configured to allow access.
Route propagation needs to be enabled on each subnet route table for the routes learned through BGP to show up.

Security Groups

Security groups are implicit DENY. Unless a rule allowing incoming or outgoing traffic is created, traffic will not flow. This means that even if two instances are in the same subnet, they will not be able to communicate with each other unless there is a rule created allowing such traffic. Security groups are also stateful, meaning that if an inbound or outbound rule is created, it will allow the return traffic.

The following are items to consider when troubleshooting security groups:

There is a limit on the number of inbound and outbound rules of 50.
Up to five security groups can be added per network interface.
Only ALLOW rules can be added to a security group.

Amazon VPC Flow Logs can be useful for troubleshooting security group-related issues. Traffic will be recorded as a rejected packet if there is not a rule in place to allow it.

Network Access Control Lists

By default, the network access control list (ACL) on a VPC is set to allow all inbound and outbound traffic. If network ACLs are set to be more restrictive, care must be taken to allow all required traffic. Note that network ACLs are not stateful like security groups—return traffic to an outbound port must be explicitly permitted with an ALLOW rule. For example, locking down outbound to port 80 and port 443 only in a subnet would also require an inbound ALLOW rule for ephemeral ports (1024-65535). A good understanding of network traffic flows and ports and protocols in use should be established prior to implementing network ACLs on a subnet.

The following are items to consider when troubleshooting network ACLs:

Network ACLs are not stateful like security groups.
Return traffic may need the entire ephemeral port range of 1024-65535 to be open.
There is a limit of 20 inbound and outbound rules per ACL.
By default, each custom network ACL denies all inbound and outbound traffic until you add rules.
Rules are evaluated starting with the lowest numbered rule. As soon as a rule matches traffic, it is applied regardless of any higher-numbered rule that may contradict it.

Applications commonly communicate over a number of ports and often require an inbound rule for return traffic. Amazon VPC Flow Logs can be useful for troubleshooting network ACL-related issues. Traffic will be recorded as a rejected packet if there is not a rule to allow it or there is an explicit rule to block it.

Routing

Routing within a VPC is controlled by a route table attached to each subnet. Note that unless a route table is explicitly associated with a subnet, the main route table for the VPC will be used for each subnet. Knowing the caveats of routing and how routing within VPC works is beneficial to troubleshooting. The following are some common considerations when troubleshooting route tables:

The most specific route that matches traffic in a route table is used.
IPv4 and IPv6 routes are handled separately.
The default local route of the VPC CIDR cannot be modified.
When you add an Internet gateway, an egress-only Internet gateway (for IPv6), a VGW, a NAT device, a peering connection, or a VPC endpoint in your VPC, you must update the route table for any subnet that uses these gateways or connections.
There is a limit of 50 non-propagated routes that you can have in a route table.
There is a hard limit of 100 routes that can be propagated to a VPC route table. More general routes or a default route should be used if this limit is reached.
If a custom Amazon EC2 instance is used as a router, then its elastic network interface needs to be added in the route table as a destination. Note that the instance must also be set to disable source/destination checking for traffic to flow.

Virtual Private Cloud (VPC) Peering Connections

A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network. You can create a VPC peering connection between your own VPCs or with a VPC in another AWS account. In both cases, the VPCs must be in the same AWS Region.

AWS uses the existing infrastructure of a VPC to create a VPC peering connection; it is neither a gateway nor a VPN connection, and it does not rely on a separate piece of physical hardware. There is no single point of failure for communication or a bandwidth bottleneck.

The following are items to consider when troubleshooting VPC peering connections:

VPC peering connections are not transitive. Traffic not in a destination VPC CIDR will not flow over a peering link.
A VPC peering connection cannot be used to reach out to the Internet or to a VPN connection.
A route needs to be added to each subnet route table with the remote VPC CIDR and the peering connection as the destination. This needs to be done on both sides of the VPC peering connection.
There cannot be conflicting or overlapping VPC CIDR ranges.
There is a limit of 50 VPC peering connections per VPC. This limit can be increased to maximum of 125 by opening an AWS Support ticket.
Security groups and network ACLs need to be set to allow traffic to flow on the source and destination instances and subnets.

Connectivity to AWS Cloud Services

AWS Cloud services that reside outside of a VPC require a public IP address for access. This can be accomplished through a NAT gateway, public IP address, proxy server, or by setting up an endpoint on a VPC (if an endpoint is available for the service).

The following considerations should be checked when there are issues accessing AWS Cloud services:

There should be an Internet gateway, proxy, or VPC endpoint that enables connectivity from private IP addresses with the VPC.
If a VPC endpoint is used, there should be a route in the route table that has the VPC endpoint as the destination.
Security groups and network ACLs should be checked to confirm they are properly set to allow the instances and services to communicate.
IAM roles (if in use) allow communication with the service. If a VPC endpoint is used, then the IAM policy on the endpoint must be set to allow access. The IAM Policy Simulator tool is helpful in troubleshooting permissions issues.

Amazon CloudFront Connectivity

Amazon CloudFront is a global Content Delivery Network (CDN) service that securely delivers data, videos, applications, and APIs to viewers with low latency and high transfer speeds. Amazon CloudFront is integrated with AWS—both physical locations which are directly connected to the AWS global infrastructure, as well as software that works seamlessly with services including AWS Shield for Distributed Denial of Service (DDoS) mitigation, Amazon S3, Elastic Load Balancing, or Amazon EC2 as origins for your applications, and AWS Lambda to run custom code close to your viewers.

The following are some common items to consider when troubleshooting Amazon CloudFront connectivity problems:

If you are using a custom DNS entry, then it must be a CNAME that points to your Amazon CloudFront distribution’s domain name.
If the Amazon CloudFront origin is an Amazon S3 bucket, the objects must either be publically readable or have an Origin Access Identity (OAI) created and attached to the distribution with permissions assigned to the Amazon S3 objects.
HTTP 502 status codes (Bad Gateway) indicates that Amazon CloudFront was not able to serve the requested object because it could not connect to the origin server. If 502 errors are seen, connectivity to the origin server should be confirmed.
HTTP 503 Status Codes (Service Unavailable) status code typically indicates a lack of capacity on the origin server. If 503 errors are seen, capacity at the origin server should be confirmed.

Elastic Load Balancing Functionality

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve fault tolerance in your applications, seamlessly providing the required amount of load balancing capacity needed to route application traffic.

Due to the scalable nature of Elastic Load Balancing, the fleet of AWS-managed Elastic Load Balancing instances will grow and shrink to meet demand. This scaling requires allocating a sufficient amount of available IP addresses to the Elastic Load Balancing subnet. Failure to account for the scaling of an Elastic Load Balancing fleet can result in errors and the inability of the load balancer to balance traffic.

The following are some common considerations to take into account when troubleshooting Elastic Load Balancing:

Verify that a public load balancer resides only in public subnets.
Verify that the load balancer security group and network ACLs allow inbound traffic from the clients and outbound traffic to the clients.
Verify that the load balancer target’s security groups and network ACLs allow inbound traffic from the load balancer subnet and outbound traffic to the load balancer subnet.
Verify that the default success code for a health check is 200. (This should be modified if your application uses an alternate success code.)
Verify that sticky sessions are enabled on the load balancer; stateful connections will not work correctly otherwise.

Domain Name System

Domain Name System (DNS) provides hostname to IP resolution. AWS creates a DNS server by default within a VPC. (This option can be disabled.) AWS has a managed DNS service called Amazon Route 53 that provides the ability to create public and private hosted zones. Private hosted zones are only available within a VPC, while public hosted zones are globally accessible throughout the world.

DNS entries have a TTL value for each DNS record within a domain that specifies how long a client can cache the values of a DNS query. These TTL values are only a suggestion, and they can be ignored by caching at intermediary DNS servers, operating systems, and individual applications. For this reason, DNS queries may take some time to resolve to the correct values, even after the TTL period has expired.

The following are items to consider when troubleshooting DNS issues:

AWS allocates the second IP address in a VPC CIDR as a DNS server. If a custom DNS server is needed, then an alternate Dynamic Host Configuration Protocol (DHCP) option set can be created.
If a DNS entry is not resolving properly after an entry update, then the TTL of a DNS entry may not have expired.
DNS values can be cached by ISPs, operating systems, and applications, and they can return incorrect values even after a TTL has expired.
Only CNAME records should be used to point to AWS Cloud services. Using A records can cause errors.
EnableDnsHostnames and enableDnsSupport must be set to true for Amazon Route 53 private hosted zones.

The nslookup command is a useful tool for troubleshooting DNS issues to determine to which IP address a hostname resolves.

Hitting Service Limits

Every AWS Cloud service has some type of limit. There are hard limits, which cannot be increased, and soft limits, which can be increased with an AWS Support ticket. It is very important to understand these limits. A lot of troubleshooting time can be saved by recognizing when a service limit is the root cause of an issue. Each service’s page on the AWS website lists the limits. A subset of network limits can also be seen in AWS Trusted Advisor:

Elastic IP address
Amazon VPC
Subnets per security group
Internet gateway
Active load balancers

Many services will show an error in the API response or AWS Management Console when trying to create or allocate resources once a limit has been reached.

Summary

In this chapter, you reviewed core concepts of troubleshooting connectivity within AWS and connectivity from AWS to on-premises networks.

Core troubleshooting tools consist of the following:

Traditional Tools
- Packet captures
- ping
- traceroute
- Telnet
- nslookup
AWS Native Tools
- Amazon CloudWatch
- Amazon VPC Flow Logs
- AWS Config
- AWS Trusted Advisor
- IAM Policy Simulator

In this chapter, you also reviewed some common troubleshooting scenarios. The best way to get experience in troubleshooting is to use the tools and address common issues that may arise. It is recommended that you complete the exercises at the end of this chapter in order to gain hands-on experience with network troubleshooting in AWS.

Exam Essentials

Understand methodologies for troubleshooting. It is important to understand how to troubleshoot common network anomalies that occur and how doing so in a cloud or hybrid environment can be different from on-premises networking.

Understand tools for troubleshooting. In addition to traditional troubleshooting tools, there are a number of AWS tools discussed in this chapter with which you should be familiar.

Understand the conditions required for Internet connectivity. There are five conditions that must be met for connectivity to the Internet from an Amazon EC2 instance:

A public IP is assigned to an instance (note for IPv6 that all addresses are assigned by AWS and are public) or a NAT gateway with a public IP in a public subnet.
An Internet gateway is attached to the VPC.
There is a default route to an Internet gateway in the route table on the public subnet. If a NAT gateway is used, then the default route on private subnets should be the NAT gateway instance.
Outbound ports are open in the instance security group (ports 80 and 443 for web traffic).
Inbound and outbound ports are open in the subnet network ACL (ports 80 and 443 outbound and ephemeral port range inbound).

Understand network ACLs vs. security groups. Security groups are stateful, whereas network ACLs are not. There is an implicit DENY with security groups. Rules must be added to allow network traffic. If network ACLs are used, then care must be taken to ensure that return traffic (whether inbound or outbound) is allowed.

Understand how routing works with Amazon VPC. There is an implicit route within a VPC for its CIDR. All other routes to destinations outside of the CIDR need to be added to the route table. There is a master route table for all subnets when a VPC is initially created, and additional route tables can be added. There is a one-to-one mapping of route table to subnet; however, multiple subnets can share the same route table. More specific routes have a higher preference.

Understand VPN IPsec and how to troubleshoot. There are two phases to IPsec to establish a VPN tunnel. You should know the requirements for each phase and how to troubleshoot when one or both fail to complete. You should also understand how routing works with VPN tunnels and how it works as a standby if an AWS Direct Connect is also in use.

Understand AWS Direct Connect and how to troubleshoot. There are a number of requirements that must be completed before traffic can flow over an AWS Direct Connect connection. There is also a difference between a private VIF (connectivity to a VPC) and public VIF (connectivity to public AWS Cloud services). In the case of a hosted VIF, there is only one VIF that can be created with each.

Understand VPC peering and valid versus invalid configurations. VPC peering will not be established if there are overlapping or conflicting CIDR addresses. Peering connections are not transitive. Any traffic that is not in the CIDR range of the VPC peer will not flow over the peering connection.

Understand how DNS and Amazon Route 53 work and how to troubleshoot. DNS resolution is provided by default within a VPC by an AWS-managed endpoint. Amazon Route 53 can be used for hosting private zones within a VPC and public zones outside of a VPC. CNAMEs should be used to point to AWS-provided endpoint hostnames.

Resources to Review

For further information, refer to the following pages on the AWS website:

AWS Troubleshooting Amazon VPC Connectivity:
https://aws.amazon.com/premiumsupport/knowledge-center/connect-vpc/
AWS VPN Troubleshooting:
https://aws.amazon.com/premiumsupport/knowledge-center/ vpn-tunnel-troubleshooting/
AWS Direct Connect Troubleshooting:
http://docs.aws.amazon.com/directconnect/latest/UserGuide/ Troubleshooting.html
AWS CloudFront Troubleshooting:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ Troubleshooting.html
AWS Instance SSH Connectivity Troubleshooting:
https://aws.amazon.com/premiumsupport/knowledge-center/ instance-vpc-troubleshoot/

Exercises

The best way to become familiar with troubleshooting is to install and leverage the tools mentioned in this chapter. There is no substitute for the experience that comes from working within the AWS environment, becoming familiar with how networking works, and learning how to work through common troubleshooting situations.

EXERCISE 13.2

Test Instance-to-Instance Connectivity with ping

In this exercise, you will test the ability of two instances within the same subnet to communicate using the ping tool.

Open up a Secure Shell (SSH) connection to your instance in the public subnet.
Use the ping command to try to reach the instance in the private subnet.
You will notice that the ICMP traffic fails unless you have already modified the security groups.
Under the Amazon VPC section of the AWS Management Console, navigate to Security Groups.
Create a new security group that allows inbound and outbound ICMP traffic from the subnets of which each instance is a part.
Under the Amazon EC2 section of the AWS Management Console, select each instance and attach the newly-created security group. This option is available within the sections Actions, Networking, and then Security Groups.

The instances will now be able to ping each other.

EXERCISE 13.4

Using traceroute

In this exercise, you will use the traceroute tool to determine the route of network traffic.

Create a new VPC. Peer the new VPC with your existing VPC.
Create a t2.micro instance in the new VPC and enable inbound ICMP traffic.
Use SSH to access your public Amazon EC2 instance. Run traceroute from an IP address in your VPC peer.
You will notice that the traffic goes out over the default route to the Internet and that traceroute will eventually time out.
Add a route to the route table that is attached to the public subnet using the CIDR of the new VPC that you created and a destination of the VPC peering connection.
Run traceroute again. You will notice that traffic will not traverse the VPC peering connection; however, it will still fail to complete.
Add a route to the route table in the new VPC using the CIDR of the existing VPC and a destination of the VPC peering connection.
Run traceroute again. This time it will complete and return a response.

Using the traceroute tool, you were able to identify a routing issue and correct it by adding a route to the route table.

Review Questions

You place an application load balancer in front of two web servers that are stateful. Users begin to report intermittent connectivity issues when accessing the website. Why is the site not responding?
1. The website needs to have port 443 open.
2. Sticky sessions must be enabled on the application load balancer.
3. The web servers need to have their security group set to allow all Transmission Control Protocol (TCP) traffic from 0.0.0.0/0.
4. The network Access Control List (ACL) on the subnet needs to allow a stateful connection.
You create a new instance, and you are able to connect over Secure Shell (SSH) to its private IP address from your corporate network. The instance does not have Internet access, however. Your internal policies forbid direct access to the Internet. What is required to enable access to the Internet?
1. Assign a public IP address to the instance.
2. Ensure that port 80 and port 443 are not set to DENY in the instance security group.
3. Deploy a Network Address Translation (NAT) gateway in the private subnet.
4. Ensure that there is a default route in the subnet route table that goes to your on-premises network.
You create a Network Address Translation (NAT) gateway in a private subnet. Your instances cannot communicate with the Internet. What action must you take?
1. Add a default route out to the Internet gateway.
2. Ensure that outbound traffic is allowed on port 80 and port 443.
3. Delete the NAT gateway and deploy it in a public subnet.
4. Place the instances in a public subnet.
What is not required for Internet connectivity from a public subnet?
1. Public IP
2. Network Address Translation (NAT) gateway
3. Outbound rule in a security group
4. Inbound rule in the network Access Control List (ACL)
5. Outbound rule in the network ACL
6. An Internet gateway
7. A default route to an Internet gateway
You are trying to add two new Virtual Private Cloud (VPC) peering connections to a VPC with 24 existing peering connections. The first connection works fine, but the second connection returns an error message. What should you do?
1. Submit a request to AWS Support to have your VPC peer limit increased.
2. Select another AWS Region to set up the VPC peering connection.
3. Retry the request again; the error may go away.
4. Deploy a Virtual Private Network (VPN) instance to connect the VPC.
You created a new endpoint for your Virtual Private Cloud (VPC) that does not have Internet connectivity. Your instance cannot connect to Amazon Simple Storage Service (Amazon S3). What could be the problem?
1. There is no route in your route table to the Amazon S3 VPC endpoint.
2. The Amazon S3 bucket is in another region.
3. Your bucket access list is not properly configured.
4. The VPC endpoint does not have the proper AWS Identity and Access Management (IAM) policy attached to it.
5. All of the above
You recently set up Amazon Route 53 for a private hosted zone for a highly-available application hosted on AWS. After adding a few A records, you notice that the instance hostnames are not resolving within the Virtual Private Cloud (VPC). What actions should be taken? (Choose two.)
1. Allow port 53 on the instance security group.
2. Create a Dynamic Host Configuration Protocol (DHCP) option set.
3. Set enableDnsHostnames to true on the VPC.
4. Set enableDnsSupport to true on the VPC.
You discover that the default Virtual Private Cloud (VPC) has been deleted from region us-east-1 by a coworker in the morning. You will be deploying a lot of new services during the afternoon. What should you do?
1. It’s not important, so no action is required.
2. Designate a VPC that you create as the default VPC.
3. Create an AWS Support ticket to have your VPC re-created.
4. Perform an Application Programming Interface (API) call or go through the AWS Management Console to create a new default VPC.
You are responsible for your company’s AWS resources. You notice a significant amount of traffic from an IP address range in a foreign country where your company does not have customers. Further investigation of the traffic indicates that the source of the traffic is scanning for open ports on your Amazon Elastic Compute Cloud (Amazon EC2) instances. Which one of the following resources can prevent the IP address from reaching the instances?
1. Security group
2. Network Address Translation (NAT) gateway
3. Network Access Control List (ACL)
4. A Virtual Private Cloud (VPC) endpoint
Which of the following tools can be used to record the source and destination IP addresses of traffic? (Choose two.)
1. Flow logs
2. Packet capture on an instance
3. AWS CloudTrail
4. AWS Identity and Access Management (IAM)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.