THE AWS CERTIFIED ADVANCED NETWORKING – SPECIALTY EXAM OBJECTIVES COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:
Amazon CloudFront is a global Content Delivery Network service that speeds up the distribution of your static and dynamic web content. Amazon CloudFront delivers your content through a worldwide network of edge locations. Amazon CloudFront integrates with other AWS products to give developers and organizations an easy way to distribute content to end users with low latency, high data transfer speeds, and no minimum usage commitments. This chapter reviews the components that make up Amazon CloudFront and then examines its advanced features. The chapter concludes with key exercises and questions related to Amazon CloudFront and the AWS Certified Advanced Networking – Specialty Exam.
A Content Delivery Network (CDN) is a globally-distributed network of caching servers that accelerate the downloading of web pages, images, videos, and other content. CDNs use Domain Name System (DNS) geolocation to determine the geographic location of each request for a web page or other content. They then serve that content from caching servers closest to that location—whether “closest” is measured in distance or time (latency)—instead of the original web server. A CDN allows you to increase the scalability and decrease the latency of a website or mobile application easily in response to traffic spikes. In most cases, using a CDN is completely transparent—end users simply experience better website performance, while the load on your original website is reduced.
CDNs were primarily invented to circumvent a constant that has yet to be overcome in the networking world: the speed of light. In a vacuum, the speed of light is roughly 300,000 kilometers per second; in fiber-optic cables, it can be up to 30 percent slower. When such fiber-optic cables and their associated optical repeaters traverse the vast expanse of the Pacific Ocean, for example, responses from web servers back to clients can take upwards of hundreds of milliseconds. In the networking world, this results in reduced throughput and poor performance for customers. By using a CDN, you can overcome the limitations of serving content over large distances by caching or pre-positioning data at predefined locations. You can also isolate the load on your centralized web servers by having each edge location where your content is cached serve the content for you, therefore increasing your scale immensely on an edge location basis.
Amazon CloudFront is the AWS CDN. It can be used to deliver your web content using Amazon’s global network of edge locations. When a user requests content that is served with Amazon CloudFront, the user is routed to the edge location that provides the lowest latency (time delay), so content is delivered with the best possible performance. If the content is already in the edge location with the lowest latency, Amazon CloudFront delivers it immediately. If the content is not currently in that edge location, Amazon CloudFront retrieves it from the origin server, such as an Amazon Simple Storage Service (Amazon S3) bucket or a web server. The origin server stores the original, definitive versions of your content.
Amazon CloudFront is optimized to work with other AWS Cloud services that serve as the origin server, including Amazon S3 buckets, Amazon S3 static websites, Amazon Elastic Compute Cloud (Amazon EC2) instances, and Elastic Load Balancing load balancers. Amazon CloudFront also works seamlessly with non-AWS origin servers, such as an existing on-premises web server. Amazon CloudFront also integrates with Amazon Route 53.
Amazon CloudFront supports all content that can be served over HTTP or HTTPS. This includes any popular static files that are a part of your web application, such as HTML files, images, JavaScript, and CSS files, and also audio, video, media files, or software downloads. Amazon CloudFront also supports serving dynamically generated web pages, so it can be used to deliver your entire website. Lastly, Amazon CloudFront supports media streaming, using both HTTP and Real-Time Messaging Protocol (RTMP).
There are three core concepts that you need to understand in order to start using Amazon CloudFront: distributions, origins, and cache control. With these concepts, you can use Amazon CloudFront to speed up delivery of content from your websites.
To use Amazon CloudFront, you start by creating a distribution, which is identified by a DNS domain name such as d111111abcdef8.cloudfront.net. To serve files from Amazon CloudFront, you simply use the distribution domain name in place of your website’s domain name; the rest of the file paths stay unchanged. You can use the Amazon CloudFront distribution domain name as-is, or more typically you create a user-friendly DNS name in your own domain by creating a Canonical Name Record (CNAME) in Amazon Route 53 or another DNS service that refers to the distribution’s domain name. Clients who use the CNAME are automatically redirected to your Amazon CloudFront distribution domain name. If you use Route53 as your DNS service, you can also use a feature called aliases to redirect a zone root address such as “example.com” (which cannot be a CNAME) to your CloudFront distribution.
When you create a distribution, you must specify the DNS domain name of the origin—the Amazon S3 bucket or HTTP server—from which you want Amazon CloudFront to retrieve the definitive version of your objects (web files). For example, note the following:
Once requested and served from an edge location, objects stay in the cache until they expire or are evicted to make room for more frequently requested content. By default, objects expire from the cache after 24 hours. After an object expires, the next request results in Amazon CloudFront forwarding the request to the origin to verify that the object is unchanged or to fetch a new version if it has changed.
Optionally, you can control how long objects stay in an Amazon CloudFront cache before expiring. To do this, you can choose to use Cache-Control headers set by your origin server, or you can set the minimum, maximum, and default Time to Live (TTL) for objects in your Amazon CloudFront distribution.
You can also remove copies of an object from all Amazon CloudFront edge locations at any time by calling the invalidation Application Programming Interface (API) or through the Amazon CloudFront console. This feature removes the object from every Amazon CloudFront edge location regardless of the expiration period you set for that object on your origin server. The invalidation feature is designed to be used in unexpected circumstances, such as to correct an error or to make an unanticipated update to a website—not as part of your everyday workflow.
Instead of invalidating objects manually or programmatically, it is a best practice to use a version identifier as part of the object (file) path name. For example, note the following:
When using versioning, users will see the latest content through Amazon CloudFront when you update your site without using invalidation. Old versions will expire from the cache automatically. That said, depending on other settings, you may need to invalidate the base page that includes references to the versioned objects.
After some initial setup, Amazon CloudFront works transparently to speed up delivery of your content. This overview provides you with the steps required to set up Amazon CloudFront to serve your content, as well as the process that happens behind the scenes when serving content to your users.
The following steps walk you through the process required to configure Amazon CloudFront:
CloudFront uses your origin server to retrieve your files for distribution from Amazon CloudFront edge locations.
An origin server stores the original, definitive version of your objects. If you are serving content over HTTP, your origin server is either an Amazon S3 bucket or an HTTP server, such as a web server. Your HTTP server can run on an Amazon EC2 instance or on a server that you manage; these servers are also known as custom origins.
If you distribute media files on demand using the Adobe RTMP protocol, your origin server is always an Amazon S3 bucket.
As you build your website or application, you can use the domain name that Amazon CloudFront provides for your URLs when referencing objects. For example, if Amazon CloudFront returns the domain d111111abcdef8.cloudfront.net for your distribution, the URL for logo.jpg in your Amazon S3 bucket or the root directory of your web server would be as follows: http://d111111abcdef8.cloudfront.net/logo.jpg. A more typical practice, however, is to use relative paths that do not specify the host part of the URL at all, unless another host name is actually required. This provides more flexibility in terms of site construction and the use of CNAMEs, load balancers, and CloudFront distributions. For example, an image file would be referenced as “/images/website-logo.png”, or to take the previous example, “logo.jpg”. This allows the reference to work properly whether the web page is accessed directly from the server by its DNS name or IP address, via the CloudFront distribution’s DNS name, or via a CNAME such as www.example.com that you provide that points to the CloudFront distribution’s DNS name.
Optionally, you can configure your origin server to add headers to the files, with a header indicating how long you want the files to stay in the cache in the Amazon CloudFront edge location. By default, each object stays in an edge location for 24 hours before it expires. The minimum expiration time is 0 seconds, with no maximum expiration time limit.
Figure 7.1 shows an overview of the steps required to configure your Amazon CloudFront distribution.
The following steps outline what happens when users request objects after you’ve configured Amazon CloudFront to deliver your content.
The process for CloudFront content delivery is shown in Figure 7.2.
Amazon CloudFront edge locations are the regional points of presence that are used to cache objects and store these closer to your application or website’s end users. As of the time of this writing, Amazon CloudFront has a global network of 100 edge locations in 50 cities across 23 countries. These edge locations include 89 Points of Presence and 11 Regional Edge Caches.
Regional Edge Caches are CloudFront locations that are deployed globally in AWS regions, at closer proximity to your users. These locations sit between your origin server and the global edge locations that serve traffic directly to your users. As the popularity of your objects declines, individual edge locations may evict those objects to make room for more popular content. Regional Edge Caches have a much larger cache size than their global edge location counterparts, which allows objects to remain in cache longer.
When a user makes a request to your website or application, DNS routes the request to the Amazon CloudFront edge location that can best serve the user’s request. This location is typically the nearest Amazon CloudFront edge location in terms of latency. In the edge location, Amazon CloudFront checks its cache for the requested files. If the files are in the cache, Amazon CloudFront returns them to the user. If the files are not in the cache, the edge servers go to the nearest Regional Edge Cache to fetch the object. In the Regional Edge Cache location, Amazon CloudFront again checks its cache for the requested files. If the files are in the cache, Amazon CloudFront forwards the files to the requested edge location.
As soon as the first byte arrives from a Regional Edge Cache location, Amazon CloudFront will begin to forward the files to the user. Amazon CloudFront also adds the files to the cache in the requested edge location for the next time someone requests those files.
Amazon CloudFront Regional Edge Cache locations are suited for content that might not be popular enough to remain consistently within Amazon CloudFront edge locations but still might benefit from being located closer to the requestor of the content.
Some important points to consider for Amazon CloudFront Regional Edge Caches:
When you want to use Amazon CloudFront to distribute your content, you create a distribution and specify configuration settings such as your origin and whether you want your files to be available to everyone or have restricted access.
You can also configure Amazon CloudFront to require users to use HTTPS to access your content, forward cookies and/or query strings to your origin, prevent users from particular countries from accessing your content, and create access logs.
You can use web distributions to serve the following content over HTTP or HTTPS:
Amazon CloudFront can do much more than simply serve static web files. To start using the service’s advanced features, you will need to understand how to use cache behaviors and how to restrict access to sensitive content.
Serving static assets, as described previously, is a common way to use a CDN. An Amazon CloudFront distribution, however, can easily be set up to also serve dynamic content and to use more than one origin server. You can control which requests are served by which origin and how requests are cached using a feature called cache behaviors.
A cache behavior lets you configure a variety of Amazon CloudFront functionalities for a given URL path pattern for files on your website, as shown in Figure 7.3. One cache behavior applies to all PHP files in a web server (dynamic content) using the path pattern *.php, while another behavior applies to all JPEG images in another origin server (static content) using the path pattern *.jpg.
The functionality that you can configure for each cache behavior includes the following:
Cache behaviors are applied in order; if a request does not match the first path pattern, it drops down to the next path pattern. Normally, the last path pattern specified is * to match all files.
It is very useful that Amazon CloudFront can seamlessly deal with all content, including dynamically-generated content that is not cacheable alongside a wide array of content that can be cached (see previous and following sections). But you may assume that there is no performance benefit in that case. After all, if the Amazon CloudFront edge location needs to reach back to the origin each time it receives a request for a particular URL representing dynamic content, how can it speed up content delivery? You may then assume that if an item is not in the Amazon CloudFront cache, the use of Amazon CloudFront won’t speed up access to that content the first time it is requested.
As it turns out, even dynamic or initially uncached content will often be delivered with lower latency to end users. The reason has to do with the time it takes to set up the TCP or TLS connections that underlie the content caching and delivery mechanisms. Each such connection takes a finite amount of time to establish, and if a connection from CloudFront to the origin can be reused, significant latency gains are possible.
For example, let’s assume that the round-trip latency between an end user and the Amazon CloudFront edge location is 30 milliseconds, and the round-trip latency between the edge location and the origin is 100 milliseconds. (For context, as of this writing, even over the high performance AWS backbone the roundtrip latency from the Singapore region to the Northern Virginia region was about 240 milliseconds.) In all cases, before any content can be delivered for the very first time, the TCP connection establishment between the three hosts (which require one full round-trip for the SYN/ACK packets from client to edge, and the edge to origin) will take at least 130 milliseconds (ignoring local overhead, which is much higher for TLS connections).
Now, let’s assume a new client connects to the edge location and begins requesting content from the same origin, whether dynamic content, or content not yet in the edge cache. The Amazon CloudFront edge server will often be able to re-use an existing connection to the origin server, and avoid the connection setup overhead. This can reduce the first-byte delivery time by 100 milliseconds or more. That may not seem like a lot, but even 1/10 of a second per TCP connection can add up quickly. Avoiding the overhead of establishing an encrypted TLS session each time will decrease latency even more. So using Amazon CloudFront is a performance win even in cases where content caching is not playing a role. Your users will be happy to receive the best possible performance in all of these scenarios.
Amazon CloudFront also supports connections from clients via the HTTP/2 protocol. That new protocol, already supported by most modern browsers, provides a significant number of enhancements that improve performance by connection re-use, multiplexing, server push, etc. Even if your origin server does not support HTTP/2 yet, those enhanced features in use between your end-users and the Amazon CloudFront edge servers can significantly improve performance even when Amazon CloudFront is accessing your origin server using HTTP/1.x. Not only can you use Amazon CloudFront to optimize origin access via connection re-use, but content in the edge cache will be delivered faster than it could be from your origin servers, even ignoring latency differences between the edge and the origin.
Using cache behaviors and multiple origins, you can easily use Amazon CloudFront to serve your whole website and to support different behaviors for different client devices.
In many cases, you may want to restrict access to content in Amazon CloudFront only to selected requestors, such as paid subscribers or to applications or users in your company network. Amazon CloudFront provides several mechanisms to allow you to serve private content:
Signed URLs Use URLs that are valid only between certain times and optionally from certain IP addresses.
Signed cookies Require authentication via public and private key pairs.
Origin Access Identities (OAI) Restrict access to an Amazon S3 bucket only to a special Amazon CloudFront user associated with your distribution. This is the easiest way to ensure that content in a bucket is accessed only by Amazon CloudFront.
RTMP distributions stream media files using Adobe Media Server and the Adobe RTMP. When using an RTMP distribution for Amazon CloudFront, you need to provide both your media files and a media player to your end users. Media player examples include JW Player, Flowplayer, and Adobe Flash.
End users will view your media files using the media player that you provide for them. They do not use the media player (if any) that is already installed on their computer or device. This is due in part to the fact that when the end user streams your media file, the media player begins to play the content of the file while the file is still being downloaded from Amazon CloudFront. The media file is not stored locally on the end user’s system.
To use Amazon CloudFront to serve media in this way, you need two types of distributions: a web distribution to serve the media player and an RTMP distribution for the media files. The web distribution will serve files over HTTP, while the RTMP distribution will stream media files over RTMP or a variant of RTMP.
Figure 7.4 shows that the media files and your media player are stored in different buckets in Amazon S3. You could also make the media player available to users in other ways, such as using Amazon CloudFront and a custom origin; however, the media files must use an Amazon S3 bucket at the origin.
Figure 7.4 also shows two separate buckets being used: one for your media files and the other for your media player. You can also store media files and your media player in the same Amazon S3 bucket (not shown in the figure).
In Figure 7.4, there are two distributions used for Amazon CloudFront streaming:
There are other streaming options available with Amazon CloudFront.
Wowza Streaming Engine 4.2 You can use the Wowza Streaming Engine 4.2 to create live streaming sessions for global delivery using Amazon CloudFront. Wowza Streaming Engine 4.2 supports the following HTTP-based streaming protocols:
For these protocols, Amazon CloudFront will break video into smaller chunks that are cached in the Amazon CloudFront network for improved performance and scalability.
Live HTTP streaming using Amazon CloudFront and any HTTP origin Amazon CloudFront supports any live encoder, such as Elemental Live. The encoder must output HTTP-based streams to stream live performances, webinars, and other events.
On-demand video streaming using Amazon CloudFront and other media players When streaming media files using Amazon CloudFront, you provide both the media files and media player that you want end users to utilize to play the media file.
In Amazon CloudFront, an alternate domain name lets you use your own domain name (for example, www.example.com) for links to your objects instead of using the domain name that CloudFront assigns to your distribution. Both web and RTMP distributions support alternate domain names.
When you create a distribution, Amazon CloudFront returns a domain name for the distribution, for example: d111111abcdef8.cloudfront.net.
When you use the Amazon CloudFront domain name for your objects, the URL for an object called /images/image.jpg would be: http://d111111abcdef8.cloudfront.net/images/image.jpg.
If you want to use your own domain name, such as www.example.com, instead of the cloudfront.net domain name that Amazon CloudFront assigned to your distribution, you can add an alternate domain name to your distribution for www.example.com. You can then use the following URL for /images/image.jpg: http://www.example.com/images/image.jpg.
When you add alternate domain names, you can use the wildcard * at the beginning of a domain name instead of specifying subdomains individually. For example, with an alternative domain name of *.example.com, you can use any domain name that ends with example.com in your object URLs, such as www.example.com, product-name.example.com, and marketing.product-name.example.com.
For web distributions, you can configure Amazon CloudFront to require that viewers use HTTPS to request your objects, and even automatically redirect users from an HTTP endpoint to the HTTPS endpoint for your distribution. This results in connections between users and Amazon CloudFront being encrypted. You also can configure Amazon CloudFront to use HTTPS to retrieve objects from your origin so that connections are encrypted when Amazon CloudFront communicates with your origin from edge locations and Regional Edge Caches.
Here is the process that is followed when Amazon CloudFront receives a request for an object, and you require HTTPS to communicate with both your users and your origin:
AWS Certificate Manager (ACM) is designed to simplify and automate many of the tasks that are traditionally associated with management of SSL/TLS certificates. ACM takes care of the complexity surrounding the provisioning, deployment, and renewal of digital certificates, with certificates being provided by Amazon’s certificate authority (CA), Amazon Trust Services.
You can provision SSL/TLS certificates and associate them with Amazon CloudFront distributions. First, you provision a certificate using ACM and then deploy it to your Amazon CloudFront distribution. ACM also has the ability to manage certificate renewals for you. ACM allows you to provision, deploy, and manage the certificate with no additional charges. There are, however, additional charges when using Amazon CloudFront and HTTPS.
To use an ACM Certificate with Amazon CloudFront, you must request or import the certificate in the US East (N. Virginia) Region. ACM certificates in this region that are associated with an Amazon CloudFront distribution are disseminated to all the geographic locations configured for that distribution.
If you need to remove objects from an Amazon CloudFront Regional Edge Cache before they expire, you can invalidate the object from the Amazon CloudFront Regional Edge Caches. There is no charge for the first 1,000 invalidations per month; you pay for each invalidation over 1,000 in a month.
To invalidate objects, you can specify either the path for individual objects or a path that ends with the * wildcard, which might apply to one object or many objects. The following are examples of specific object and wildcard invalidations:
An alternative to invalidating objects is to use object versioning to serve a different version of the object that has a different fully-qualified name (name including path).
Amazon CloudFront can create log files that contain detailed information about every user request that Amazon CloudFront receives. Access logs are available for both web and RTMP distributions. When you enable logging for your distribution, you specify the Amazon S3 bucket in which you want Amazon CloudFront to store log files.
You can store the log files for multiple distributions in the same bucket. When you enable logging, you can specify an optional prefix for the file names so that you can keep track of which log files are associated with which distributions.
AWS Lambda@Edge is an extension of AWS Lambda, a compute service that lets you execute functions that customize the content that is delivered through Amazon CloudFront. You can author functions in one region and execute them in AWS Regions and edge locations globally, without provisioning or managing servers. Just as with AWS Lambda, Lambda@Edge scales automatically, from a few requests per day to thousands per second. Lambda@Edge processes requests at edge locations instead of an origin server, which can significantly reduce latency and improve the user experience.
When you associate an Amazon CloudFront distribution with a Lambda@Edge function, Amazon CloudFront intercepts requests and responses at Edge locations. Lambda@Edge functions execute in response to Amazon CloudFront events in the region or edge location that is closest to your customer.
You can execute AWS Lambda functions when the following Amazon CloudFront events occur:
The following are some example use cases for Lambda@Edge:
With Amazon CloudFront Field-Level Encryption, you can encrypt sensitive pieces of content at the edge before requests are forwarded to your origin servers. The data is encrypted using a public key that you supply. That data can then be decrypted inside your application using the associated private key. In an era of agile dev/ops teams developing large applications on the basis of a range of APIs and loosely-coupled micro-services, isolating sensitive data when it first enters the application, and only decrypted it at one or a few key points in its lifecycle, can significantly improve application security while enabling greater agility in secure application development.
You configure Amazon CloudFront field-level encryption by going through a series of steps that include uploading the private key, creating encryption profiles, setting up a configuration that makes use of those profiles, and then linking that configuration to cache behavior. You can specify up to 10 fields in an HTTP POST request that are to be encrypted, and you can set it so that different profiles are applied to each request based on a query string within the request URL.
When all is properly configured, sensitive data fields coming from end users will be encrypted automatically at the edge, and then the body of the content including both encrypted and unencrypted data can flow to and throughout your application. Only at the point where the application—most likely, a particular micro-service carefully designed and managed to deal with the sensitive data—needs to read the original data, is data decrypted and utilized. Meanwhile, all other parts of the application, as well as general logging, monitoring, performance tracing facilities, will never inadvertently examine or record or expose the sensitive data elements that arrived from the user if configured correctly.
In this chapter, you learned about Amazon CloudFront, a global CDN service that integrates with other AWS products to give developers and organizations an easy way to distribute content to end users with low latency, high data transfer speeds, and no minimum usage commitments.
You learned about the different capabilities and features of Amazon CloudFront, including edge locations, Regional Edge Caches, web and RTMP distributions, origin servers, dynamic content delivery, access logs, Lambda@Edge and field-level encryption.
CDNs are one of the main ways to provide consistent performance to users who are geographically dispersed across the globe. They can also reduce load on your origin server and provide increased web application scalability, performance, and security.
Know the basic use cases for Amazon CloudFront. Know when to use Amazon CloudFront, such as for popular static and dynamic content with geographically-distributed users.
Know how Amazon CloudFront works. Amazon CloudFront optimizes downloads by using geolocation to identify the geographical location of users and then serving and caching content at the edge location closest to each user.
Know how to create an Amazon CloudFront distribution and what types of origins are supported. To create a distribution, you specify an origin and the type of distribution, and then Amazon CloudFront creates a new domain name for the distribution. Origins supported include Amazon S3 buckets or static Amazon S3 websites and HTTP servers located on Amazon EC2 or in your own data center.
Know how to use Amazon CloudFront for dynamic content and multiple origins. Understand how to specify multiple origins for different types of content and how to use cache behaviors and path strings to control what content is served by which origin.
Know what mechanisms are available to serve private content through Amazon CloudFront. Amazon CloudFront can serve private content using Amazon S3 OAIs, signed URLs, and signed cookies.
Know how access logs for Amazon CloudFront work. Amazon CloudFront can create log files that contain detailed information about every user request that Amazon CloudFront receives.
Know how and why you would invalidate objects from Amazon CloudFront. If you need to remove objects from an Amazon CloudFront edge location cache before the object expires, you can invalidate the object from the Amazon CloudFront edge location caches.
Know how Lambda@Edge works and the use cases where it would be useful. Lambda@Edge is an extension of AWS Lambda, a compute service that lets you execute functions that customize the content that is delivered through Amazon CloudFront. You can execute AWS Lambda functions when Amazon CloudFront events occur.
Know why and how you would use ACM. ACM is designed to simplify and automate many of the tasks that are traditionally associated with management of SSL/TLS certificates. To use an ACM certificate with Amazon CloudFront, you must request or import the certificate in the US East (N. Virginia) Region.
Know why and how you would use HTTPS with Amazon CloudFront. For web distributions, you can configure Amazon CloudFront to require that viewers use HTTPS to request your objects.
The best way to become familiar with Amazon CloudFront is to build your own Amazon CloudFront distribution, which is what you will be doing in this section.
For assistance completing these exercises, refer to the Amazon CloudFront user guide located at: https://aws.amazon.com/documentation/cloudfront/.
What is a Content Delivery Network (CDN)?
You are using Amazon CloudFront for your website. A user requests content, which is routed to a local edge location. What happens before the requested content is available at that edge location?
Amazon CloudFront can work with which of the following origin servers? (Choose three.)
What is the default expiry time for an Amazon CloudFront cache?
What does the Amazon CloudFront invalidation feature do?
What does an Amazon CloudFront cache behavior do?
What does Amazon CloudFront do when it uses HTTP Live Streaming (HLS), HTTP Dynamic Streaming (HDS), Smooth Streaming, and MPEG DASH formats for streaming video?
When adding an alternate domain to your Amazon CloudFront distribution, the wildcard * can be used to do what?
When using AWS Certification Manager (ACM) and Amazon CloudFront, you configured your certificate within ACM. When you try to enable Amazon CloudFront, however, you do not see the certificate available for use. What could be the problem?
How can you use the wildcard * when invalidating objects with Amazon CloudFront?
What do Amazon CloudFront access logs do?