Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6
Storage Systems

THE AWS CERTIFIED SYSOPS ADMINISTRATOR - ASSOCIATE EXAM TOPICS COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:

Domain 4.0: Deployment and Provisioning
4.1 Demonstrate ability to build the environment to conform to the architected design
Content may include the following:
- Optimizing storage for performance
4.2 Demonstrate ability to provision cloud resources and manage implementation automation
Content may include the following:
- Deploying storage solutions on the cloud
- Understanding the different storage options available on the AWS platform
- Optimizing storage for performance

Domain 5.0: Data Management
5.1 Demonstrate ability to implement backups for different services
Content may include the following:
- Working with snapshots
- Using AWS Storage Gateway

5.2 Demonstrate ability to enforce compliance requirements
Domain 6.0: Security
6.2 Ensure data integrity and access controls when using the AWS platform
Content may include the following:
- How to secure data at rest
- Using Amazon Simple Storage Service (Amazon S3) lifecycle policies
- Using Amazon Glacier vaults

images

Understanding Different Storage Options

One of the advantages of running your application on AWS is the wide variety of storage options that are available to you. AWS provides solutions to allow your team to optimize for the business needs of specific data. Variances in read or write capacity, simultaneous access options, storage cost, durability, availability, and retrieval speeds all become factors in choosing the right service. For any given enterprise application, you may find yourself managing many, if not all, of these different options as you optimize for different use cases.

Block Storage vs. Object Storage

The first factor in choosing a storage solution concerns block storage vs. object storage. Although this is primarily an architecture discussion, as systems operators you need to be aware of their primary differentiators in order to optimize usage on your end.

The key difference between block and object storage systems is the fundamental unit of storage. For block storage, each file (object) being saved to the drive is broken down into “blocks” of a specific size. If you were using 512 byte blocks and had a 10 KB file, you would create 20 blocks worth of drive space. Each block is saved and tracked independently of the others.

For object storage, each file is saved as a single object regardless of size. A 10 KB file is saved as a single 10 KB object, and a 30 MB file is saved as a single 30 MB object.

The best way to visualize this behavior is to compare what happens on a file update.

Updating a File (Block Storage vs. Object Storage)

In this scenario, we begin with a chapter for a book. After all of the graphics are embedded in the document, and the file size happens to be 30 MB. After it is saved to storage, the editor does one final review and she finds a capitalization error on the last page of the chapter. She finishes the edit and overwrites the file with the correction (the update is the key to this scenario).

If she was using block storage (assuming the 512-byte block), a 30 MB file is broken down into 60,000 blocks. Her edit of a single capitalization error only affected one out of the 60,000, so her save action only updates the one 512-byte block (or the delta—the difference between the documents) and her save is complete. This is a great method for objects with regular updates because the save only has to account for the delta and the rest of the object is untouched.

If she was using object storage, there is no concept of just updating the delta. Each save creates a unique, discrete object. An update will simply discard the previous object and upload all 30 MB as a brand new object.

On the surface, the object storage approach seems like a waste of network bandwidth and time required for loading a complete document just to update a single character. As you look closer at the specific use cases for block vs. object storage, however, you will understand why they behave so differently and why one would be preferable to the other in certain circumstances.

Block Storage Basics

To make it easier to understand, block storage is mountable drive storage. AWS provides unformatted drive space, then based on your operating system configuration, you choose the format, block size, and other factors. In most cases, these formatting decisions are pre-made by the Amazon Machine Image (AMI) selected, but AWS does not prevent you from making your own selections.

Block storage volumes are provisioned and attached to Amazon Elastic Compute Cloud (Amazon EC2) instances. In the case of Amazon Elastic Block Store (Amazon EBS), the volume’s lifecycle is independent of the instance itself.

Content management in block storage is controlled entirely by the operating system of the instance to which the volume is attached. AWS has no visibility inside the individual blocks; it only has visibility into the volume properties (such as volume encryption, volume size, IOPS).

Object Storage Basics

Unlike block storage, which functions as provisioned, mountable volumes controlled by the operating systems, object storage is 100 percent Application Programming Interface (API) driven. Applications with the proper credentials and authorization make calls to the object storage services for reads, writes, updates, deletes, and other functions.

Because all actions in and out of AWS object storage are API controlled, AWS can provide you with much more granular control over your content and more visibility of actual usage.

With the notable exception of AWS Snowball, all AWS object storage systems are regional in scope. Content placed in Amazon Simple Storage Service (Amazon S3) or Amazon Glacier is automatically replicated across multiple facilities in multiple Availability Zones, providing data durability that is not possible with single datacenter storage solutions. A later section in this chapter provides more information on durability and availability in object storage.

Because AWS has more visibility into your interactions with content in object storage, AWS does not require any pre-provisioning for object storage (again with the notable exception of AWS Snowball). You are billed only for the exact volume in storage for the hours that it is in storage.

Retrieval Times (Hot vs. Cold Storage)

When choosing the right storage medium, for both block storage and object storage, the second criteria to consider is retrieval times. In many cases, this consideration is framed in terms of hot and cold storage options.

The classic example is Amazon S3 vs. Amazon Glacier (which is examined in more detail later in this chapter). Amazon S3 is a highly available system where content can be delivered to many requests simultaneously for high and immediate parallel throughput. Amazon Glacier is designed for cold storage of content that is not needed regularly, and when it is needed for auditing or analysis, operations can wait three to five hours for the content to be made available.

Block storage does not have such a dramatic differentiation, but there are still media selections that can have an effect on the volume and velocity of requests in and out of the provisioned drives. The easy example here that you might see on the certification exam is the difference between a 1 TB magnetic volume and a 1 TB Solid State Drive (SSD) general purpose (gp2) volume in Amazon EBS. The first has a hard cap of no more than 100 IOPS while the gp2 volume has provisioned 3,000 IOPS as a baseline. If your data needed high throughput without bottlenecks, you would want to stay in the gp2 line of volumes.

Cost Efficiency

When choosing and managing the proper storage for your content, the third major criteria is cost. This book does not list actual prices because they can be different across the various regions and because AWS is constantly lowering prices wherever they can engineer new efficiencies. Be aware of the current pricing sheet for your AWS Region of choice.

The driving principles of AWS pricing are that customers only pay for what they provision, and customers with efficient operations only provision what they are actually using.

With block storage, this is specifically an operational consideration. Block storage volumes are provisioned for a specific size, and you are billed based on provisioned size each hour. AWS pricing charts show block storage as a monthly rate to avoid publishing a long list of zeros after a decimal place, even though the hourly rate is the actual billing that is applied to your volume.

Cost-efficient methods will focus on not overprovisioning volumes based on projected storage needs; rather, they will provision based on actual amounts and then grow the storage as needed. Provisioning a 3 TB drive based on estimates of needing that much in the next few years, even though current data sizes are only around 100 MB, would be a classic example of a cost scenario that systems operators can and should address.

With object storage, there is no concept of pre-provisioning (once again with the exception of AWS Snowball). AWS knows exactly how much data you have in the object storage systems, and you are billed for exactly that amount. You never have to tell AWS in advance how much you plan on storing, and your billing stops once you delete the content.

There are significant cost efficiencies that can be leveraged when examining the various types of object storage. Based on durability needs, retrieval times, and frequency, you will be able to improve the cost efficiency of your application dramatically. These different options are discussed in the Amazon S3 section of this chapter.

Block Storage on AWS

From a systems operation viewpoint, block storage must be provisioned, inventoried, attached, secured, monitored, duplicated, shared, resized, restored, detached, and decommissioned.

In some cases, the methods that you will use are common throughout all types of block storage. Let’s examine the various types and go through AWS operations.

Amazon Elastic Block Store (Amazon EBS)

Amazon EBS is the foundational block storage service from AWS. It is built as network-attached storage that is variable in size from 1 GB to 16 TB. As mentioned earlier, volumes are provisioned by the operator and then attached to instances Figure 6.1 shows how the command-line interface is used to create an Amazon EBS volume.

Image shows program code design of AWS CLI to create an Amazon EBS volume. It has codes with Availabilityzone, Encrypted, Volumeype, VolumeId, State, Iops, and so on. — **FIGURE 6.1** AWS CLI to create an Amazon EBS volume

Amazon EBS volume usage is traditionally divided into boot volumes and data volumes. While boot and data can be provisioned on a single Amazon EBS volume, resizing a single volume requires a complete shutdown of the instance. For stateless servers that effectively ignore any data storage, however, there is no reason to have a separate data volume.

Once created, a volume’s lifecycle is fundamentally independent from the Amazon EC2 instances. However, boot volumes are often destroyed when instances are terminated. Figure 6.2 shows an Amazon EC2 instance with Amazon EBS volumes attached.

Image shows connection between host computer (Amazon EC2 instance: block device mapping and instance store) with Amazon EBS volumes (Root device volume and vol-xxxxxxxx). Block device mapping connected to vol-xxxxxxxx and instance store. — **FIGURE 6.2** Amazon EC2 instance with attached Amazon EBS volumes

The fact that Amazon EBS is network-attached is the distinguishing attribute of the service. Because Amazon EBS is a mounted volume, minimizing latency is critical to AWS operations. To ensure the lowest latency possible, volumes cannot be mounted across Availability Zones.

For example, an account has an Amazon EC2 instance running in us-west-2a and two unattached volumes: One is running in us-west-2a, and the other is running in us-west-2b. Only the volume in -2a is allowed to be mounted. If the operator is in the Amazon EBS Console, only the -2a volume will be selectable. If the operator is using the API, the attempt to attach the volume will be rejected. Figure 6.3 demonstrates the AWS CLI with a rejected volume attachment.

Image shows program script designed for AWS CLI with a rejected volume attachment showing screen An error occurred (InvalidParameterValue when calling the AttachVolume operation: Value...). — **FIGURE 6.3** AWS CLI with a rejected volume attachment

Amazon EBS Volume Types

There are five types of Amazon EBS volumes that can be provisioned as of this writing. Table 6.1 details the various Amazon EBS volume types.

TABLE 6.1 The Various Amazon EBS Volume Types

Volume Type	API Reference	Sizes	Max IOPS*	Max Throughput	Price (us-east-1)
General Purpose SSD	gp2	1 GB–16 TB	10,000	160 MB/s	$.10/GB-month
Provisioned IOPS SSD	io1	4 GB–16 TB	20,000	320 MB/s	$.125/GB-month $.065/provisioned IOPS
Throughput Optimized HDD	st1	500 GB–16 TB	500	500 MB/s	$.045/GB-month
Cold HDD	sc1	500 GB–16 TB	250	250 MB/s	$.025/GB-month
Magnetic (Previous Gen)	standard	1 GB–1 TB	40-200	40-90 MB/s	$.05/GB-month $.05/million I/O
* io1/gp2/standard based on 16 K I/O size, st1/sc1 based on 1 MB I/O packet size.

The most visible difference between types is whether they are SSD or magnetic Hard Disk Drives (HDD). Do not assume that SSD is always the best performance choice for your Amazon EC2 instances. If you will be using Amazon EBS, you must decide which Amazon EBS volume type is the best for your business value.

For the certification exam, it is important to align the volume type’s characteristics to the values and desired outcome presented in the question. If you are looking for the lowest-cost option, provisioned IOPS volumes are almost always the wrong choice. If maximum throughput is the need, you should avoid choosing default magnetic drives.

Pricing

Let’s start with the last column: Price. Although actual costs vary region to region, the prices in Table 6.1 are good representations of comparative costs.

A common statement that we hear about Amazon EBS is that customers choose gp2 for speed and magnetic for price. With the introductions of st1 and sc1 volumes, there are now magnetic drives that become the most performant options for some workloads while still reducing costs.

Determining the cheapest Amazon EBS volume is more than just choosing the lowest GB-month rate—there is also the minimum volume size to consider. The Cold HDD (sc1) is the lowest cost at $.025 GB-month, but there is a minimum volume size of 500 GB. For example, if you only need a 50 GB volume for all of your needs, assuming cost is your only concern, going with the legacy default magnetic drive at 50 GB will be the least expensive. The flex point for choosing an Amazon EBS volume type is 250 GB. Once you require that much space, changing from default to sc1 will start saving money while providing significantly improved performance. Don’t let the label “cold” keep you from using this volume type in production systems. Compared to the legacy default volumes, sc1 is a significant improvement on all performance metrics.

Provisioned IOPS is often the most expensive option. As a rule, these drives should not be used arbitrarily in operations unless you are specifically ensuring the availability of more than 3,000 IOPS. Provisioned IOPS volumes not only have the highest GB-month storage cost, but they have the additional $.065 per provisioned IOPS per month. A 10,000 IOPS volume will include a monthly cost of $650 above and beyond the monthly cost for the storage itself.

Maximum IOPS

IOPS is the measurement of how many read/write actions can happen against the volume per second. The ratings for the volumes are based on a standard packet size, although you can choose the actual packet size that your application will use based on your needs. Because of the number of variables in play, it is important to understand the fundamentals of calculating potential IOPS.

The ratings on the volumes are based on 16 KB packets for SSD volumes and 1 MB packets for HDD volumes. The reason for the difference is to optimize each media type based on their specialty. In this case, SSD volumes handle many small packets with greater efficiency, while HDD is best suited for fewer large packets.

You can choose the actual packet size your application uses, but there are some limitations:

Packet sizes for SSD are capped at 256 KB.
HDD volumes can go as large as 1 MB.

Maximum Throughput

Throughput is the product of packet size and IOPS. Even though st1 and sc1 volumes have relatively small maximum IOPS (hundreds compared to thousands for gp2 or io1), the total throughput for st1 and sc1 far outstrips the SSD options because of the significantly larger packet options.

For large numbers of small random writes, gp2 (or io1) will provide the most throughput due to the higher IOPS. Using the smallest packet size of 16 KB, an SSD with 10,000 IOPS would sustain 160 MB/s throughput (which is the maximum throughput for gp2). Using the same 16 KB packet with an st1 volume, even at max IOPS with only 500 IOPS available, the volume would reach capacity at a paltry 8 MB/s—only 5 percent of the throughput of the SSD options.

By changing the packet size to the maximum allowed, however, the st1 volume running 1 MB packets at the same 500 IOPS is now able to read/write at 500 MB/s or over 300 percent of the maximum throughput of gp2.

Provisioning Amazon EBS

When provisioning Amazon EBS volumes, the systems operator declares the following:

Volume type
Size
Availability Zone
IOPS (only if io1)
Snapshot ID (if this volume is a copy of a saved snapshot)
Encryption options (including AWS Key Management Service [AWS KMS] key ID)

Once a drive is provisioned, the values are not editable. If you want to change a volume from gp2 to io1, you must take a snapshot of the original gp2 volume and create a new io1 volume based on that snapshot ID (decommissioning the original gp2 drive as it is no longer needed).

The same process applies to any of the other properties not on the list that you would want to change (such as volume encryption or volume size). Changing these properties requires taking a snapshot, creating a new volume, and then deleting the old one.

Calculating IOPS and Throughput

Amazon EBS gp2 volumes have a set amount of IOPS based on volume size: 3 IOPS/GB. For example, a 100 GB volume has a 300 IOPS baseline, a 1 TB volume has 3,000 IOPS, and so on up to a maximum of 10,000 IOPS with a 3.33 TB volume. Any volume larger than that will be throttled at the 10,000 IOPS maximum per volume.

Note that the maximum output of an Amazon EC2 instance is not tied to the IOPS of individual volumes. For example, let’s say that your application is using a c4.2xlarge Amazon EC2 instance and you need approximately 15 TB of storage for your database. You could provision a single 15 TB gp2 Amazon EBS volume with 10,000 IOPS maximum, or you could provision three 5 TB gp2 volumes with 10,000 IOPS each and stripe them together at Raid0. This would present a logical volume of 15 TB and give you an aggregate IOPS of 30,000, assuming that you were distributing read/writes randomly throughout the array.

Bursting IOPS and Throughput

Both gp2 and st1 volumes have bursting models for drive communication. In the case of gp2, the standard formula for 3 IOPS/GB provisioned is the baseline until you get to 10,000 IOPS (3.33 TB volume). For volumes under 1 TB (< 3,000 IOPS), AWS provides the options to occasionally burst up to 3,000 IOPS for a short period of time. This means that even a 50 GB volume, which has a baseline performance of 150 IOPS, could operate on demand at 3,000 IOPS until the bursting window closes.

The ability to burst is based on I/O credits accrued by the volume at a rate of three credits per GB provisioned per second. Once bursting is triggered, the credits are consumed at three credits per IOPS. The maximum amount of credits a volume can keep is approximately 54 million, which is enough for max bursting (3,000 IOPS) for 30 minutes. Once a volume is out of credits, AWS will throttle communication back down to the baseline amount.

st1 has a similar bursting model, but based on MB/s, not IOPS. In the case of st1 volumes, the cap is 500 MB/s, which becomes the bursting cap with credits accrued in a similar manner. Throughput credits accrue at 40 MB/s per TB provisioned. A volume then spends credits to burst to 500 MB/s until credits are depleted. Note that the 500 MB/s cap only applies to volumes that are equal to or larger than 2 TB; volumes that are smaller have reduced cap sizes based on the formula of 250 MB/s per provisioned TB. Thus a 500 GB volume (smallest size allowed in st1) would only be able to burst to 125 MB/s before being capped, which is still significantly more throughput than its baseline of 20 MB/s.

Provisioned IOPS

Provisioned IOPS is the primary feature of io1 volumes. Although there are no minimums on how many IOPS can be provisioned, as a general rule it is considered to be a costing anti-pattern to use io1 with less provisioned IOPS than a similarly sized volume for gp2 would have for baseline IOPS. Although there may be edge cases where an enterprise might choose that scenario, for the certification exam, consider any of those cases to be “zebras” instead of “horses.”

There is another good practice minimum to consider when provisioning IOPS when latency to the drive matters. To get the best I/O latency to an io1 volume, it is recommended to have at least two IOPS provisioned for each GB; with any less, you might encounter latency issues. For example, a 3,000 GB volume should be provisioned with at least 6,000 IOPS for best performance.

There are maximums on how many IOPS can be provisioned: a ratio of 50 IOPS per GB provisioned up to a max of 20,000 IOPS per volume. Thus a 100 GB volume could only be provisioned for 5,000 IOPS, but anything over 400 GB can be provisioned for the maximum amount.

Amazon EC2 Capacity

You should pay attention to the IOPS/throughput capacity of the Amazon EC2 instances that will use the volume. Technically, a systems operator could provision five volumes with all at 20,000 IOPS to give a 100,000 IOPS possible aggregate capacity and attach them to a single instance. However, if that instance was an m4.large, there are only 3,600 maximum IOPS that can come out of that instance. It’s the equivalent of buying a race car and deciding to drive it in a residential area—there is no way to use the full potential of the one because of the limits of the other.

In fact, there are no instances (as of this printing) that could use that kind of potential. The maximum IOPS in an Amazon EC2 instance caps at 65,000. Any application that needs more than 65,000 IOPS will either need to shard the data transactions onto multiple Amazon EC2 instances or consider instance store, which is discussed later in this chapter.

Throughput also has Amazon EC2 limitations. Some instances can go as high as 12 GB/s for maximum throughput, but only if the instance has been launched with the Amazon EBS Optimization property turned on. (Refer to Chapter 4, “Compute,” for more information on Amazon EBS optimization on Amazon EC2.)

Mounting Amazon EBS Volumes

Amazon EBS volumes operate independently from Amazon EC2 instances. Although they can be created at launch time by the AMI and can be terminated simultaneously as well, it is important to remember from an operations perspective that Amazon EBS and Amazon EC2 are separate services. A key benefit is that Amazon EC2 enables the resizing of instances (as mentioned in Chapter 4). Amazon EBS volumes persist when an instance is stopped, so the instance type can be changed.

When Amazon EC2 instance types are changed, the underlying hardware is always swapped out. Amazon EBS persistence makes this changing of infrastructure seamless to the operator.

To mount an Amazon EBS volume, use the following commands:

aws ec2 attach-volume --volume-id vol-1234567890abcdef0
--instance-id i-abc1234 --device /dev/sdf

Amazon EBS volumes can only attach to one Amazon EC2 instance at a time. Although there is no limit to how many instances a volume may have been attached to, there is no concurrent attachment option. For use cases such as shared volumes where multiple instances must communicate with a single volume, the best architecture is to use Amazon Elastic File System (Amazon EFS), which is discussed later in this chapter.

Amazon EBS Snapshots

Amazon EBS volumes are redundant arrays and can withstand the loss of individual drives without loss of data. As previously mentioned, however, Amazon EBS volumes live in one—and only one—Availability Zone. If data durability is important, systems operations teams should include volume snapshots as part of their regular operational configuration.

Snapshots create permanent images of the volume and store them at the regional level (across multiple Availability Zones) in Amazon S3. It is important to realize that the snapshots are not stored as Amazon S3 objects and cannot be viewed using the Amazon S3 Console or APIs. They are unique Amazon EBS images and can only be accessed through the Amazon EC2 service.

Because they are not true Amazon S3 objects, snapshots only store the sectors that have changed (the delta) from the previous snapshot, allowing the total volume of data stored (and thus billed) to be dramatically reduced.

Snapshots for data volumes do not require the volume to be stopped (or quiesced), but choosing to hold writes to the volume will provide a much more accurate dataset. Boot volumes must be stopped before they can be snapped—another key reason to keep boot and data volumes separate.

Snapshots are locked to the region where they were created. If your operational policy requires keeping copies of data in different regions, you must initiate a copy of that image to the target region. The new snapshot will have a new ID. This fact is important when creating automated deployment methods because new volumes cannot be created from image IDs that are not in the region where the new volume is being created. Figure 6.4 demonstrates the AWS CLI command used to create the snapshots.

Images shows program script designed for AWS CLI command to create snapshots containing Description, Encrypted, VolumeID, State, VolumeSize, Progress, and so on. — **FIGURE 6.4** AWS CLI command to create snapshots

Sharing Snapshots

Your team can share snapshots to other accounts both in and out of your organization by managing the permissions on those images. After a volume is shared, anyone granted access to the volume will be able to make a complete copy of the volume (there is no partial sharing).

Public sharing can only happen on unencrypted volumes. If your volume is encrypted, you can share the volume to specific accounts as long as you provide the custom Customer Master Key (CMK) used to encrypt the volume. Volumes encrypted with your account’s default CMK can never be shared.

Creating an AMI from a Snapshot

A snapshot of a root volume has most of the pieces needed to create an AMI. What is left is to put the additional metadata pieces in the system and register the snapshot as an AMI. Using the register-image command in the AWS CLI, you can identify which snapshot you want to use, along with the new name for the AMI.

Deregistration will cause the image to no longer be available as an AMI. Figure 6.5 demonstrates the AWS CLI used to describe your snapshots.

Images shows Program script using register image command designed for AWS CLI command to describe snapshot containing “OwnerAlias”: “amazon”, “Description”: “Business/Industry Summary (Windows)”, and so on. — **FIGURE 6.5** AWS CLI command to describe snapshots

Securing Amazon EBS

All Amazon EBS volumes are automatically locked to the account that created them and the Availability Zone where they were created. Unless the volume owner makes a snapshot and shares the snapshot with another account, there is no way for an instance in another account to have direct access to any data on the original Amazon EBS volume.

Customers who want additional security on their volumes can encrypt the data at either the client level or the volume level. Client-level encryption is done in the operating system of the instance to which the volume is currently attached. Because this is controlled by the operating system of the Amazon EC2 instance, AWS has no access or visibility into what keys or algorithms are being used. From an operational perspective, this means that customers using client-side encryption must be responsible for maintaining availability and durability for all keys in use.

AWS offers volume-level encryption as well. In the case of volume-level encryption, all data is encrypted at rest and in transit between the instance and the volumes. With Amazon EBS volume-level encryption, each volume gets a unique encryption key only used by that volume or other volumes created from snapshots of that volume. AWS uses the industry-standard Advanced Encryption Standard (AES)-256 algorithm to encrypt data and the volume keys. For greater security, customers can choose to use AWS KMS for the volume key management, which can be combined with AWS Identity and Access Management (IAM) permissions and AWS CloudTrail logging to provide greater protection and visibility.

Monitoring Amazon EBS

Basic volume monitoring (included with all volumes) happens in five-minute intervals. The metrics cover the fundamental pieces of disk management: number of bytes (read/write), I/O per second (read/write), idle time, queue length, and so on.

For volumes with provisioned IOPS, an additional metric of throughput percentage is also available. Throughput percentage is a critical metric because it shows a percentage of actual throughput used compared to provisioned. For cost analysis, this number must be closely regulated to ensure that you are not overpaying for IOPS.

For gp2, st1, and sc1 volumes, burst balance is also available. This feature provides visibility into burst bucket credits that are available to the volume.

Decommissioning Amazon EBS

Amazon EBS volumes are designed for persistence—as such, they can survive the termination of associated Amazon EC2 instances or even be directly detached from instances (as long as they are not the root volume). Detaching a volume does not change the billing for the volume, and it continues to count toward the storage limit for the account that owns the volume. A detached volume can be reattached to any other Amazon EC2 instance in the same Availability Zone of the same account.

Instance Store

For many Amazon EC2 instance types, AWS provides an additional direct-attach block storage option through instance store (also referred to as ephemeral storage). From an operational perspective, instance store has some very specific advantages and some considerations.

Instance store is ideal for temporary storage of information that changes frequently (for example buffers, caches, scratch data, and other temporary content) or for data that is replicated across a fleet of instances (such as a load-balanced pool of web servers).

Instance Store vs. Amazon EBS

Amazon EBS is designed for persistence, whereas instance store is designed for comparatively inexpensive, disposable, usually high-speed communication. Instance store lives on the same hardware as the Amazon EC2 instance itself and is included in the hourly cost of the instance. This also means that there is no external network overhead that must be factored into communication potential.

The primary consideration is also a potential benefit: The fact that the storage is on the same hardware as the instance means that the storage does not have any persistence beyond the life of the Amazon EC2 instance. If the instance is stopped and started, all data on the instance store volume is lost.

Root volumes can run on Amazon EBS or instance store, but if the root volume is on instance store, there is no “stop” option, as a stop will terminate the instance.

Provisioning Instance Store

Unlike Amazon EBS, instance store is not flexible when considering volume size, media type, or throughput. Each instance type has a single specific storage option, if instance storage is available for that instance—some instances only support Amazon EBS for volume types and do not have instance storage options. Check the Amazon EC2 documentation for current instance storage options for each instance type.

Instance store-backed volumes are only available on creation of the Amazon EC2 instance. If an instance is launched without the volumes, they cannot be added later.

Instance Store Security

Security concerns are simplified to some degree when using instance store. The data never leaves the hardware, so encryption in transit is not a consideration. The instance cannot be detached and reattached to another instance. All encryption must be done at the operating system level.

These considerations mean that all security controls are managed at the operating system and above, hence it’s your responsibility to secure. Refer to the shared responsibility model described in Chapter 3, “Security and AWS Identity and Access Management (IAM),” for more information.

AMIs on Instance Store

To create an instance store-backed AMI, start from an instance that you have launched from an existing instance store-backed Linux AMI. Customize the instance to suit your needs, then bundle the volume and register a new AMI. Figure 6.6 summarizes the process of creating an AMI from an instance store-backed instance.

Image shows process starts with launch instance from AMI #1 to host computer (instance #1 and root volume + bundle) to upload in S3 bucket and register new AMI #2, and again launch instance to host computer. — **FIGURE 6.6** Process to create an AMI from an instance store-backed instance

The bundling step is different compared to Amazon EBS, with which you can create an AMI directly from an instance or a snapshot.

Once an instance store AMI is created, all root volumes will be built on the instance store when using that AMI.

The bundling process is a multi-step function that we are not going to cover in this book, as it falls outside the scope of the exam. Nevertheless, you should look up the online documentation and practice making an instance store-backed AMI to understand the process.

Amazon Elastic File System (Amazon EFS)

Amazon EFS is not a true unformatted block service but rather a fully managed file storage service. Multiple Linux Amazon EC2 instances connect to Amazon EFS through standard operating system I/O APIs. Amazon EFS can also connect to on-premises Linux servers through AWS Direct Connect.

Amazon EFS is designed to be automatically flexible in sizing, growing from gigabytes of shared storage up to petabytes in a single logical volume.

Amazon EFS vs. Amazon EBS

When comparing the two options, it is important to look closely at the primary use cases. Amazon EFS is designed to provide regional, durable, multi-user file systems. Amazon EBS is designed for single Amazon EC2 volumes.

Because of its design, Amazon EBS is significantly cheaper when looking at per-GB cost of storage. If there is no need for multiple instances to attach to the storage, Amazon EBS will likely be a good place to start your operation.

Once you have multiple users, however, Amazon EFS will often be a much less expensive option. Cost savings are more apparent once you consider the regional redundancy and durability that is rolled into Amazon EFS, bypassing the need to build your own highly available, regionally redundant file system. This becomes even more important when you consider the maintenance overhead that you avoid when using Amazon EFS over a DIY solution.

Provisioning Amazon EFS

After creating Amazon EFS using the Create API, you create endpoints in your Amazon Virtual Private Cloud (Amazon VPC) called “mount targets.” When mounting from an Amazon EC2 instance, you provide the Amazon EFS Domain Name System (DNS) name in your mount command. The DNS resolves to the mount target’s IP address. If your instance has access to the mount target in the VPC, you will then be able to access the file system.

This will be the same mount target that peered Amazon VPC instances or on-premises instances will use. They are granted access to the host VPC through the networking methods, and that will complete their ability to connect to the file system as well.

Instances then connect to the system with an NFSv4.1 client. See Figure 6.7 for the AWS CLI command to create an Amazon EFS.

Image shows program script developed for Amazon EFS create command such as “SizeInBytes”: {“Value”: 0}, “CreationToken”: “CertBook”, and so on. — **FIGURE 6.7** Amazon EFS create command

Securing Amazon EFS

The mount target you created when you provisioned the file system can use Amazon VPC security groups (the same ones that you use to secure access to Amazon EC2). All of the same rules of ingress and egress control can then be managed in detail with that toolset.

The mount targets are also specifically assigned to subnets (which by extension means that you should include mount points in a subnet in each Availability Zone in which you intend to run connected instances). This means that network Access Control List (ACL) control would also apply at the subnet level, if needed, for additional controls over an instance connecting from outside the subnet where the mount targets exist.

Encryption is not yet available natively on Amazon EFS as of this writing.

File access inside the system is controlled with standard UNIX-style read/write/execute permissions based on the user and group ID asserted by the mounting NFSv4.1 client.

Object Storage on AWS

The decision to use block storage vs. object storage is often an architecting question, as discussed in the introduction to this chapter. Understanding their core differences and maintaining content in object stores falls under the systems operator’s scope of responsibility.

In AWS, object storage systems are fully managed, abstracted services. Interactions are all API driven, which means that IAM privileges are critical to security, because these services live outside of the subnets of your Amazon VPC.

Amazon Simple Storage Service (Amazon S3)

Amazon S3 is regionally provisioned object storage service. Content saved into Amazon S3 is automatically replicated into at least three facilities. The multiple copies provide the extreme levels of durability, while the regional distribution allows for massive parallel throughput. AWS datacenters are geographically separated, providing protection against natural disasters.

There is no data minimum or data limit to total Amazon S3 content. You are billed for exactly the amount in storage for the time it is in storage, plus the charges for API calls in and out of the system and data volume costs for data transfer out of the region.

Content stored in Amazon S3 is organized in buckets. Buckets are globally unique named objects; they are unique across all AWS accounts. If Customer A has a bucket called “projectatestresults,” Customer B would be rejected in an attempt to create the exact same bucket. You cannot create a bucket in another bucket.

The reason behind the uniqueness is based on the property of Amazon S3 storage in that all content in Amazon S3 is automatically web-enabled. When content is saved into Amazon S3, there is a unique URL created for the object, which includes the bucket name as the domain, hence the need for global uniqueness.

Amazon S3 supports both virtual-hosted-style and path-style URLs to access a bucket.

In a virtual-hosted-style URL, the bucket name is part of the domain name in the URL. For example:
- http://bucket.s3.amazonaws.com
- http://bucket.s3-aws-region.amazonaws.com

In a virtual hosted-style URL, you can use either of these endpoints. If you make a request to the http://bucket.s3.amazonaws.com endpoint, the DNS has sufficient information to route your request directly to the region where your bucket resides.

In a path-style URL, the bucket name is not part of the domain (unless you use a region-specific endpoint). For example:
- US East (N. Virginia) region endpoint, http://s3.amazonaws.com/bucket
- Region-specific endpoint, http://s3-aws-region.amazonaws.com/bucket.

In a path-style URL, the endpoint you use must match the region in which the bucket resides. For example, if your bucket is in the South America (São Paulo) region, you must use the http://s3-sa-east-1.amazonaws.com/bucket endpoint. If your bucket is in the US East (N. Virginia) region, you must use the http://s3.amazonaws.com/bucket endpoint.

The fact that content is web-enabled does not mean that it is exposed to the public. Default properties of Amazon S3 keep content private for the account owner until explicit commands to share are invoked. (This is covered further later in this chapter.)

Durability vs. Availability

One of the big engineered benefits of Amazon S3 is that it is designed to provide eleven 9s of durability for content. In other words, 99.999999999 percent durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001 percent of objects.

To run the numbers in a more specific example, if you store 10,000 objects in Amazon S3, you can, on average, expect to incur a loss of a single object once every 10,000,000 years.

As a systems operator, do not confuse the durability numbers with availability numbers. Amazon S3 Standard is designed to deliver up to four 9’s (99.99 percent) of availability. AWS even provides a Service Level Agreement (SLA) with penalties if availability drops below 99.9 percent on any given month of service (which translates to approximately 44 minutes of time in a month).

To understand the difference between durability and availability, think in terms of the question each is trying to answer. Availability addresses the question, “Can I reach my content?” While AWS is very proud of our track record on availability, there are many factors that control access and are taken into account in the engineering profile. Durability addresses the question, “Does my content still exist?” From an enterprise survivability perspective, this is a much more critical question. Most businesses could survive the unfortunate loss of service for a few minutes a year; few businesses would be well positioned if their data was lost.

Durability is the primary reason that AWS recommends that all Amazon EBS volumes that contain mission-critical data be saved as snapshots into Amazon S3. Availability can always be restored as long as the data survives.

Accessing Amazon S3

Amazon S3 is accessed using APIs. Amazon S3 objects can also be accessed directly through HTTP calls with the right permissions.

Permissions to content (or buckets) are granted through IAM, or they can be granted directly to the operator through object-level permissions. Either approach is sufficient to grant access.

IAM policy documents and Amazon S3 bucket policies appear almost identical. The key difference between them is that the bucket policies include an account number. The account number is used to explicitly authorize users (or roles) from that account.

Amazon S3 – Reduced Redundancy Storage (Amazon S3-RRS)

Amazon S3 is designed to deliver eleven 9s of durability, four 9s of availability, and low-cost throughput, which solves most architectural storage needs. In some cases, however, the business need does not require durability.

For example, you have an application that manages high-definition imagery and uses a front end with a thumbnail browser. The high-definition images are best stored in Amazon S3 because they are mission-critical assets, but the thumbnails are all derivatives of the original image. In this scenario, there is very little damage to the business if a thumbnail is lost, because it can be detected and regenerated quickly from the original. Because durability is not as critical for the thumbnails, it doesn’t make sense to pay for the massive redundancies and durability of regular Amazon S3 to store them.

This is an example of the business scenarios behind Amazon S3 – Reduced Redundancy Storage (Amazon S3-RRS). Unlike standard Amazon S3, which replicates your data across at least three geographically separate facilities, Amazon S3-RRS only uses two facilities. There is no difference in the availability (99.99 percent) or the SLA between Amazon S3 and Amazon S3-RRS, but there is significant difference in the results of the durability engineering. Amazon S3 is designed for eleven 9s, whereas Amazon S3-RRS is designed for four 9s of durability.

This reduction in engineered durability means that Amazon S3-RRS can provide somewhat cheaper storage without reducing availability.

From a certification perspective, you must look at a question and determine the business value behind the content being stored. If the question refers to storing content that must be highly available and cost effective, do not immediately assume that the correct approach is to use Amazon S3-RRS. Unless the question includes some mention about the data being reproducible or the original content is stored separately, the loss of durability might be a problem.

If the question asks for the least-expensive solution as the primary goal and does not mention anything about needing durable content however, you can assume that Amazon S3-RRS would be a better choice than standard Amazon S3.

If the question specifically mentions that the data being stored can be re-created on demand, then you should absolutely look at Amazon S3-RRS as the primary candidate for your answer.

Amazon S3 Infrequent Access (Amazon S3-IA)

Amazon S3 Infrequent Access (Amazon S3-IA) solves a completely different business need. In the scenarios suited for Amazon S3-RRS, high availability and high parallel throughput were still needed, but durability was not a primary value. For Amazon S3-IA, the objects still need the massive durability of Amazon S3 and the high availability, but the throughput is not expected at all.

The example scenario in this case would be from the same imaging application mentioned previously that has an archive section for images that are more than six months old. Although they might occasionally be called up for a review, the reality is that they won’t be looked at more than once or twice a year, if ever.

For those business cases, AWS offers Amazon S3-IA. It is still Amazon S3 in terms of durability and availability, but the difference is that the way the storage is engineered for accessibility is much less expensive in general, but the cost per individual requests are notably higher. The reduced cost for availability means that Amazon S3-IA is the most cost-efficient option, as long as the objects are only retrieved infrequently.

The breakpoint for cost effectiveness in Amazon S3-IA is around two retrievals per month. If you expect more retrievals than that on average, it is cheaper to stay with standard Amazon S3.

Exam questions that specifically call out objects that are only accessed once a month or less than 20 times a year should most likely be addressed with Amazon S3-IA. As always, look for other clues that would invalidate the savings presented by Amazon S3-IA.

For example, an additional attribute of Amazon S3-IA is minimum file size (128 KB). Standard Amazon S3 has no file size minimum—a 4 KB file in Amazon S3 is only billed for the 4 KB space. The same 4 KB file stored in Amazon S3-IA would be billed at 128 KB, so even if Amazon S3-IA was normally half the cost of regular Amazon S3, in this edge case, Amazon S3-IA would be 16 times the cost of storing the same 4 KB file in Amazon S3. If the exam question specifically mentioned tiny files (less than 64 KB in size), that would be a signal not to choose Amazon S3-IA as the cost-effective answer.

The other edge case, where Amazon S3-IA is not the best answer, is when files are only stored for a few days. Amazon S3-IA has a minimum 30-day storage time. Content stored for one week is still billed for the entire month. Amazon S3-IA would normally be less expensive, but the shortened shelf life makes standard Amazon S3 a more cost-effective choice.

Versioning

The default behavior of Amazon S3 is to have a single version of an object. When the object is updated, all copies of that object are updated as well. When the object is deleted, all copies are deleted. For operational needs that require back versions to be retained, Amazon S3 versioning can be enabled.

After versioning is enabled, updates to an object do not replace the original object; rather, the new version is stored and marked as the current version of the object and the previous version is retained but no longer retrieved when the object is HTTP requested.

The previous versions are still accessible through the APIs. Operators can select those versions and mark them as the current version (restore) or delete/expire them when they are no longer needed.

Deletes in versioned buckets get divided into two operational functions. “Normal” deletes actually add another version to the stack of versions referenced by the object. In this case, it is just a marker indicating that the file does not exist. If there is an HTTP call to the object, it will return a 404 error. Deleting this 0 KB version will restore the previous version (effectively providing a “recycle bin” function).

Because normal deletes in this case only serve to add a 404 header, to actually purge the object from Amazon S3, you must perform “delete version.” This can be done version by version or wildcarded to delete the entire stack.

It is important to account for cost whenever versioning is enabled. Each object version is billed at the full Amazon S3 storage rates. Operations teams that do not have a lifecycle plan for versioning can find storage costs escalating.

As such, versioning only makes sense on the exam when the questions are concerned with object survivability in the event of intentional or accidental deletes or overwrites. If those are not concerns, then for price considerations, versioning is likely not the best choice.

Amazon S3 Security Options

Amazon S3 is API driven and is a completely abstracted service. As such, there are no security groups or Amazon VPC subnets that control access to the Amazon S3 endpoints (although individual instances can certainly block themselves from accessing Amazon S3).

Assigning the correct IAM policy document to individual users, groups, and roles is the first method for granting access to Amazon S3. You can assign permissions to individual objects or to buckets.

The advantage to using group policies is that individual buckets don’t have to be named. If new buckets are created in an account, anyone with wildcarded Amazon S3 permissions will be able to access the new bucket without edits to the policy document.

Bucket policies have the advantage of being able to grant access to other accounts without needing to create a role for them in the account that owns the bucket.

Cross-Region Replication

There are additional security concerns when dealing with Amazon S3: termination protection and tampering protection. Both of those can be addressed by enabling cross-region replication.

At first glance, cross-region replication is a tool designed to provide extreme global durability. If you have a business that for regulatory reasons needs data replicated with hundreds or thousands of miles of distance between copies, this feature will make those copies automatically.

To enable cross-region replication, you must first turn on versioning in both the source and destination buckets. After versioning is turned on, you can enable the replication by identifying the source and destination. Note that this is not bucket synchronization—deleted versions in the source bucket are not deleted from the target bucket.

How does this protect your content? Even if someone accidentally (or intentionally) deletes all content from the source bucket, operations can go to the target bucket and find the content in its entirety.

In addition, cross-region replication can be done from a source bucket in one account to a target bucket in a completely separate account. This multi-account strategy allows a security team to ensure that no single individual has permissions to both buckets where the data is copied. Even full administrator privileges on one account would not automatically give them any access to modify or delete content in the other.

Multi-Factor Authentication (MFA) Delete

The other unique element of security that Amazon S3 and Amazon Glacier offers is the ability to attach a physical or virtual Multi-Factor Authentication (MFA) device against an object or bucket.

Once this MFA device is attached, any delete statement will go through two checks. First, the properly authorized operator must still have the right IAM or object permissions to delete the object. Second, the operator must also present the current one-time password generated by the MFA device.

This feature will prevent the deletion of content by a compromised account or malicious agent. It will prevent the deletion of content by an automation method that would unintentionally have caught the protected object.

It also solves these problems without any billing expense. MFA Delete is provided at no additional charge, whereas enabling cross-region replication will incur the cost of data transfer between regions and the storage costs of the duplicate datasets.

Amazon Glacier

If we look at the use cases of standard Amazon S3, Amazon S3-IA, or Amazon S3-RRS, there may have been variable demands on durability requirements or how often an object would be retrieved. In all three primary use cases, however, high availability was always the need—being able to get to the content immediately and on demand was critical.

But what about a situation where availability is not needed, where as long as the data is protected, intact, and eventually retrievable, then the business need is solved? This is the primary scenario for Amazon Glacier.

Unlike the various flavors of Amazon S3, which are all enabled for immediate consumption online, there is no online availability for content stored in Amazon Glacier. The content is in “cold storage.” Durability, however, is designed to be the same between Amazon S3 and Amazon Glacier: eleven 9s.

Retrieving content from Amazon Glacier is done through the normal API request; you can request content for immediate retrieval at additional expense. Once content is retrieved, a copy is placed into Amazon S3 where it becomes a normal Amazon S3 object (with associated availability and cost structure). Retrieving an object does not delete the object from Amazon Glacier.

You store your data in Amazon Glacier as archives. You may upload a single file as an archive, but your costs will be lower if you aggregate your data. TAR and ZIP are common formats that customers use to aggregate multiple files into a single file before uploading to Amazon Glacier. The total volume of data and number of archives that you can store are virtually unlimited. Individual Amazon Glacier archives can range in size from 1 byte to 40 TB. The largest archive that can be uploaded in a single upload request is 4 GB. For items larger than 100 MB, customers should consider using the multipart upload capability. Archives stored in Amazon Glacier are immutable (that is, archives can be uploaded and deleted but cannot be edited or overwritten).

Security

From a security perspective, Amazon Glacier operates in a similar manner to Amazon S3. Users can be granted IAM permissions to individual archives or vaults (which are root-level storage containers similar to Amazon S3 buckets). Object-level permission is also possible, which can allow for multi-account workflows to the vaults or archives. MFA Delete offers additional protection against accidental or intentional deletions.

In addition, Amazon Glacier offers Vault Lock for customers with specific compliance needs. For example, the United States Securities and Exchange Commission (SEC) rule 17a-4 states that all “electronic records must be preserved exclusively in a non-rewriteable and non-erasable format.”

Legal clarifications to that rule require that there be no possibility of content deletion. An operations team can store content in Amazon Glacier, make copies of that archive in vaults owned by different accounts and different regions, remove all IAM permissions, attach deny policies against the objects and the vaults, and attach different MFA devices against each copy protecting against deletion, finally locking those devices in a safe owned by an escrow company. However, without Amazon Glacier Vault Lock, they still would not be compliant with SEC rule 17a-4. That is because with the right root-level permissions on all of the accounts and the cooperation of the escrow company holding the MFA devices, the content could be deleted or altered.

Although such a complex action would be difficult to do unnoticed, it is still technically possible. Vault Lock adds an additional, immutable element to the storage engine.

Enabling Vault Lock requires two separate API requests. The first request establishes the parameters of the Vault Lock in terms of duration. The second API (which must be executed within 24 hours of the first) confirms the Vault Lock option.

Once Vault Lock is confirmed, any content stored in the vault enters a locked state until that duration period passes. For example, if a duration of one year is put on a vault, content placed in the vault on January 1, 2020 would not be capable of being deleted until January 1, 2021. It does not mean that it will be deleted on that day, but under no conditions can it be deleted or altered during the year 2020.

What if the company changes its mind? The company can change its mind all it wants; however, until the duration expires for any content under Vault Lock, there is no way for the company to delete the content. What if the company’s account is delinquent? In those cases, under the AWS Terms and Conditions, AWS will continue to honor the Vault Lock conditions and not delete the content, although access to the content may be denied until the account is made current.

The only way to “disable” a Vault Lock is to delete the vault after all archives in the vault have outlived their duration protection. Given the irrevocable nature of the Vault Lock feature, it is considered a best practice not to use Vault Lock unless a company is under specific compliance language that provides no other interpretation.

A final security note on Vault Lock is that the option does not provide read-lock. Users with permissions may still download copies of the content, so sensitive material will still need proper governance models to protect against unauthorized downloads.

Lifecycle Policies

An important aspect of the relationship between Amazon S3 and Amazon Glacier is the ability to set lifecycle policies that automatically manage the transitions of content and their expiry.

For example, a company might have content that needs accessibility for the first 90 days followed by occasional usage for the next two years, at which point the records must be retained but would only be accessed in the event of an audit. Finally, the content must be deleted after a total of seven years for compliance. To accommodate this, systems operators can create a lifecycle policy on the original bucket:

On object creation plus 90 days, change attribute of content to Amazon S3-IA.
The next part of the lifecycle policy statement reads: At object creation plus two years, copy the content to an Amazon Glacier vault, and delete the content from the Amazon S3-IA bucket.
The final policy is on the Amazon Glacier vault where creation date plus five years deletes the object (the five years in Amazon Glacier, plus the two years in Amazon S3, results in a seven-year total object lifecycle).

When creating a lifecycle policy, it is the responsibility of the systems operator to ensure that only the correct content is affected by the scope of the policy. To clarify, if you have a deletion policy on an Amazon Glacier vault, you should only place content in that vault if that content has the same lifecycle rules. Failure to do so will result in content being purged unexpectedly.

Systems Operator Scenario: The Newspaper

You have just been put in charge of operations for the city Tribune’s Content Management System (CMS). The system has already been architected, and now you need to make sure that it is operating properly on the AWS Cloud.

For the needs of this chapter, let’s look at how the storage systems would be designed based on the various needs of the CMS.

Storage Needs

The Tribune’s CMS covers a number of processes of the production system: The reporter process, along with storage of all notes; the photo editing process, including storage of outtakes; the web publishing system for all stories, photos, and videos; and the archive system for research internally and externally.

Solution Breakdown

Here is a breakdown of the Tribune’s CMS processes.

Reporter

The reporter uses a word processor for all of her work. This implies a heavy read/write workload best served by Amazon EBS at first glance. However, given the fact that editors are involved and story creation is a collaborative effort, a better operational solution would be to use Amazon EFS as a mounted volume for the word processor application.

Using Amazon EFS would provide immediate shared storage for all content in the creation and editing phases of the workflow without any additional operational overhead.

After the content is fully edited and ready for publication, you could serve the content on the website from the Amazon EFS (or Amazon EBS) volume. A more scalable solution would be to load the document into Amazon S3 and have the web page call the story directly from Amazon S3, which is a more cost-efficient and scalable publishing option.

The final concern for reporters is their story notes, which are not intended for the public but must be retained for verification in the event of legal challenges to the story. Amazon Glacier becomes the ideal solution for these documents because retrieval needs are rarely immediate and storage costs are the lowest.

Photo Editor

The photographer will often shoot hundreds of photos or hours of video for a single shot or clip that will end up in the publication. The outtakes can often be mined years later for important details that seemed insignificant during the shoot. After photo selection is complete, storing the outtakes in Amazon Glacier again provides the most cost-effective solution for a deep archive.

For media that is marked for production, you can store the raw files in Amazon S3 and serve them directly from their bucket to internal production applications or to the public websites.

If the publication needs pre-rendered thumbnails or mobile-sized versions of the media, they can be stored in Amazon S3-RRS as a way of saving costs, so that those versions can be re-created on demand from the originals if they are lost.

Production and Archive

By storing the content (text and media) in Amazon S3 (or Amazon S3-RRS), the CMS is already positioned with the most cost-effective, scalable option to deliver the content.

At some point, the content will be determined to be out of date. In the case of a news website, it may take months before consumers stop searching for or clicking on old stories, but eventually all content will reach a point of diminishing results. When the content reaches that point, automated processes can transition the content to Amazon S3-IA. Why not Amazon Glacier? Because in the case of a news archive, even if consumers don’t care about the old story for years, once someone is doing research and wants to open that file (or video), odds are they don’t want to wait five hours for the content to be available. By using Amazon S3-IA, the archive is still cost effective but immediately available and searchable.

Additional Storage Solutions

Amazon S3, Amazon EBS, Amazon EFS, and Amazon Glacier represent the core of the AWS storage solutions. When studying to be successful on the exam, you must be very comfortable with these services.

AWS does provide a number of other services that may be considered part of the storage solutions discussion. You should be familiar with them, but they will not likely be a significant portion of the AWS Certified SysOps Administrator – Associate exam.

Amazon CloudFront

Global content distribution can be delivered using Amazon CloudFront. This Content Delivery Network (CDN) can help accelerate the delivery of your content by caching copies close to consumers.

Content is stored in origin locations that can be on AWS in an Amazon S3 bucket or served from an Amazon EC2 instance. Amazon CloudFront can even cache content stored on-premises.

Amazon CloudFront is a good choice for distribution of frequently accessed static content that benefits from edge delivery, like popular website images, videos, media files, or software downloads. For on-demand media files, you can also choose to stream your content using Real-Time Messaging Protocol (RTMP) delivery. Amazon CloudFront also supports delivery of live media over HTTP.

Lastly, Amazon CloudFront has a geo-restriction feature that lets you specify a list of countries in which your users can access your content. Alternatively, you can specify the countries in which your users cannot access your content. In both cases, Amazon CloudFront responds to a request from a viewer in a restricted country with an HTTP status code 403 (Forbidden).

AWS Storage Gateway

AWS Storage Gateway allows existing on-premises storage solutions to be extended into AWS. By installing a virtual appliance in the local datacenter, storage can be duplicated or extended into the AWS Region.

For the exam, you will likely see at least a couple of questions that involve AWS Storage Gateway. The important things to understand are the different options available when configuring the appliance.

AWS Storage Gateway Options

The first option is called the file interface. This enables you to use Amazon S3 with your on-premises workflows by accessing the Amazon S3 bucket through a Network File System (NFS) mount point. This allows customers to migrate their files into Amazon S3 through object-based workloads while maintaining their on-premises investments.

The second option is the volume interface. In this case, you are presented with an iSCSI block storage endpoint. Data is accessed on the local volumes, which is then stored on AWS in the form of Amazon EBS snapshots.

There are two configuration modes using the volume interface. The cached mode is designed to extend your local storage. All content is saved in AWS, and only the frequently accessed data is cached locally. This is an excellent operational solution for on-premises storage solutions that are exceeding their local capacity limits. The other mode is the stored mode, which is a one-to-one backup of all local content on AWS. This is a primary solution when durable and inexpensive off-site backups are needed. This becomes an excellent start to a hybrid disaster recovery plan.

The third option for AWS Storage Gateway is the tape interface. In this configuration, you connect to your existing backup application using an iSCSI Virtual Tape Library (VTL). These virtual tapes are then asynchronously stored in Amazon S3 or Amazon Glacier if less expensive, longer-term storage is needed.

AWS Snowball

For massive data transfers, ground transport can be faster than the Internet. Being able simply to ship large hard drives will get the data into AWS faster (and often cheaper) than public Internet speeds can accommodate. For example, suppose you had 100 TB of data to transfer to AWS. Even over a 100 Mbps data transfer line, it would take 120 days to complete the data transfer. Physically shipping the data is a far faster option.

For those data transfer cases, AWS provides AWS Snowball, a physically hardened, secure appliance that ships directly to your on-premises location. Each device weighs around 50 pounds and stores 80 TB (as of this writing). Using multiple AWS Snowball appliances in parallel can provide an easy to implement a migration solution for datasets that are multiple petabytes in size.

AWS Snowball is secured through tamper-resistant seals and a built-in Trusted Platform Module (TPM) that uses a dedicated processor designed to detect any unauthorized modifications to the hardware, firmware, or software.

Data is automatically encrypted when using AWS Snowball with encryption keys managed through AWS KMS. The actual encryption keys are never stored on the appliance.

AWS Snowball with AWS Greengrass

As AWS Snowball has increased in popularity, AWS has added additional functionality, including the ability to access APIs run on the device by leveraging AWS Greengrass technology. Those APIs allow the AWS Snowball appliance to act as if it was an Amazon S3 endpoint itself, even when disconnected from the Internet.

AWS Snowmobile

Some companies have dataset transfer needs that range in the exabyte scale. For those unique transfers, AWS can deploy 45-foot-long shipping containers called AWS Snowmobiles. Pulled by tractor trailers, each container can transfer up to 100 PB of data.

For security, AWS Snowmobile uses dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, AWS KMS encryption, and an optional escort security vehicle while in transit. For more information on AWS KMS, refer to Chapter 3.

Like AWS Snowball, AWS Snowmobile can deliver multiple containers to a single location, providing exabytes of capacity where needed.

Summary

There is no one-size-fits-all solution when it comes to storage. As you prepare for the exam, you need to dive deep into the core storage solutions: the block storage options of Amazon EBS and Amazon EFS and the object storage solutions of Amazon S3 and Amazon Glacier. Make sure that you are comfortable with their use cases and when to deploy each option.

Resources to Review

Amazon EBS documentation: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html
Amazon EFS main page: https://aws.amazon.com/efs/
Amazon EFS documentation: https://aws.amazon.com/documentation/efs/
Amazon S3 main page: http://aws.amazon.com/s3/
Amazon S3 documentation: http://aws.amazon.com/documentation/s3
“IAM Policies and Bucket Policies and ACLs! Oh, My! (Controlling Access to Amazon S3 Resources)” blog post: https://aws.amazon.com/blogs/security/ iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/
Configuring Static Website Hosting section of the Amazon S3 user guide: http://docs.aws.amazon.com/AmazonS3/latest/user-guide/static-website-hosting.html
Amazon Glacier main page: https://aws.amazon.com/glacier/
Amazon Glacier documentation: https://aws.amazon.com/documentation/glacier/
Amazon CloudFront main page: https://aws.amazon.com/cloudfront/
Amazon CloudFront documentation: https://aws.amazon.com/documentation/cloudfront/
AWS Storage Gateway main page: https://aws.amazon.com/storagegateway/
AWS Storage Gateway documentation: https://aws.amazon.com/documentation/storage-gateway/
“File Interface to Storage Gateway” blog post: https://aws.amazon.com/blogs/aws/category/aws-storage-gateway/
AWS Snowball main page: https://aws.amazon.com/snowball/
AWS Snowball documentation: https://aws.amazon.com/documentation/snowball/
What’s new at AWS: https://aws.amazon.com/new
AWS Simple Monthly Calculator: http://calculator.s3.amazonaws.com/index.html

Exam Essentials

Understand block storage vs. object storage. The difference between block storage and object storage is the fundamental unit of storage. With block storage, each file being saved to the drive is broken down into “blocks” of a specific size. With object storage, each file is saved as a single object regardless of size.

Understand when to use Amazon S3 and when to use Amazon EBS or Amazon EFS. This is an architectural decision based on the content type and the rate of change. Amazon S3 can hold any type of data; however, Amazon S3 would not be a good choice for a database or any rapidly changing data types. Remember the case for eventual consistency.

Understand the lifecycle of Amazon EBS volumes. Amazon EBS volumes can be provisioned at the launch of an Amazon EC2 instance or created and attached at a later time. Amazon EBS volumes can persist through the lifecycle of an Amazon EC2 instance, provided the Deleted on Termination flag is set to false. The AWS best practice for an unused Amazon EBS volume is to snapshot it and delete it. The Amazon EBS volume can be created from the snapshot if needed. Remember that with Amazon EBS, you pay for what you provision, as opposed to paying for what you use with Amazon S3.

Understand different types of Amazon EBS volumes. Understand bursting. Amazon EBS volumes are tied to the Availability Zone.

Understand how Amazon EBS snapshots work. Snapshots are stored in Amazon S3; however, you do not access them via the Amazon S3 Console. They are accessed via the Amazon EC2 console under Snapshots.

Understand instance store storage. Depending on the Amazon EC2 instance family, you have access to the local storage. This storage is called the instance store. Snapshots cannot be taken of the instance store. Instance store storage is ephemeral and doesn’t survive a restart or termination of the Amazon EC2 instance. For more information, refer to Chapter 4.

Have a detailed understanding of how Amazon S3 lifecycle polices work. Lifecycle configuration enables you to specify the lifecycle management of objects in a bucket. The configuration is a set of one or more rules, where each rule defines an action for Amazon S3 to apply to a group of objects. Know the two types: Transition actions and expiration actions.

Understand Amazon S3 versioning. When Amazon S3 versioning is turned on, it cannot be turned off, only paused. Understand that when versioning is turned on, items deleted are assigned a delete marker and are unable to be accessed. The deleted objects are still in Amazon S3 and you still pay for them.

Understand how to interface with Amazon Glacier. When objects are moved from Amazon S3 to Amazon Glacier, they can only be accessed from the Amazon S3 APIs (for example, in the case of an object that has been moved to Amazon Glacier as the result of a lifecycle policy).

Understand Amazon Glacier vaults. Amazon Glacier stores objects as archives and stores the archives in vaults. You can have (as of this writing) 1,000 vaults per account per region. Vaults are immutable; when retrieving an object, it is not removed from Amazon Glacier but is copied to Amazon S3.

Know how MFA Delete works with Amazon S3. You can optionally add another layer of security by configuring a bucket to enable MFA Delete, which requires additional authentication for changing the versioning state of your bucket and for permanently deleting an object version. MFA Delete requires two forms of authentication together: your security credentials and the combination of a valid serial number, a space, and the six-digit code displayed on an approved authentication device. MFA Delete thus provides added security in the event, for example, that your security credentials are compromised.

Know how to control access to Amazon S3 resources. IAM policies specify what actions are allowed or denied on what AWS resources. Amazon S3 bucket policies, on the other hand, are attached only to Amazon S3 buckets. Amazon S3 bucket policies specify what actions are allowed or denied for which principles on the bucket to which the bucket policy is attached. Amazon S3 bucket policies are a type of ACL.

Understand the features of Amazon CloudFront. Amazon CloudFront is a CDN that can help accelerate the delivery of content by caching copies close to consumers. Content is stored in origin locations, which can be on AWS in an Amazon S3 bucket or served from an Amazon EC2 instance. Amazon CloudFront can even cache content stored on-premises.

Understand media that can be delivered from Amazon CloudFront. This media includes static website CSS style sheets, HTML pages, images, videos, media files, or software downloads. For on-demand media files, you can also choose to stream your content using RTMP delivery. Amazon CloudFront also supports delivery of live media over HTTP.

Understand Amazon CloudFront’s geo-restriction feature. This feature lets you specify a list of countries in which your users can access your content. Alternatively, you can specify the countries in which your users cannot access your content.

Understand AWS Storage Gateway’s interfaces and modes of operation. The first option is called the file interface. This enables you to use Amazon S3 with your on-premises workflows by accessing the Amazon S3 bucket through an NFS mount point. The second option is the volume interface. In this case, you are presented with an iSCSI block storage endpoint. The third option for AWS Storage Gateway is the tape interface. In this configuration, you connect to your existing backup application using an iSCSI VTL. The file volume interface operates in one of two modes: cached mode and stored mode. Understand the differences.

Exercises

By now you should have set up an account in AWS. If you haven’t, now would be the time to do so. It is important to note that these exercises are in your AWS account and thus are not free.

Use the Free Tier when launching resources. The AWS Free Tier applies to participating services across the following AWS Regions: US East (Northern Virginia), US West (Oregon), US West (Northern California), Canada (Central), EU (London), EU (Ireland), EU (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), and South America (Sao Paulo). For more information, see https://aws.amazon.com/s/dm/optimization/server-side-test/free-tier/free_np/.

If you have not yet installed the AWS Command Line utilities, refer to Chapter 2, “Working with AWS Cloud Services,” Exercise 2.1 (Linux) or Exercise 2.2 (Windows).

The reference for the AWS CLI can be found at http://docs.aws.amazon.com/cli/latest/reference/.

Getting to know storage systems can be simple when using your own account. Follow the steps in previous chapters to log in to your account. As always, remember that although AWS is a very cost-effective solution, you will be responsible for all charges incurred. If you don’t want to keep the items permanently, remember to delete them after your practice session.

Remember, the goal of this book isn’t just to prepare you to pass the AWS Certified SysOps Administrator – Associate exam. It should also serve as a reference companion in your day-to-day duties as an AWS Certified SysOps Administrator.

EXERCISE 6.3

Create and Attach an Amazon EFS Volume.

In this exercise, you will create a new Amazon EFS volume and attach it to a running instance.

While signed in to the AWS Management Console, open the Amazon EC2 service dashboard. If you don’t have a running Linux instance, create, and launch one.
Open the Amazon EFS service dashboard. Select Create File System.
Select the Amazon VPC where your Linux instance is running.
Accept the default mount targets, and take note of the security group ID assigned to the targets.
Choose any settings, and then create the file system.
Assign the same default security group used by the file system to your Linux instance.
Log in to the console of the Linux instance, and install the NFS client on the Amazon EC2 instance. For Amazon Linux:
```
sudo yum install -y nfs-utils
```
Create a new directory on your Amazon EC2 instance, such as efs:
```
sudo mkdir efs
```

Mount the file systems using the DNS name:

sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-12341234.efs.region-1.amazonaws.com:/ efs

You have now mounted the Amazon EFS volume to the instance.

EXERCISE 6.4

Create and Use an Amazon S3 Bucket.

In this exercise, you will create an Amazon S3 bucket and then load and publish content.

While signed in to the AWS Management Console, open the Amazon S3 service dashboard.
Select Create Bucket.
Provide a globally unique name for the bucket, and choose the appropriate region. Leave all other settings at their defaults for now. Create the bucket.
Select the bucket, and choose Upload.
Click Add Files, and navigate to an image that you want to share publicly. Click Next.
Because you want this image to be publicly accessible, under Manage Public Permissions, give everyone read access to the object. Review, and upload the image.
Click on the object name to go to the properties screen. Select, and open the URL for the image in a new browser window. You should now see your image.

You have now created a bucket and loaded content that is web-enabled and publicly accessible.

EXERCISE 6.8

Enable Lifecycle Rules.

In this exercise, you will link content created in the Amazon S3 bucket with a lifecycle that will move the content to Amazon Glacier and eventually purge the object.

While signed in to the AWS Management Console, open the Amazon S3 service dashboard.
Select the bucket that you created in Exercise 6.4, and go to the Management tab.
Under Lifecycle, choose Add Lifecycle Rule.
Name the rule Move to Glacier.
For transition, select Current Version. Add a transition rule to move to Amazon Glacier after 30 days.
For Expiration, select current and previous versions and Expire After 31 Days.
For incomplete multipart uploads, expire them after seven days.

Your account will now automatically move content from Amazon S3 to Amazon Glacier after 30 days and delete the content from Amazon S3 one day later.

Review Questions

You are running a website that keeps historical photos from previous elections. Users need to be able to search for and display images from the last 100 years. On average, most images are requested only once or twice a year. What would be the most cost-effective, highly available method of storing and serving the images?
1. Save images on Amazon Elastic Block Store (Amazon EBS) and use a web server running on Amazon Elastic Compute Cloud (Amazon EC2).
2. Save images on Amazon Elastic File System (Amazon EFS) and use a web server running on Amazon EC2.
3. Save images on Amazon Simple Storage Service (Amazon S3) and serve direct from Amazon S3.
4. Save images on Amazon Simple Storage Service – Infrequent Access (Amazon S3-IA) and serve direct from Amazon S3.
5. Save images on Amazon Glacier and serve direct from Amazon Glacier.
You are running a legacy Sybase database on an Amazon Elastic Compute Cloud (Amazon EC2) instance running Windows. You average 1,200 transactions per second against the database, but at peak levels have as many as 2,500 transactions per second. The current size of the database is 1.8 TB. What is the best data volume Amazon Elastic Block Store (Amazon EBS) configuration for cost without sacrificing performance?
1. One Amazon EBS Magnetic Volume provisioned at 2 TB with 2,500 provisioned IOPS
2. Two Amazon EBS Magnetic Volumes, each provisioned at 1 TB and Raid0 connected
3. One Amazon EBS gp2 volume provisioned at 2 TB
4. One Amazon EBS Solid State Drive (SSD) provisioned at 2 TB with 2,500 provisioned IOPS
5. Two Amazon EBS gp2 volumes, each provisioned at 1 TB and Raid0 connected
Your application’s Amazon Elastic Compute Cloud (Amazon EC2) instances need a single shared volume to edit common files. What is the best volume to attach to the instances?
1. One Amazon Elastic Block Store (Amazon EBS) volume with IOPS
2. One Amazon EBS ma1 volume
3. One Amazon Elastic File System (Amazon EFS) volume
4. One Amazon Simple Storage Service (Amazon S3) volume
5. One Amazon EBS gp2 volume
Your company is planning to keep a media archive of photos that are rarely accessed (no more than 10 times a year on average). Business needs expect that the media be available for publishing on request with a response time of less than 800 ms. What is the most cost-efficient storage method for the media?
1. Amazon Elastic Block Store (Amazon EBS) with provisioned IOPS
2. Amazon EBS gp2
3. Amazon Simple Storage Service – Reduced Redundancy Storage (Amazon S3-RRS)
4. Amazon S3 – Infrequent Access (Amazon S3-IA)
5. Amazon Glacier
Which of the following would be good use cases for Amazon Simple Storage Service (Amazon S3)? (Choose three.)
1. Compiled application installers
2. Video clips storage from motion-activated cameras
3. .dat files from active databases
4. Scratch disk for video transcoders
5. Data warehouse repositories
6. Amazon Elastic Compute Cloud (Amazon EC2) session state offloading
Your company has a compliance requirement to record all writes to an Amazon Simple Storage Service (Amazon S3) bucket and any time that content was read publicly. What are the two steps needed to achieve compliance? (Choose two.)
1. Activate AWS Identity and Access Management (IAM) logging.
2. Activate AWS CloudTrail logging.
3. Activate Amazon CloudWatch logging.
4. Activate Server Access logging.
5. Activate ClearCut logging.
Your company must retain legacy compliance data for five years in “an immutable form” in the unlikely event of a tax audit. What is the most cost-effective method that will achieve compliance?
1. Amazon Glacier with Vault Lock activated
2. Amazon Glacier with AWS Identity and Access Management (IAM) permissions to edit and delete objects denied
3. Amazon Simple Storage Service (Amazon S3) with cross-region replication activated
4. Amazon S3 Infrequent Access (Amazon S3-IA) with Bucket Lock activated
5. AWS Storage Gateway with tape interface
You want to use AWS Storage Gateway to increase the amount of storage available to your on-premises application block storage systems. What is the correct configuration?
1. AWS Storage Gateway file interface
2. AWS Storage Gateway volume interface with cached mode
3. AWS Storage Gateway volume interface with stored mode
4. AWS Storage Gateway tape interface
What is a good use case for Amazon Elastic Compute Cloud (Amazon EC2) instance store?
1. Compiled application installers
2. Video clips storage from motion-activated cameras
3. .dat files from active databases
4. Scratch disk for video transcoders
5. Data warehouse repositories
6. Amazon EC2 session state offloading
What step must you do as part of provisioning and mounting Amazon Elastic File System (Amazon EFS) on your Amazon Elastic Compute Cloud (Amazon EC2) instance?
1. Select the authorized AWS Key Management Service (AWS KMS) master key.
2. Create mount targets in the appropriate Amazon Virtual Private Cloud (Amazon VPC) subnets.
3. Activate versioning on the associated Amazon Simple Storage Service (Amazon S3) buckets.
4. Create the Amazon EFS role in AWS Identity and Access Management (IAM).
5. Connect to the iSCSI Amazon EFS endpoint.
In what ways does Amazon Simple Storage Service (Amazon S3) object storage differ from block and file storage? (Choose two.)
1. Amazon S3 stores data in fixed blocks.
2. Objects can be any size.
3. Objects are stored in buckets.
4. Objects contain both data and metadata.
Which of the following are features of Amazon Elastic Block Store (Amazon EBS)? (Choose two.)
1. Data stored on Amazon EBS is automatically replicated within an Availability Zone.
2. Amazon EBS data is automatically backed up to tape.
3. Amazon EBS volumes can be encrypted transparently to workloads on the attached instance.
4. Data on an Amazon EBS volume is lost when the attached instance is stopped.
You need to take a snapshot of an Amazon Elastic Block Store (Amazon EBS) volume. How long will the volume be unavailable?
1. It depends on the provisioned size of the volume.
2. The volume will be available immediately.
3. It depends on the amount of data stored on the volume.
4. It depends on whether the attached instance is an Amazon EBS-optimized instance.
You are restoring an Amazon Elastic Block Store (Amazon EBS) volume from a snapshot. How long will it be before the data is available?
1. It depends on the provisioned size of the volume.
2. The data will be available immediately.
3. It depends on the amount of data stored on the volume.
4. It depends on whether the attached instance is an Amazon EBS-optimized instance.
You store critical data in Amazon Simple Storage Service (Amazon S3); the data must be protected against inadvertent or intentional deletion. How can this data be protected? (Choose two.)
1. Use cross-region replication to copy data to another bucket automatically.
2. Set a vault lock.
3. Enable versioning on the bucket.
4. Use a lifecycle policy to migrate data to Amazon Glacier.
5. Enable MFA Delete on the bucket.
Amazon Glacier is well-suited to data that is which of the following? (Choose two.)
1. Is infrequently or rarely accessed
2. Must be immediately available when needed
3. Is available after a three- to five-hour restore period
4. Is frequently erased within 30 days
Amazon EFS supports all of the Windows operating systems.
1. True
2. False
When using Amazon Glacier, the size of an individual archive can be virtually unlimited.
1. True
2. False
You can take periodic snapshots of instance storage.
1. True
2. False
How do you resize an instance store-backed volume?
1. Stop the Amazon Elastic Cloud Compute (Amazon EC2) instance and use the resize volume command.
2. Take a snapshot of the instance volume, resize the instance store volume, and use the snapshot to restore data to the new resized instance-store volume.
3. You cannot resize an instance store volume.
4. Attach another instance volume, copy the data to the new volume, and delete the old volume.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.