In the previous chapter, we covered a lot about Amazon RDS and how you can leverage it to host highly scalable and fault-tolerant databases.
In this chapter, we will be exploring yet another popular and widely used AWS core service, that is, the Simple Storage Service (S3). This chapter will cover many important aspects of S3, such as its use cases, its various terms and terminologies, along with a few steps on how to use S3 to store and retrieve objects. It will also go through few simple steps using which you can archive your data using both the AWS Management Console and the AWS CLI. So, buckle up and get ready for an awesome time.
Ever used Dropbox to store and back up your important data and files? Or how about Netflix to watch your favorite TV shows online? Both Dropbox and Netflix have one very interesting thing in common, which you may have guessed already! They are both using Amazon S3 to store and retrieve data. How much data are we talking about here? Well, way back in 2008, S3 was storing approximately 30 billion objects or unique data elements in it. This number has grown exponentially ever since with approximately 2 trillion objects reportedly stored in S3 as of April 2013, so no prizes for guessing what this number has gone up to today! But enough numbers, let's learn a bit more about what Amazon S3 actually is.
To begin with, Amazon S3 is a highly scalable, durable, and low cost storage as a service option provided by AWS for everyone to use. Using S3, you can upload virtually any file, folder, or data from anywhere on the web and retrieve it just as easily all the while paying only for the storage that you use! Now that's amazing, isn't it!
How much of data can you upload to S3? Well, its virtually unlimited, so you can feel free to upload your songs, movies, high-resolution pictures, anything and everything goes! S3 will treat each of the files that you upload as individual objects and store them redundantly across the underlying secure hardware. You don't have to worry about the replication process or even for the hardware's scalability, it is all taken care of by AWS itself.
You can leverage S3 for a variety of purposes; a few listed as follows:
How does it all work? Well, to begin with, you first need to create something called as a Bucket. A Bucket is a top level entity in S3 and acts as a logical container that will hold all your objects. You can create multiple buckets and store various objects in them as you please; however, there are a few pointers that you must always keep in mind when working with them:
It is equally important to note that S3 is not some hierarchical organization of objects, although you can create folders and store objects in them. Folders are just a logical representation that AWS provides you with for easier object storing and arrangement, but underneath all this, S3 really does not use any hierarchy at all as it is a flat storage system. This enables S3 to add new storage and scale virtually without any limits, without having to worry about the objects that already reside in it.
Buckets also provide us with some simple access control mechanisms, using which you can restrict users to operations such as create, delete, or list all the objects present in the bucket. You can even assign the bucket permissions that govern who can upload or even download data from it.
S3 also provides different storage classes for the objects that you store on it. Each storage class has its own performance and cost associated with it, as described here:
With this basic understanding in mind, let's see how we can use the AWS Management Console to create and upload a few objects to a bucket of our choice.