Chapter 8. Database-as-a-Service Using Amazon RDS

In the previous chapter, you learnt a lot about the concepts of Auto Scaling and Elastic Load Balancing, and how you can leverage them to host highly scalable and fault tolerant applications.

In this chapter, we are going to shift our attention from all those web servers and EC2 instances and talk about more on the database offerings provided by AWS, with some special emphasis on Amazon RDS. This chapter will help you understand the overall concept of RDS and even demonstrate how you can leverage RDS in your own application's hosting environment. We will also be studying some of AWS's other popular database-as-a-service options along the way; so let's get started without any further ado!

An overview of Amazon RDS

Before we go ahead and dive into the amazing world of RDS, it is essential to understand what exactly AWS provides you when it comes to database-as-a-service offerings and how can you effectively use them. To start off, AWS provides a bunch of awesome and really simple-to-use database services that are broadly divided into two classes: the relational databases, which consist of your MySQL and Oracle databases, and the non-relational databases, which consist of a propriety NoSQL database similar to MongoDB. Each of these database services is designed by AWS to provide you with the utmost ease and flexibility of use along with built-in robustness and fault tolerance. This means that all you need to do as an end user or a developer is simply configure the databases service once, run it just as you would run any standard database without worrying about the internal complexities of clustering, sharding, and so on, and only pay for the amount of resources that you use! Now that's awesome, isn't it!

However, there is a small catch to this! Since the service is provided and maintained by AWS, you as a user or developer are not provided with all the fine tuning and configuration settings that you would generally find if you were to install and configure a database on your own. If you really want to have complete control over your databases and their configurations, then you might as well install them on EC2 instances directly. Then you can fine-tune them just as you would on any traditional OS, but remember that in doing so, you will have to take care of the database and all its inner complexities.

With these basic concepts in mind, let us go ahead and learn a thing or two about Amazon Relational Database Service (RDS). Amazon RDS is a database service that basically allows you to configure and scale your popular relational databases such as MySQL and Oracle based on your requirements. Besides the database, RDS also provides additional features such as automated backup mechanisms, point-in-time recovery, replication options such as multi-AZ deployments and Read Replicas, and much more! Using these services you can get up and running with a completely scalable and fault tolerant database in a matter of minutes, all with just a few clicks of a button! And the best part of all this is that you don't need to make any major changes to your existing applications or code. You can run your apps with RDS just as you would run them with any other traditional hosted database, with one major advantage: you don't bother about the underlying infrastructure or the database management. It is all taken care of by AWS itself!

RDS currently supports five popular relational database engines, namely MySQL, Oracle, Microsoft's SQL Server, PostgreSQL, and MariaDB as well. Besides these, AWS also provides a MySQL-like propriety database called Amazon Aurora. Aurora is a drop-in replacement for MySQL that provides up to five times the performance that a standard MySQL database provides. It is specifically designed to scale with ease without having any major consequences for your application or code. How does it achieve that? Well, it uses a combination of something called as an Aurora Cluster Volume, as well as one Primary Instance and one or more Aurora Replicas. The Cluster Volume is nothing more than virtual database storage that spans across multiple AZs. Each AZ is provided with a copy of the cluster data so that the database is available even if an entire AZ goes offline. Each cluster gets one Primary Instance that's responsible for performing all the read/write operations, data modifications, and so on. With the Primary Instance, you also get a few Aurora Replicas (also like Primary Instances). A Replica can only perform read operations and is generally used to distribute the database's workload across the Cluster. You can have up to 15 Replicas present in a Cluster besides the Primary Instance, as shown in the following image:

An overview of Amazon RDS

Note

You can also read more on Amazon Aurora at http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Aurora.html.

With this basic information in mind, let us now understand some of RDS's core components and take a look at how RDS actually works.

RDS instance types

To begin with, RDS does operate in a very similar way as EC2. Just as you have EC2 instances configured with a certain amount of CPU and storage resources, RDS too has instances that are spun up each time you configure a database service. The major difference between these instances and your traditional EC2 ones is that they cannot be accessed remotely via SSH even if you want to. Why? Well, since it's a managed service and everything is provided and maintained by AWS itself, there is no need for you to SSH into them! Each instance already has a particular database engine preinstalled and configured in it. All you need to do is select your particular instance type and assign it some storage, and voila! There you have it! A running database service of your choice in under 5 minutes! Let's have a quick look at some of the RDS instance types and their common uses:

  • Micro instances (db.t1.micro): Just as we have micro instances in our EC2 environments, the same is also provided for RDS as well. Each database micro instance is provided with just 1 CPU and approximately 600 MB of RAM, which is good enough if you just want to test RDS or play around with it. This instance type, however, is strictly not recommended for any production-based workloads at all. Along with this particular micro instance, RDS also provides a slightly better instance type in the form of a db.m1.small, which provides 1 CPU with a slightly better 1.7 GB RAM.
  • Standard instances (db.m3): Besides your micro instances, RDS provides a standard set of instance types that can be used on a daily basis for moderate production workloads. This class of instance provides up to 8 CPUs and about 30 GB of RAM as well, but more importantly, these instances are specially created for better network performance as well.
  • Memory optimized (db.r3): As the name suggests, this instance class provides really high-end, memory optimized instances that are capable of faster performance and more computing capacity as compared to your standard instance classes. This instance class provides a maximum of 32 CPUs with a RAM capacity of up to 244 GB along with a network throughput of 10 GB/second.

    Note

    The db.r3 DB instance classes are not presently available in the South America (Sao Paulo) and AWS GovCloud (US) regions.

  • Burst capable (db.t2): This instance class provides a baseline performance level with the ability to burst to full CPU usage if required. This particular class of database instance, however, can only be launched in a VPC environment. The maximum CPU offered in this category is up to 2 CPUs with approximately 8 GB of RAM.

    Along with an instance type, each RDS instance is also backed by an EBS volume. You can use this EBS volume for storing your database files, logs, and lots more! More importantly, you can also select the type of storage to go with your instances as per your requirements. Here's a quick look at the different storage types provided with your RDS instances:

  • Magnetic (standard): Magnetic storage is an ideal choice for applications that have a light to moderate I/O requirement. A magnetic volume can provide up to 100 IOPS approximately on average with burst capability of up to hundreds of IOPS. The disk sizes can range anywhere between 5 GB to 3 TB. An important point to note here, however, is that since magnetic storage is kind of shared, your overall performance can vary depending on the overall resource usage by other customers as well.
  • General purpose (SSD): These are the most commonly used storage types from the lot and are generally a good choice of storage if you are running a small to medium-sized database. General purpose or SSD-backed storage can provided better performance as compared to your magnetic storage at much lower latencies and higher IOPs. General purpose storage volumes can provide a base performance of three IOPS/GB and have the ability to burst up to 3,000 IOPS as per the requirements. These volumes can range in size from 5 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB instances, and from 20 GB to 4 TB for SQL server DB instances.
  • Provisioned IOPs: Although general purpose volumes are good for moderate database workloads, they are not a good option when it comes to dedicated performance requirements and higher IOPs. In such cases, provisioned IOPs are the best choice of storage type for your instances. You can specify IOPs anywhere between the values 1,000 all the way up to 30,000 depending on the database engine you select as well as the amount of disk size that you specify. A MySQL, MariaDB, PostgreSQL, or Oracle database instance with approximately 6 TB of storage can get up to 30,000 IOPs. Similarly, an SQL server DB instance with approximately 4 TB of disk size can get up to 20,000 IOPs.

Note

You cannot decrease the storage of your RDS instance once it is allocated to it.

With the RDS instance types in mind, let's now look at some of the key services as well as processes provided by Amazon RDS.

Multi-AZ deployments and Read Replicas

We all know the importance and the hard work needed to keep a database, especially the one running a production workload up and running at all times. This is no easy feat, especially when you have to manage the intricacies and all the tedious configuration parameters. But thankfully, Amazon RDS provides us with a very simple and easy-to-use framework, using which tasks such as providing high availability to your databases, clustering, mirroring, and so on are all performed using just a click of a button!

Let's take high availability for example. RDS leverages your region's availability zones and mirrors your entire primary database over to some other AZ present in the same region. This is called as a Multi-AZ deployment and it can easily be enforced using the RDS database deployment wizard. How does it work? Well it's quite simple actually. It all starts when you first select the Multi-AZ deployment option while deploying your database. At that moment, RDS will automatically create and maintain another database as a standby replica in some different AZ. Now if you use a MySQL, MariaDB, Oracle, or PostgreSQL as your database engine, then the mirroring technology used by RDS is AWS propriety. Whereas, if you go for an SQL server deployment, then the mirroring technology used is SQL server mirroring by default. Once the standby replica database instance is created, it continuously syncs up with the primary database instance from time to time, and in the event of a database failure or even a planned maintenance activity, RDS will automatically failover from the primary to the standby replica database instance within a couple of minutes:

Multi-AZ deployments and Read Replicas

Note

Amazon RDS guarantees an SLA of 99.95 percent! To know more about the RDS SLA agreement, refer to http://aws.amazon.com/rds/sla/.

However remarkable and easy multi-AZ deployment may be, it still has some minor drawbacks of its own. Firstly, you can't use a multi-AZ deployment for scaling out your databases, and, secondly, there is no failover provided if your entire region goes down. With these issues in mind, RDS provides an additional feature for our database instances called as Read Replicas.

Read Replicas are database instances that enable you to offload your primary database instance's workloads by having all the read queries routed through them. The data from your primary instance is copied asynchronously to the read replica instance using the database engine's built-in replication engine. How does it all work? Well it's very similar to the steps required for creating an AMI from a running EC2 instance! First up, RDS will create a snapshot based on your primary database instance. Next, this snapshot is used to span a read replica instance. Once the instance is up and running, the database engine will then start the asynchronous replication process such that whenever a change is made to the data in the primary, it gets automatically replicated over to the read replica instance as well. You can then connect your application to the new read replica and offload all your read queries to it! As of date, RDS supports only MySQL, MariaDB, and PostgreSQL database engines for read replicas:

Multi-AZ deployments and Read Replicas

Note

You can create up to five Read Replicas for a given database instance.

You can additionally use these Read Replicas as a failover mechanism as well by deploying read replicas in a different region altogether. The only downside to this is that you will have to manually promote the replica as a primary when the latter fails. We will be creating and promoting a Read Replica later on in this chapter, but for now let's look at how you can create and get started with your first database using RDS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset