The concept of strategies in Trove

Strategy in the world of Trove means a construct that allows developers to extend the functionalities of Trove by writing specialized implementations that can be abstracted.

This is a fully pluggable architecture, and what this actually means is that different technologies and different codes can be used to perform the same functions across different database engines.

The concept of strategies is used for backups, restores, replication, clustering, and storage (this determines where the backups are stored along with its associated properties). These are implemented in the guest agent code (can also be implemented for the API and task manager components), which also makes the code run closest to the place where the action has to happen.

So, effectively, each strategy needs to implement a list of functions at a minimum (these can be seen in the base.py file for that particular strategy), which the system can then use to call and perform the functions.

For example, each backup strategy needs to provide a command that needs to be executed in order to take the backup, and each storage strategy needs to implement a save function, which will allow us to save to that particular storage system.

The following diagram shows the concept of strategies. It also shows that the control components use an abstracted term and send the message using the message bus, say create_backup, and the guest agent looks at the default or configured strategy for that particular database engine and executes those commands.

The concept of strategies in Trove

The concept is valid for everything that supports the strategies. Please note that not all the control components are shown in this case and the diagram is for representation purposes only.

The backup/restore strategy in action

In order to better understand how the strategy will work, let's take a look at the following diagram that shows the backup taking place. The steps are enumerated as follows:

  1. The Trove API passes on the command to the Trove Task Manager.
  2. The Trove Task Manager leaves a Message in the Rabbit MQ queue for the Guest Agent to pick up.
    The backup/restore strategy in action
  3. The Guest Agent pulls the message and checks the backup and storage strategy (configured/default) for the particular data store version.
  4. The backup commands are executed by the guest agent. (It gets the command by the strategy definition.) For example, if the MySQLDump strategy is used, then the command executed is mysqldump --all-databases –user <username> --password, along with the command to zip and encrypt the backup (these are all defined in the strategy files (as shown in the next section)).
  5. The Guest Agent stores the backup as stored in the storage strategy.

Configuring the backup strategies

The strategies are configured by default, but we can choose to override them. The configuration options are:

  • backup_strategy: The name of the strategy to use, for example, InnoBackupEx, MySQLDump, MongoDump, and so on
  • backup_namespace: The file to load the code for the strategies from
  • backup_incremental_strategy: The name of the strategy that needs to be used while taking incremental backups

These configuration options are set in the trove-guestagent.conf file, which will inject them to the guest during build time.

We don't have to configure anything additional in the guest agent configuration; this section is purely informational.

In order to understand the different strategies available to us and the corresponding namespaces, let us take a look at the following table, which shows the different backup strategies that are available in Trove at the time of writing the book:

Data store name / Backup type

Strategy name

Strategy namespace

MySQL / Full

MySQLDump

trove.guestagent.strategies.backup.mysql_impl

MySQL / Full

InnoBackupEX

trove.guestagent.strategies.backup.mysql_impl

MySQL / Incremental

InnoBackupExIncremental

trove.guestagent.strategies.backup.mysql_impl

Couchbase / Full

CbBackup

trove.guestagent.strategies.backup.experimental.couchbase_impl

Mongo DB / Full

MongoDump

trove.guestagent.strategies.backup.experimental.mongo_impl

PostgreSQL / Full

PgDump

trove.guestagent.strategies.backup.experimental.postgresql_imp

Redis / Full

RedisBackup

trove.guestagent.strategies.backup.expreimental.redis_impl

As we can see, at this point in time, only MySQL (and its variants like MariaDB) have the ability to perform the incremental backup and offer two strategies for full backup (if we choose not to use InnoDB, we could just use MySQLDump). Also, not all the different data stores support full backup at this moment.

This means that we can also implement a simple backup strategy of our choice, if we so choose, by writing a different Python class. However, in most cases, we don't have to as the ones provided by default with Trove are sufficient.

Configuring the storage strategies

The storage strategy denotes the place where the backups can be stored. At the time of writing this book, only SwiftStorage, which is the object storage in OpenStack, has been implemented. The default configuration parameters are:

  • storage_strategy: The name of the storage strategy
  • storage_namespace: The file where this strategy is implemented

There are plans to add support for other storage strategies like AWS S3 and so on. But since this is the only strategy available to us at the moment, let us take a moment to also look at its sub-configuration parameters. The bucket, where the backups need to be stored, whether the backup needs to be encrypted, if it needs to be encrypted, what key needs to be used, and so on. All of these are configured using the following configuration variables:

  • backup_swift_container: The place where the backups will be stored (default value is database_backups)
  • backup_use_gzip_compression: Do we compress the backup (default is true)
  • backup_use_openssl_encryption: Do we encrypt the backup (default is true)
  • backup_aes_cbc_key: Which key to use for encryption
  • backup_use_snet: Can the backup use the Swift service network (default is false)
  • backup_chunk_size: Chunk size for backups
  • backup_segment_max_size: Max size for each segment of the backup

Most times, the default would work fine. But these options can be configured should we need to tweak their values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset