Strategy in the world of Trove means a construct that allows developers to extend the functionalities of Trove by writing specialized implementations that can be abstracted.
This is a fully pluggable architecture, and what this actually means is that different technologies and different codes can be used to perform the same functions across different database engines.
The concept of strategies is used for backups, restores, replication, clustering, and storage (this determines where the backups are stored along with its associated properties). These are implemented in the guest agent code (can also be implemented for the API and task manager components), which also makes the code run closest to the place where the action has to happen.
So, effectively, each strategy needs to implement a list of functions at a minimum (these can be seen in the base.py
file for that particular strategy), which the system can then use to call and perform the functions.
For example, each backup strategy needs to provide a command that needs to be executed in order to take the backup, and each storage strategy needs to implement a save
function, which will allow us to save to that particular storage system.
The following diagram shows the concept of strategies. It also shows that the control components use an abstracted term and send the message using the message bus, say create_backup, and the guest agent looks at the default or configured strategy for that particular database engine and executes those commands.
The concept is valid for everything that supports the strategies. Please note that not all the control components are shown in this case and the diagram is for representation purposes only.
In order to better understand how the strategy will work, let's take a look at the following diagram that shows the backup taking place. The steps are enumerated as follows:
mysqldump --all-databases –user <username> --password
, along with the command to zip and encrypt the backup (these are all defined in the strategy files (as shown in the next section)).The strategies are configured by default, but we can choose to override them. The configuration options are:
backup_strategy
: The name of the strategy to use, for example, InnoBackupEx, MySQLDump, MongoDump, and so onbackup_namespace
: The file to load the code for the strategies frombackup_incremental_strategy
: The name of the strategy that needs to be used while taking incremental backupsThese configuration options are set in the trove-guestagent.conf
file, which will inject them to the guest during build time.
We don't have to configure anything additional in the guest agent configuration; this section is purely informational.
In order to understand the different strategies available to us and the corresponding namespaces, let us take a look at the following table, which shows the different backup strategies that are available in Trove at the time of writing the book:
Data store name / Backup type |
Strategy name |
Strategy namespace |
---|---|---|
MySQL / Full |
MySQLDump |
|
MySQL / Full |
InnoBackupEX |
|
MySQL / Incremental |
InnoBackupExIncremental |
|
Couchbase / Full |
CbBackup |
|
Mongo DB / Full |
MongoDump |
|
PostgreSQL / Full |
PgDump |
|
Redis / Full |
RedisBackup |
|
As we can see, at this point in time, only MySQL (and its variants like MariaDB) have the ability to perform the incremental backup and offer two strategies for full backup (if we choose not to use InnoDB, we could just use MySQLDump). Also, not all the different data stores support full backup at this moment.
This means that we can also implement a simple backup strategy of our choice, if we so choose, by writing a different Python class. However, in most cases, we don't have to as the ones provided by default with Trove are sufficient.
The storage strategy denotes the place where the backups can be stored. At the time of writing this book, only SwiftStorage, which is the object storage in OpenStack, has been implemented. The default configuration parameters are:
storage_strategy
: The name of the storage strategystorage_namespace
: The file where this strategy is implementedThere are plans to add support for other storage strategies like AWS S3 and so on. But since this is the only strategy available to us at the moment, let us take a moment to also look at its sub-configuration parameters. The bucket, where the backups need to be stored, whether the backup needs to be encrypted, if it needs to be encrypted, what key needs to be used, and so on. All of these are configured using the following configuration variables:
backup_swift_container
: The place where the backups will be stored (default value is database_backups
)backup_use_gzip_compression
: Do we compress the backup (default is true
)backup_use_openssl_encryption
: Do we encrypt the backup (default is true
)backup_aes_cbc_key
: Which key to use for encryption backup_use_snet
: Can the backup use the Swift service network (default is false
)backup_chunk_size
: Chunk size for backupsbackup_segment_max_size
: Max size for each segment of the backupMost times, the default would work fine. But these options can be configured should we need to tweak their values.