Chapter 6: ShardingSphere-Proxy Installation and Startup

Unlike ShardingSphere-JDBC, ShardingSphere-Proxy provides a proxy service requiring independent deployment. ShardingSphere-Proxy is based on database protocols that allow users to perform operations on distributed databases in the same way as operating centralized databases – they can connect database terminals, third-party clients, or any other JDBC-supporting application to ShardingSphere-Proxy.

Now that you have reached this point of the book, you should be feeling confident in your understanding of Apache ShardingSphere and wondering how to start using it. This chapter will guide you through your first steps in installing and setting up ShardingSphere and, ultimately, getting it up and running.

In this chapter, we're going to cover the following main topics:

  • Introduction to Distributed SQL
  • Configuration

The first section will get you started at the beginning of the process, by showing you how to install your binary package.

Technical requirements

To run the practical examples in this chapter, you will need the following tools:

  • Text editor (not mandatory): You can use Vim or VS Code to modify the YAML configuration files.
  • 7-Zip or tar command: You can use these tools for Linux or macOS to decompress the proxy artifact.
  • A MySQL/PG client: You can use the default CLI or other SQL clients such as Navicat or DataGrip to execute SQL queries.

For ShardingSphere-Proxy, you can either use the binary package or Docker to perform the installation.

You can find the complete code file here:

https://github.com/PacktPublishing/A-Definitive-Guide-to-Apache-ShardingSphere

Installing with the binary package

When you deploy ShardingSphere-Proxy with the binary package, please do the following:

  1. Install Java™ Runtime Environment (JRE) version 8.0 or higher.
  2. Download the binary package for ShardingSphere-Proxy. Navigate to https://archive.apache.org/dist/shardingsphere/ and download the apache-shardingsphere-${version}-shardingsphere-proxy-bin.tar.gz file.
  3. Decompress the binary package.
  4. Modify the configuration files in the conf. directory. For the modification details, please refer to the following sections:
    • Configuration – sharding
    • Configuration – read/write splitting
    • Configuration – encryption
    • Configuration – shadow database
    • Configuration – mode
    • Configuration – scaling
    • Configuration – mixed configurations
    • Configuration – server (the server.Yaml configuration)

Installing with Docker

If you prefer to install with Docker, you can refer to the following steps:

  1. Pull the docker pull apache/shardingsphere-proxy Docker image.
  2. Create the server.yaml and config-xxx.yaml files under /${your_work_dir}/conf/ to configure the services and rules.
  3. To start the proxy container, please use a command similar to the following format:

    docker run -d -v /${your_work_dir}/conf:/opt/shardingsphere-proxy/conf -e PORT=3308 -p13308:3308 apache/shardingsphere-proxy:latest

    Note – 4343

    Users are allowed to map the container's port. For example, in the preceding command, 3308 refers to Docker's container port, while 13308 represents the host port.

    The configuration must be mounted to /opt/shardingsphere-proxy/conf.

At this point, the preconditions for ShardingSphere-Proxy are met.

Introduction to Distributed SQL

Leveraging the power of ShardingSphere's efficient SQL Parser Engine, Distributed SQL (DistSQL) is an interactive management tool developed by Apache ShardingSphere to assist users. As a part of the distributed database management solution (DDBMS), DistSQL can identify and handle resource and rule configurations and query requests.

Based on their respective functions, DistSQL can be categorized into three separate categories:

  • Resource and Rule Definition Language (RDL) is used to create, modify, and delete resources and rules.
  • Resource and Rule Query Language (RQL) is used to query resources and rules, so you won't need to check files when you want to know the configuration state.
  • Resource and Rule Administration Language (RAL) is used to control advanced features such as permission controls, transaction type switching, and circuit breakers.

However, up to now, DistSQL statements have only been supported by ShardingSphere-Proxy.

Configuration – sharding

In this section, we'd like to showcase how to configure data sharding in ShardingSphere-Proxy by means of DistSQL and YAML. With DistSQL, users can use SQL-like statements to configure and manage sharding rules.

In this section, the focus is placed on the DistSQL syntax, and some examples are given to illustrate how users can use DistSQL to perform sharding operations. Additionally, the YAML method relies on configuration files to configure and manage data sharding features. We'd like to introduce the configuration items and their definitions, as well as provide scenario examples.

DistSQL – the SQL syntax

There are four table types: the sharding table, the binding table, the broadcast table, and the single table. To perform operations such as creating table rules, modifying table rules, deleting table rules, and displaying table rules, DistSQL provides four statements respectively: CREATE, ALTER, DROP, and SHOW.

Sharding table

The DistSQL statements designed for sharding table management primarily focus on sharding rules that require a sharding strategy configuration, a sharding algorithm, and a distributed key generator. Accordingly, DistSQL statements are supported to manage sharding strategies, sharding algorithms, and distributed key generation.

This is the simplest syntax definition for creating a sharding table rule, and more optional parameters can be found in the latest documentation.

Drop sharding table rule

The syntax for drop sharding table rules is simpler than creating and modifying them. You can do so according to the following example:

DROP SHARDING TABLE RULE tableName [, tableName] ...

Query sharding table rules

Usually, you can view all sharding table rules in the current database using the SHOW SHARDING TABLE RULES syntax.

Let's look at an example. The SQL statement that is used to create a sharding table rule is shown as follows:

CREATE SHARDING TABLE RULE t_order (
   RESOURCES(ds_0, ds_1),
   SHARDING_COLUMN=order_id, TYPE(NAME=hash_mod, PROPERTIES("sharding-count"=4)),
   GENERATED_KEY(COLUMN=another_id, TYPE(NAME=snowflake, PROPERTIES("worker-id"=123)))
);

Binding table rules

DistSQL has also designed a related syntax for managing binding table rules. Binding table rules are based on the sharding table rules, which bind the relationships formed between different sharding table rules.

First, let's create the Sharding Binding Table Rules syntax:

CREATE SHARDING BINDING TABLE RULES  (tableName [, tableName] ... ) [,  (tableName [, tableName]* ) ] 

Essentially, the syntax for modifying and deleting binding table rules is the same as the creation syntax; just change CREATE to ALTER or DROP.

Next, let's query the sharding binding table rules as follows:

SHOW SHARDING BINDING TABLE RULES [FROM schemaName]

Now, let's take a look at an example. Here is the statement to create binding table rules. The primary table and the joiner table should have the same sharding rules, such as t_order table and t_order_item table. In this case, as you can see, both are using order_id as sharding key, so they are binding each other's tables. Cartesian product correlation will not appear in the multi-table correlating query, which means that the query efficiency will greatly increase. The statement is as follows:

CREATE SHARDING BINDING TABLE RULES (t_order, t_order_item);

Broadcast table

We use the DistSQL syntax for creating broadcast table rules. The broadcast table has nothing to do with the shard table rules. The table used to create the broadcast table rules can be an existing table or a table that has not been created yet.

First, let's create the Sharding Broadcast Table Rules syntax:

CREATE SHARDING BROADCAST TABLE RULES (tableName [, tableName] ...)

Similar to binding table rules, modifying broadcast table rules and deleting broadcast table rules only requires changing CREATE in the creation syntax to either ALTER or DROP.

Next, let's query Sharding Broadcast Table Rules:

SHOW SHARDING BROADCAST TABLE RULES [FROM schemaName]

Let's check out an example. This is the statement to create broadcast table rules:

CREATE SHARDING BROADCAST TABLE RULES (t_config);

YAML

We will now look at the YAML configurations:

  • Here are the rules:
    • Table configuration items: The related configuration items of the table in sharding, including the ordinary sharding table, the automatic sharding table, the binding table, and the broadcast table, are mainly used to configure the relevant sharding information in order to properly run the sharding feature:
      • Tables: The user-defined sharding table rule configuration
      • autoTables: The automatic sharding table rule configuration
      • bindingTables: The binding table rule configuration
      • broadcastTables: The broadcast table rule configuration
    • Strategy configuration items: These are the shard-related strategy configuration items, including the default database sharding strategy, the default table sharding strategy, and the default primary key generation strategy. They can be used to specify the shard-related strategy information:
      • defaultDatabaseStrategy: The default database sharding strategy
      • defaultTableStrategy: The default table sharding strategy
      • defaultKeyGenerateStrategy: The default key generation strategy
    • Default sharding column: This is the default shard key. When the sharding policy does not contain a shard key configuration, the default shard key will be used:
      • defaultShardingColumn: The default sharding column name
    • Sharding algorithms: The sharding algorithm configuration information. These algorithms will calculate the corresponding routing information when sharding:
      • ShardingAlgorithms: The sharding algorithm configuration
    • Key generator: This is the algorithm used to configure the primary key generation:
      • KeyGenerators: The key generator configuration
  • Tables: Shard table configuration. Sharding information used to configure the table, including the table name, real data nodes, the database sharding strategy, the table sharding strategy, and the primary key generation strategy:
    • LogicTable: The logical table name.
    • ActualDataNodes: The data node name (the format is data source name + table name).
    • DatabaseStrategy: The database sharding strategy. Without configuration, the default database sharding strategy will be used.
    • tableStrategy: The table sharding strategy. Without configuration, the default table sharding strategy will be used.
    • keyGenerateStrategy: The key generation strategy. Without configuration, the default key generation strategy will be used.
  • autoTables: The automatic shard table configuration. Automatic sharding information for configuring tables, including the table name, the real data source information, the sharding strategy, and the primary key generation strategy:
    • logicTable: The logic table name.
    • actualDataSources: The data source names.
    • shardingStrategy: The sharding strategy.
    • keyGenerateStrategy: The key generation strategy. Without configuration, the default key generation strategy will be used.
  • databaseStrategy: The database sharding strategy. This is the sharding strategy for specifying data sources, including the standard sharding strategy, the complex sharding strategy, the hint sharding strategy, and the no sharding strategy:
    • strategyType: The strategy type, such as standard, complex, hint, and none
    • shardingColumn: The sharding column name
    • shardingAlgorithmName: The sharding algorithm name
  • tableStrategy: The sharding strategy for the table. The user specifies the sharding strategy for the table, including the standard sharding strategy, the complex sharding strategy, the hint sharding strategy, and the no sharding strategy.
    • strategyType: The strategy type, such as standard, complex, hint, and none
    • shardingColumn: The sharding column name
    • shardingAlgorithmName: The sharding algorithm name
  • keyGenerateStrategy: The primary key generation strategy configuration. With this, you can configure the primary key generation strategy under the sharding function and specify the corresponding primary key generation algorithm:
    • column: The column name
    • keyGeneratorName: The key generator name

      Note

      For the configuration items of defaultDatabaseStrategy, please refer to the table under databaseStrategy.

      For the configuration items of defaultTableStrategy, please refer to tableStrategy.

      For the configuration items of defaultKeyGenerateStrategy, please refer to keyGenerateStrategy.

  • shardingAlgorithms: The sharding algorithm configuration. The algorithms are used to configure sharding:
    • shardingAlgorithmName: The sharding algorithm name
    • type: The sharding algorithm type
    • props: The sharding algorithm properties
  • keyGenerators: The configuration of the primary key generation algorithm. The algorithm is used to configure the primary key generation:
    • keyGeneratorName: The key generator name
    • type: The key generator type
    • props: The key generator properties

For sharding, you will find a yaml example in the following script:

rules:
- !SHARDING
 tables:
   t_order:
     actualDataNodes: ds_${0..1}.t_order_${0..1}
     tableStrategy:
       standard:
         shardingColumn: order_id
         shardingAlgorithmName: t_order_inline

Bear in mind that you can also configure other strategies depending on your requirements.

Configuration – read/write splitting

In this section, we will introduce you to DistSQL's management syntax for read/write splitting rules:

  1. First, let's create the read/write splitting rules:

    CREATE READWRITE_SPLITTING RULE ruleName (

        WRITE_RESOURCE=resourceName,

        READ_RESOURCES(resourceName [ , resourceName] *),

        TYPE(NAME=algorithmName, PROPERTIES("key"="value" [, "key"="value" ]* )

    );

This is the standard syntax of the read/write splitting rule. The DistSQL syntax with read/write splitting also provides configuration methods for dynamic data sources.  

To modify the syntax of the read/write splitting rule, replace CREATE with ALTER and keep the other parts unchanged.

  1. Next, we will learn how to drop read/write splitting rules. The delete syntax is as follows:

    DROP READWRITE_SPLITTING RULE ruleName, [ruleName]*

  2. Now, let's learn how to show read/write splitting rules. The query syntax is as follows:

    SHOW READWRITE_SPLITTING RULES [FROM schemaName]

The DistSQL statements used to create read/write splitting table rules are shown here:

CREATE READWRITE_SPLITTING RULE write_read_ds (
WRITE_RESOURCE=write_ds,
READ_RESOURCES(read_ds_0, read_ds_1),
TYPE(NAME=random)
);

You will find that modifying the syntax is similar to the process of creating the syntax. It is very simple to show and delete the syntax. Now, let's review the configuration options when using YAML.

YAML

Now, we will look at the configuration for the rules and data sources of ShardingSphere's read/write splitting feature:

  • Rules configuration: You can configure various ShardingSphere rules, such as read/write splitting, data sharding, encryption, and decryption:
    • dataSources: The data source configuration
    • loadBalancers: The secondary database load balancer configuration
  • Read/write splitting data source configuration: The primary-secondary data source information is used to configure read/write splitting to help ShardingSphere identify the primary-secondary data source:
    • writeDataSourceName: The primary database name
    • readDataSourceNames: The secondary database name
    • loadBalancerName: The secondary database load balancer configuration

In the following code, you can find an example to configure your data source for the read/write splitting feature:

rules:
- !READWRITE_SPLITTING
 dataSources:
   read_write_ds:
     writeDataSourceName: write_ds
     readDataSourceNames:
       - read_ds_0
       - read_ds_1
     loadBalancerName: random
 loadBalancers:
   random:
     type: RANDOM

This wraps up the configuration items for ShardingSphere's features. In the next section, we will take you through the encryption feature's configuration.

Configuration – encryption

Data encryption can transform data based on encryption rules and encryption algorithms to protect private user data, while data decryption reverses the data encryption process.

In this section, we will present DistSQL statements and examples concerning data encryption alongside the YAML configurations and examples.

DistSQL

Now, let's look at the DistSQL statements:

  • Create Encrypt Rules:

    CREATE ENCRYPT RULE tableName(

    COLUMNS(

    (NAME=columnName ,PLAIN=plainColumnName,CIPHER=cipherColumnName,

    TYPE(NAME=encryptAlgorithmType ,PROPERTIES('key'='value')))

    ),QUERY_WITH_CIPHER_COLUMN=true);

To modify the syntax of the encrypt rule, replace CREATE with ALTER and keep the other parts unchanged. Here's an example:

CREATE ENCRYPT RULE t_encrypt (

COLUMNS(

(NAME=user_name,PLAIN=user_name_plain,CIPHER=user_name_cipher,TYPE(NAME=AES,PROPERTIES ('aes-key-value'='123456abc'))),

(NAME=password, CIPHER =password_cipher,TYPE(NAME=MD5))

),QUERY_WITH_CIPHER_COLUMN=true);

  • Drop Encrypt Rules: If you'd like to delete syntax, you can do so as follows:

    DROP ENCRYPT  RULE ruleName, [ruleName]*

  • Show Encrypt Rules: The query syntax is as follows:

    SHOW ENCRYPT RULES [FROM schemaName]

Modifying the syntax is similar to creating the syntax. It is very simple to show the syntax and delete the syntax.

In actual application scenarios, you can even configure your own encryption and decryption algorithms as needed.

YAML configuration items

In Apache ShardingSphere-Proxy, the YAML configuration is also used to manage data encryption.

The following list provides the YAML configuration items for the encryption feature:

  • !ENCRYPT: The encryption and decryption rule identification. This is used to identify the rule as an encryption and decryption rule:
    • tables: The encryption table configuration.
    • encryptors: The encryption algorithm configuration.
    • queryWithCipherColumn: The enable query with cipher column. It's also possible to query with a plaintext column (if there are plaintext columns).
  • tables: The related configuration of encryption and decryption tables. This is the table name and corresponding column information for configuring encryption and decryption:
    • table-name: The encryption table name
    • columns: The encryption column configuration
  • columns: The configuration related to the encryption and decryption columns. This is used to configure the logical names of the encryption and decryption columns and the names of the corresponding encrypted columns and auxiliary query columns, to help ShardingSphere encrypt and decrypt data correctly:
    • column-name: The encryption column name
    • cipherColumn: The cipher column name
    • assistedQueryColumn: The query assistant column name
    • plainColumn: The plaintext column name
    • encryptorName: The encryptor algorithm name
  • encryptors: The configuration of encryption and decryption algorithms. Configure the relevant algorithms that can be used for encryption and decryption:
    • encrypt-algorithm-name: The encryption algorithm name
    • type: The encryption algorithm type
    • props: This is the encryption algorithm to configure the assistedQueryColumn property.

Here is an example you can refer to for your encryption:

rules:
- !ENCRYPT
 tables:
   t_encrypt:
     columns:
       name:
         plainColumn: name_plain
         cipherColumn: name
         encryptorName: name-encryptor

This is an example that you could base your potential algorithm on, but bear in mind that you can also configure other algorithms.

This marks the end of your required encryption feature configuration. Now we are going to move on to the shadow database feature configuration items.

Configuration – shadow database

Shadow databases can route SQL executions according to configured shadow rules and shadow algorithms; therefore, test data is isolated from production data. This feature can solve data corruption problems in stress tests.

In this section, we will introduce you to DistSQL statements and the examples concerning shadow databases alongside related YAML configurations and examples.

DistSQL

DistSQL also provides support for the shadow database feature. This means that you can create, modify, delete, and query shadow rules using DistSQL:

  • Create Shadow Rule: The create syntax is as follows:

    CREATE SHADOW RULE shadow_rule(

    SOURCE=resourceName,

    SHADOW=resourceName,

    tableName((algorithmName,

    TYPE(NAME=encryptAlgorithmType ,PROPERTIES('key'='value')))

    ));

In addition to creating shadow rules, DistSQL also provides a syntax for creating shadow algorithms and setting default shadow algorithms.  

To modify the syntax of the shadow rule, replace CREATE with ALTER and keep the other parts unchanged.

  • Drop Shadow Rule: This syntax supports removing the shadow algorithm. The delete syntax is as follows:

    DROP SHADOW RULE ruleName, [ruleName]*

  • Show Shadow Rule: This syntax also supports the query shadow algorithm. The query syntax is as follows:

    SHOW SHADOW RULES [FROM schemaName]

Now, let's look at an example to create shadow database rules:

CREATE SHADOW RULE shadow_rule(
SOURCE=ds,SHADOW=ds_shadow,
t_order((simple_hint_algorithm, TYPE(NAME=SIMPLE_HINT, PROPERTIES("foo"="bar")))));

Modifying the syntax is similar to creating the syntax. It is very simple to show and delete the syntax.

YAML

You can also use YAML configurations to configure data encryption in Apache ShardingSphere-Proxy. The following tables provide you with the configuration items for the shadow database feature:

  • !SHADOW: Here are the basic configuration items and parameter descriptions of the shadow rules. Each parameter is described in detail as follows:
    • dataSources: This configures mappings from production data sources to shadow data sources.
    • tables: The shadow table name and configuration.
    • defaultShadowAlgorithmName: The default shadow algorithm name.
    • shadowAlgorithms: The shadow algorithm name and configuration.
  • dataSources: Here is a list of descriptions of the data source mapping relationship configuration parameters in the shadow rule:
    • shadowDataSource: The production data source and shadow data source mapping configuration names
    • sourceDataSourceName: The production data source name
    • shadowDataSourceName: The shadow data source name
  • defaultShadowAlgorithmName: Here is a description of the shadow algorithm configuration that takes effect, by default, in the shadow rule:

    Note

    This is optional; you can choose the configuration according to your needs.

    • defaultShadowAlgorithmName: The default algorithm name (optional)
  • tables:
    • tableName: The shadow table name
    • dataSourceNames: The shadow-table-related data source mapping configuration list
    • shadowAlgorithmNames: The shadow algorithms related to the shadow tables
  • shadowAlgorithms:
    • shadowAlgorithmName: The shadow algorithm name
    • type: The shadow algorithm type
    • props: The shadow algorithm configuration

Here is an example of a shadow data source mapping configuration that you can refer to:

dataSources:
   shadowDataSource:
#    Configure sourceDataSourceName and shadowDataSourceName
 tables:
   t_order:
#    Configure dataSourceNames and shadowAlgorithmNames
 shadowAlgorithms:        
   simple_hint_algorithm:
#    Configure type and props

The previous example is fairly standard, and you will find that you can utilize the script yourself quite often. In the next section, we will guide you through ShardingSphere's various mode configuration items.

Configuration – mode

The mode operation is very easy for developers and testers to use, and it also contains the production environment's deployment cluster operation mode.

ShardingSphere provides three operation modes, namely memory mode, local mode, and cluster mode. ShardingSphere mode doesn't support modifications and changes by DistSQL. Therefore, here, we will only present the YAML configuration method.

In the following sections, you will find that we have broken down the ShardingSphere mode configuration items into mode, repository, and props sections:

  • mode: With the operation mode configuration, you can configure memory, a single machine, and cluster mode. When using standalone mode or cluster mode, you can choose whether to override the remote configuration with the local configuration:
    • type: The operation mode: memory, standalone, and cluster
    • repository: The operation mode configuration
    • overwrite: Whether to overwrite remote configuration: true or false
  • repository: When using the standalone mode, you can specify the persistent configuration based on the file. When using cluster mode, you can use ZooKeeper or etcd for persistent configuration information:
    • type: The configuration center: ZooKeeper or etcd
    • props: The persistent configuration
  • props: In cluster running mode, you need to configure the namespace and registry information:
    • namespace: The configuration naming space
    • server-lists: The configuration center address

Here is an example of a mode configuration that you can refer to:

mode:
 type: Cluster
 repository:
   type: ZooKeeper
   props:
     namespace: governance_ds
     server-lists: localhost:2181
     retryIntervalMilliseconds: 500
     timeToLiveSeconds: 60
     maxRetries: 3
     operationTimeoutMilliseconds: 500
 overwrite: true

This completes our section on how to configure the various modes of Apache ShardingSphere. The next section will take you through the configuration items for the scaling feature.

Configuration – scaling

Elastic scale-out is at the core of ShardingSphere, and its relevant configurations are placed in the sharding rule YAML configuration. Although it requires a long process, it can be easily operated through DistSQL.

Scale-out has two modes, namely an automated process and a manual process. The completionDetector configuration should be active in automated processes, while the other configurations can be turned on according to actual needs. Then, trigger elastic scale-out [job] until it is done. In the manual process, each stage of elastic scale-out can be controlled.

This section will discuss two things:

  • The elastic scale-out job DistSQL syntax and examples
  • The elastic scale-out YAML configuration

DistSQL for job management

Users can control the whole process of scaling data migration through DistSQL, including starting and stopping scaling jobs, viewing progress, disabling reads, checking scaling, switching configurations, and more.

Currently, elastic scale-out can only be triggered by Sharding DistSQL.

The scaling syntax is different from the syntax of other features; you will find that it has more operation modes. Nevertheless, it is good for you to know the common syntax, which is as follows:

SHOW SCALING LIST ;
SHOW SCALING STATUS jobId ;
START SCALING jobId ;
STOP SCALING jobId ;
DROP SCALING jobId ;
RESET SCALING jobId ;

In addition to the previous syntax, the verification syntax is also similar:

CHECK SCALING jobId [BY TYPE(NAME=encryptAlgorithmType ,PROPERTIES('key'='value'))]? ;

You should note that scaling provides an advanced syntax, as follows:

SHOW SCALING CHECK ALGORITHMS ;
STOP SCALING SOURCE WRITING jobId ;
CHECKOUT SCALING jobId ;

The scaling process can become more complicated at times, so the provided syntax is more diverse.

Now for the examples! The following examples will prove to be handy references when it comes to scaling. Please note that because there are many scaling syntaxes, we will not show them here one by one. Only the most commonly used syntaxes are used as an example:

  • Stop Scaling:

    STOP SCALING 1449610040776297;

  • Start Scaling:

    START SCALING 1449610040776297;

  • Stop Scaling Source Writing:

    STOP SCALING SOURCE WRITING 1449610040776297;

In the preceding example, 1449610040776297 is the operation number, which is commonly referred to as jobId. The last result returned in the example is Query OK.

YAML – configuration items

The elastic scale-out configuration is in server.yaml. The elastic scale-out job will be triggered according to the configuration when sharding needs to be adjusted.

Now, let's look at the scaling configuration items, followed by an example:

  • blockQueueSize: This is the queue size of the data transmission channel.
  • workerThread: This is the worker thread pool size, that is, the number of migration task threads allowed to run concurrently.
  • clusterAutoSwitchAlgorithm: When configured, the system will detect when the scaling job has finished and switch the cluster configuration automatically.
  • dataConsistencyCheckAlgorithm: When configured, the system will use this defined algorithm to do a data consistency check when triggered.

Here is an example of a scaling configuration that you can refer to:

scaling:
  blockQueueSize: 10000
  workerThread: 40
  clusterAutoSwitchAlgorithm:
    type: IDLE
    props:
      incremental-task-idle-minute-threshold: 30
  dataConsistencyCheckAlgorithm:
    type: DEFAULT

Note

Scaling for tables without primary keys, and for tables with compound primary keys is not supported.

When scaling at the current storage node, a new database cluster needs to be prepared as the target end.

Configuration – multiple features, server properties

In this section, we will discuss how to realize the rule configuration of data sharding based on read/write splitting. Please note that the data source used in this process should be the one after aggregating read/write splitting.

DistSQL

The SQL syntax is consistent with each feature when used singularly. For more detailed syntax, please refer to the following sections:

  • Configuration – sharding
  • Configuration – read/write splitting
  • Configuration – mode

Now for the example! Perform the following steps:

  1. First, create the read/write splitting rules:

    CREATE READWRITE_SPLITTING RULE ds_0 (

       WRITE_RESOURCE=write_ds_0,

       READ_RESOURCES(write_ds_0_read_ds_0, write_ds_0_read_ds_1),TYPE(NAME=ROUND_ROBIN)

    ), ds_1 (

       WRITE_RESOURCE=write_ds_1,

       READ_RESOURCES(write_ds_1_read_ds_0, write_ds_1_read_ds_1),TYPE(NAME=ROUND_ROBIN)

    );

  2. After the read/write splitting rule has been created successfully, use the read/write splitting data source as the sharded data source of the sharding table rule. Create a shard table rule for t_order:
CREATE SHARDING TABLE RULE t_order (
RESOURCES(ds_0,ds_1),
SHARDING_COLUMN=order_id,TYPE(NAME=hash_mod,PROPERTIES("sharding-count"=4)),
GENERATED_KEY(COLUMN=order_id,TYPE(NAME=SNOWFLAKE, PROPERTIES("worker-id"=123)))
);

Now, the mixed read/write splitting and sharding table rule creation has been completed. The corresponding rules can be viewed through the SHOW syntax.

You can also use the DDL statement to create the t_order table through the proxy, which will actually create four shard nodes on write_ds_0 and write_ds_1, that is, four tables.

YAML

The configuration items are consistent with each feature when used as a standalone feature. For detailed configuration items, please refer to the following sections:

  • Configuration – sharding
  • Configuration – read/write splitting
  • Configuration – mode

For example, in addition to using DistSQL to create rules, configuration files can also be used to create read/write splitting and sharding rules. The configuration files are as follows:

  1. Define the logic schema and data sources:
schemaName: sharding_db
dataSources:
  # define data sources write_ds_0, write_ds_1, write_ds_0_read_0, write_ds_0_read_1, write_ds_1_read_0, write_ds_1_read_1.
  1. Then, create the read/write splitting rule and the sharding rule in the same file:
readwrite_splitting 
rules:
- !READWRITE_SPLITTING
# define readwrite_splitting rule here
- !SHARDING
# define sharding rule here

Mixed – encryption + read/write splitting + cluster

This section discusses how to create a scenario of application for both data encryption and read/write splitting.

DistSQL

The SQL syntax remains consistent with what you have learned so far. It is same as if you were configuring one single feature, instead of multiple features like we are doing here. For more detailed syntax, please refer to the following sections:

  • Configuration – encryption
  • Configuration – read/write splitting
  • Configuration – mode

Let's look at an example. This section will use DistSQL to configure hybrid rules for data encryption and read/write splitting and present a complete read/write demonstration. Perform the following steps:

  1. Create the read/write splitting rules:

    CREATE READWRITE_SPLITTING RULE wr_group (

    WRITE_RESOURCE=write_ds,

    READ_RESOURCES(read_ds_0,read_ds_1),

    TYPE(NAME=random)

    );

  2. Create the encryption rules:

    -- Encrypt password column and store the value encrypted in password_cipher column

    CREATE ENCRYPT RULE t_encrypt (

    COLUMNS(

    (NAME=password,CIPHER=password_cipher,TYPE(NAME=AES,PROPERTIES('aes-key-value'='123456abc')))

    ));

Now, the rules for read/write splitting and encryption have been created.

  1. Create an encrypted table, t_encrypt for verification.

Please note that the DDL statement only designates the original column and the extra plaintext column and that the encrypted column will be automatically processed by ShardingSphere.

The process of creating a table uses the DDL statement of Mysql, which is not shown here. However, it should be noted that the created table needs to contain the password field. The fields of the t_encrypt table are id, user_name, and password.

  1. Insert data into t_encrypt:

    -- SQL statement input

    INSERT INTO t_encrypt (user_name, password) VALUES ('user_name', 'plain_password');

    -- Actual SQL statement executed by ShardingSphere

    INSERT INTO t_encrypt (user_name, password_cipher) VALUES ('user_name', 'OYd7QrmOWUiJKBj0oDkNIw==');

  2. Query data in t_encrypt: execute the 'select * from t_encrypt' statement from the proxy and the real database to verify that the data has been encrypted.

Data is directly queried in a physical data source. The value of a password is OYd7QrmOWUiJKBj0oDkNIw== which shows that the password has been encrypted.

Data is queried in ShardingSphere. The value of the password is plain_password, and the password has been automatically decrypted.

YAML configuration

Configuration items are consistent with the configuration items you have reviewed in this chapter, under each specific feature. Concurrently configuring multiple features like we are doing here, does not require any special or new configurations from what you learned previously. For more detailed configuration items, please refer to the following sections:

  • Configuration – encryption
  • Configuration – read/write splitting
  • Configuration – mode

In ShardingSphere-Proxy, the configuration format of the YAML file is consistent with that of ShardingSphere-JDBC, except that schemaName is added to every config-xxx.yaml file to designate the name of the logic schema. Perform the following steps:

  1. Define the logic schema and data sources:

    schemaName: mixture_db

    dataSources:

      # define data sources write_ds, read_ds_0 and read_ds_1

  2. Then, create the encrypt rule and the READWRITE_SPLITTING rule in the same file:

    rules:

    - !ENCRYPT

    # define encrypt rule here

    - !READWRITE_SPLITTING

    # define readwrite_splitting rule here

Compared to DistSQL, when using the YAML file for configuration, you do not need to manually create a schema. Other operations are the same as using configured DistSQL.

Configuration – server

ShardingSphere can configure properties related to authority control, the transaction type, and the system configuration, which are all in the server.yaml configuration file.

Authority

ShardingSphere provides control over access to the proxy. Control over the client end's access to the proxy can be achieved through the following YAML configurations:

Table 6.1

Table 6.1

Here's an example of the ALL_PRIVILEGES_PERMITTED type:

rules:
 - !AUTHORITY
   users:
     - root@%:root
     - sharding@:sharding
   provider: 
     type: ALL_PRIVILEGES_PERMITTED

Here's the code for the SCHEMA_PRIVILEGES_PERMITTED type:

rules:
 - !AUTHORITY    
   users:
     - root@:root      
     - my_user@:pwd    
   provider:
     type: SCHEMA_PRIVILEGES_PERMITTED      
     props:
       user-schema-mappings: root@=sharding_db, root@=test_db, [email protected]=sharding_db
# root users can access sharding_db when connecting on any mainframe
# root users can access test_db when connecting on any mainframe
# my_user users can only access sharding_db when connecting to 127.0.0.1        

Transaction

ShardingSphere provides transaction capability. Transaction types, including LOCAL/XA/BASE, can be chosen through the YAML configuration:

Table 6.2

Table 6.2

For example, take a look at the following:

rules:
 - !TRANSACTION
   defaultType: XA 
   providerType: Narayana/Atomikos

Props configuration

ShardingSphere provides some YAML configuration items and can configure the proxy system-level properties:

  • Feature switch:
Table 6.3

Table 6.3

  • Tuning parameters:
Table 6.4

Table 6.4

Here's an example:

props:
   max-connections-size-per-query: 1
   kernel-executor-size: 16 
   proxy-frontend-flush-threshold: 128
   proxy-opentracing-enabled: false
   proxy-hint-enabled: false
   sql-show: true
   check-table-metadata-enabled: false
   show-process-list-enabled: false
   proxy-backend-query-fetch-size: -1
   check-duplicate-table-enabled: false
   sql-comment-parse-enabled: false
   proxy-frontend-executor-size: 0 
   proxy-backend-executor-suitable: OLAP
   proxy-frontend-max-connections: 0 
   sql-federation-enabled: false

That concludes our transaction feature's configuration items, giving you a complete overview of how to configure this feature in your system.

Summary

Now, you are ready to take the necessary steps to get your version of Apache ShardingSphere up and running. Thanks to this chapter, you now understand how to download and install ShardingSphere, which is an important milestone toward achieving your distributed database goals.

You can consider yourself adept at using Apache ShardingSphere. In the next chapter, you will be able to take your skillset to an advanced level by mastering ShardingSphere-JDBC, too.

Mastering ShardingSphere-JDBC will set you up to be able to fully leverage the Database Plus concept and the plugin-oriented platform that sets this ecosystem apart.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset