Chapter 11: Performance Benchmarking and Migration

In Chapter 10, Troubleshooting Issues, we learned how to troubleshoot issues with the help of metrics and logs. In this chapter, we will go over some important parameters that we should consider during performance benchmarking, how to do performance benchmarking, and important things to consider during migration. Finally, we will look at some specific examples of migrating from some traditional databases, such as PostgreSQL.

The following topics will be covered in this chapter:

  • Performance—Things to consider
  • Performance benchmarking for CockroachDB
  • Migration—Things to consider
  • Migrating from traditional databases

Technical requirements

There are some commands discussed in this chapter that require you to have CockroachDB installed. If you still haven't done so, please refer to the Technical requirements section of Chapter 2, How Does CockroachDB Work Internally?.

Performance – Things to consider

In this section, we will discuss general factors related to the infrastructure that affect performance, some standard benchmarking suites, and running a benchmark for your specific needs in CockroachDB.

Infrastructure

It is important to know the ideal configuration for your infrastructure as that significantly influences the performance of CockroachDB that runs on top of it. Here are some key factors to consider in this regard:

  • Central processing unit (CPU): It would be a cliché to say that the CPU plays an important role in performance. We have to ensure each node gets at least four virtual CPUs (vCPUs). You should constantly monitor the CPU usage in all nodes to quickly identify hot nodes that might be receiving more traffic than other nodes. This can make the node's CPU reach its maximum capacity, which can slow down query response times.
  • Memory: Pebble key-value store, table metadata, and CockroachDB internal data structures all use memory. It is recommended that each node should have at least 4 gibibytes (GiB) of random-access memory (RAM). Also, you should keep track of the memory usage of all nodes. If a node or bunch of nodes start to reach their full memory capacity, you can either increase the memory of the affected node/s or scale out the cluster by adding more nodes. Resharding the ranges ensures that each node now handles fewer ranges, and hence their memory usage can go down. Also, ideally, 20% of the memory should always be empty, so you should treat reaching 80% memory usage as the full capacity and start readjusting the cluster if any nodes reach that threshold.
  • Operating system: Any Linux-based operating system would be ideal. Container-optimized operating systems, Ubuntu, Red Hat Enterprise Linux (RHEL), and CentOS are among the popular choices for an operating system for running CockroachDB.
  • Storage input/output (I/O): Many cloud providers offer storage options that are highly available. You can also choose specific configurations to ensure the storage layer withstands zonal and regional failures. Solid-state drives (SSDs) are preferred for faster I/O. You should not exceed 2.5 tebibytes (TiB) per node and 150 gigibytes (GiB) per vCPU for optimal performance, as per the recommendation from Cockroach Labs..
  • Network: Since more than one node is involved in a distributed Structured Query Language (SQL) query and SQL clients can also consume data from various geographies, network performance is critical in determining the overall performance. If a cluster is spread across multiple regions and supports regional failures, then the overall query-response time can be slower, as each commit would have to be communicated between at least two regions. So, pay attention to network hops between the nodes, client, and cluster.
  • Cloud provider: We now have several cloud providers that offer compute, memory, storage, and network resources on the cloud. Also, you get several options to choose from for each of these resources. There is also an increasing use of hybrid cloud and multi-cloud architectures, which further complicates the deployments. Cloud providers also refresh their hardware; so, when you are benchmarking across multiple cloud providers, you might see inconsistencies in performance if you repeat the same benchmarks every 6 months.

Next, we will learn about some popular benchmark suites.

Popular benchmark suites

Several popular benchmark suites are used to benchmark databases and data warehouses. Here is a list of some commonly used ones:

  • TPC-C (http://www.tpc.org/tpcc/): TPC-C is a short form of Transaction Processing Performance Council Benchmark C, which is a benchmark used to compare the performance of online transaction processing (OLTP) systems. The TPC-C benchmark defines a set of functional requirements that can be run on a given transaction processing system. This benchmark requires reporting performance, which is the same as transaction throughput, and price/performance, which is the overall infrastructure cost/throughput.
  • TPC-H (http://www.tpc.org/tpch/): TPC-H is a decision support benchmark. It involves running ad hoc queries and concurrent data modification and measuring performance.
  • YCSB (https://github.com/brianfrankcooper/YCSB): Short for Yahoo! Cloud Serving Benchmark, YCSB is an open source benchmark suite used for evaluating retrieval and maintenance capabilities of databases. It is often used to measure the performance of NoSQL databases and workloads involving simple database operations.
  • Sysbench (https://github.com/akopytov/sysbench): Sysbench is a benchmark suite used for measuring system performance, especially when you are running a database.

It takes a lot of effort and resources to run these benchmarks. Fortunately, Cockroach Labs conducts extensive benchmarking across multiple cloud providers every year that includes the performance of CPU, network, storage, and TPC-C, and provides the results in its annual cloud report. Benchmarks for 2020 and 2021 can be accessed at https://www.cockroachlabs.com/guides/2020-cloud-report/ and https://www.cockroachlabs.com/guides/2021-cloud-report/. So, unless you want to measure the performance for some specific use cases and workload patterns, these cloud reports should serve as a great starting point to decide which cloud provider to pick and if CockroachDB's performance is acceptable.

Benchmarking your specific use cases

In any benchmark, you should also pay attention to whether you want to conduct a synthetic benchmark or a real-world benchmark. Synthetic benchmarking involves simulating the traffic with dummy data. Real-world benchmarking is much closer to what you already have in production, benchmarking the real workloads with the real data. Real-world benchmarks can give a much better indication of how CockroachDB is going to perform when put in production. Although there are standard suites available for performance benchmarks, it's always handy to do your own benchmarking for specific needs. Here are some categories that can fall under this:

  • Large data: When the data is large, you would need a greater number of nodes to manage the data—so, the performance of a 3-node cluster might vary from the performance of a 17-node cluster. So, based on the specific volume of data that you have in your mind, you can benchmark CockroachDB against it. You should also ensure that the read-and-write queries are similar to what you would expect in production, in order to keep it close to the real world.
  • High traffic: Your workload might have to be more skewed toward high read or high write traffic, or both. So, based on what is relevant to your query patterns, it's important to benchmark for high traffic workloads tailored to your needs.
  • Scaling with an increase in data volume: If you are using a database in a year-old start-up with very few customers, the volume of data will obviously be much less. But as the company grows and there are more use cases and customers start consuming more data, the data volume can explode. So, it is good to prepare your infrastructure for 5-10X growth, both in terms of data volume and traffic, and see how many resources you would need to scale out. As part of this exercise, you will also know where you stand with respect to automation. By automating all the repeatable tasks, you can be better prepared to handle failures and scale out the infrastructure.
  • Failover: Some businesses are more sensitive to downtime than others. For example, if your database is supporting some application used by doctors and health specialists to view patient records in real time or if your database supports an application that is used for online booking, your database is expected to be available maybe with six nines of availability. This means the database can be unavailable only for 31.56 seconds in the entire year. So, in such cases, it's important to simulate failures to see how quickly you can recover from the node, zonal, regional failures, or any kind of failures that can result in the disruption of your database service.
  • Multi-tenancy: There are a couple of different ways to implement multi-tenancy with CockroachDB. For example, you can have a dedicated small cluster for each customer. Although this is great from a security perspective as workloads from other customers can be completely isolated, practically, this is not cost-effective. So, it would be ideal to just introduce tenant-id (tenant identifier (ID)) and shard the data based on that. But with this approach, it can become tricky to keep the service-level agreement (SLA) the same for all customers, as it's more prone to creating hotspots as the tenants will not be equally active in consuming the data. In such cases, you can also benchmark resharding ranges or try resharding the data into multiple clusters, and see how long that takes.
  • Migration: Before you migrate the real data in production, you can benchmark how much time the entire process takes from exporting the schema and data from the source database to importing the same into the target database. You might have to do this for various data volumes to get an idea of how much time it might take in production for the complete migration.

In the next section, using an example, we will learn how to conduct performance benchmarking for CockroachDB.

Performance benchmarking for CockroachDB

In this section, we will go over the process of performing TPC-C benchmarking on CockroachDB. Here are the steps for running a TPC-C workload and getting the statistics for the run:

  1. Import the TPC-C dataset into your local CockroachDB cluster, as follows:

    $ cockroach workload fixtures import tpcc –warehouses=5 'postgresql://root@localhost:26257?sslmode=disable'

    I220127 06:46:52.189260 1 ccl/workloadccl/fixture.go:342  [-] 1  starting import of 9 tables

  2. After loading the tables, you should see the following statistics at the end:

    I220127 06:47:31.384005 1 workload/tpcc/tpcc.go:485  [-] 13  check 3.3.2.1 took 152.063ms

    I220127 06:47:31.664548 1 workload/tpcc/tpcc.go:485  [-] 14  check 3.3.2.2 took 280.468ms

    I220127 06:47:31.719888 1 workload/tpcc/tpcc.go:485  [-] 15  check 3.3.2.3 took 55.281ms

    I220127 06:47:32.796841 1 workload/tpcc/tpcc.go:485  [-] 16  check 3.3.2.4 took 1.076892s

    I220127 06:47:33.218333 1 workload/tpcc/tpcc.go:485  [-] 17  check 3.3.2.5 took 421.432ms

    I220127 06:47:34.706096 1 workload/tpcc/tpcc.go:485  [-] 18  check 3.3.2.7 took 1.487671s

    I220127 06:47:35.142591 1 workload/tpcc/tpcc.go:485  [-] 19  check 3.3.2.8 took 436.435ms

    I220127 06:47:35.503616 1 workload/tpcc/tpcc.go:485  [-] 20  check 3.3.2.9 took 360.963ms

You can verify whether data for TPC-C is loaded or not by browsing the tpcc database and the tables, as shown in the following code snippet:

root@localhost:26257/defaultdb> show databases;

  database_name | owner | primary_region | regions | survival_goal

----------------+-------+----------------+---------+----------------

  defaultdb     | root  | NULL           | {}      | NULL

  postgres      | root  | NULL           | {}      | NULL

  system        | node  | NULL           | {}      | NULL

  tpcc          | root  | NULL           | {}      | NULL

(4 rows)

Here are the tables in the tpcc database:

root@localhost:26257/defaultdb> use tpcc;

SET

Time: 1ms total (execution 1ms / network 0ms)

root@localhost:26257/tpcc> show tables;

  schema_name | table_name | type  | owner | estimated_row_count | locality

--------------+------------+-------+-------+---------------------+-----------

  public      | customer   | table | root  |              150000 | NULL

  public      | district   | table | root  |                  50 | NULL

  public      | history    | table | root  |              150000 | NULL

  public      | item       | table | root  |              100000 | NULL

  public      | new_order  | table | root  |               45000 | NULL

  public      | order      | table | root  |              150072 | NULL

  public      | order_line | table | root  |             1500459 | NULL

  public      | stock      | table | root  |              500000 | NULL

  public      | warehouse  | table | root  |                   5 | NULL

(9 rows)

Once this is done, you can run a sample workload with the following command:

$ cockroach workload run tpcc --warehouses=5 --ramp=3m --duration=3m 'postgresql://root@localhost:26257?sslmode=disable'

Here is a description of some of the options used in the preceding command:

  • --warehouse is the number of warehouses that will be used for loading the data.
  • --ramp is the duration for which the load will be ramped up, which will warm the cluster.
  • --duration is the total duration to run the workload.

For both ramp and duration, time units can be specified in h, m, s, ms, us, and ns.

You will see statistics getting emitted every second on the console. Here is an example snapshot of statistics you might see:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)

  145.0s        0            0.0            0.1      0.0       0.0      0.0      0.0 delivery

  145.0s        0            0.0            0.4      0.0       0.0      0.0      0.0 newOrder

  145.0s        0            0.0            0.1      0.0       0.0      0.0      0.0 orderStatus

  145.0s        0            1.0            0.5    176.2     176.2    176.2    176.2 payment

  145.0s        0            0.0            0.0      0.0       0.0      0.0      0.0 stockLevel

  146.0s        0            0.0            0.1      0.0       0.0      0.0      0.0 delivery

Once the workloads are done running, you will see a performance metrics summary at the end. Here is a sample summary:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)

  180.0s       62.0  96.4%    179.5    176.2    243.3    260.0    352.3    469.8

This metrics summary includes an average of per-operation statistics. Similarly, you can run several other types of workloads that are provided out of the box. Please visit https://www.cockroachlabs.com/docs/stable/cockroach-workload.html to learn more.

In the next section, we will go over key aspects to consider before and during database migration.

Migration – Things to consider

In this section, we will go over some important factors that affect your migration strategy. Migrating between databases is always a tedious, error-prone, and complicated endeavor. It is also possible to get into nightmarish situations if you are not thorough and careful. Also, if your business is sensitive to downtime during this migration, you should find a mechanism for doing continuous migration. It is important to make sure you do your own benchmarking for the current schema, workload, and queries on CockroachDB before you decide to migrate your production data and workloads.

Here are some key aspects you should pay attention to before deciding on migration:

  • Cost: You should be asking yourself whether the overall cost is going to go down or increase once you move to CockroachDB. Since the cost of maintaining a database can be significant, you should do a back-of-the-envelope calculation to see the cost difference.
  • Learning curve: Although CockroachDB supports most SQL constructs, it's not 100% compatible with traditional databases. This requires modifying some existing queries and also doing benchmarking for those queries. If there are many features missing in CockroachDB that you are using in the current database, then all your application teams have to be educated about how to rewrite those queries.
  • Security: When you are moving to CockroachDB, if the resources allocated for CockroachDB are within the same infrastructure as that of the previous database, then you will have fewer things to worry about in terms of security. For example, migrating to CockroachDB within your on-premises cluster is very different from migrating to CockroachDB-as-a-Service hosted by some third-party vendor. In the first case, all your security configurations can remain the same, whereas in the latter case, you will just get an endpoint to talk to and there is always a chance that the third-party infrastructure gets compromised and potentially exposes your data to bad players. So, it's always a good idea to run the migration plan through the information security (InfoSec) team and get their opinion.
  • Vendor lock-in: Vendor lock-in happens when you are forced into using a specific product or a service, either because there are no other vendors who provide the same service or because you are already deeply invested in one. Vendor lock-in is a tricky subject, and it can happen in several ways. Here are some situations that can lead to vendor-lock-in:
    1. Completely relying on an as-a-service offering by a third-party vendor
    2. Extensively using all the enterprise-only features
    3. Not having in-house expertise to deal with fires or to develop new features
    4. Getting into long-term contracts with vendors because of an initial discount

I am sure there are plenty more points to discuss here, but the intention is to only highlight the important ones.

Once you have finalized that you are definitely going to migrate to CockroachDB, here are some key items you should plan for:

  • Capacity planning: The number of nodes, memory, and storage space can all vary from what's allocated to a traditional database. So, make sure you provision enough resources upfront in order to avoid any issues arising from a lack of resources.
  • Downtime: Although you will have tried the migration several times in a pre-production environment before proceeding to production, you can always encounter new issues if the scale and data of production and pre-production are different. So, just be prepared for all sorts of failures, keep the stakeholders informed, and line up the necessary human resources before you start the migration.
  • Migrating data in smaller chunks: Instead of dumping the entire database at once, you can just migrate one table at a time. This can finish relatively faster compared to the entire database import, and you can also avoid timeouts.
  • Maintaining data in source and target databases for some time: It would be advisable to maintain the data in both source and target databases for several months till you are comfortable with CockroachDB. Maintaining the data in the source database will allow you to switch back to the source database if things don't go as expected after the migration. But this definitely creates additional complexity, as you now have to maintain the data in two different databases and make sure the data is still consistent between them.

In the next section, we will learn how to migrate from traditional databases into CockroachDB.

Migrating from traditional databases

In this section, we will outline a few generic steps that are involved in migrating data between any two databases. After that, we will go over a specific example of migrating data from PostgreSQL into CockroachDB.

Migrating from other databases into CockroachDB usually involves the following generic steps:

  1. Export the schema from the source database.
  2. Export the data from the source database.
  3. Transform the data into a desirable format. In most cases, comma-separated values (CSV) should work just fine.
  4. Compress and transfer the schema and the data to the desired location where it can be imported into CockroachDB.
  5. Do the data type mapping in the schema from the source to the CockroachDB database.
  6. Import the schema, along with the data, into CockroachDB.
  7. Manually create users and privileges.
  8. All the application teams also have to map current queries from the source database to CockroachDB.

Now, let's look at a specific example of how this migration looks when we migrate from PostgreSQL into CockroachDB.

Migrating from PostgreSQL to CockroachDB

Here are the steps to take to migrate from PostgreSQL:

  1. Dump the database: pg_dump is a utility used for taking backups for a PostgreSQL database. You can use the following command to dump a given database:

    pg_dump my_database > my_database.sql

  2. Map the required data types in the schema. For example, any new types of data that are created in PostgreSQL using CREATE TYPE will not be supported in CockroachDB as they are not the standard data type. So, these should be manually mapped to CockroachDB data types in the previously dumped file.
  3. Compress and move the dump to a place where it can be imported into a CockroachDB cluster.
  4. Import the schema and data using the following command:

    IMPORT PGDUMP 'nodelocal:///my_database.sql.gz' WITH ignore_unsupported_statements;

  5. Once this is done, you can list the databases and tables and run sample queries on the imported database to make sure it's imported correctly.
  6. The final step is to verify that the entire data is imported without any missing data. For this, you can compare all the tables and row counts of the imported tables with the source database. For critical tables, you can dump it from the source and target database and compare the byte size and contents.

Data migration is always tedious and can fail for various reasons. So, instead of making it a one-time activity, you can also explore the option of continuous migration and keeping the previous database in tandem with CockroachDB. This is required till you are completely confident that CockroachDB is performing well for all your use cases and all the data is migrated to CockroachDB cluster. For continuous migration, you can publish the change data capture from a traditional database into Kafka and consume the events to make relevant updates in CockroachDB.

Summary

In this chapter, we discussed aspects that contribute to performance. We also looked at popular benchmark suites and went over the process of benchmarking a CockroachDB cluster. Next, we learned about important things we should consider before and during migration. Lastly, we familiarized ourselves with generic steps for migrating from some other database to CockroachDB and discussed a specific example of migrating from PostgreSQL to CockroachDB.

With that, we have come to the end of this book. All the code used in this book has been shared at https://github.com/PacktPublishing/Getting-Started-with-CockroachDB. Please make sure you are using the latest version of CockroachDB when trying out these examples. All the examples have been tried with v21.2.0. Please reach out to the author if you find any mistakes. Thank you for reading this book— I hope you enjoyed the contents of this book and are now ready to start exploring CockroachDB as a potential database for your use cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset