MySQL allows us to analyze individual queries to see how long they take and exactly how they are executed on the database. By looking at this information and armed with an understanding of how MySQL tries to optimize your queries for you, you can sometimes improve performance. In this chapter we'll look at the following:
• Finding slow queries
• Benchmarking your queries
• Using the slow query log
• Using EXPLAIN
to see how queries are executed
• Understanding MySQL's built-in query optimization
• Optimization tips
To optimize any application, you must first find out which parts of it take up the most execution time. There are a few ways to do this:
• Observation: Frequently, the reason we notice that optimization is needed is that one particular query runs at the speed of continental drift.
• Benchmarking: Test your application to see which parts of it are slow.
• Slow query log: This log tracks slow queries, as you might expect from its name.
After a slow query has been identified, you can work out how it is being executed by MySQL by using the EXPLAIN
statement and can take steps to optimize the query.
It is not usually necessary to attempt to speed up all queries in an application. Generally, only a small portion of your code takes up most of the execution time. Unless your system is very, very busy, the programming time spent attempting to accelerate parts that do not take much time is wasted.
In any kind of programming optimization problem, benchmarking is useful. By benchmarking, we mean specifically timing how long your queries take. This is best done by running a query many times and seeing how long it takes on average. A single execution of the query will be subject to load issues, so it may give unpredictable results. You will also find that the second time you run a query, it should be quicker because the query is cached.
You can, of course, use external scripts or programs to run queries multiple times. For an example, download the source distribution of MySQL and look at the benchmarking code that is included in the sql-bench directory.
You can also test the speed of evaluation of any expression (including a query) in MySQL by using the built-in BENCHMARK()
function. For example:
select benchmark(1000000, 6*9);
This will produce something similar to the following results:
1 row in set (0.25 sec)
The function takes two parameters: the number of times to evaluate the expression (in this case, one million) and the expression we want to evaluate (in this case, six times nine).
We do not really want to see the result of the SELECT
query. The BENCHMARK()
function always returns zero. What we are interested here is the time the query took to execute. In this sample output, you can see that evaluating 6×9 one million times on my system took a quarter of a second.
You can also pass BENCHMARK()
a query, for example:
select benchmark(10000000,
'select employee.name, department.name
from employee, department
where employee.departmentID=department.departmentID'),
The slow query log can be used to track which of your queries is running slowly. You define what is meant by “slowly,” but it must be measured in whole seconds.
You can turn on slow query logging with the --log-slow-queries=
filename
option when starting the MySQL server or in your configuration file. If you also turn on the --log-long-format
option, all queries that run without using an index are also logged. This can help you to see where you should be targeting your optimization efforts.
You can define what is meant by a slow query by using the long_query_time
variable. You can set this in your configuration file or by using the SET
command. This variable is in seconds.
You can read the slow query log manually because it's just a text file. You may find it more useful to view a summary of the queries that are running slowly. You can do this by running the mysqldumpslow
script (in the scripts directory of your MySQL installation). This is a Perl script, so Windows users note that you will need to install Perl if you do not already have it. (Get it from www.activestate.com.)
One current limitation of MySQL is that you cannot configure it to log slow queries that take less than one second. On systems that process a large number of relatively simple queries, one second would be very, very slow. The administrator might want to be informed when queries take more than one-tenth of a second or some other fraction. This will be addressed in a future version.
The EXPLAIN
command tells MySQL to explain to you how a query will be executed. As a very simple example, we might type this:
explain
select e.name, d.name
from employee e, department d
where e.departmentID = d.departmentID;
As you can see, we have simply prefixed an unremarkable query with the word EXPLAIN
. This will not actually execute the query, but will instead return some information about how MySQL plans to execute the query. You should see some results similar to the following:
2 rows in set (0.00 sec)
What does this all mean? You can see at a glance that there is one row in the result set per table in the query. (You can see which table is being discussed in the index column.) The order of the rows shows the order in which the tables will be joined.
The columns in this table are as listed here:
• id
This is a sequence number. If there is more than one SELECT
in a query—for example, if you are using a subquery—each SELECT
is numbered.
• select_type
This is the type of SELECT
statement being performed. Most of the time this will be SIMPLE
as in the preceding example, meaning that it's a plain vanilla SELECT
. If you are using subqueries, the outer query will be marked PRIMARY
and the inner queries will be marked SUBSELECT
or DEPENDENT SUBSELECT
for correlated subqueries.
• table
This is the table this row is about.
• type
This is one of the most important columns for optimization. It tells you how the table is being joined to the other tables in the query.
• possible_keys
This tells you which indexes could have been used in the query. It will be NULL
if there are no relevant indexes.
• key
This tells you which index was selected for use in the query. It will be NULL
if no index was selected.
• key_len
This is the length of the index MySQL decided to use.
• ref
This is the value being compared to the key to decide whether to select rows.
• rows
This is an estimate of the number of rows from this table MySQL will read to generate the results of the query. You can work out how many rows will be read overall by multiplying the rows' values together. This gives a basic benchmark for how fast the query will run.
• Extra
Additional information may be given in this column. For example, the comment Using index
means that MySQL can retrieve the result of the query completely from an index without reading data from the table.
So in this case, what is the output telling us?
The join type ALL
for the employee table tells us that all rows will be scanned in this table. This is obviously going to be slow if the table has a lot of data in it. In fact, the join type ALL
is the worst possible result. You will get this typically if a table has no useful index. The obvious optimization here is to add an index. We'll look at this in a moment.
The row for department has join type eq_ref
, which means that a single row will be read from the department table for each row in the employee table. This is one of the best types. The only better values for type are system
and const
, meaning that the table has only one matching row and can effectively be treated as a constant. This part of the join we are pretty happy with.
These are the other possible values for type:
• ref
All rows with matching index values will be read from the table. This is the next best option after eq_ref
and represents situations in which you are dealing with non-unique keys.
• range
This is not as good as eq_ref
or even ref
. It means that all rows in a particular range will be read from the table.
• index
This is better than ALL
, but worse than the other types mentioned previously. Seeing index
means that the complete index will be scanned. This is preferable to scanning the complete table, but is far from ideal.
Next, let's look at the values of the possible_keys
and key
columns. The department table has one option—PRIMARY
, the primary key, which is the one used. The employee table has the value NULL
in both of these columns, meaning that there's no key to use and therefore no key will be used. Again, this is a pretty strong hint that we should be adding another index!
Based on this information and assuming that we're going to execute this query reasonably frequently and therefore would like it to be faster, we'll create another index as follows:
create index ename_did on employee(name, departmentID);
If we then rerun EXPLAIN
, we get the following output:
2 rows in set (0.00 sec)
You can see that there are some changes. The type for employee is now index
because we now have an appropriate index to refer to. The new index is listed as a possible key, but it is not actually being used. Under Extra
you will see that only the index for this table is used, rather than the table itself. This should be slightly faster.
The most basic use of EXPLAIN
is to see where you can make better use of indexes to speed up your queries, but you might discover other approaches to speed up a query.
MySQL applies many optimization rules to queries.
MySQL uses its estimated number of rows (as shown in EXPLAIN
) to work out the best order in which to join tables. If you notice that its estimate is off, you may want to experiment with using a STRAIGHT JOIN
to force the table order. Benchmarking before and after cases will tell you whether you are helping or hindering.
To choose an index, MySQL looks for the most relevant index that spans less than 30% of the rows. If it can't find an index fitting these criteria, the table will be scanned instead. (This was what happened in the EXPLAIN
query we looked at earlier, after we added the new index.)
Expressions in WHERE
clauses are optimized in a similar way to the way many programming compilers optimize expressions. For example, unnecessary parentheses in expressions are removed. This is one reason you should feel free to make your queries more readable with parentheses.
If a query can be resolved wholly from indexes, it will be done without any reference to the actual rows in the table. Evaluation of COUNT(*)
is also evaluated without reading or counting the rows in a table because this data is stored separately.
A more comprehensive list of the optimizations MySQL performs is in the manual, so we have not reproduced it here. But even that list is incomplete. You can read through the source code if you are interested in more information about how the query optimizer works.
There are three main things you can do to optimize your queries:
• Add indexes. If you have a frequent query on an unindexed column, add an index. For more information about how MySQL uses indexing, refer to Chapter 18, “Optimizing Your Database.” Remember though that although an appropriate index might speed up operations that need to find data in the table, keeping indexes up-to-date increases the time required to write data. Do not add indexes that will not be used.
• Use ANALYZE TABLE
. (See Chapter 18 for syntax.) This updates the information MySQL stores about key distribution. This information is used to decide the order in which tables are joined. If MySQL seems to be joining your tables in a strange order, try ANALYZE TABLE
.
• Use OPTIMIZE TABLE
. (See Chapter 18 for syntax.) This defragments the table storage, sorts the indexes, and updates the table statistics as used by the query optimizer.
• Find slow queries using BENCHMARK()
or the slow query log.
• See how queries are executed using EXPLAIN
.
• Speed things up by adding indexes and recheck with EXPLAIN
.
• ANALYZE TABLE
and OPTIMIZE TABLE
will help the MySQL query optimizer to do its job well.