18.
Optimizing Your Database

In Chapter 17, “Optimizing Your MySQL Server Configuration,” we discussed how you can set up your server for optimal performance. In this chapter, we'll look at how you can optimize your database tables.

Common database design guidelines and normalization usually improve performance. There are also other choices you can make when working with your database design that will make the end product faster or slower. In this chapter, we will review these choices.

We will cover the following:

• What's slow in MySQL databases?

• Making the right design choices

• Using indexes for optimization

• Using OPTIMIZE TABLE

What's Slow in MySQL Databases?

Having read the preceding chapter on server optimization, we can now turn our attention to database optimization. In the next chapter, we'll look at specific query optimization.

If you have optimized your server (and perhaps thrown some hardware at the problem), you can next consider whether any of the following issues apply to you:

Not using enough indexes. The number one cause of poor performance is using tables that have no indexes or that are without indexes on columns you are searching. This doesn't mean that you should have as many indexes as possible because that can cause the next problem on this list.

Using too many indexes. Updating a lot of indexes as data is inserted or updated takes time. If you are retrieving data, indexes are good. When you are inserting new rows, updating rows, or deleting rows, indexes are no longer your friends. When you update data, the indexes need updating too, increasing the amount of overhead you have to deal with.

Using table- and column-level privileges. If you use table-or column-level privileges for any resource, MySQL must check table and column privileges every time a user runs a query.

Making the wrong database design choices. There are ways to design the actual structure of your database to make it faster to query.

We will discuss database design for speed and indexing in the next two sections.

Making the Right Design Choices

There are various design choices you can make that will help to speed up your database performance. They are as listed here:

• Use the smallest type that data will fit in. For example, if you are storing numbers from 1 to 10, don't use INT; use TINYINT instead. The smaller your rows and tables are, the quicker they will be to search. The smaller your data is, the more rows can be cached in memory.

• Use fixed-length records where possible. If all rows in a table are the same length, it's faster for MySQL to access rows in the middle of the table. To get fixed-length rows, all the column types you use must be of fixed length. This means no VARCHAR, no TEXT, and no BLOB.

If you need to store TEXT and BLOB, you might consider denormalizing your schema to break the TEXT or BLOB fields out into a separate table.

If you are using only VARCHAR, you can consider replacing them all with CHAR. This is something of a trade-off because CHAR will occupy more space on disk, going against our first suggestion in this list.

• Declare as many columns NOT NULL as possible. If your data logically requires NULL values, then obviously you should use them. However, note that you are paying a small speed and storage space price for these, so use NOT NULL wherever possible.

• Choose the table type on a table-by-table basis. Non–transaction-safe tables (for example, MyISAM) involve a lot less overhead and are therefore faster than the transaction-safe types (InnoDB and BDB). MySQL allows you to have a mixture of table types in the one database. Choose the fastest one that can do each job. (You can find more information about the different types in Chapter 9, “Understanding MySQL's Table Types.”)

• Choose appropriate indexes. We will cover this topic in detail in the next section.

• In extreme cases, you may even consider denormalization of tables to reduce the number of joins made for common queries. Because this can make your database a nightmare to maintain, it is not generally recommended.

Indexing for Optimization

If you have been following along in this book, you should have a database that contains primary keys. This will ensure that you have at least one index per table because indexes are automatically created by MySQL for columns that are declared as PRIMARY, KEY, or UNIQUE.

If you are trying to optimize an existing database, it is worth checking whether the database has any indexes. It is a common design flaw to leave them out. The SQL command DESCRIBE will tell you what indexes a table already has.

So, what's an index, and what is it used for?

An index is like a lookup table that allows us to find specific rows in a table quickly. If an index is created on column X, we can search for particular values of column X in the fast-to-search index. The index will tell us where in the table the row containing that value can be found, so we can go directly to it.

If you do not have an index on a table, the entire table will be scanned to find rows you are looking for. Imagine trying to find a topic in this book by starting at the beginning and reading every word on every page. It is much faster to look up a topic in the index of this book and turn directly to the page you need.

Indexes in MySQL are stored as b-trees (binary trees), a data structure that is very fast for searching.

Indexes can be on a single column or can span multiple columns (just like keys). An index will be used when running a query, if the search is being performed on the following:

• A single column that has a single-column index—for example, if we index departments on departmentID and perform a query like SELECT...WHERE departmentID=n.

• A set of columns that forms a multicolumn index—for example, if we have created an index on the employee.assignment table on (clientID, employeeID, workdate) and we perform a query like SELECT...WHERE clientID=x AND employeeID=y AND workdate=z.

• A column or set of columns that forms a subset of a multicolumn index, as long as there is a leftmost prefix of the index columns—for example, with the assignment table as before, with an index on (clientID, employeeID, workdate), indexes would be used for these types of queries:

SELECT...WHERE clientID=x
SELECT...WHERE clientID=x AND employeeID=y


But, they would not be used for this type:

SELECT...WHERE employeeID=y AND workdate=z


The moral of the story is that is if you will be making frequent queries based on a column or set of columns that does not fit the preceding criteria, you should consider running a CREATE INDEX statement to create an appropriate index. Note that MySQL can use only one index per table in a single query. It cannot combine existing indexes automatically.

We will look further at this issue in Chapter 19, “Optimizing Your Queries,” when we look at the EXPLAIN statement.

ANALYZE TABLE

We can use the ANALYZE TABLE statement to review and store the key distribution in a table. MySQL stores this information and uses it to decide how to execute joins. You can run this statement by typing


analyze table tablename;


Using OPTIMIZE TABLE

Finally, we should mention the OPTIMIZE TABLE command. It has the following, very straightforward syntax:


OPTIMIZE TABLE tablename;


This is the MySQL equivalent of defragmenting your hard disk. As you work with the database, the files that store the data end up with fragments of space within the data where records were deleted or where records had to be moved because an update made them larger. This scattered space is inefficient.

You should use OPTIMIZE TABLE periodically when you have deleted a lot of data from a table or if you have inserted, deleted, or updated a lot of rows and your rows are of variable length. It will tidy up the table's disk storage, re-sort the index, and update statistics for the table.

This command works only on MyISAM and BDB tables at the time of writing.

Summary

• You can optimize your database structure by keeping data a small and fixed size, by indexing appropriately, and by choosing an appropriate table type.

• Indexes are used to find records quickly according to the index column value.

• An index will be used in a query when the query is based on the index column or columns or a leftmost prefix of the index columns.

OPTIMIZE TABLE tablename; performs housekeeping similar to defragging your disk.

Quiz

1.

Which of the following statements about indexes is true?

a) Indexes take up too much space on disk and should not be used.

b) Not having any indexes can make your queries run slowly.

c) You should index as many columns as possible.

d) None of the above.

2.

When choosing data types for columns

a) use the same ones throughout a database because this will make the database more efficient

b) use variable-sized ones wherever possible to optimize disk usage

c) use fixed-sized ones wherever possible to speed up access to data

d) none of the above

3.

When choosing storage engine types for tables

a) always use InnoDB or BDB because they are transaction safe

b) always use MyISAM because it is the fastest

c) use a mix of types depending on what you need in a table

d) none of the above

4.

An index will not be used if a query

a) uses a leftmost prefix of the index columns

b) uses all the index columns in a different order than in the index

c) uses all the index columns in the same order as in the index

d) uses a rightmost prefix of the index columns

5.

Use OPTIMIZE TABLE

a) when you have just created a table, to improve its structure

b) when using InnoDB tables, to reduce overhead

c) after a lot of SELECT statements have been run on a table

d) after a lot of DELETE statements have been run on a table

Exercises

Which of the following queries on the employee database would use indexes? (You can check Chapter 4, “Creating Databases, Tables, and Indexes,” for the exact structure of the database. Or, if you have it installed, use describe to see which columns are indexed.)

a)


select *
from employee
where departmentID=128;


b)


select employeeID
from assignment
where clientID=1;


c)


select skill, count(skill)
from employeeSkills
group by skill;


If these three queries were going to be run frequently, what additional indexes would you create?

Answers

Quiz

1.

b

2.

c

3.

c

4.

b

5.

d

Exercises

Only b) uses an index.

You could consider indexing employee.departmentID and employeeSkills.skill.

Next

In the next (and final) chapter, “Optimizing Your Queries,” we will discuss how to work out when particular queries are running slowly and why, as well as how to solve the problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset