9.
Understanding MySQL's Table Types

In this chapter, we'll review the different table types available to the MySQL database designer. We have generally used InnoDB or MyISAM tables in the examples in this book, but there are others.

Designing a database management system, like many other design tasks, involves many compromises. For example, many database tasks should be done in a transaction-safe way, but providing for this increases time, disk, and memory requirements. The creators of MySQL have deferred some of the compromises to you, the database designer, by offering you a choice of table types. You can choose one of the transaction-safe types, if needed, for your application, or you can choose a higher performance non–transaction-safe type. In any case, you need to know what compromises you are making.

You might hear table types referred to as storage engines. This reflects the fact that some of the table types rely on large amounts of separate source code that manage their own caching, indexing, locking, and disk access. It also reflects the core of a database's purpose: It stores things.

The term transaction or transaction safe will come up a lot in this chapter. It is an important criterion to understand when selecting a table type. Examples so far in the book have involved SQL queries being executed in isolation, but for many applications, this is not what really happens.

Imagine for a moment that you had a database that contained bank account details. If you wanted to transfer $1,000 from one account to another, you would need at least two SQL queries—one to deduct $1,000 from one account, and one to add $1,000 to the other. It would be a disaster if something (such as a power failure) caused one query to complete, but the other to fail. It would be far preferable in a case like this for the two related queries to either both happen or both fail because the database must be left in a consistent state.

Transaction-safe tables allow you to specify that a set of queries is one indivisible unit of work—a transaction. The whole transaction should complete, or the database must roll back or return to the same state it was in before the transaction started.

We will cover transactions and the MySQL syntax to use them in Chapter 10, “Using Transactions with InnoDB Tables.”

The table types available in MySQL are

• ISAM

• MyISAM

• InnoDB

• BerkeleyDB (BDB)

• MERGE

• HEAP

We will look at each table type in turn, but will devote most space to the most commonly used types: MyISAM and InnoDB. InnoDB and BerkeleyDB are transaction safe. The others (ISAM, MyISAM, MERGE, and HEAP) are not.

We will also cover the special features of MyISAM tables in this chapter, specifically compressed tables and full-text searching. We will spend the whole of Chapter 10 on the special features of the InnoDB storage engine.

ISAM Tables

ISAM tables are included in MySQL purely for legacy support. Their functionality has been entirely replaced by MyISAM tables, so they will not be examined in any detail here. They are scheduled to be removed from MySQL in version 5.0.

The following code will create an ISAM table, should you want to for some reason:


create table asset
(
  assetID int not null,
  description varchar(255)
) type=ISAM;


ISAM tables offer fast but not transaction-safe storage. Most of what we will say about MyISAM tables holds true for ISAM tables, but the older ISAM tables have several limitations.

Improvements offered by MyISAM include the following:

Table portability. Tables stored on disk or backup media can be loaded onto another system running MySQL, regardless of the platform. This is not true for ISAM tables.

Support for very large tables. ISAM tables have a hard limit of 4GB. MyISAM allows tables to be as large as the underlying operating system will allow. This will be important only to some users, and it means choosing your operating system (and file system) carefully. Many file systems limit files to being 2GB in size. (Note that we can actually get around this limitation using MERGE tables.)

More efficient use of disk space. Spaces and fragmentation are reduced.

Less restricted keys. ISAM tables allow 16 keys per table and a default maximum key length of 256 bytes. MyISAM tables allow 64 keys per table and a default maximum key length of 1024 bytes.

ISAM tables should be regarded as being deprecated. They still exist, but they should not be used for new development. If you have any existing ISAM tables, you should strongly consider converting them to MyISAM. It takes very little effort and offers significant advantages.

MyISAM Tables

Many people use MySQL for years without discovering that it offers different table types. These people are using MyISAM tables because this has been the default in all recent versions.

MyISAM tables offer very fast but not transaction-safe storage. They provide high performance in most situations, even if the designer makes mistakes, and in the hands of a skilled administrator, they can handle massive and/or busy databases.

The following code will create a MyISAM table:


create table article (
  articleID int not null auto_increment primary key,
  title varchar(255),
  body text
);


The final line could optionally have been


) type=MyISAM;


but would produce the same result.

MyISAM tables can be one of three types: dynamic, static, or compressed. A table automatically becomes dynamic or static depending on the definition of its columns. Compressed tables must be deliberately created with the myisampack tool.

Tables with fixed-length rows will be created as static tables, and tables with variable-length rows will be created as dynamic tables. How can we tell whether a table has fixed- or variable-length rows?

The char and numeric types all have a fixed size. The size of varchar, text, and blob columns can vary with the size of their contents. A table with only char and numeric columns will be created as a static table, but a table containing any varchar, text, or blob columns will be dynamic.

In the section “Full-Text Searching on MyISAM Tables,” we will create an example table called article. This will be created as a dynamic table because it contains a varchar column and a text column. The storage requirement for each row in the table will therefore vary depending on the amount of data in each of these fields.

There are a number of advantages to a static table. It is faster to search than a dynamic table or a compressed table. It is very easy for the database to retrieve a particular record based on an index when each record is at a particular offset from the start of the file. It is very easy to cache. It is less likely to suffer serious corruption in the event of a crash—the repair facility can usually recover all rows except the damaged one.

The disadvantage to static tables is that forcing real data to fit into fixed-size columns nearly always wastes disk space. This may be a price you are willing to pay for data that varies only a little in size, such as people's names, but are unwilling to pay for data that varies a great deal in size. If you decide that nearly all employee surnames will be less than 80 characters, you may or may not be willing to waste 75 bytes each time you store a Smith.

Dynamic tables need more complex management within MySQL. It is not as straightforward a task for the engine to cache, find, or repair records. This is partly just because they vary in size, but it is also because they can become fragmented. If a row is modified and becomes larger, part of its data will remain at the original location, and part will be stored as a new fragment elsewhere in the file. This means that a segment of a file that has been cached by the operating system cannot be guaranteed to contain all parts of a row. Corruption may also be harder to fix because if fragments or links become lost, it will not be obvious which parts belong to which rows.

To repair or defragment a MyISAM table, you can use the command-line tool myisamchk or the MySQL command REPAIR TABLE. (This is covered in more detail in Chapter 13, “Administering Your Database.”) To defragment but not repair, you can use the MySQL command OPTIMIZE TABLE. (This is covered in more detail in Chapter 18, “Optimizing Your Database.”)

Compressing MyISAM Tables

Although tables become static or dynamic without your specific request (but in response to your design decisions), tables are not automatically compressed. To compress a table, you need to use the command-line program myisampack. (There is a version of this for pure ISAM tables, should you be using them, which is called pack_isam.)

Compression sounds like a positive thing, but it makes sense only for some applications because compressed tables become read-only. If you need to alter, update, or insert data in the table, you need to uncompress the entire table, make your changes, and then recompress the table.

The compression performed by myisampack includes a mixture of true compression (Huffman coding) and a set of optimizations aimed at shrinking columns, such as converting types to smaller types and converting columns to enums. Because each record is compressed separately, there is only a small overhead to be paid when decompressing a record. This may even be counterbalanced on slow devices by the reduction in data that needs to be read from disk.

Full-Text Searching on MyISAM Tables

One feature that currently comes only with MyISAM tables is full-text searching and indexing. Normal indexes are very good at finding rows where a value in the table matches a given value, but it is common to want to search for words or strings within a block of text. This is where full-text searching comes in handy.

The following SQL will create a MyISAM table with a full-text index:


create table article (
  articleID int not null auto_increment primary key,
  title varchar(255),
  body text,
  fulltext (title, body)
);


The following query will retrieve any records containing the word 'merger':


select title
from article
where match (title, body) against ('merger'),


More complicated searches are supported. The following query will retrieve records containing any of the words 'merge', 'acquisition', 'acquire', or 'takeover'.


select title from article
where match (title, body) against ('merge acquisition acquire takeover'),


Note that we are matching any record that contains at least one of the words. We are not searching for the string or for a record containing every word listed. We can do both of these types of searches using the IN BOOLEAN MODE modifier, which we will come to later.

We needed to search for 'acquire' and 'acquisitions' separately because MySQL does not currently support stemming. Stemming is a technique implemented in many other full-text search systems that recognizes sets of words as having a common stem word. For example, 'acquire' is the stem of many words such as 'acquires', 'acquired', and 'acquisition'.

Each match found is assigned a relevance value, and the results are automatically sorted into relevance order. You may want to see the relevance scores for records. The following query will retrieve an unsorted list of scores for all records. Any records with a score of zero have no similarity and will not be retrieved.


select title, match (title, body)
against ('merge acquisition acquire takeover')
as relevance
from article;


Note that relevance is not a keyword. It is just an alias for match(title, body) against ('merge acquisition acquire takeover'). We have included it so that the output will be tidier.

More usefully, the following query will retrieve article titles and scores for matched documents. Because we have the MATCH condition in the WHERE clause, the results will be sorted, and unrelated rows will be ignored.


select title, match (title, body)
against ('merge acquisition acquire takeover')
as relevance
from article
where match (title, body) against ('merge acquisition acquire takeover'),


There are certain words you cannot search for. To improve performance, some words are excluded from indexes or are ignored when searching.

Short words are not indexed. By default, words with fewer than four characters are ignored. For some installations, most famously Slashdot.org, this is a problem because three-letter acronyms are often an important part of the content in technical material. You can change this limit by altering the variable ft_min_word_len, but you will need to regenerate your indexes.

Stop words are used by full-text indexes. A stop word is a word with no semantic value. Generally, these are common words that are important for sentence construction, but are unlikely to be important parts of the content. Words like 'the', 'and', 'then', and 'soon' are necessary, but are not usually useful to search for. As well as providing standard lists, MySQL allows you to specify your own list of stop words for each human language you are indexing.

One thing to note about full-text indexing is that because it is a complex task, it has some very real performance limits. When your MySQL tables get large (say more than 1,000,000 rows), full-text search performance slows down. For small applications this should not be a problem, but for larger applications you should keep this fact in mind.

Words that are common in your data are not used when searching. If your table contains company newsletter articles for Acme PTY Ltd., it is likely that many articles would contain the word 'Acme'. Searching for this would generate many results, which is not usually a good thing. If 50% or more of your records contain a word, that word is taken to have no value when calculating relevance.

Boolean Full-Text Search

You can exert greater control over the search by using the IN BOOLEAN MODE modifier.

The following query will match only records than contain the word 'linux' and the string "Open Source", but not the word 'desktop'. The words 'Java' and 'Oracle' are optional, but when deciding on relevance, finding 'Java' in a record will improve its ranking, whereas finding 'Oracle' will degrade the ranking. The order of words in the search string or the record is not important.


select title
from article
where match (title, body)
      against ('+linux +"Open Source" -desktop Java ~Oracle' IN BOOLEAN MODE);


The full set of operators is shown in Table 9.1.

Table 9.1. Boolean Mode Search Operators

Operator

Meaning

+

This word is compulsory.

-

This word must not appear.

<

This word is less important.

>

This word is more important.

( )

Group words together as a subexpression.

~

This word may appear, but it has a negative effect on ranking.

*

Wildcard suffix. For example, merge will not match merger, but merge* will match both merge and merger. May be used only at the end of a word.

" "

This is a phrase. Matches only exactly the same content in the same order.

It is not required that you have full-text indexes to do Boolean mode searches. You can search unindexed tables this way, but it will be very slow.

Another small difference between full-text searches and Boolean searches is that when the search is done in Boolean mode, words that are common in your data are not ignored. The 50% rule does not apply. If we were searching Acme PTY Ltd. newsletter articles, the next query would probably return nearly all rows, whereas the one following that would result in an empty result set.


select title from article
where match (title, body) against ('Acme' IN BOOLEAN MODE);

select title from article
where match (title, body) against ('Acme'),


InnoDB Tables

The next table type we will discuss is InnoDB. InnoDB is a fast, transaction-safe storage engine. Its transaction capabilities are so important that we will spend the entire next chapter discussing how to use them.

InnoDB tables offer

Transactions. See Chapter 10.

Row-level locking. This means that only the row we are using in a particular query is unavailable to other users. Most of the other storage engines (with the exception of BDB) use table-level locking—that is, while one process is updating a table, it is not available to other processes.

Support for foreign keys. The examples in earlier chapters that include foreign keys would not work with other table types.

Consistent nonlocking reads in SELECTs. (The idea for this is borrowed from Oracle.)

InnoDB has its own configuration options, its own directory, and its own way of storing data. Whereas MyISAM stores one table per file, InnoDB stores all its tables and indexes in a tablespace, which means that they may be stored over multiple files. This allows InnoDB to have very large tables that will be unaffected by any operating-system file size restrictions. Note, though, that in comparison to MyISAM, an InnoDB table uses a lot more disk space to store the same records.

We will spend the next chapter, Chapter 10, discussing how to use the special features in InnoDB.

InnoDB requires a certain amount of configuration effort. We supplied a basic configuration in Chapter 1, “Installing MySQL,” and we will discuss these options in more depth in Chapter 12, “Configuring MySQL.”

InnoDB, though developed separately from most of MySQL, is available under the same dual-licensing agreement.

Many of the large, high-profile Web sites that use MySQL, such as Slashdot (www.slashdot.org), Google (www.google.com), and Yahoo! Finance (http://finance.yahoo.com), use the InnoDB engine. It is especially good for dealing with large volumes of data at high speeds in a transaction-safe environment.

InnoDB is one of the fastest transaction-safe systems in the world, but providing this safety involves sacrifices. For most usage patterns, MyISAM will be faster, but the difference will not generally be huge.

InnoDB is made by a separate company, InnoBase Oy, and it has its own Web site. For the most up-to-date information on InnoDB, go to www.innodb.com.

BerkeleyDB (BDB) Tables

The other option you have for a transaction-safe storage engine for MySQL is BerkeleyDB (usually abbreviated to BDB). BDB is provided by Sleepycat software (www.sleepycat.com). It is not as widely used for transaction work in MySQL as InnoDB. The BDB engine has been around for a long time, but the MySQL integration is still relatively new, and the manual tells us the interface with MySQL is of “gamma” quality. That is, it's more reliable than a beta, but with the odd bug. Mind you, MySQL's idea of beta tends to be more reliable than many companies' gold releases.

BDB uses page-level locking and, as such, will give you slightly worse performance than InnoDB. It is also slow if you open a large number of tables at the same time.

All BDB tables must have a primary key (not that this is much of a problem because we would recommend this anyway). In fact, one will be created silently for you if you do not create one for yourself.

BDB tables are stored as b-trees. Compare this to most other table types that store their indexes as b-trees. This slows down table scanning (such as when you are retrieving all rows from the table or performing unindexed queries). BDB tables also take up more space on disk.

One important limitation of BDB tables is that you cannot move the stored data around on disk. This is because the data stored for a table includes the path to the data. You must also be careful when backing up BDB tables to remember to back up the log files because you will be unable to restart without them.

MERGE Tables

MERGE tables are a clever way to get around operating-system restrictions on maximum file sizes. Because each MyISAM table is stored in a single file, tables are limited in size by the maximum file size of the operating system. We get around this by creating a MERGE table, a construct that allows you to treat multiple MyISAM tables as a single table for the purpose of queries.

Let's look at an example. Code to create a MERGE table consisting of three log tables is shown in Listing 9.1.

Listing 9.1. A MERGE Table Example


create database logs; 
use logs;

create table log2003Jan
(logid int auto_increment primary key,
logts datetime,
entry char(255));
insert into log2003Jan values
(NULL, '2003-01-01', 'first jan entry'),

create table log2003Feb
(logid int auto_increment primary key,
logts datetime,
entry char(255));
insert into log2003Feb values
(NULL, '2003-02-01', 'first feb entry'), 

create table log2003Mar
(logid int auto_increment primary key,
logts datetime,
entry char(255));
insert into log2003Mar values
(NULL, '2003-03-01', 'first mar entry'),

create table logs
(logid int auto_increment primary key,
logts datetime,
entry char(255))
type = merge
union = (log2003Jan, log2003Feb, log2003Mar)
insert_method = last; 


What we have done in this listing is to create three tables that are identical in structure, log2003Jan, log2003Feb, and log2003Mar. Logging is a common application of MERGE tables, as you will see in a moment.

After inserting some test data into these three tables, we have created a MERGE table of the three. We have done this by creating a table called logs that has the same structure as the three tables to be merged and by specifying that it is of type MERGE and that it is the UNION of the three tables. We also specify the INSERT_METHOD as last. This means that if we insert data into the MERGE table, it will be added to the last table in the merge, in this case log2003Mar. The other options are FIRST (insert into the first table in the list) or NO (don't allow inserts into the MERGE table).

This gives us a table we can interact with that appears to contain all the data in the merged tables. If we run the query


select * from logs;


we will obtain the following output:

image

3 rows in set (0.01 sec)

As you can see, all the data from the three tables is represented. One really important thing to note is that although we specified logid as the primary key in the MERGE table, it is a bit different from the way primary keys normally work. Usually, they must be unique, but because the MERGE table manages three sets of primary keys, there may well be more than one row with the same primary key, as in the preceding output.

Even with the MERGE table, we can still query the component tables as usual. We cannot DROP, ALTER, DELETE FROM TABLE, REPAIR, TRUNCATE, OPTIMIZE, or ANALYZE any of the component tables. You will be able to do some of these things (DELETE FROM TABLE) if the MERGE table is not currently open. You can close it with FLUSH TABLES. The manual says that you should be able to do any of these things after a FLUSH, but we have found that this is sometimes not the case. For example, at the time of writing, dropping one of the component tables leads to the MERGE table also being silently dropped. If you need to make these sorts of changes, you may be better off dropping the MERGE table and re-creating it. Dropping the MERGE table does not affect the component tables or their data.

You can compress individual tables in the MERGE with myisampack. This is particularly useful for examples like ours, in which we are storing logfiles—we can compress the earlier months' log files because we are writing to only the most recent log.

HEAP Tables

HEAP tables are extremely fast tables that are stored wholly in memory. They use a hashed indexing scheme that is responsible for their speed.

The downside to having tables stored completely in memory is, of course, that if you have any power issues, your HEAP data is gone forever. They are, however, great for storing temporary tables.

You can create a HEAP table like this:


create table testHeap
(id int not null primary key, 
data char(100))
type=heap
max_rows = 100;


As you can see, we have specified the table type as HEAP. We have also followed a good practice guideline here by limiting the maximum number of rows in the table. If your HEAP tables grow large, you can easily run out of memory. The number of rows can also be limited by the configuration directive max_heap_table_size.

HEAP tables have a few limitations:

• They don't support AUTO_INCREMENT.

• They don't support BLOB or TEXT types.

• HEAP tables cannot use the leftmost prefix of an index to find rows. (If you would like more information about what this means, you can read more about indexing in Chapter 19.)

• Indexes will be used only to find rows with queries that use the = or <=> operators in the search clause.

Summary

• MySQL has six table types: ISAM, MyISAM, InnoDB, BDB, MERGE, and HEAP.

• Only InnoDB and BDB tables are transaction safe.

• Only MyISAM tables support full-text indexing and searching.

ISAM

• ISAM had been deprecated and superceded by MyISAM.

• ISAM tables have a hard size limit of 4GB.

• ISAM tables are not portable.

• You can have a maximum of 16 keys per table and a maximum key length of 256 bytes (characters).

MyISAM

• MyISAM is the default table type. It is very fast, but not transaction safe.

• MyISAM tables support table compression.

• The size of MyISAM tables is limited only by the operating system, and this can be worked around with MERGE tables.

• The data files that store MyISAM tables are portable from system to system.

• You can have a maximum of 64 keys per table and a maximum key length of 1024 bytes.

InnoDB

• InnoDB tables are transaction safe.

• InnoDB supports row-level locking.

• There is no theoretical maximum table size because tables may be stored in more than one file.

• InnoDB provides consistent nonlocking reads in SELECT.

• InnoDB tables are portable from system to system.

• InnoDB tables take more disk space than MyISAM tables.

• Foreign keys are supported between InnoDB tables.

BDB

• Like InnoDB tables, BDB tables are transaction safe. BDB tables are not as widely used with MySQL as InnoDB.

BDB supports page-level locking.

BDB tables are not portable.

MERGE

• MERGE tables are used to treat multiple MyISAM tables as a single table, and therefore, the maximum file size limitation is removed from MyISAM tables.

HEAP

• HEAP tables are stored only in memory and need to be limited in size to avoid running out of memory.

• Data stored in a HEAP table is volatile and will be lost in the event of a power failure.

• HEAP tables are super fast, as long as you have enough physical memory to keep them.

• HEAP tables do not support AUTO_INCREMENT, TEXT, or BLOB.

Quiz

1.

We need transactions and tables that can easily be ported to another system. We should use

a) ISAM

b) MyISAM

c) InnoDB

d) BDB

2.

We need to create a temporary table for fast lookups. We should use

a) ISAM

b) MyISAM

c) MERGE

d) HEAP

3.

We need to have extremely large tables. We should not use

a) ISAM

b) MERGE

c) InnoDB

d) any of the above

4.

We need to perform a full-text search. We should use

a) MyISAM

b) InnoDB

c) BDB

d) HEAP

5.

Consider the following full-text search query:


select title
from article
where match (title, body)
      against ('+php +pdf pdflib >tutorial ~reference' IN BOOLEAN MODE);


Which of the following statements about this query is false?

a) The importance of the search term 'tutorial' is increased.

b) The ranking of results containing 'reference' will be increased.

c) Search results must contain 'php'.

d) Search results that contain 'pdflib' will be ranked more highly than those that do not.

Exercises

For this chapter, there are no formal exercises, but you might like to experiment with creating and using tables of the different types.

Answers

Quiz

1.

c

2.

d

3.

a

4.

a

5.

b

Next

In the next chapter, “Using Transactions with InnoDB Tables,” we will discuss exactly what is meant by a transaction, why transactions are so important, and how to implement them with InnoDB tables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset