Managing character types is a little more
complicated. Not only do you have to worry about the minimum and
maximum string lengths, but you also have to worry about the average
size and the amount of variation. For our current purposes, an
index
is a field or combination of fields
on which you plan to search—basically, the fields in your
WHERE
clause. Indexing is, however, much more
complicated, so we will provide further details later in the chapter.
What’s important to note here is that indexing on
character fields works best when the field is a fixed length. If
there is little or, preferably, no variation in the length of your
character-based fields, then a CHAR
type is
appropriate. An example of a good candidate for a
CHAR
field is a country code. The
ISO provides a
comprehensive list of standard two-character representations of
country codes (US for the U.S., FR for
France, etc.).[3] Because these codes are always exactly two characters, a
CHAR(2)
is the
best way to maintain the country code based on the ISO representation
A value does not need to be constant length to use a
CHAR
field. It should, however, have very little
variance. Phone numbers, for example, can be stored safely in a
CHAR(13)
field even though phone number lengths
vary from nation to nation. The variance is little enough that there
is no point in making a phone number field variable in length. Keep
in mind that with a CHAR
field, no matter how big
the actual string being stored is, the field always takes up exactly
the number of characters specified as the field’s
size—no more, no less. Any difference between the length of the
text being stored and the length of the field is made up by padding
the value with spaces. While the few potential extra characters being
wasted on a subset of the phone number data is not anything to worry
about, you do not want to be wasting much more.
Variable-length text fields are appropriate for text fields with widely varying lengths. A good, common example of a field that demands a variable-length data type is a web URL. Most web addresses are relatively short (e.g., http://www.ora.com, http://www.imaginary.com, http://www.mysql.com) and consequentially do not pose problems. Occasionally, however, you will run into web addresses such as:
http://www.winespectator.com/Wine/Spectator/ |
_notes|5527293926834323221480431354?Xv11=&Xr5=&Xv1=&type-region- |
search-code=&Xa14=flora+springs&Xv4= |
If you construct a CHAR
field large enough to hold
this URL, you will be wasting a significant amount of space for
almost every other URL being stored. A variable-length field lets you
define a field length that can store the odd, long-length value while
not wasting all that space for the common, short-length values.
Variable-length text fields in MySQL use only as much space as
necessary to store an individual value into the field. A
VARCHAR(255)
field
that holds the string “hello
world,” for example, takes up only 12 bytes (1 byte
for each character plus an extra byte to store the length).
MySQL varies from the ANSI standard by not padding
VARCHAR
fields. Any extra spaces are removed from
a value before it is stored.
You cannot store strings with lengths greater than the field length
you have specified. With a VARCHAR(4)
field, you
can store at most a string with four characters. If you attempt to
store the string “happy birthday,”
MySQL will truncate the string to
“happ.” The downside is that there
is no way to store the odd string that exceeds your designated field
size. Table 3-2 shows the
storage space required by the different
text data types to store the 144-character Wine Spectator URL shown
earlier, the space required to store an average-sized 30-character
URL, and the maximum string size for that data type.
In this table, note that storage requirements grow 1 byte at a time
for the variable-length types of MEDIUMTEXT
and
LONGTEXT
. This growth is due to the space required
to store the size in variable-length fields. TEXT
uses an extra byte to store the potentially greater length of the
text it contains. Similarly, MEDIUMTEXT
uses an
extra 2 bytes over VARCHAR
, and
LONGTEXT
uses an extra 3 bytes.
If after years of uptime with your database, you find that the world
has changed and a field that once comfortably existed as a
VARCHAR(25)
must now hold strings as long as 30
characters, you are not out of luck. MySQL provides a command called
ALTER TABLE
that enables you to redefine a field
type without losing any data:
ALTER TABLEmytable
MODIFYmycolumn
LONGTEXT
[3] States and provinces do not work the
same way in internationalized applications. If you want to write an
application that works in an international environment, make the
columns for state and province codes CHAR(3)
,
because Australia uses three-character state codes. Also note that
there is a three-character ISO country-code standard.