Additional character sets may be configured into MySQL if they don’t require multi-byte character support or string collating routines. Adding a character set through configuration requires the following steps:
Add the new character set to the file sql/share/charsets/Index [1]
Create the configuration file for the new character set in sql/share/charsets.
Edit your configure.in file to include the character set in the next compile.
Recompile MySQL.
In this example, we will add a special character set called elvish. We first need to add it to the character set index file. The file looks like this:
$ cat sql/share/charsets/Index
# sql/share/charsets/Index
#
# This file lists all of the available character sets. Please keep this
# file sorted by character set number.
big5 1
czech 2
dec8 3
.
.
.
latin5 30
latin1_de 31
To add a new character set, simply add the character set to the end of the file with a unique index:
latin5 30 latin1_de 31 elvish 32
The next step is to create a configuration file in sql/share/charsets for your character set. You can base it on sql/share/charsets/latin1.conf.
$ cd sql/share/charsets $ cp latin1.conf elvish.conf $ vi elvish.conf
There are four
array
definitions in the configuration file. You need to edit each of these
arrays to configure your character set. A #
in the
configuration file indicates a comment.
ctype
The ctype
array[2] defines the features of each
character in the character set. It consists of 257 hexadecimal words.
Each word corresponds to a character in the character set, plus an
additional character for EOF (for legacy reasons), and is a
bitmask that defines the features of
its corresponding characters. Table 14-2 shows the
possible features. These are also defined in
include/m_ctype.h. The ctype
value for each character is the union of all the features that
describe it. For example, “A” is an
uppercase character (0001) and a hexadecimal digit (0200), so its
ctype
is 0001 + 0200 = 0201 octal. In hexadecimal,
this is 81. So ctype['A' + 1]
should contain 0x81.
to_lower
and to_upper
The to_lower
and
to_upper
arrays contain, for each character, the
corresponding upper- and lowercase character. So, for example
to_lower['A']
should contain
'a'
, and to_upper['a']
should
contain 'A'
.
sort_order
MySQL uses the
sort_order
array to determine the sort order of
characters in your character set. For character sets in which you
want the sorting to be case insensitive, this will be the same as the
to_upper
array. If the sorting rules for your
character set are too complicated to be handled by a simple table,
you will need to compile in support for your character set.
Once you have configured your character set, you are ready to compile
MySQL to include it. Before recompiling MySQL, you need to edit
configure.in and add your new character set to
CHARSETS_AVAILABLE
:
CHARSETS_AVAILABLE="big5 cp1251 cp1257 croat czech danish dec8 dos estonia euc_kr gb2312 gbk german1 greek hebrew hp8 hungarian koi8_ru koi8_ukr latin1 latin1_de latin2 latin5 sjis swe7 tis620 ujis usa7 win1250 win1251ukr elvish"
The last step is to compile MySQL:
$ make $ make install
[1] The charsets directory may have different locations depending on your installation. This file might also be share/mysql/charsets, for example..
[2] Note that the ctype
array contains 257 words while the to_lower
, to_upper
, and sort_order
arrays all contain 256 words. The ctype
array is indexed by character value +1, while the others are indexed by character value.