This chapter shows you how to create your own databases, add and remove structures such as tables and indexes, and make choices about column types in your tables. It focuses on the syntax and features of SQL, and not the semantics of conceiving, specifying, and refining a database design; you’ll find an introductory description of database design techniques in Chapter 2. To work through this chapter, you need to understand how to work with an existing database and its tables, as discussed in Chapter 3.
This chapter lists the structures in the sample sakila database; detail on how to load the database is presented in Chapter 1. If you’ve followed those instructions, you’ll already have the database available and know how to restore the database after you’ve modified its structures.
When you finish this chapter, you’ll have all the basics required to create, modify, and delete database structures. Together with the techniques you learned in Chapter 3, you’ll have the skills to carry out a wide range of basic operations. Chapters [Link to Come], [Link to Come], and [Link to Come] cover skills that allow you to do more advanced operations with MySQL.
When you’ve finished designing a database, the first practical step to take with MySQL is to create it. You do this with the ((("CREATE DATABASE statement")))CREATE DATABASE
statement. Suppose you want to create a database with the name lucy
. Here’s the statement you’d type in the monitor:
mysql> CREATE DATABASE lucy; Query OK, 1 row affected (0.10 sec)
We assume here that you know how to connect to and use the monitor, as described in [Link to Come]. We also assume that you’re able to connect as the root
user or as another user who can create, delete, and modify structures (you’ll find a detailed discussion on user privileges in [Link to Come]). Note that when you create the database, MySQL says that one row was affected. This isn’t in fact a normal row in any specific database—but a new entry added to the list that you see with SHOW DATABASES
.
Prior to 8.0, behind the scenes, MySQL created a new directory under the data directory for the new database and stored the text file db.opt that lists the database options; for example, the file might contain:
default-character-set=latin1 default-collation=latin1_swedish_ci
These particular two lines specify the default character set and collation of the new database. We’ll look at what these mean later, but you generally won’t need to know much about the db.opt file or access it directly.
MySQL 8.0 internalized this and other type of metadata files inside of the data dictionary, decreasing code complexity and performance overhead associated with a files-based approach. The most important feauter of the new approach for a DBA is the fact that DDL is now atomic: either going through or not. This wasn’t always true in earlier releases.
Once you’ve created the database, the next step is to use it—that is, choose it as the database you’re working with. You do this with the MySQL command:
mysql> USE lucy; Database changed
As discussed previously in Chapter 3, this command must be entered on one line and need not be terminated with a semicolon, though we usually do so automatically through habit. Once you’ve used the database, you can start creating tables, indexes, and other structures using the steps discussed next in “Creating Tables”.
Before we move on to creating other structures, let’s discuss a few features and limitations of creating databases. First, let’s see what happens if you create a database that already exists:
mysql> CREATE DATABASE lucy; ERROR 1007 (HY000): Can't create database lucy; database exists
You can avoid this error by adding the IF NOT EXISTS
keyword phrase to the statement:
mysql> CREATE DATABASE IF NOT EXISTS lucy; Query OK, 0 rows affected (0.00 sec)
You can see that MySQL didn’t complain, but it didn’t do anything either: the 0 rows affected
message indicates that no data was changed. This addition is useful when you’re adding SQL statements to a script: it prevents the script from aborting on error.
Let’s discuss how to choose database names and the use of character case. Database names define physical directory (or folder) names on disk. On some operating systems, directory names are case-sensitive; on others, case doesn’t matter. For example, Unix-like systems such as Linux and Mac OS X are typically case-sensitive, while Windows isn’t. The result is that database names have the same restrictions: when case matters to the operating system, it matters to MySQL. For example, on a Linux machine, LUCY
, lucy
, and Lucy
are different database names; on Windows, they refer to just one database. Using incorrect capitalization under Linux or Mac OS X will cause MySQL to complain:
mysql> select actOr.Actor_id from ACTor; ERROR 1146 (42S02): Table sakila.ACTor doesn't exist
but under Windows, this will normally work. To make your SQL machine-independent, we recommend that you consistently use lowercase names for databases (and for tables, columns, aliases, and indexes).
There are other restrictions on database names. They can be at most 64 characters in length. You also shouldn’t use MySQL reserved words—such as SELECT
, FROM
, and USE
—as names for structures; these can confuse the MySQL parser, making it impossible to interpret the meaning of your statements. There’s a way around this problem: you can enclose the reserved word with the backtick symbol (‵
) on either side, but it’s more trouble remembering to do so than it’s worth. In addition, you can’t use selected characters in the names: specifically, you can’t use the forward slash, backward slash, semicolon, and period characters, and a database name can’t end in whitespace. Again, the use of these characters confuses the MySQL parser and can result in unpredictable behavior. For example, here’s what happens when you insert a semicolon into a database name:
mysql> CREATE DATABASE IF NOT EXISTS lu;cy; Query OK, 1 row affected (0.00 sec) ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near cy at line 1
Since more than one SQL statement can be on a single line, the result is that a database lu
is created, and then an error is generated by the very short, unexpected SQL statement cy;
.
This section covers topics on table structures. We show you how to:
Understand and choose column types
Understand and choose keys and indexes
Use the proprietary MySQL AUTO_INCREMENT
feature
When you finish this section, you’ll have completed all of the basic material on creating database structures; the remainder of this chapter covers the sample sakila
database, and how to alter and remove existing structures.
For our examples in this section, we’ll assume that the database sakila
hasn’t been created. If you want to follow the examples, and you have already loaded the database, you can drop it for this section and reload it later; dropping it removes the database, tables, and all of the data, but the original is easy to restore by following the steps in Chapter 1. Here’s how you drop it temporarily:
mysql> DROP DATABASE sakila; Query OK, 4 rows affected (0.06 sec)
The DROP
statement is discussed further at the end of this chapter in “Deleting Structures”.
To begin, create the database sakila
using the statement:
mysql> CREATE DATABASE sakila; Query OK, 1 row affected (0.00 sec)
Then select the database with:
mysql> USE sakila; Database changed
We’re now ready to begin creating the tables that’ll hold our data. Let’s create a table to hold actor details. For now, we’re going to have a simplified structure, and talk about more complexity later. Here’s the statement that we use:
mysql> CREATE TABLE actor ( -> actor_id SMALLINT UNSIGNED NOT NULL DEFAULT 0, -> first_name VARCHAR(45) DEFAULT NULL, -> last_name VARCHAR(45), -> last_update TIMESTAMP, -> PRIMARY KEY (actor_id) -> ); Query OK, 0 rows affected (0.01 sec)
Don’t panic: even though MySQL reports that zero rows were affected, it’s definitely created the table:
mysql> SHOW TABLES;------------------
| Tables_in_sakila |------------------
| actor |------------------
1 row in set (0.01 sec)
Let’s consider all this in detail. The CREATE TABLE
statement has three major sections:
The CREATE TABLE
statement, which is followed by the table name to create. In this example, it’s actor
.
A list of one or more columns to add to the table. In this example, we’ve added quite a few: actor_id SMALLINT UNSIGNED NOT NULL DEFAULT 0
, first_name VARCHAR(45) DEFAULT NULL
, last_name VARCHAR(45)
, and last_update TIMESTAMP
. We’ll discuss these in a moment.
Optional key definitions. In this example, we’ve defined a single key: PRIMARY KEY (actor_id)
. We’ll discuss keys and indexes in detail later in this section.
Notice that the CREATE TABLE
component is followed by an opening parenthesis that’s matched by a closing parenthesis at the end of the statement. Notice also that the other components are separated by commas. There are other elements that you can add to a CREATE TABLE
statement, and we’ll discuss some in a moment.
Let’s discuss the column specifications. The basic syntax is as follows: __name type__ [NOT NULL | NULL] [DEFAULT __value__]
. The `name` field is the column name, and it has the same limitations as database names, as discussed in the previous section. It can be at most 64 characters in length, backward and forward slashes aren’t allowed, periods aren’t allowed, it can’t end in whitespace, and case sensitivity is dependent on the underlying operating system. The `type` defines how and what is stored in the column; for example, we’ve seen that it can be set to VARCHAR
for strings, SMALLINT
for numbers, or TIMESTAMP
for a date and time.
If you specify NOT NULL
, a row isn’t valid without a value for the column; if you specify NULL
or omit the clause, a row can exist without a value for the column. If you specify a `value` with the DEFAULT
clause, it’ll be used to populate the column when you don’t otherwise provide data; this is particularly useful when you frequently reuse a default value such as a country name. The `value` must be a constant (such as 0
, "cat"
, or 20060812045623
), except if the column is of the type TIMESTAMP
. Types are discussed in detail later in this section.
The NOT NULL
and DEFAULT
features can be used together. If you specify NOT NULL
and add a DEFAULT
value, the default is used when you don’t provide a value for the column. Sometimes, this works fine:
mysql> INSERT INTO actor(first_name) values ("John"); Query OK, 1 row affected (0.01 sec)
And sometimes it doesn’t:
mysql> INSERT INTO actor(first_name) values ("Elisabeth"); ERROR 1062 (23000): Duplicate entry 0 for key actor.PRIMARY
Whether it works or not is dependent on the underlying constraints and conditions of the database: in this example, actor_id
has a default value of 0
, but it’s also the primary key. Having two rows with the same primary-key value isn’t permitted, and so the second attempt to insert a row with no values (and a resulting primary-key value of 0
) fails. We discuss primary keys in detail later in this section.
Column names have fewer restrictions than database and table names. What’s more, they’re not dependent on the operating system: the names are case-insensitive and portable across all platforms. All characters are allowed in column names, though if you want terminate them with whitespace or include periods (or other special characters such as the semicolon), you’ll need to enclose the name with a backtick symbol (```) on either side. We recommend that you consistently choose lowercase names for developer-driven choices (such as database, alias, and table names) and avoid characters that require you to remember to use backticks. Naming the columns, as well as other database objects is something of a personal preference when starting anew, or a matter of following standards when working on an existing codebase. We recommend avoiding repeating yourself. Column name actor_first_name
is going to look redundant when table name precedes it (e.g. in a complex join query): actor.actor_first_name
or actor.first_name
. Usually, one exception is done to that: the ubiquitous id
column name should either not be used or have the table name prepended for clarity. You may use sakila
, world
, or employee
example databases to get an inspiration. Another good practice is to use the underscore character to separate words; you could use underscores or dashes, or omit the word-separating formatting altogether. However, “CamelCase” is harder to read. As with database and table names, the longest column name is 64 characters in length.
Because not everyone wants to store English strings, it’s important that a database server be able to manage non-English characters and different ways of sorting characters. When you’re comparing or sorting strings, how MySQL evaluates the result depends on the character set and collation used. Character sets define what characters can be stored; for example, you may need to store non-English characters such as ٱ or ü. A collation defines how strings are ordered, and there are different collations for different languages: for example, the position of the character ü in the alphabet is different in two German orderings, and different again in Swedish and Finnish.
In our previous string-comparison examples, we ignored the collation and character-set issue, and just let MySQL use its defaults; in versions of MySQL prior to 8.0, the default character set is latin1
, and the default collation is latin1_swedish_ci
. MySQL 8.0 changed the defaults, and now the default character set is utf8mb4
, and the default collation is utf8mb4_0900_ai_ci
. MySQL can be configured to use different character sets and collation orders at the connection, database, table, and column levels. Outputs below come from MySQL 8.0.
You can list the character sets available on your server with the SHOW CHARACTER SET
command. This shows a short description for each character set, its default collation, and the maximum number of bytes used for each character in that character set:
mysql> SHOW CHARACTER SET;----------
------------------------------------------------------
--------+ | Charset | Description | Default collation | Maxlen |----------
------------------------------------------------------
--------+ | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | binary | Binary pseudo charset | binary | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | cp866 | DOS Russian | cp866_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |----------
------------------------------------------------------
--------+ 41 rows in set (0.00 sec)
For example, the latin1
character set is actually the Windows code page 1252 that supports West European languages. The default collation for this character set is latin1_swedish_ci
, which follows Swedish conventions to sort accented characters (English is handled as you’d expect). This collation is case-insensitive, as indicated by the letters ci
. Finally, each character takes up one byte. By comparison, if you use the default utf8mb4
character set, each character would take up to four bytes of storage. Sometimes, it makes sense to change that default. For example, there’s no reason to store base64-encoded data (which, by definition, is ASCII) in utf8mb4
. With a 128 character wide column, at million rows you’re looking at approximately 350MiB of overhead on charset alone in the worst case.
Similarly, you can list the collation orders and the character sets they apply to:
mysql> SHOW COLLATION;---------------------
---------------
-------------------
------------------------
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |---------------------
---------------
-------------------
------------------------
| armscii8_bin | armscii8 | 64 | | Yes | 1 | PAD SPACE | | armscii8_general_ci | armscii8 | 32 | Yes | Yes | 1 | PAD SPACE | | ascii_bin | ascii | 65 | | Yes | 1 | PAD SPACE | | ascii_general_ci | ascii | 11 | Yes | Yes | 1 | PAD SPACE | ... | greek_bin | greek | 70 | | Yes | 1 | PAD SPACE | | greek_general_ci | greek | 25 | Yes | Yes | 1 | PAD SPACE | | hebrew_bin | hebrew | 71 | | Yes | 1 | PAD SPACE | | hebrew_general_ci | hebrew | 16 | Yes | Yes | 1 | PAD SPACE | ... | utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE | | utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 | PAD SPACE |---------------------
---------------
-------------------
------------------------
272 rows in set (0.02 sec)
You can see the current defaults on your server as follows:
mysql> SHOW VARIABLES LIKE c%;--------------------------
--------------------------------+ | Variable_name | Value |--------------------------
--------------------------------+ ... | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql-8.0/charsets/ | ... | collation_connection | utf8mb4_0900_ai_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | ...--------------------------
--------------------------------+ 21 rows in set (0.00 sec)
When you’re creating a database, you can set the default character set and sort order for the database and its tables. For example, if you want to use the utf8mb4
character set and the utf8mb4_ru_0900_as_cs
(case-sensitive) collation order, you would write:
mysql> CREATE DATABASE rose DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_ru_0900_as_cs; Query OK, 1 row affected (0.00 sec)
Usually, there’s no need to do this if you’ve installed your MySQL correctly for your language and region, and if you’re not planning on internationalizing your application. With utf8mb4
being the default since MySQL 8.0, there’s even less need to change the charset. You can also control the character set and collation for individual tables or columns, but we won’t go into the detail of how to do that here.
This section briefly describes other features of the MySQL CREATE TABLE
statement. It includes an example using the IF NOT EXISTS
feature, and a list of advanced features and where to find more about them in this book. The statement shown is the full representation of the table taken from sakila database, unlike a previous simplified example.
You can use the IF NOT EXISTS
keyword phrase when creating a table, and it works much as it does for databases. Here’s an example that won’t report an error even when the actor
table exists:
mysql> CREATE TABLE IF NOT EXISTS actor ( -> actor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT, -> first_name VARCHAR(45) NOT NULL, -> last_name VARCHAR(45) NOT NULL, -> last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -> PRIMARY KEY (actor_id), -> KEY idx_actor_last_name (last_name) -> ); ) Query OK, 0 rows affected, 1 warning (0.00 sec)
You can notice that 0 rows are affected, and a warning is reported. Let’s take a look:
mysql> SHOW WARNINGS;-------
------------------------------------
| Level | Code | Message |-------
------------------------------------
| Note | 1050 | Table actor already exists |-------
------------------------------------
1 row in set (0.01 sec)
There are a wide range of additional features you can add to a CREATE TABLE
statement, only few of which are present in this more full statement. Many of these are advanced and aren’t discussed in this book, but you can find more information in the MySQL manual under the heading “CREATE TABLE Statement.” These additional features include:
AUTO_INCREMENT
feature for numeric columnsThis feature allows you to automatically create unique identifiers for a table. We discuss it in detail later in this chapter in “The AUTO_INCREMENT Feature”.
You can add a comment to a column; this is displayed when you use the SHOW CREATE TABLE
command that we discuss later in this section.
You can tell MySQL to check whether data in one or more columns matches data in another table. For example, sakila
database has a foreign key constraint on a city_id
column of the address
table, referring to the city
table’s city_id
column. That means, it’s impossible to have an address in a city not present in the city
table. We’ll take a deeper look at foreign key constraints in [Link to Come]. Not every storage engine in MySQL supports foreign keys.
If you create a table using the keyword phrase CREATE TEMPORARY TABLE
, it’ll be removed (dropped) when the monitor connection is closed. This is useful for copying and reformatting data because you don’t have to remember to clean up. Sometimes, they are also used as an optimization to hold some intermediate data.
You can control a wide range of features of the table using table options. These include the starting value of AUTO_INCREMENT
, the way indexes and rows are stored, and options to override the information that the MySQL query optimizer gathers from the table. It’s also possible to specify generated columns, containing data like sum of two other columns, as well as indexes on such columns.
Some storage engines in MySQL allow you to specify and control what type of internal structure—such as a B-tree or hash table—MySQL uses for its indexes. You can also tell MySQL that you want a full text or spatial data index on a column, allowing special types of search.
MySQL supports different partitioning strategies, which you can select at the table creation time, as well as at a later time. We will cover partitioning later in [Link to Come].
You can check the CREATE TABLE
statement for a table using the SHOW CREATE TABLE
statement introduced in Chapter 3. This often shows you output that includes some of the advanced features we’ve just discussed; the output rarely matches what you actually typed to create the table. Here’s an example for the actor
table:
mysql> SHOW CREATE TABLE actorG Table: actor Create Table: CREATE TABLE `actor` ( `actor_id` smallint unsigned NOT NULL AUTO_INCREMENT, `first_name` varchar(45) NOT NULL, `last_name` varchar(45) NOT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`actor_id`), KEY `idx_actor_last_name` (`last_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)
We’ve reformatted the output slightly to fit better in this book. You’ll notice that the output includes content added by MySQL that wasn’t in our original CREATE TABLE
statement:
The names of the table and columns are enclosed in backticks. This isn’t necessary, but it does avoid any parsing problems that can occur through using reserved words and special characters, as discussed previously
An additional default ENGINE
clause is included, which explicitly states the table type that should be used. The setting in a default installation of MySQL is InnoDB
, so it has no effect in this example
An additional DEFAULT CHARSET=utf8mb4
clause is included, which tells MySQL what character set is used by the columns in the table. Again, this has no effect in a default installation
This section describes the column types you can use in MySQL. It explains when each should be used and any limitations it has. The types are grouped by their purpose. We’ll cover widely-used datatypes, and mention more advanced or less used types in passing. That doesn’t mean they have no use, but consider learning about them as an exercise. Most likely, you will not remember each of the data types and its particular intricacies, and that’s okay. It’s worth re-reading this chapter later, and consulting with MySQL documentation on the topic to keep your knowledge up to date.
We will start with numeric data types, and more specifically with integer types, or the types holding specific whole number. First, the two most popular integer types.
INT[(__width__)] [UNSIGNED] [ZEROFILL]
The most commonly used numeric type. Stores integer (whole number) values in the range –2,147,483,648 to 2,147,483,647. If the optional UNSIGNED
keyword is added, the range is 0 to 4,294,967,295. The keyword INT
is short for INTEGER
, and they can be used interchangeably. An INT
column requires four bytes of storage space.
INT
, as well as other integer types, has two properties specific to MySQL: optional `width` and ZEROFILL
arguments. They are not part of an SQL standard, and as of MySQL 8.0 are deprecated. Still, you will surely notice them in a lot of codebases, so we will briefly cover both of them.
The `width` parameter specifies the display width, which can be read by applications as part of the column metadata. Contrary to parameters in a similar position for other data types, this parameter has no effect on the storage characteristics of a particular integer type, and does not constrain the usable range of values. INT(4)
and INT(32)
are same for the purpose of data storage.
ZEROFILL
is an additional argument, which is used to left-pad the values with zeros up to the length, specified by the `width`. If you use ZEROFILL
, MySQL automatically adds UNSIGNED
to the declaration (since zero filling makes sense only in the context of positive numbers).
In a few applications where ZEROFILL
and `width` are useful, LPAD()
function can be used, or numbers can be stored formatted in CHAR
column.
BIGINT[(__width__)] [UNSIGNED] [ZEROFILL]
In the world of growing data sizes, having tables with count of rows in the billions is getting common. Even simple id
-type columns might need a wider range than a regular INT
provides. BIGINT
solves that problem. It is a large integer type with a signed range of -9223372036854775808 to 9223372036854775807. Unsigned BIGINT
can store numbers from 0 to 18446744073709551615. Column of this type will require eight bytes of storage.
Internally, all calculations within MySQL are done using signed BIGINT
or DOUBLE
values. The important consequence of that is that you should be extremely careful when dealing with extremely large numbers. First, unsigned big integers larger than 9223372036854775807 should only be used with bit functions. Second, if a result of a arithmetical operation is larger than 9223372036854775807, unexpected results might be observed.
For example:
mysql> CREATE TABLE test_bigint (id BIGINT UNSIGNED); mysql> INSERT INTO test_bigint VALUES (18446744073709551615); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO test_bigint VALUES (18446744073709551615-1); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO test_bigint VALUES (184467440737095516*100); ERROR 1690 (22003): BIGINT value is out of range in (184467440737095516 * 100)
Even though 18446744073709551600 is less than 18446744073709551615, since signed BIGINT
is used for multiplication internally, the out of range error is observed.
Data type SERIAL
can be used as an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE
. Unless you must optimize for data size and performance, consider using SERIAL
for your id
-like columns. Even UNSIGNED INT
runs out of range much quicker than you’d expect, and in the worst possible time.
We will only touch other integer types briefly. However, you should keep in mind, that while it’s possible to store every integer as BIGINT
, that’s wasteful in terms of storage space. Moreover, since as we discussed the `width` parameter doesn’t constrain the range of values, different integer types can be used for that purpose.
SMALLINT[(__width__)] [UNSIGNED] [ZEROFILL]
As expected from its name, this type holds a small integer, with range from -32768 to 32767 signed, and from 0 to 65535 unsigned. It takes two bytes of storage.
TINYINT[(__width__)] [UNSIGNED] [ZEROFILL]
Even smaller integer, and the smallest numeric data type. Range of this type is -128 to 127 signed and 0 to 255 unsigned. It only takes one byte of storage.
BOOL[(__width__)]
Short for BOOLEAN
, and a synonym for TINYINT(1)
. Usually, boolean types only accept two values: true or false. However, since BOOL
in MySQL is an integer type, you can store values from -128 to 127 there. 0 will be treated as false, and all nonzero values as true. It’s also possible to use special true
and false
aliases for 1 and 0 respectively.
mysql> CREATE TABLE test_bool (i BOOL); Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO test_bool VALUES (true),(false); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> INSERT INTO test_bool VALUES (1),(0),(-128),(127); Query OK, 4 rows affected (0.02 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> SELECT i, IF(i,true,false) FROM test_bool;------
----------------------+ | i | IF(i,true,false) |------
----------------------+ | 1 | true | | 0 | false | | 1 | true | | 0 | false | | -128 | true | | 127 | true |------
----------------------+ 6 rows in set (0.01 sec)
MEDIUMINT[(__width__)] [UNSIGNED] [ZEROFILL]
Another type of integer. Takes 3 bytes of storage space. Stores values in the signed range of -8388608 to 8388607, and unsigned range of 0 to 16777215.
BIT[(__M__)]
This is a special type used to store bit values. `M` specifies number of bits per value and defaults to 1 if omitted. MySQL uses a b'__value__'
syntax for the binary values.
Both DECIMAL
and NUMERIC
data types in MySQL are the same. Thus we will only describe DECIMAL
here, and everything will apply to NUMERIC
. The main difference between fixed-point types (or type, in MySQL case) and floating-point types is precision. For fixed-point types, the value retrieved is identical to the value stored; this isn’t always the case with other types that contain decimal points, such as the FLOAT
and DOUBLE
types described later. That is the most important property of the DECIMAL
data type.
DECIMAL[(__width__[,__decimals__])] [UNSIGNED] [ZEROFILL]
A commonly used numeric type. Stores a fixed-point number such as a salary or distance, with a total of `width` digits of which some smaller number are `decimals` that follow a decimal point. The maximum value of `width` is 255. For example, a column declared as price DECIMAL(6,2)
should be used to store values in the range –9999.99 to 9999.99. price DECIMAL(10,4)
would allow values like 123456.1234.
Prior to MySQL version 5.7, if you tried to store a value that’s outside this range, it would be stored as the closest value in the allowed range. For example, 100 would be stored as 99.99, and –100 would be stored as –99.99. Starting with 5.7, however, special SQL mode is set, which prohibits this, and other unsafe behaviors: STRICT_TRANS_TABLES
. Using old behavior is possible, but could result in a data loss.
SQL modes are special settings that control behavior of MySQL when it comes to queries. For example, as you can see above, they can restrict “unsafe” behavior. Other modes might affect how queries are interpreted. We will discuss SQL modes in a little more depth in [Link to Come].
The `width` is optional, and a value of 10 is assumed when this is omitted. The number of `decimals` is optional and, when omitted, a value of 0 is assumed; the maximum value of `decimals` should be two less than the value of `width`. The maximum value of `width` is 65, and `decimals` is 30.
If you’re storing only positive values, you can use the UNSIGNED
keyword as described for INT
. If you want zero padding, use the ZEROFILL
keyword for the same behavior as described for INT
. The keyword DECIMAL
has three identical, interchangeable alternatives: DEC
, NUMERIC
, and FIXED
.
Values in DECIMAL
column are stored using a binary format. This format uses four bytes for every nine digits.
In “Fixed-point types”, we discussed the fixed-point DECIMAL
type. There are two other types that support decimal points: DOUBLE
(also known as REAL
) and FLOAT
. They’re designed to store approximate numeric values rather than the exact values stored by DECIMAL
.
Why would you want approximate values? The answer is that many numbers with a decimal point are approximations of real quantities. For example, suppose you earn $50,000 per annum and you want to store it as a monthly wage. When you convert it to a per-month amount, it’s $4,166 plus 66 and 2/3rds cents. If you store this as $4,166.67, it’s not exact enough to convert to a yearly wage (since 12 multiplied by $4,166.67 is $50,000.04). However, if you store 2/3rds with enough decimal places, it’s a closer approximation. You’ll find that it is accurate enough to correctly multiply to obtain the original value in a high-precision environment such as MySQL, using only a bit of rounding. That’s where DOUBLE
and FLOAT
are useful: they let you store values such as 2/3rds or pi with a large number of decimal places, allowing accurate approximate representations of exact quantities. You can later use ROUND()
function to restore results to a given precision.
Let’s continue the previous example using DOUBLE
. Suppose you create a table as follows:
mysql> CREATE TABLE wage (monthly DOUBLE); Query OK, 0 rows affected (0.09 sec)
You can now insert the monthly wage using:
mysql> INSERT INTO wage VALUES (50000/12); Query OK, 1 row affected (0.00 sec)
And see what’s stored:
mysql> SELECT * FROM wage;----------------
| monthly |----------------
| 4166.666666666 |----------------
1 row in set (0.00 sec)
However, when you multiply it to a yearly value, you get a high precision approximation:
mysql> SELECT monthly*12 FROM wage;--------------------
| monthly*12 |--------------------
| 49999.999999992004 |--------------------
1 row in set (0.00 sec)
To get the original value back, you still need to perform a rounding with a desired precision. For example, your business might require precision to five decimal places. In this case, you could restore the original:
mysql> SELECT ROUND(monthly*12,5) FROM wage;---------------------
| ROUND(monthly*12,5) |---------------------
| 50000.00000 |---------------------
1 row in set (0.00 sec)
But precision to eight decimal places would not result in the original value:
mysql> SELECT ROUND(monthly*12,8) FROM wage;---------------------
| ROUND(monthly*12,8) |---------------------
| 49999.99999999 |---------------------
1 row in set (0.00 sec)
It’s important to understand the imprecise and approximate value of floating-point data types.
Here are the details of the DOUBLE
and FLOAT
types:
FLOAT[(width, decimals)] [UNSIGNED] [ZEROFILL]`or`FLOAT[(precision)] [UNSIGNED] [ZEROFILL]
Stores floating-point numbers. It has two optional syntaxes: the first allows an optional number of `decimals` and an optional display `width`, and the second allows an optional `precision` that controls the accuracy of the approximation measured in bits. Without parameters, the type stores small, four-byte, single-precision floating-point values; usually, you use it without providing any parameters. When `precision` is between 0 and 24, the default behavior occurs. When `precision` is between 25 and 53, the type behaves as for DOUBLE
. The `width` has no effect on what is stored, only on what is displayed. The UNSIGNED
and ZEROFILL
options behave as for INT
.
DOUBLE[(__width__, __decimals__)] [UNSIGNED] [ZEROFILL]
Stores floating-point numbers. It has one optional syntax: it allows an optional number of `decimals` and an optional display `width`. Without parameters, the type stores normal, eight-byte, double-precision floating point values; usually, you use it without providing any parameters. The `width` has no effect on what is stored, only on what is displayed. The UNSIGNED
and ZEROFILL
options behave as for INT
. The DOUBLE
type has two identical synonyms: REAL
and DOUBLE PRECISION
.
String data types are used to store text, and, less obviously, binary data. In MySQL, the string data types are CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET.
[NATIONAL] VARCHAR(__width__) [CHARACTER SET __charset_name__] [COLLATE __collation_name__]
Probably, the single most commonly used string type. VARCHAR
stores variable-length strings up to a maximum `width`. The maximum value of `width` is 65,535 characters. Most of the information applicable to this type will apply to other string types as well.
CHAR
and VARCHAR
types are very similar, but have a few important distinctions. VARCHAR
incurs one or two extra bytes of overhead to store the value of the string, depending on whether the value is smaller than or larger than 255 bytes. Note that this is different from string length in characters, as certain character might require up to 4 bytes of space. It might seem obvious then, that VARCHAR
is less efficient. However, that is not always true. As VARCHAR
stores strings of arbitrary length (up to the `width`), shorter strings stored will require less storage than CHAR
of similar `width`.
Another difference betwee CHAR
and VARCHAR
is their handling of trailing spaces. VARCHAR
retains trailing spaces up to the specified column length, and will truncate the excess, producing a warning. As will be shown later, CHAR
values are right-padded to the column length, and the trailing spaces aren’t preserved. For VARCHAR
, trailing spaces are significant, unless they are trimmed, and will count as unique values. ' ' and ' ' are different values.
TEXT
and CHAR
exhibit the same behavior. You can use BLOB
type described below, which will accept the above statement.
Both data types disallow storage of values longer than `width`, unless strict SQL mode is disabled. We don’t recommend enabling legacy behavior, as it might result in an unaccounted data loss.
Sorting and comparison of VARCHAR
, CHAR
, and TEXT
types happens according to the collation of the character set assigned. You can see that it is possible to specify character set, as well as collation for each individual string type column. It’s also possible to specify binary
character set, which effectively converts VARCHAR
into VARBINARY
. Don’t mistake binary
charset for a BINARY
attribute for a charset. The latter is a MySQL-only shorthand to specify a binary (_bin
) collation.
NATIONAL
attribute is a standard SQL way to specify that a string type column must use a predefined character set. MySQL uses utf8
as such charset. It’s important to note that MySQL versions 5.7 and 8.0 disagree on what is utf8
exactly: former using it as an alias for utf8mb3
, and latter — for utf8mb4
. Thus, it is best to not use the NATIONAL
attribute, as well as ambiguous aliases. The best practice with any text-related columns and data is to be as unambiguous and specific as possible.
[NATIONAL] CHAR[(__width__)] [CHARACTER SET __charset_name__] [COLLATE __collation_name__]
CHAR
stores a fixed-length string (such as a name, address, or city) of length `width`. If a `width` is not provided, CHAR(1)
is assumed. The maximum value of `width` is 255. As we discussed in the VARCHAR
section above, values in CHAR
columns are always stored at the specified length. Single letter stored in a CHAR(255)
column will take 255 bytes (in latin1
charset), and will be padded with spaces. The padding is removed when reading the data, unless PAD_CHAR_TO_FULL_LENGTH
SQL mode is enabled. Worth mentioning again that it means that strings stored in CHAR
will lose all of their trailing spaces.
Note that earlier, `width` of a CHAR
column was often associated with bytes. That’s not always the case now, and it’s definitely not the case by default. Multi-byte character sets, such as default utf8mb4
can result in a much larger values. InnoDB will actually encode fixed-length columns as variable-length columns, if their maximum size exceeds 768 bytes. Thus, in MySQL 8, by default, InnoDB will store CHAR(255)
as it would VARCHAR
. Let’s see a small example:
mysql> CREATE TABLE test_char_length( -> * utf8char CHAR(10) CHARACTER SET utf8mb4* -> , asciichar CHAR(10) CHARACTER SET binary -> ); Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO test_char_length VALUES (Plain text, Plain text); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO test_char_length VALUES (的開源軟體, Plain text); Query OK, 1 row affected (0.00 sec) mysql> SELECT LENGTH(utf8char), LENGTH(asciichar) FROM test_char_length;------------------
-------------------+ | LENGTH(utf8char) | LENGTH(asciichar) |------------------
-------------------+ | 10 | 10 | | 15 | 10 |------------------
-------------------+ 2 rows in set (0.00 sec)
As the values are left-aligned and right-padded with spaces, and any trailing spaces aren’t considered for CHAR
at all, it’s impossible to compare strings consisting of spaces alone. If you find yourself in a situation when that’s important, VARCHAR
is the data type to use.
BINARY[(__width__)]
and VARBINARY(__width__)
These types are very similar to CHAR
and VARCHAR
but store binary strings. Binary strings have the special binary
character set and collaction, and sorting them is dependent on the numeric values of the bytes in values stored. Instead of character strings, byte strings are stored. Remember that in VARCHAR
we described the binary
charset and BINARY
attribute. Only the binary
charset “converts” VARCHAR
or CHAR
into a respective BINARY
form. BINARY
attribute to a charset will not change the fact that character strings are stored. Unlike VARCHAR
and CHAR
, `width` here is exactly the number of bytes. When `width` is omitted for BINARY
, it defaults to 1.
Similar to CHAR
, data in the BINARY
column is padded on the right. However, that being a binary data, it’s padded using zero bytes, usually written as 0x00 or