This last chapter will focus on the different tools the SQLite platform provides to help you analyze your app's databases. Except for the ANALYZE statement, which can be executed with the sqlite3_exec command, the other tools are external software programs. Specifically, in this chapter we will discuss and explore, through examples, the following technologies:
The ANALYZE statement
The sqldiff tool
The sqlite3_analyzer tool
Other than the ANALYZE statement, which we will explore in Swift, all the other tools are external tools that can help you analyze your databases for support and/or for development optimization.
The Analyze Statement
The role of the ANALYZE statement is to gather information on tables and indexes in a database through statistics. SQLite accomplishes this task by creating a sqlite_stats1 table in your app's database when the ANALYZE statement is executed. SQLite passes this information to the SQLite Query Optimizer, which in turn uses the collected information to use the best query algorithm for the best performance.
If you build the database or enable SQLITE3_ENABLE_STAT3 or SQLITE3_ENABLE_STAT4, additional histogram information is gathered and stored in the sqlite3_stat3 and sqlite3_stat4 tables respectively.
To run the ANALYZE functionality in a database, you simply need to execute the ANALYZE statement, as follows:
ANALYZE schema
This will create a stat1 table for the whole database. If you only want to target a table, you could issue a command like:
ANALYZE schema.tablename
Of course, you can also build a stat1 table for an index in a table using the following command:
ANALYZE schema with or without the schema prefix:
ANALYZE schema.indexname
or,
ANALYZE indexname
In Swift, the ANALYZE statement can be executed like any other query, as follows:
func analyzeDatabase(_ name:String){
let sql:String = "ANALYZE Chinook_Sqlite.sqlite"
if(sqlite3_open(dbPath.path!, &db)==SQLITE_OK){
if(sqlite3_prepare_v2(db, sql.cString(using: String.Encoding.utf8)!, -1, &sqlStatement, nil)==SQLITE_OK){
while(sqlite3_step(sqlStatement)==SQLITE_ROW){
let output = String(cString: UnsafePointer<Int8>(sqlite3_column_text(sqlStatement, 0)))
print(output)
}
}
}
sqlite3_close(db)
}
The sqldiff Tool
The sqldiff utility is an external application from SQLite. You can download it along with SQLite3_Analyzer—and SQLite, for that matter–from the download page at www.sqlite.org . Look for the zip package for OSX x86.
You only need to unzip the compressed file to a handy directory on your OSX machine. These tools are equally available for Windows and Linux. The instructions that follow are for OSX only; however, I am sure they work the same way on Windows and Linux.
The sqldiff utility is used to compare two SQLite databases and generates a SQL script to convert the source database (database1) into the target database (database2). The sqldiff tool is easy to use. A sample output is shown in Figure 12-1. From a Terminal window, navigate to the directory where you unzipped the utility, and sqldiff will issue the following command:
Figure 12-1. sqldiff output to standard output
sqldiff database1.sqlite databse2.sqlite
To illustrate, I will run the utility on the Chinook_SQLite.sqlite database that I used for the backups. The second database is DbToBackup.sqlite, which is another empty database I created for the backups.
The utility has a number of options that you can use as well. For instance:
--changeset FILE
--lib or L
--primaryKey
--schema
--summary
--table TABLENAME
--transaction
--vtab
The –changeset option re-directs the output to a file. The –lib option loads a user-defined library prior to comparing the databases, like collating_sequences, for instance. If you prefer using the primary key in a table instead of the rowed, then–primaryKey is the way to go. Using the –schema flag, you can show the differences in the schema only, excluding the content. To pinpoint the changes that have occurred in two tables, use –summary. However, the actual changes won’t be displayed. Using the –table option allows you to compare specific table content. The –transaction option allows you to generate one large transaction for the complete operation. Finally, the –vtab choice works with virtual tables like FTS3, FTS5, and rtree tables.
The sqldiff utility works by comparing pairwise rowids, unless you are using the –primaryKey option. The output is generated as updates if the content is in similar tables. If the two databases have distinct tables, then the source tables are dropped (DELETE) and new ones created, along with using INSERT to insert the content.
There are, of course, some limitations with the current version of the tool. For instance, the utility only works on tables, and rowids must be accessible, unless you are using the –primaryKey option. Also, the content of virtual tables isn’t compared unless it results in a physical table. However, using the tool in this capacity can create corrupted databases.
The sqlite3_analyzer tool
The sqlite3_analyzer is a handy tool used to measure the effective use of the space in the tables in the target database. Like the sqldiff tool before, the sqlite3_analyzer is a command-line tool that is included in the same download package as the sqldiff tool.
The utility generates a text (ASCII)-based report in a text file. It is a human-readable file. To demonstrate, I will run the tool with the Chinook_Sqlite.sqlite database file from before.
From a Terminal window, navigate to the directory where the sqlite3_analyzer tool is situated and issue the following command:
sqlite3_analyzer database.sqlite
For example, here is the output of the DbToBackup.sqlite database:
Last login: Mon Sep 5 17:04:01 on ttys001
Kevins-MacBook-Air:∼ kevinlanguedoc$ /Users/kevinlanguedoc/Documents/sqlitetools/sqlite3_analyzer /Users/kevinlanguedoc/Documents/sqlitetools/DbToBackup.sqlite
/** Disk-Space Utilization Report For /Users/kevinlanguedoc/Documents/sqlitetools/DbToBackup.sqlite
Page size in bytes................................ 32768
Pages in the whole file (measured)................ 4
Pages in the whole file (calculated).............. 4
Pages that store data............................. 4 100.0 %
Pages on the freelist (per header)................ 0 0.0 %
Pages on the freelist (calculated)................ 0 0.0 %
Pages of auto-vacuum overhead..................... 0 0.0 %
Number of tables in the database.................. 4
Number of indices................................. 0
Number of defined indices......................... 0
Number of implied indices......................... 0
Size of the file in bytes......................... 131072
Bytes of user payload stored...................... 0 0.0 %
*** Page counts for all tables with their indices *****************************
CARS.............................................. 1 25.0 %
SQLITE_MASTER..................................... 1 25.0 %
SQLITE_SEQUENCE................................... 1 25.0 %
SQLITE_STAT1...................................... 1 25.0 %
*** Page counts for all tables and indices separately *************************
CARS.............................................. 1 25.0 %
SQLITE_MASTER..................................... 1 25.0 %
SQLITE_SEQUENCE................................... 1 25.0 %
SQLITE_STAT1...................................... 1 25.0 %
*** All tables ****************************************************************
Percentage of total database...................... 100.0 %
Number of entries................................. 3
Bytes of storage consumed......................... 131072
Bytes of payload.................................. 296 0.23 %
Average payload per entry......................... 98.67
Average unused bytes per entry.................... 43543.67
Maximum payload per entry......................... 141
Entries that use overflow......................... 0 0.0 %
Primary pages used................................ 4
Overflow pages used............................... 0
Total pages used.................................. 4
Unused bytes on primary pages..................... 130631 99.66 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 130631 99.66 %
*** Table CARS ****************************************************************
Percentage of total database...................... 25.0 %
Number of entries................................. 0
Bytes of storage consumed......................... 32768
Bytes of payload.................................. 0 0.0 %
B-tree depth...................................... 1
Average payload per entry......................... 0.0
Average unused bytes per entry.................... 0.0
Maximum payload per entry......................... 0
Entries that use overflow......................... 0
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 32760 99.976 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 32760 99.976 %
*** Table SQLITE_MASTER *******************************************************
Percentage of total database...................... 25.0 %
Number of entries................................. 3
Bytes of storage consumed......................... 32768
Bytes of payload.................................. 296 0.90 %
B-tree depth...................................... 1
Average payload per entry......................... 98.67
Average unused bytes per entry.................... 10783.67
Maximum payload per entry......................... 141
Entries that use overflow......................... 0 0.0 %
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 32351 98.7 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 32351 98.7 %
*** Table SQLITE_SEQUENCE *****************************************************
Percentage of total database...................... 25.0 %
Number of entries................................. 0
Bytes of storage consumed......................... 32768
Bytes of payload.................................. 0 0.0 %
B-tree depth...................................... 1
Average payload per entry......................... 0.0
Average unused bytes per entry.................... 0.0
Maximum payload per entry......................... 0
Entries that use overflow......................... 0
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 32760 99.976 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 32760 99.976 %
*** Table SQLITE_STAT1 ********************************************************
Percentage of total database...................... 25.0 %
Number of entries................................. 0
Bytes of storage consumed......................... 32768
Bytes of payload.................................. 0 0.0 %
B-tree depth...................................... 1
Average payload per entry......................... 0.0
Average unused bytes per entry.................... 0.0
Maximum payload per entry......................... 0
Entries that use overflow......................... 0
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 32760 99.976 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 32760 99.976 %
*** Definitions ***************************************************************
Page size in bytes
The number of bytes in a single page of the database file.
Usually 1024.
Number of pages in the whole file
The number of 32768-byte pages that go into forming the complete
database
Pages that store data
The number of pages that store data, either as primary B*Tree pages or
as overflow pages. The number at the right is the data pages divided by
the total number of pages in the file.
Pages on the freelist
The number of pages that are not currently in use but are reserved for
future use. The percentage at the right is the number of freelist pages
divided by the total number of pages in the file.
Pages of auto-vacuum overhead
The number of pages that store data used by the database to facilitate
auto-vacuum. This is zero for databases that do not support auto-vacuum.
Number of tables in the database
The number of tables in the database, including the SQLITE_MASTER table
used to store schema information.
Number of indices
The total number of indices in the database.
Number of defined indices
The number of indices created using an explicit CREATE INDEX statement.
Number of implied indices
The number of indices used to implement PRIMARY KEY or UNIQUE constraints
on tables.
Size of the file in bytes
The total amount of disk space used by the entire database files.
Bytes of user payload stored
The total number of bytes of user payload stored in the database. The
schema information in the SQLITE_MASTER table is not counted when
computing this number. The percentage at the right shows the payload
divided by the total file size.
Percentage of total database
The amount of the complete database file that is devoted to storing
information described by this category.
Number of entries
The total number of B-Tree key/value pairs stored under this category.
Bytes of storage consumed
The total amount of disk space required to store all B-Tree entries
under this category. This is the total number of pages used times
the pages size.
Bytes of payload
The amount of payload stored under this category. Payload is the data
part of table entries and the key part of index entries. The percentage
at the right is the bytes of payload divided by the bytes of storage
consumed.
Average payload per entry
The average amount of payload on each entry. This is just the bytes of
payload divided by the number of entries.
Average unused bytes per entry
The average amount of free space remaining on all pages under this
category on a per-entry basis. This is the number of unused bytes on
all pages divided by the number of entries.
Non-sequential pages
The number of pages in the table or index that are out of sequence.
Many filesystems are optimized for sequential file access, so a small
number of non-sequential pages might result in faster queries,
especially for larger database files that do not fit in the disk cache.
Note that after running VACUUM, the root page of each table or index is
at the beginning of the database file and all other pages are in a
separate part of the database file, resulting in a single non-
sequential page.
Maximum payload per entry
The largest payload size of any entry.
Entries that use overflow
The number of entries that use one or more overflow pages.
Total pages used
This is the number of pages used to hold all information in the current
category. This is the sum of index, primary, and overflow pages.
Index pages used
This is the number of pages in a table B-tree that hold only key (rowid)
information and no data.
Primary pages used
This is the number of B-tree pages that hold both key information and data.
Overflow pages used
The total number of overflow pages used for this category.
Unused bytes on index pages
The total number of bytes of unused space on all index pages. The
percentage at the right is the number of unused bytes divided by the
total number of bytes on index pages.
Unused bytes on primary pages
The total number of bytes of unused space on all primary pages. The
percentage at the right is the number of unused bytes divided by the
total number of bytes on primary pages.
Unused bytes on overflow pages
The total number of bytes of unused space on all overflow pages. The
percentage at the right is the number of unused bytes divided by the
total number of bytes on overflow pages.
Unused bytes on all pages
The total number of bytes of unused space on all primary and overflow
pages. The percentage at the right is the number of unused bytes
divided by the total number of bytes.
*******************************************************************************
The entire text of this report can be sourced into any SQL database
engine for further analysis. All of the text above is an SQL comment.
The data used to generate this report follows:
*/
BEGIN;
CREATE TABLE space_used(
name clob, -- Name of a table or index in the database file
tblname clob, -- Name of associated table
is_index boolean, -- TRUE if it is an index, false for a table
is_without_rowid boolean, -- TRUE if WITHOUT ROWID table
nentry int, -- Number of entries in the BTree
leaf_entries int, -- Number of leaf entries
depth int, -- Depth of the b-tree
payload int, -- Total amount of data stored in this table or index
ovfl_payload int, -- Total amount of data stored on overflow pages
ovfl_cnt int, -- Number of entries that use overflow
mx_payload int, -- Maximum payload size
int_pages int, -- Number of interior pages used
leaf_pages int, -- Number of leaf pages used
ovfl_pages int, -- Number of overflow pages used
int_unused int, -- Number of unused bytes on interior pages
leaf_unused int, -- Number of unused bytes on primary pages
ovfl_unused int, -- Number of unused bytes on overflow pages
gap_cnt int, -- Number of gaps in the page layout
compressed_size int -- Total bytes stored on disk
);
INSERT INTO space_used VALUES('sqlite_master','sqlite_master',0,0,3,3,1,296,0,0,141,0,1,0,0,32351,0,0,32768);
INSERT INTO space_used VALUES('sqlite_stat1','sqlite_stat1',0,0,0,0,1,0,0,0,0,0,1,0,0,32760,0,0,32768);
INSERT INTO space_used VALUES('cars','cars',0,0,0,0,1,0,0,0,0,0,1,0,0,32760,0,0,32768);
INSERT INTO space_used VALUES('sqlite_sequence','sqlite_sequence',0,0,0,0,1,0,0,0,0,0,1,0,0,32760,0,0,32768);
COMMIT;
This analysis report provides very detailed information on the use of space in the pages that are in the database. What follows is the extract on the Cars table. The primary concern to me as a database developer is the amount of free space in the database’s table. In this case, it is at 99.976 %, which is fantastic. This means that there isn’t any bloat in the pages.
A page is a unit of storage in a database—any database, really. In this table, only one page is used. However, that database is free to use as many pages as is needed to efficiently manage itself. When there is a lot of activity from inserts, deletes, and updates, the database can be become bloated with unused space in the pages.
In order to regain that space and optimize the database, you can execute the VACUUM command on the database, as we have discussed before, which removes empty space. You can also re-index the tables to optimize table I/O efficiency.
*** Table CARS ****************************************************************
Percentage of total database...................... 25.0 %
Number of entries................................. 0
Bytes of storage consumed......................... 32768
Bytes of payload.................................. 0 0.0 %
B-tree depth...................................... 1
Average payload per entry......................... 0.0
Average unused bytes per entry.................... 0.0
Maximum payload per entry......................... 0
Entries that use overflow......................... 0
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 32760 99.976 %
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 32760 99.976 %
Summary
This chapter is different from the other chapters in that the focus was primarily outside of Xcode and Swift. Most of these tools that we looked at are run in the Terminal. The sqldiff lets us compare two databases and copy the schema and content from one to another. The sqlite3_analyzer tool generates a report on how the space is used in the pages in the database. We also looked at the ANALYZE statement, which creates a stat1 table to store statistics on the tables, which are then used in the SQLite Query Optimizer to select the best algorithm to use to perform various queries against a database.
I hope this book serves you well.