Understanding Large Data Types

Before we discuss large data types, let's take a brief look at the history of why they are useful. Prior to SQL Server Version 7.0, the largest character, text, or binary field allowable was 255 bytes. Of course, as applications became more sophisticated, this became a restriction. Since Version 7.0, the maximum size of these data types has expanded to 8,000 bytes. This was a large jump, but it is still a bit lacking for some application requirements. Many applications have the need to store more than 8,000 bytes of data. Some of the more common reasons are

  • Long comments

  • Detailed descriptions

  • Resumes

  • Digitized photographs or pictures

  • Audio

  • Video

These are only the most common uses for large data types (greater than 8,000 bytes). If you have a requirement for this type of storage in your database, SQL Server 2000 provides the capability for storing binary large objects (BLOBs) as large as 2 gigabytes per row. This can be done by using one of the following three data types described in Table 21.1.

Table 21.1. Large Object Data Types
Data Type Description
TEXT Variable-length (non-Unicode) data that can contain a maximum length of 2,147,483,647 characters
NTEXT A variable-length Unicode field that can contain a maximum length of 1,073,741,823
IMAGE Variable-length binary data field that can contain a maximum length of 2,147,483,647 bytes

Working with BLOBs and using BLOBs are two very different actions. What you will learn in this section is how large objects are stored and how they are indexed inside a database. In the next section, you will learn how to manipulate them within your SQL scripts and how to gather information about them.

How Large Data Is Stored

When storing data in a database table, you usually do not worry about the size of the data. However, individual TEXT, NTEXT, and IMAGE data type columns can be a maximum of 2GB, which is too large to store in a single data row. In SQL Server 2000, smaller values of these data types can be stored directly in the row, but the values that are too large to fit are stored in a separate area than where the data for the other columns in the row are stored.

The 'TEXT IN ROW' option (which is set within a SQL Script) is used to specify how data contained in a TEXT, NTEXT, or IMAGE column is stored in the table. Storing text directly in the row would reduce the work performed by the server whenever this data was accessed. The server administrator uses the system stored procedure sp_Tableoption to turn on or off this option. To turn on this option for the Employees table, you would use the following syntax:

Exec sp_Tableoption 'Employees', 'text in row', 'ON'

In addition to just turning on this option, you could also specify the maximum amount of data that can be stored in the row, from 24 up to 7,000 bytes as shown:

Exec sp_Tableoption 'Orders', 'Text In Row', '1000'

Note

If you set this option 'ON' without specifying a limit, the default limit of 256 bytes will be used.


You can turn off this option by using the statement shown here:

Exec sp_tableoption 'Employees', 'text in row', 'OFF'

Although you will always work with each of these columns as if it were a single string of bytes, the data is not stored in that format. The data will be stored in a collection of 8KB pages, which are organized logically in an index-like structure to provide faster access to the data.

Data Storage Options

Although SQL Server 2000 allows the storage of small- to medium-sized TEXT, NTEXT, and IMAGE values directly in a data row, it is best to use this 'in row' feature where the data is usually read or written in one unit, and most SQL statements that access the table use the data in these large object columns. Unless the 'text in row' option is specified, any data for these three data types will be stored outside the data row. What will be stored in the row is a 16-byte text pointer that points to the root node of internal pointers for the pages that contain the large object data.

Text Not in Row Data

When the 'text in row' option is not set, any large data strings will be stored outside the data row. The server will insert a 16-byte text pointer into the data row that will point to the actual location of the large object. Although you will not see any difference in how you process these strings, the internal structure that SQL Server uses will vary slightly if there is less than 32KB of data or if there is more than 32KB of data. If the data is smaller than 32KB, the 16-byte text pointer in the data row points to a root structure that is 84 bytes long. This structure forms the root node of the tree structure that will be used to point to the different blocks that hold the actual data. The structure shown in Figure 21.1 is how the actual data row points to the root structure, which in turn points to the data blocks.

Figure 21.1. Storing BLOBs that are less than 32KB in length.


Although the data for a BLOB column is arranged logically, both the root and individual blocks are spread throughout the chain pages for the table. They are placed anywhere space is available. The size of each block is determined by the size of the data written by the application BLOB. Smaller blocks of data will be combined to fill a page. If there is less than 64 bytes of data, the entire object is stored in the root structure.

If the amount of data in a BLOB exceeds 32KB, SQL Server will start building intermediate nodes between the data blocks and the root node, as shown in Figure 21.2.

Figure 21.2. Storing BLOBs that are greater than 32KB in length.


The root structure and the data blocks are stored throughout the TEXT, NTEXT, or IMAGE pages; however, the intermediate nodes are stored in pages not shared between occurrences of BLOB columns. A page that contains intermediate nodes is not used to store any other types of data.

Text in Row Data

When the 'text in row' option is set, TEXT, NTEXT, and IMAGE strings are stored directly in the data row if

  • The length of the string is shorter than the specified limit or the default of 256 bytes if a length is not specified.

  • There is enough space available in the data row to hold the string.

When these strings are stored in the data row, the server does not need to access a separate page or set of pages to read or write the string. This means that the access to the large data columns is about as fast the standard-size character columns. However, if the string is longer than the 'text in row' option limit or the space available in the row, the pointers to the large object is stored in the row itself. These pointers are stored in the row if

  • The amount of space needed to store the pointers is smaller than the 'text in row' limit.

Note

The techniques the server uses when pointers are required for large strings that are stored in the row are the same as those when the 'text in row' option is OFF.


You should be aware of the effects that enabling this option will have on your database and performance. The effects of how the option 'text in row' is set are

  • You cannot use READTEXT, UPDATETEXT, or WRITETEXT statements with any TEXT, NTEXT, or IMAGE stored in the table. This requires you to work with the entire string at all times. You will not be able to modify portions of the string.

  • If the option is turned on after the table contains data, any existing large object columns will not be converted to in-row strings until they are updated.

  • If the option is turned off after the table contains data, the table will be locked and any in-row TEXT, NTEXT, or IMAGE strings will be immediately converted to regular TEXT, NTEXT, or IMAGE strings. The amount of processing required to perform this conversion depends on how many strings must be converted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset