An index consists of one or more columns used to uniquely identify each row within a table. Operating as a SAS object containing the values in one or more columns in a table, an index is composed of one or more columns and may be defined as numeric, character, or a combination of both.
There is no rule that says a table has to have an index, but they can often make information retrieval more efficient and considerably faster.
For example, if you know a specific part number and its location from a list of thousands, then you can look up the part and find its manufacturer, cost, and location far more efficiently than if you did not know this information.
When defining an index, you should first understand the purpose the index is to serve. The most important thing to keep in mind about indexes is that they should be created only when they are absolutely needed. Too many, or unnecessary, indexes use up computer resources. An index also takes up space and has to be updated any time a DELETE, INSERT, or UPDATE is performed on rows in a table. For this reason, care should be used when deciding when and what indexes to create.
To help determine when indexes are necessary, consider existing data as well as the way the base table(s) will be used. You also need to know what queries will be used and how they will access columns of data. If an index is used to specify some order within a table, such as manufacturer number or product number in the PRODUCTS table, you should fully assess what the impact of that index will be.
Sometimes the column(s) making up an index is obvious, and other times it is not. When determining whether an index provides any value, some very important rules should be kept in mind. An index should permit the greatest flexibility so every column in a table can be accessed and displayed. You can also improve query results by assigning indexes only to those columns that have many unique values or that you use regularly in joins.
When an index is specified for one or more tables, a join process may actually occur faster. The PROC SQL processor may use an index when certain conditions permit its use. Here are a few things to keep in mind before creating an index:
If the table is small, sequential processing may be just as fast, or faster, than processing with an index
If the page count as displayed in the CONTENTS procedure is less than 3 pages, avoid creating or using an index
Do not create more indexes than you absolutely need
If the data subset for the index is not small, sequential access may be more efficient than using the index
If the percentage of matches is approximately 15% or less then an index should be used
The costs associated with an index can outweigh its performance value – an index is updated each time when rows in a table are added, deleted, or modified.
Two types of indexes can be defined and used in PROC SQL: simple and composite. When a simple index is created, it references only a single column. In contrast, a composite index references two or more columns in a table.
A simple index is specifically defined for one column in a table and must be the same name as the column. Suppose you had to create an index consisting of product type (PRODTYPE) in the PRODUCTS table. Once created, the index becomes a separate object located in the SAS library.
PROC SQL; 1 2 3 CREATE INDEX PRODTYPE ON PRODUCTS(PRODTYPE); QUIT;
SAS Log Results
PROC SQL; 1 2 3 CREATE INDEX PRODTYPE ON PRODUCTS(PRODTYPE); NOTE: Simple index PRODTYPE has been defined. QUIT; NOTE: PROCEDURE SQL used: real time 0.37 seconds |
| The simple index is assigned a name of PRODTYPE, which must be the same as the column name. |
| The simple index is defined on the PRODUCTS table. |
| The PRODTYPE column in the PRODUCTS table is designated as the column to be used by the index. |
A composite index is specifically defined for two or more columns in a table and must have a different name from the columns. Suppose you had to create an index consisting of manufacturer number (MANUNUM) and product type (PRODTYPE) located in the PRODUCTS table. You should be aware that only one composite index is allowed per set of columns, but more than one composite index is allowed. The composite index, as with the simple index, becomes a separate object located in the SAS library.
SQL Code
PROC SQL; CREATE INDEX 1 2 3 MANUNUM_PRODTYPE ON PRODUCTS(MANUNUM,PRODTYPE); QUIT;
PROC SQL; CREATE INDEX 1 2 3 MANUNUM_PRODTYPE ON PRODUCTS (MANUNUM,PRODTYPE); NOTE: Composite index MANUNUM_PRODTYPE has been defined. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
| The composite index is assigned a name of MANUNUM_PRODTYPE, which is used to represent the MANUNUM and PRODTYPE column names. |
| The composite index is defined on the PRODUCTS table. |
| The MANUNUM and PRODTYPE columns in the PRODUCTS table are designated as the columns to be used by the index. |
The UNIQUE keyword prevents the entry of a duplicate value in an index. You should use this keyword with care because there may be times when more than one occurrence of a data value in a table is necessary. When multiple occurrences of the same value appear in a table, the UNIQUE keyword is rejected and the index is not created for that particular column.
Altering the attributes of a column that contains an associated index (simple or composite) does NOT prohibit the values in the altered column from using the index. But, if a column that contains an index is dropped, then the index is also dropped. Accordingly, when a column is dropped, any data in that index is also lost.
When one or more indexes are no longer needed, the DROP INDEX statement can be used to remove them. Suppose you determine that you no longer need the composite index MANUNUM_PRODTYPE (created earlier) because processing requirements have changed. The next example illustrates a single composite index being deleted from the SAS library.
SQL Code
PROC SQL; DROP INDEX MANUNUM_PRODTYPE FROM PRODUCTS; QUIT;
SAS Log Results
PROC SQL; DROP INDEX MANUNUM_PRODTYPE FROM PRODUCTS; NOTE: Index MANUNUM_PRODTYPE has been dropped. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
According to the ANSI SQL standard, two or more indexes can also be deleted in a DROP INDEX statement. The next example illustrates the MANUNUM and PRODTYPE indexes being deleted from the SAS library in a single DROP INDEX statement.
SQL Code
PROC SQL; DROP INDEX MANUNUM, PRODTYPE FROM PRODUCTS; QUIT;
PROC SQL; DROP INDEX MANUNUM, PRODTYPE FROM PRODUCTS; NOTE: Index MANUNUM has been dropped. NOTE: Index PRODTYPE has been dropped. QUIT; NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds cpu time 0.01 seconds |