An important element in table creation is table design. Table design incorporates how tables are structured — how rows and columns are defined, how indexes are created, and how columns refer to values in other columns. Readers seeking a greater understanding in this area are encouraged to review the many references identified at the end of this book. The following overview should be kept in mind during the table design process.
When building a table it is important to devote adequate time to planning its design as well as understanding the needs that each table is meant to satisfy. This process involves a number of activities such as requirements and feasibility analysis including cost/benefit of the proposed tables, the development of a logical description of the data sources, and physical implementation of the logical data model. Once these tasks are complete, you assess any special business requirements that each table is to provide. A business assessment helps by minimizing the number of changes required to a table once it has been created.
Next, determine what tables will be incorporated into your application’s database. This requires understanding the value that each table is expected to provide. It also prevents a table of little or no importance from being incorporated into a database. The final step and one of critical importance is to define each table’s columns, attributes, and contents.
Once the table design process is complete, each table is then ready to be created with the CREATE TABLE statement. The purpose of creating a table is to create an object that does not already exist. In the SAS implementation, three variations of the CREATE TABLE statement can be specified depending on your needs:
Creates new columns
Although part of the SQL standard, the column-definition list (like the LENGTH statement in the DATA step) is a laborious and not very elegant way to create a table. The disadvantage of creating a table this way is that it requires the definition of each column’s attributes including their data type, length, informat, and format. This method is frequently used to create columns when they are not present in another table. Using this method results in the creation of an empty table (without rows). The code used to create the CUSTOMERS table appears below. It illustrates the creation of a table with column-definition lists.
PROC SQL; CREATE TABLE CUSTOMERS (CUSTNUM NUM LABEL='Customer Number', CUSTNAME CHAR(25) LABEL='Customer Name', CUSTCITY CHAR(20) LABEL='Customer''s Home City'), QUIT;
SAS Log Results
PROC SQL; CREATE TABLE CUSTOMERS (CUSTNUM NUM LABEL='Customer Number', CUSTNAME CHAR(25) LABEL='Customer Name', CUSTCITY CHAR(20) LABEL='Customer''s Home City'), NOTE: Table CUSTOMERS created, with 0 rows and 3 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.81 seconds |
Readers should be aware that the SQL procedure ignores width specifications for numeric columns. When a numeric column is defined, it is created with a width of 8 bytes, which is the maximum precision allowed by the SAS System. PROC SQL ignores numeric length specifications when the value is less than 8 bytes. To illustrate this point, a partial CONTENTS procedure output is displayed for the CUSTOMERS table below.
Results
The CONTENTS Procedure
-----Alphabetic List of Variables and
Attributes-----
# Variable Type Len Pos Label
________________________________________________
_____________
3 CUSTCITY Char 20 33
Customer's Home City
2 CUSTNAME Char 25 8
Customer Name
1 CUSTNUM Num 8 0
Customer Number |
To conserve storage space (CUSTNUM only requires maximum precision provided in 3 bytes), a LENGTH statement could be used in a DATA step to define CUSTNUM as a 3-byte column rather than an 8-byte column. A DROP= data set option is specified to delete the original CUSTNUM column (created by the CREATE TABLE statement) in the Program Data Vector or PDV.
DATA Step Code
DATA CUSTOMERS; LENGTH CUSTNUM 3.; SET CUSTOMERS(DROP=CUSTNUM); LABEL CUSTNUM = 'Customer Number'; RUN;
Results
The CONTENTS Procedure
-----Alphabetic List of Variables and
Attributes-----
# Variable Type Len Pos Label
________________________________________________
_____________
3 CUSTCITY Char 20 25
Customer's Home City
2 CUSTNAME Char 25 0 Customer
Name
1 CUSTNUM Num 3 45 Customer
Number |
Let’s look at the column-definition list used to create the PRODUCTS table.
SQL Code
PROC SQL; CREATE TABLE PRODUCTS (PRODNUM NUM(3) LABEL='Product Number', PRODNAME CHAR(25) LABEL='Product Name', MANUNUM NUM(3) LABEL='Manufacturer Number', PRODTYPE CHAR(15) LABEL='Product Type', PRODCOST NUM(5,2) FORMAT=DOLLAR9.2 LABEL='Product Cost'), QUIT;
PROC SQL; CREATE TABLE PRODUCTS (PRODNUM NUM(3) LABEL='Product Number', PRODNAME CHAR(25) LABEL='Product Name', MANUNUM NUM(3) LABEL='Manufacturer Number', PRODTYPE CHAR(15) LABEL='Product Type', PRODCOST NUM(5,2) FORMAT=DOLLAR9.2 LABEL='Product Cost'), NOTE: Table PRODUCTS created, with 0 rows and 5 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
The CONTENTS output for the PRODUCTS table shows once again that the SQL procedure ignores all width specifications for numeric columns.
Results
The CONTENTS Procedure -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format Label ________________________________________________ __________________ 3 MANUNUM Num 8 8 Manufacturer Number 5 PRODCOST Num 8 16 DOLLAR9.2 Product Cost 2 PRODNAME Char 25 24 Product Name 1 PRODNUM Num 8 0 Product Number 4 PRODTYPE Char 15 49 Product Type |
As before, to conserve storage space you can use a LENGTH statement in a DATA step to override the default 8-byte column definition for numeric columns.
DATA PRODUCTS; LENGTH PRODNUM MANUNUM 3. PRODCOST 5.; SET PRODUCTS(DROP=PRODNUM MANUNUM PRODCOST); LABEL PRODNUM = 'Product Number' MANUNUM = 'Manufacturer Number' PRODCOST = 'Product Cost'; FORMAT PRODCOST DOLLAR9.2; RUN;
Results
The CONTENTS Procedure -----Alphabetic List of Variables and Attributes----- # Variable Type Len Format Label _______________________________________________ _______________ 2 MANUNUM Num 3 Manufacturer Number 3 PRODCOST Num 5 DOLLAR9.2 Product Cost 1 PRODNUM Num 3 Product Number 4 PRODNAME Char 25 Product Name 5 PRODTYPE Char 15 Product Type |
Referencing an existing table in a CREATE TABLE statement is an effective way of creating a new table. In fact, it can be a great time-saver, because it prevents having to define each column one at a time as was shown with column-definition lists. The LIKE clause (in the CREATE TABLE statement) triggers the existing table’s structure to be copied to the new table minus any columns dropped with the KEEP= or DROP= data set (table) option. It copies the column names and attributes from the existing table structure to the new table structure. Using this method results in the creation of an empty table (without rows). To illustrate this method of creating a new table, a table called HOT_PRODUCTS will be created with the LIKE clause.
PROC SQL; CREATE TABLE HOT_PRODUCTS LIKE PRODUCTS; QUIT;
SAS Log Results
PROC SQL; CREATE TABLE HOT_PRODUCTS LIKE PRODUCTS; NOTE: Table HOT_PRODUCTS created, with 0 rows and 5 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
Readers are reminded that, as a result of executing the LIKE clause in the CREATE TABLE statement, only those columns in the existing table are copied to the new table. What this means is that the new table has zero rows of data.
Our next example illustrates how to create a new table by selecting just the columns you have an interest in. This method is not supported by the SQL ANSI standard. Suppose you want three columns (PRODNAME, PRODTYPE, and PRODCOST) from the PRODUCTS table. The following code illustrates how the KEEP= data set (table) option can be used to accomplish this. (Note that data sets can also be called tables.)
SQL Code
PROC SQL; CREATE TABLE HOT_PRODUCTS(KEEP=PRODNAME PRODTYPE PRODCOST) LIKE PRODUCTS; QUIT;
SAS Log Results
PROC SQL; CREATE TABLE HOT_PRODUCTS(KEEP=PRODNAME PRODTYPE PRODCOST) LIKE PRODUCTS; NOTE: Table HOT_PRODUCTS created, with 0 rows and 3 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
Deriving a new table from an existing table is by far the most popular and effective way to create a table. This method uses a query expression, and the results are stored in a new table instead of being displayed as SAS output. This method not only stores the column names and their attributes, but the rows of data that satisfies the query expression as well. The following example illustrates how a new table is created using a query expression.
SQL Code
PROC SQL; CREATE TABLE HOT_PRODUCTS AS SELECT * FROM PRODUCTS; QUIT;
SAS Log Results
PROC SQL; CREATE TABLE HOT_PRODUCTS AS SELECT * FROM PRODUCTS; NOTE: Table WORK.HOT_PRODUCTS created, with 10 rows and 5 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.00 seconds |
Readers may notice after examining the SAS log in the previous example that the SELECT statement extracted all rows from the existing table (PRODUCTS) and copied them to the new table (HOT_PRODUCTS). In the absence of a WHERE clause, the resulting table (HOT_PRODUCTS) contains the identical number of rows as the parent table PRODUCTS.
The power of the CREATE TABLE statement, then, is in its ability to create a new table from an existing table. What is often overlooked in this definition is the CREATE TABLE statement’s ability to form a subset of a parent table. More frequently than not, a new table represents a subset of its parent table. For this reason this method of creating a table is the most powerful and widely used. Suppose you want to create a new table called HOT_PRODUCTS containing a subset of “Software” and “Phones” product types. The following query-expression would accomplish this.
SQL Code
PROC SQL; CREATE TABLE HOT_PRODUCTS AS SELECT * FROM PRODUCTS WHERE UPCASE(PRODTYPE) IN ("SOFTWARE", "PHONE"); QUIT;
SAS Log Results
PROC SQL; CREATE TABLE HOT_PRODUCTS AS SELECT * FROM PRODUCTS WHERE UPCASE(PRODTYPE) IN ("SOFTWARE" , "PHONE"); NOTE: Table WORK.HOT_PRODUCTS created, with 7 rows and 5 columns. QUIT; NOTE: PROCEDURE SQL used: real time 0.38 seconds |
Let’s look at another example. Suppose you want to create another table called NOT_SO_HOT_PRODUCTS containing a subset of everything but “Software” and “Phones” product types. The following query-expression would accomplish this.
PROC SQL; CREATE TABLE NOT_SO_HOT_PRODUCTS AS SELECT * FROM PRODUCTS WHERE UPCASE(PRODTYPE) NOT IN ("SOFTWARE", "PHONE"); QUIT;
SAS Log Results
PROC SQL; CREATE TABLE NOT_SO_HOT_PRODUCTS AS SELECT * FROM sql.PRODUCTS WHERE UPCASE(PRODTYPE) NOT IN ("SOFTWARE", "PHONE"); NOTE: Table NOT_SO_HOT_PRODUCTS created, with 3 rows and 5 columns. QUIT; NOTE: PROCEDURE SQL used: real time 1.20 seconds |