The purpose of a database is to store data. A database contains one or more tables (and other components). Tables consist of columns and rows of data. In the SAS implementation of SQL, there are two available data types: 1) numeric and 2) character.
The SAS implementation of SQL provides programmers with numerous arithmetic, statistical, and summary functions. It offers one numeric data type to represent numeric data. Columns defined as a numeric data type with the NUMERIC or NUM column definition are assigned a default length of 8 bytes, even if the column is created with a numeric length less than 8 bytes. This provides the greatest degree of precision allowed by the SAS System. In the example below, a table called PURCHASES is created consisting of two character and two numeric columns. The resulting table contains no rows of data, as illustrated by the SAS log results. For more information about the CREATE TABLE statement, see Chapter 5, “Creating, Populating, and Deleting Tables.”
PROC SQL; CREATE TABLE PURCHASES (CUSTNUM CHAR(4), ITEM CHAR(10), UNITS NUM UNITCOST NUM(8,2)); QUIT;
SAS Log Results
PROC SQL; CREATE TABLE PURCHASES (CUSTNUM CHAR(4), ITEM CHAR(10), UNITS NUM, UNITCOST NUM(8,2)); NOTE: Table PURCHASES created, with 0 rows and 4 columns. QUIT; |
Results
The CONTENTS Procedure -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ___________________________________ 1 CUSTNUM Char 4 16 2 ITEM Char 10 20 3 UNITS Num 8 0 4 UNITCOST Num 8 8 |
Use the DATA step LENGTH statement to create a column of fewer than 8 bytes. Although this action can cause precision issues, it allows for more efficient use of data storage resources. See the SAS Language Reference: Dictionary for more information. The example illustrates a DATA step that assigns smaller lengths to the two numeric variables, UNITS and UNITCOST, in the PURCHASES table. The CONTENTS output illustrates the creation of shorter length numeric variables.
DATA Step Code
DATA PURCHASES; LENGTH CUSTNUM $4. ITEM $10. UNITS 3. UNITCOST 4.; LABEL CUSTNUM = 'Customer Number' ITEM = 'Item Purchased' UNITS = '# Units Purchased' UNITCOST = 'Unit Cost'; FORMAT UNITCOST DOLLAR12.2; RUN; PROC CONTENTS DATA=PURCHASES; RUN;
SAS Log Results
DATA PURCHASES; LENGTH CUSTNUM $4. ITEM $10. UNITS 3. UNITCOST 4.; LABEL CUSTNUM = 'Customer Number' ITEM = 'Item Purchased' UNITS = '# Units Purchased' UNITCOST = 'Unit Cost'; FORMAT UNITCOST DOLLAR12.2; RUN; NOTE: Variable CUSTNUM is uninitialized. NOTE: Variable ITEM is uninitialized. NOTE: Variable UNITS is uninitialized. NOTE: Variable UNITCOST is uninitialized. NOTE: The data set WORK.PURCHASES has 1 observations and 4 variables. NOTE: DATA statement used: real time 2.80 seconds PROC CONTENTS DATA=PURCHASES; RUN; NOTE: PROCEDURE CONTENTS used: real time 1.82 seconds |
The CONTENTS Procedure -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format Label _______________________________________________ _________ 1 CUSTNUM Char 4 4 Customer Number 2 ITEM Char 10 8 Item Purchased 3 UNITS Num 3 18 # Units Purchased 4 UNITCOST Num 4 0 DOLLAR12.2 Unit Cost |
Database application processing stores date and time information in the form of a numeric data type. Date and time values are represented internally as an offset where a SAS date value is stored as the number of days from the fixed date value of 01/01/1960 (January 1, 1960). The SAS date value for January 1, 1960, is represented as 0 (zero). A date earlier than this is represented as a negative number, and a date later than this is represented as a positive number. This makes performing date calculations much easier.
The SAS System has integrated the vast array of date and time informats and formats with PROC SQL. The various informats and formats act as input and output templates and describe how date and time information is to be read or rendered on output. See the SAS Language Reference: Dictionary for detailed descriptions of the various informats and formats and their use. Numeric date and time columns, when combined with informats and/or formats, automatically validate values according to the following rules:
See Chapter 5, “Creating, Populating, and Deleting Tables” and Chapter 6, “Modifying and Updating Tables and Indexes,” for more information about date and time informats and formats.
PROC SQL provides tools to manipulate and store character data including words, text, and codes using the CHARACTER or CHAR data type. The characters allowed by this data type include the ASCII or EBCDIC character sets. The CHARACTER or CHAR data type stores fixed-length character strings consisting of a maximum of 32K characters. If a length is not specified, a CHAR column stores a default of 8 characters.
The SQL programmer has a vast array of SQL and Base SAS functions that can make the task of working with character data considerably easier. In this chapter you’ll learn how columns based on the character data type are defined and how string functions, pattern matching, phonetic matching techniques, and a variety of other techniques are used with character data.
Missing values are an important aspect of dealing with data. The concept of missing values is familiar to programmers, statisticians, researchers, and other SAS users. This section describes what null values are, what they aren’t, and how they are used.
Missing or unknown information is supported by PROC SQL in a form known as a null value. A null value is not the same as a zero value. In the SAS System, nulls are treated as a separate category from known values. A value consisting of zero has a known value. In contrast, a value of null has an unknown quantity and will never be known. For example, a patient who is given an eye exam does not have zero eyesight just because the results from the exam haven’t been received. The correct value to assign in a case like this is a missing or a null value.
In another example, say a person declines to provide their age on a survey. This person’s age is null, not zero. Essentially, this person has an age, but it is unknown. Whenever an unknown value occurs, you have no choice but to assign an unknown value – null.
Since the value of null is unknown, any arithmetic calculation using a null will return a null. This makes a lot of sense since the results of a calculation using a null are not determinable. This is sometimes referred to as the propagation of nulls because when a null value is used in a calculation or an expression, it propagates a null value. For example, if a null is added to a known value, the result is a null value.
In the SAS System, a numeric data type containing a null value (absence of any value) is represented with a period (.) This representation indicates that the column has not been assigned a value. A null value has no value and is not the same as zero. A value consisting of zero has a known quantity as opposed to a null value that is not known and never will be known.
If a null value is multiplied with a known value, the result is a null value represented with a period (.). In the next example, when UNITS and UNITCOST both have known values, their product will generate a known value, as is illustrated for the Markers purchase below.
SQL Code
PROC SQL; SELECT CUSTNUM, ITEM, UNITS, UNITCOST, UNITS * UNITCOST AS TOTAL FROM PURCHASES ORDER BY TOTAL; QUIT;
The SAS System custnum item units unitcost TOTAL 13 Markers . $0.99 . 1 Stapler 1 $8.95 8 .949997 1 Pens 12 $0.89 10.68 7 Mouse Pad 1 $11.79 11 .78999 1 Paper 4 $6.95 27 .79999 7 Pens 24 $1.59 38 .15998 1 Chair 1 $179.00 179 |
SQL provides three keywords: AS, DISTINCT, and UNIQUE to perform specific operations on the results. Each will be presented in order as follows.
In situations where data is computed using system functions, statistical functions, or arithmetic operations, a column name or header can be left blank. To prevent this from occurring, users may specify the AS keyword to provide a name to the column or heading itself. The next example illustrates using the AS keyword to prevent the name for the computed column from being assigned a temporary column name similar to: _TEMAxxx. The name assigned with the AS keyword is also used as the column header on output, as shown.
SQL Code
PROC SQL;
SELECT PRODNAME,
PRODTYPE,
PRODCOST * 0.80 AS Discount_Price
FROM PRODUCTS
ORDER BY 3;
QUIT;
The SAS System Product Name Product Type Discount_Price _____________________________________________ ______________ Analog Cell Phone Phone 28 Office Phone Phone 104 Digital Cell Phone Phone 140 Spreadsheet Software Software 239.2 Graphics Software Software 239.2 Wordprocessor Software Software 239.2 Database Software Software 319.2 Dream Machine Workstation 2560 Business Machine Workstation 2640 Travel Laptop Laptop 2720 |
In some situations, several rows in a table will contain identical column values. To select only one of each duplicate value, the DISTINCT keyword can be used in the SELECT statement as follows.
SQL Code
PROC SQL; SELECT DISTINCT MANUNUM FROM INVENTORY; QUIT;
Results
The SAS System Manufacturer Number ____________ 111 170 500 600 |
In some situations, several rows in a table will contain identical column values. To select each of these duplicate values only once, the UNIQUE keyword can be used in the SELECT statement.
SQL Code
PROC SQL; SELECT UNIQUE MANUNUM FROM INVENTORY; QUIT;
Results
The SAS System Manufacturer Number ____________ 111 170 500 600 |