SAS Introduction
Chapter 1 introduced SAS, the most widely used tool in the world of analytics. SAS is a software suite that can retrieve data from a variety of data sources. It can help you clean the data and perform statistical operations on it. For nontechnical users, it also provides a graphical user interface (GUI) to perform various analytics operations. The soul of SAS is its programming language, which is used by most analysts. It provides more advanced data handling and analytical capabilities than the GUI. The SAS programming language, also known as the SAS scripting language, is much easier to learn than most other programming languages such as FORTRAN, C, and Java.
In this book, you will work with the SAS programming language, not the GUI, but you are encouraged to explore the GUI on your own. This book is not intended to provide in-depth coverage of the SAS programming language; only Base SAS and the procedures required for analytics will be covered here.
SAS was originally used in statistics applications for agriculture projects. Now it’s used in the following industries: casinos, communications, education, financial services, government, health insurance, healthcare providers, hotels, insurance, life sciences, manufacturing, media, oil and gas, retail, travel, transportation, utilities, and many more.
Next you will learn some simple SAS programming steps.
In Windows, you can start SAS by selecting Start Programs like you would with any other software. The SAS folder in the Start menu programs list might show all the SAS-related software in SAS. You need to click Base SAS, that is, SAS 9.1 or SAS 9.2 or SAS 9.3, depending on the version of the SAS you have installed. Figure 2-1 shows how to start SAS in Windows.
Figure 2-1. Starting SAS in Windows
Sometimes you may get the error shown in Figure 2-2. This error pops up when your SAS license is expired, and you may have to renew the license in that case. You can renew SAS by buying a renewal license from SAS. A new SAS installation data (SID) file will be provided by SAS to make it function normally.
Figure 2-2. License expiration error in SAS
When you open the SAS program in Windows, your screen, in most cases, should look like the SAS Windows environment shown in Figure 2-3. This may depend upon the operating system of the machine you are using; Figure 2-3 shows a Microsoft Windows SAS session. This is the most usual way your SAS screen will appear. But some windows or icons might be hidden on some systems depending upon the installation procedure and settings you have.
Figure 2-3. A typical SAS Windows environment
The Five Main Windows
After opening the SAS screen, you will see many icons and windows. There are five main windows in SAS environment, as shown in Figure 2-4.
Figure 2-4. Windows in SAS environment
The five main SAS windows are the Editor window, Log window, Output window, Explorer window, and Results window. If you can’t find any of these windows, you can make them visible by using the View option from the top menu bar.
The Editor window is used for writing SAS scripts that will be used in data modeling and analysis. It’s like any other programming text editor. The editor is syntax sensitive and color codes your SAS scripts so that reading the program or identifying errors in the program is easy.
To execute the code, you use either the Submit icon on the top or select Run Submit (Figure 2-5) from the top menu bar.
Figure 2-5. Submitting a SAS program
Figure 2-6 shows the Editor window with some sample code. This code prints the prdsale table from the SAS help library. (This will be explained in detail later in this book).
proc print data=sashelp.prdsale;
run;
Figure 2-6. The SAS Editor window
This SAS code is saved in .sas format, which is the usual extension for all the SAS programming script files. You can open these code files with SAS or any other text editor.
The Log window is used for debugging. Any observations or debugging suggestions from the SAS package appear in the Log window. Specifically, the programming statements, notes, errors, or warnings associated with your program will appear in this window. Usually errors in the code are highlighted in red along with an explanation.
Figure 2-7 shows the Log window when the following program is run. There is no syntax error in this code.
proc print data=sashelp.prdsale;
run;
Figure 2-7. SAS Log window
Figure 2-8 shows the Log window with a syntax error intentionally introduced in the code.
proc printing data=sashelp.prdsale;
run;
Figure 2-8. Syntax error in SAS code
Note that the error correctly indicates that the procedure name is misspelled.
Figure 2-9 shows the Log window with a non-syntax-related error in the code.
proc print data=sashelp.prdsales;
run;
Figure 2-9. A typical non-syntax-related error
The error here correctly identifies that the data file name is misspelled and it does not exist in the SAS help library.
If you try to save the log code file, then it will get saved in .log format. Generally, log files are appended. So, you might see all the previous log information also in your current log file. You can press Ctrl+E to erase or clean the log file.
The actual program output, like print data or output data from any SAS procedure, is shown in the Output window. Only the printable output from your program will appear in this window. If the program doesn’t generate any output or if there is any syntax error in the code that caused the SAS system to stop abruptly, then the Output window might be blank.
Figure 2-10 shows the Output window when you run the following code:
proc print data=sashelp.prdsale;
run;
Figure 2-10. SAS Output window
I wrote some code for printing the prdsale data set, and the Output window shows the data records of the table. The output shows that the table contains some data for product sales with fields such as actual, predict, country, and division.
The default output is in a list or file listing format. If you try to save this output file, then it will get saved in .lst format. However, for SAS 9.3 and newer, HTML is the default option.
By default SAS generates listing output, but HTML output is a good option to see the output files in a more readable and formatted way. Broadly speaking, HTML files are easy to navigate and understand when compared to default listing files. HTML files can easily be transferred into other formats such as Excel and PowerPoint by using a simple copy and paste command. You can use HTML options in the code to create HTML output, or you can directly set the SAS options to create the HTML output for every program execution.
Here are the steps to set HTML output creation from the SAS menu environment:
Figure 2-11. Selecting Preferences
Figure 2-12. Selecting Create HTML
Figure 2-13 shows the Preferences dialog box after checking the Create HTML option.
Figure 2-13. After selecting Create HTML
The HTML output type gives you several themes to choose from. Depending on the style of your report and the theme of your business, you can pick the HTML style. There is absolutely no difference between HTML and listing output except for the format. In this book, you will be setting HTML code as the default Output window. Figure 2-14 shows the HTML output for the print code from earlier.
Figure 2-14. Typical HTML output
The Explorer window serves as an easy access point for all your files and libraries (see Figure 2-15). Libraries are the objects where data sets and other SAS files are stored. You will be mostly visiting the Explorer window to see various libraries and the files inside.
Figure 2-15. SAS Explorer window
The Results window serves as a table of contents (TOC) for the Output window, listing each part of your results in an outline form. The Results window shows both listing and HTML output files as well as the cumulative tree of results that are run in the current session (see Figure 2-16).
Figure 2-16. SAS Results window
The Explorer and Results windows are shown on the left side of the GUI, one below the other. You can toggle between the Explorer and Results windows by clicking Explorer or Results on the bottom taskbar.
Important Menu Options and Icons
In this section we discuss some important SAS menu options, such as creating, closing, and savings SAS program files, which will be used in your day-to-day working with the SAS environment.
To create a new SAS program file, Select: File New Program from the top menu bar, as shown in Figure 2-17.
Figure 2-17. Menu option to create a new program file
To open an old SAS program file, Select: File Open Program from the top menu bar, as shown in Figure 2-18.
Figure 2-18. Opening a program file
To save a SAS program file, Select: File Save or Save as from the top menu bar, as shown in Figure 2-19.
Figure 2-19. Saving a program file
Sometimes you can’t find the window you are looking for or maybe you closed one of the windows by mistake. You can use the View menu option to bring them back. Just click the View option on the top menu bar and open the desired window, as shown in Figure 2-20.
Figure 2-20. View options
The Run menu is used for submitting the whole or a selected portion of the SAS program, as shown in Figure 2-21.
Figure 2-21. Submitting a SAS program
The Solutions menu gives you access to various customized SAS solutions depending on the software options you have installed, as shown in Figure 2-22. In this book, you will be using only the simple Base SAS scripts; no customized solutions are used.
Figure 2-22. SAS Solutions options
You have now seen a small SAS print program and all the important windows that will matter to you going forward with analytics using SAS.
Figure 2-23 shows some useful shortcut icons.
Figure 2-23. Convenient shortcut icons
Here are the most useful shortcuts:
Writing and Executing a SAS Program
To start writing a SAS program, open a new Editor window. SAS programming scripts are nothing but a sequence of statements. SAS code contains statements, expressions, functions and call routines, options, and formats. SAS code is easy to write when compared to other programming languages. It is a simplified programming language with built-in programs known as SAS procedures (prewritten code). These procedures have already been written and tested in SAS. You just need to call the right procedure at the right place in your SAS script. For example, if you want to find the average of a variable, then there is no need to write the code to compute the average. Just call the SAS procedure Proc Means, which calculates the average for you. There are a few rules and some user-friendly features in SAS programs that you should know.
Here is the code for finding the average of the actual (actual sales) variable in the prdsale data set of the sashelp library:
proc means data =sashelp.prdsale;
var actual;
run;
This same code can be written as shown here (all these code samples will execute without any error):
proc means data =sashelp.prdsale;
var actual;
run;
proc
means data =sashelp.prdsale;
var actual;
run;
proc means data =sashelp.prdsale;var actual;run;
PROC MEANS Data =sashelp.prdsale;
var actual;
Run;
In this book, you will be writing simple SAS code scripts to do all the important tasks in an analytics project, such as import the data, clean the data, analyze the data, and report the results.
If you have done even a bit of programming, you are aware that you write comments in the code for documentation to explain the logic or flow involved in the code or just to remind you of something later about the code. Similarly, while writing SAS scripts, you can write comments. Most SAS scripts are small compared to conventional COBOL or Java code, which can be pages long.
There are two styles of comments you can use.
* This script does logistic regression for the price table data;
/* This script does logistic regression for the price table data */
The following is some sample code with comments.
*program to find the mean of actual sales;
PROC MEANS Data =sashelp.prdsale;
var actual;
Run;
The previous is the same as the following:
/*below program illustrates how to write a SAS code that finds the average actual sales
The dataset used here is prdsale, it is in sashelp library
MEANS is the produce that finds the averages
Var statement is used to specify the variables*/
PROC MEANS Data =sashelp.prdsale;
var actual;
Run;
Type the code shown next in your Editor window. This code prints the prdsale data set from the sashelp library. (Chapter 3 discusses libraries.) This code (sashelp.prdsale) can be viewed as a table in the database, where sashelp is the database and prdsale is a table in it.
proc print data=sashelp.prdsale;
run;
The run statement at the end tells SAS to start processing the previous statements. You can use either the submit icon or the Run Submit option menu option. It is a good habit to first select the code and then execute. If you directly submit, the SAS system will execute all the code present in the program file or the Editor window. When you execute this code, you will see some information in the output file and also in the log file. In this case, the log file shows no error, and the output screen shows the data values in the prdsale table (data set), as shown in Table 2-1.
Table 2-1. Data Values in the prdsale Table
Similarly, you can use the following code for finding the average of an actual variable:
proc means data =sashelp.prdsale;
var actual;
run;
When you execute the previous code, the log file shows no error, and you get the output shown in Table 2-2.
Table 2-2. Output of SAS Means Procedure for prdsale
Here is one more code example:
data income_data;
Input income expenses;
Cards;
1200 1000
9000 600
;
run;
Proc print data=income_data;
Run;
This generates the output shown in Table 2-3.
Tables 2-3. Print of income_data
Obs |
income |
expenses |
---|---|---|
1 |
1200 |
1000 |
2 |
9000 |
600 |
Debugging SAS Code Using a Log File
Reading the log file is an important aspect of SAS program execution. Generally, new users write the code and tend to look at the output directly, but looking at the log file is equally important. The log file has mainly three notification types: errors, warnings, and notes.
The log file for the SAS code used to print prdsale gives the following information:
31 proc print data=sashelp.prdsale;
32 run;
NOTE: Writing HTML Body file: sashtml8.htm
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 3.85 seconds
cpu time 3.79 seconds
The previous message from the log file shows that there is no sign of an error in the code and that 1,440 observations were read from the data for printing.
The log file for the SAS code used to find the mean gives the following information. The note in the log file shows 1,440 observations were read from the data set.
33 proc means data =sashelp.prdsale;
34 var actual;
35 run;
NOTE: Writing HTML Body file: sashtml9.htm
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.25 seconds
cpu time 0.09 seconds
Again, there are no errors, and the means procedure ran successfully.
The following is the log file for the third code snippet, which was used to create the print income_data:
53 data income_data;
54 Input income expenses;
55 Cards;
NOTE: The data set WORK.INCOME_DATA has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
58 ;
59 run;
60 Proc print data=income_data;
61 Run;
NOTE: Writing HTML Body file: sashtml11.htm
NOTE: There were 2 observations read from the data set WORK.INCOME_DATA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.29 seconds
cpu time 0.18 seconds
The log file doesn’t show any errors. If you deliberately write some incorrect syntax and submit it, you see an error in the log file.
The following code generates the subsequent message in the log file.
proc dataprinting data=sashelp.prdsale;
run;
40 proc dataprinting data=sashelp.prdsale;
ERROR: Procedure DATAPRINTING not found.
41 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE DATAPRINTING used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
The error is clearly mentioned.
The following is one more example of an error:
proc print data=sashelp.prdsales;
run;
42 proc print data=sashelp.prdsales;
ERROR: File SASHELP.PRDSALES.DATA does not exist.
43 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
Example for Warnings in Log File
The following code is for copying the data from the prdsale data set from the SAS help library into a new data file:
data new_data;
set sashelp.prdsale;
where actuals<1000;
run;
The following is the log file for the previous code:
14 data new_data;
15 set sashelp.prdsale;
16 where actuals<1000;
ERROR: Variable actuals is not on file SASHELP.PRDSALE.
17 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEW_DATA may be incomplete. When this step was stopped there were
0 observations and 10 variables.
WARNING: Data set WORK.NEW_DATA was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.03 seconds
The warnings show that SAS has gone ahead and created a data file with 10 variables and 0 observations.
Tips for Writing, Reading the Log File, and Debugging
Here are a few tips for writing and debugging the SAS programs, meant for beginners:
Saving SAS Files
You can use the Save or Save as option in the File menu, which will prompt you to save the SAS program file in the desired location, as shown in Figure 2-24.
Figure 2-24. Saving SAS files
Exercise
Here are some exercises to help you become more familiar with reading log files.
proc print data=sashelp.airr;run;
proc print data=sashelp.buy;run;
Conclusion
This chapter introduced you to the SAS programming environment. It discussed navigation in the SAS Windows environment, various menu options, and some shortcut icons. It also discussed writing simple SAS codes and reading log files. In the next few chapters, you will get into more details of SAS programming. Later this programming knowledge will help you in analysis, where you try to interact with SAS by writing SAS programs to analyze the data.