Chapter 7: Performance Tuning in DS2

7.1 Introduction

7.2 DS2_OPTIONS Statement

7.2.1 TRACE Option

7.3 Analyzing Performance with the SAS Log

7.3.1 Obtaining Performance Statistics

7.3.2 Analyzing Performance Statistics

7.3.3 Tuning Your Code

7.4 Learning and Troubleshooting Resources

7.4.1 SAS Learning Resources

7.4.2 SAS Support Communities

7.4.3 SAS Technical Support

7.5 Review of Key Concepts

7.6 Connecting with the Author

7.1 Introduction

In this chapter, we’ll discuss factors that can affect performance in our DS2 programs, tools to help determine the cause of performance issues, and programming techniques that we can use to tune the performance of our DS2 programs. Along the way, we’ll identify some best practices for DS2 programmers.  

Specifically, we will cover these points:

   using the DS2_OPTIONS statement with the TRACE option

   using the SAS log to gather and analyze performance data

   finding reference and troubleshooting resources

   SAS product documentation

   Communities.sas.com

   SAS Technical Support

7.2 DS2_OPTIONS Statement

7.2.1 TRACE Option

If you are experienced with using SAS to access data from relational databases, you have probably used the SASTRACE system option. SASTRACE enables us to see exactly what SQL code was being sent to the database and to determine the distribution of processing between the database and the SAS compute platform.

If you are conducting in-database processing with DS2, you will note that the SASTRACE option has no effect on the output to the log. However, if you are executing your DS2 program in the database, a DS2_OPTIONS statement specifying the TRACE option just before the data program step will produce a plethora of information in the SAS log concerning the details of the in-database processing:

proc ds2 ds2accel=yes;

ds2_options trace;

data db_data.test/overwrite=yes;

   dcl thread t t;

   method run();

     set from t;

   end;

enddata;

run;

quit;

SAS Log:

NOTE: Running THREAD program in-database

NOTE: Running DATA program in-database

NOTE: TKIOG Publish Create:

NOTE: TKIOG Publish Add:

NOTE: XOG: SQL query prior to symbol resolution

NOTE: order_fact

NOTE: XOG: SQL query after symbol resolution

NOTE: order_fact

NOTE: Invoking SASEP MapReduce

NOTE: Hadoop Job (HDP_JOB_ID), job_1455231954269_0010, SAS

      Map/Reduce Job,

      http://my.hadoop.server.com/proxy/application_1455231

      954269_0010/

NOTE: Hadoop Version           User

NOTE: 2.5.0-cdh5.3.4           student       

NOTE:

NOTE: Started At               Finished At

NOTE: Feb 13, 2016 9:55:07 PM  Feb 13, 2016 9:55:42 PM

NOTE:

NOTE: Run Query: DROP TABLE test

NOTE: Run Query: CREATE TABLE test (`total_sales`

      double,`total_spend` double,`customer_id`

      bigint,`employee_id` bigint,`street_id`

      bigint,`order_date` date,`delivery_date` date,`order_id`

      bigint,`order_type` tinyint,`product_id`

      bigint,`quantity` int,`total_retail_price`

      double,`costprice_per_unit` double,`discount` double)

NOTE: Run Query: LOAD DATA INPATH

      '/tmp/sasds2_yi5zio47gtpc65y_/sasds2ip_1771037706output'

      OVERWRITE INTO TABLE test

7.3 Analyzing Performance with the SAS Log

7.3.1 Obtaining Performance Statistics

The default option settings in SAS provide a modicum of performance reporting in the SAS log. Here is an example of the default performance statistics reported under Linux (the operating system used by SAS University Edition):

Example of standard performance statistics in the SAS log:

NOTE: PROCEDURE DS2 used (Total process time):

      real time           2.35 seconds

      cpu time            0.17 seconds;

You will notice that SAS produces performance statistics only after the QUIT statement for a PROC DS2 session, so if you have included several DS2 steps within the PROC DS2 boundaries, you get only aggregate results for the entire process. For troubleshooting and tuning, try placing each DS2 program in its own PROC DS2 session to obtain more granular results.

When your program is performing poorly, the problem usually boils down to one or more of these factors:

   CPU

   memory

   I/O

   network latency

When tracking down performance issues, or trying to tune your code, setting OPTIONS FULLSTIMER can provide more detailed information that can help you narrow down the areas where the system might be having problems. You can then try writing code in different ways with the intent of mitigating performance problems by alleviating any problems that you have detected.

Here is an example of the performance statistics reported using FULLSTIMER:

NOTE: PROCEDURE DS2 used (Total process time):

      real time           2.35 seconds

      user cpu time       0.10 seconds

      system cpu time     0.07 seconds

      memory              5738.84k

      OS Memory           30268.00k

      Timestamp           02/14/2016 10:23:11 AM

      Step Count                        87  Switch Count  50

      Page Faults                       0

      Page Reclaims                     1925

      Page Swaps                        0

      Voluntary Context Switches        2648

      Involuntary Context Switches      191

      Block Input Operations            0

      Block Output Operations           0

As you can see, the information that is reported is much richer with FULLSTIMER turned on. Here is some descriptive information for selected performance statistics:

Table 7.1: Performance Statistics in the SAS Log

Statistic Significance
Real Time This is the “wall clock” time elapsed while your job was executing—for example, the time you spent waiting for the step to complete. If any other programs are running on your system besides SAS, real time can fluctuate significantly for the same job depending on how long SAS has to wait for system resources in use by other processes.
CPU Time When FULLSTIMER is not in effect, SAS reports the sum of user CPU time and system CPU time as CPU time. When FULLSTIMER is in effect, user CPU time and system CPU time are reported separately. When parallel processing is in effect, CPU time can be greater than real time.
User CPU Time This is the CPU time expended while executing user-written code, including the SAS program that you wrote and the built-in SAS processes that support its execution.
System CPU Time This is the CPU time expended by the operating system in support of user-written code.
Memory This is the memory allocated to this job or step, not including SAS system overhead.
Involuntary Context Switches This is the number of times a process releases its CPU time-slice involuntarily. Causes include running out of CPU time before your task is finished or having another task grab your time-slice because it has higher priority.
Page Swaps This is the number of times that a process was swapped out of main memory.

When benchmarking performance, it is necessary to run each version of a process several times in separate SAS sessions and average the performance results before making comparisons. It is very helpful to be able to extract performance statistics from the SAS log and save them to a SAS data set for later analysis. The %LOGPARSE macro is available for download from the SAS Support website and can automate this process for you to some extent.

If these techniques do not identify your problem, configuring the SAS logging facility to use some of the DS2 loggers might be useful. Setting up SAS to use these loggers can be a bit intimidating at first, but a lot of detailed information can be gained using this technique. See the SAS DS2 Language Reference appendix titled “DS2 Loggers.” Also see SAS Logging: Configuration and Programming Reference.

7.3.2 Analyzing Performance Statistics

One of the first comparisons to make is real time to CPU time. If your task is performing poorly and real time and CPU time are consistently within 10% to 15% of each other, the limiting factor is probably CPU time, and your task is CPU bound. If there is consistently a large difference between real time and CPU time, then the limiting factor is probably I/O, and your process is probably I/O bound. Other valuable information from FULLSTIMER can be gained by analyzing the remaining statistics. A search of support.sas.com for “FULLSTIMER SAS option” will provide more in-depth guidance on how to proceed.

7.3.3 Tuning Your Code

As a rule, if your current traditional Base SAS DATA step process reads from a table in a DBMS that has the SAS In-Database Code Accelerator installed, it will run much faster if you rewrite the data program as a thread program and execute the thread from a DS2 data program using the SET FROM statement. Performance will be even better if the data program output is also a database table and your database is capable of running both the thread and data programs inside the database. As of this writing, only Teradata and Hadoop have that capability. If your task sources its data from a DBMS, using DS2_OPTIONS TRACE or OPTIONS SASTRACE or both can help identify how much processing is happening in the database and how much data movement to the SAS compute platform for processing was necessary. If you have SAS Viya, your DS2 programs can run distributed via CAS, and this can significantly improve performance.  

In summary, consider the following:

   If your task is I/O bound, threading on the SAS compute platform will likely exacerbate the problem. If you have SAS Viya, consider loading the data into CAS and executing your code there to boost performance. If not, check to see whether your task sources data from a DBMS that has the SAS In-Database Code Accelerator installed. If so, you can significantly improve performance by running the process inside the database as a DS2 THREAD program.

   If your task is CPU bound, it can usually benefit from parallel processing using DS2 threads, even if you do not have SAS Viya or in-database processing capability available. Try rewriting the data program as a thread program and executing it in parallel from a data program using the SET FROM statement, experimenting with the THREADS= option values to fine-tune performance.

   When my DS2 program uses an SQL query as input to the SET statement and is suffering from poor performance, I’ve found it useful to create a temporary table from the SQL query result set, and then use the temporary table as input to the DS2 program for troubleshooting. This can help isolate the performance degradation to the DS2 or the SQL portion of the program.

7.4 Learning and Troubleshooting Resources

7.4.1 SAS Learning Resources

SAS provides extensive documentation for the DS2 language, including many code examples that provide an excellent opportunity for learning more about DS2. A great place to begin is http://go.documentation.sas.com. Navigate to “SAS 9.4 and SAS Viya Programming” documentation and expand the DS2 Reference section. There you will find a new DS2 Programmer’s Guide along with the DS2 Language Reference.

SAS also provides many free how-to videos via their YouTube channel at https://www.youtube.com/user/SASsoftware. And, of course, a quick, well-worded search of the Internet will frequently yield instructive examples complete with source code.

Another great place to learn about SAS is to read the SAS Blogs entries at http://blogs.sas.com. They provide a wealth of information, and many times provide data and source code, too. I write sporadically for the SAS Learning Post blog. You can find a collection of my contributions to The SAS Learning Post blog at http://sasjedi.tips.

Formal training for DS2 is available in both classroom and Live Web format. Just go to http://support.sas.com/training and search for DS2.

7.4.2 SAS Support Communities

My favorite destination for seeking (and sometimes providing) answers is the SAS Support Communities website at http://communities.sas.com. There are many subcommunities available here, but most of the DS2 activity happens in the Base SAS Programming and General SAS Programming communities (part of the larger SAS Programming group of communities). The SAS Support Community forums are very active with seasoned SAS users, newcomers, and SAS staff all engaging to help each other make better use of SAS software.

When posing a question to an online community, you can get help much more quickly if you prepare properly. Pose your question clearly and succinctly. Provide a short program that demonstrates your problem as simply as possible. And, for a quicker response, include sample data as well as the code, so that your peers can quickly see the issue in their own SAS installation.

The best way to provide sample data is to include a little DATA step program that produces a sample data set. The folks on the forum can then run your program using the same data and be able to quickly reproduce your results. I wrote a blog about this titled “Jedi SAS Tricks: The DATA to DATA Step Macro,” which includes a program that creates the data2datastep macro. This macro accepts a data set as input and writes a DATA step for you that will reproduce the data set. I’ve included the code for this macro in the ZIP file containing the data for this book. Look for the program file named data2datastep.sas. The macro is self-documenting, but you can read more about how the macro works at http://goo.gl/spQjuc. Please note that these shortened URLs are case sensitive, so make sure you use a capital Q.

7.4.3 SAS Technical Support

Any licensed SAS user can contact SAS Technical Support for help with his or her SAS questions. Here is a little guide to help you optimize your interactions with SAS Technical Support:

   What can I expect from SAS Technical Support?
See http://support.sas.com/techsup/support.html

   How can I contact SAS Technical Support?

a.   Use the telephone for business time-critical issues: +1 (919) 677-8008 or +1 (800) 727-0025.

b.   Use the web form for all other issues: http://support.sas.com/ctx/supportform/createForm.

   What information will I need when contacting SAS Technical Support?

   If you are using a commercial version of SAS, your company name.

   The country in which your SAS software is licensed.

   Your name, email address, and phone number.

   Site number, operating system, and software release. You can obtain this information from the top of a fresh SAS session’s log, or you can execute this macro to print the information that you need in the SAS log:

%macro siteinfo;

%PUT NOTE: Site ID is &SYSSITE;

%PUT NOTE- SAS Version is &SYSVLONG;

%PUT NOTE- Operating System is &SYSSCP (&SYSSCPL);

%mend;

 

%siteinfo

   SAS product involved (Base SAS is the product for DS2 questions).

   Succinct, one-line description of the issue for the subject line of your technical support track.

   Detailed problem description, including error or warning messages.

   A clear, concise description of any troubleshooting or research that you’ve already done.

   Copies of your SAS program, SAS log, and any other pertinent files. If at all possible, supply some mock-up input data for your program to aid the Technical Support folks in troubleshooting your problem. You can use the data2dataset macro discussed in Section 7.4.2 to provide the data. To include attachments, wait for the first automatic email response to arrive after you post your problem, and then reply to that email with the requisite attachments.

7.5 Review of Key Concepts

   The SAS system option FULLSTIMER can provide significantly more detailed performance information in the SAS log than is usually available.

   The DS2_OPTIONS statement with the TRACE option can provide extra insight into how in-database processing is being accomplished by DS2 programs running in-database.

   For benchmarking, it is necessary to code several approaches to solving the same problem, run them on realistically sized and located data, and compare performance statistics in order to best tune your DS2 programs.

   There is a lot of help available online for SAS users, including free e-learning courses, how-to videos, SAS user community forums, and SAS Technical Support.

   Proper preparation before contacting SAS Technical Support will make the engagement smoother and result in faster answers to your questions.  

7.6 Connecting with the Author

When I finished the first edition of this book, I couldn’t believe it was finally done. Writing a book was an amazing experience for me, and one I wasn’t sure I’d be willing to tackle again. But I reasoned that updating a book would be easier than writing it from scratch, and there was so much great, new stuff in the SAS 9.4M5 release that I couldn’t help but want to share it with you. And so—I’ve done it again and can once again breathe a sigh of relief!

I thank you for taking the time to read my book. I hope you find DS2 as delightful and useful as I have found it to be in my own work. I’d love to hear from you, perhaps with feedback on what you did (and didn’t) find useful or how you’ve been using DS2 to solve your own real-world problems. Please feel free to connect with me on LinkedIn (http://linkedin.com/in/sasjedi), follow me on Twitter (http://twitter.com/sasjedi) and Facebook (https://www.facebook.com/sasjedi), read my sporadic “Jedi SAS Tricks” posts on The SAS Learning Post blog (http://sasjedi.tips) or contact me via my author page at support.sas.com/Jordan. Of course, your reviews on Amazon, Goodreads, etc. are always appreciated.

I wish you the best in all your SASy endeavors. May the SAS be with you!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset