Chapter 4

Counting on Statistical Software

In This Chapter

arrow Surveying statistical software (commercial and free) for personal computers

arrow Performing statistics on handheld devices (like calculators, tablets, and smartphones)

arrow Doing statistical calculations on the web

arrow Using paper calculators (yes, there are such things!)

You may be surprised that throughout this book, I tell you not to do statistical calculations by hand. With computing power so readily available and with such an abundance of statistical software at your disposal — much of it free — there’s just no good reason to put yourself through the misery of mind-numbing calculations and waste your precious time only to (almost certainly) come up with the wrong answer because of some inadvertent error in arithmetic. Just as you would never seriously consider using long division to calculate your car’s miles per gallon, you should never consider calculating a correlation coefficient, a t test, or a chi-square test by hand.

In this chapter, I describe some of the many alternatives available to you for performing statistical calculations and analyses. I group them according to the devices on which they run:

check.png Personal computers

check.png Calculators and mobile devices

check.png The web

check.png Paper

Desk Job: Personal Computer Software

The first statistical software was developed for the mainframes and minicomputers of the 1960s and 1970s. As personal computers became popular, many of these programs were adapted to run on them. And many more statistical programs were developed from scratch to take advantage of the user-friendly graphical user interface (GUI) of Macintosh and Windows computers, including menus, drag-and-drop capability, the point-and-click feature of the mouse, and so forth. More than a hundred of these packages are listed on this web page: StatPages.info/javasta2.html.

I describe a few personal computer software products in the following sections. They come in two categories: commercial (the ones you pay for) and free.

Most statistical packages run on Windows; some also run on Mac and Unix or Linux systems. Any Windows package will run on a Mac that has Windows emulation capability (as most of the modern Macs do).

Checking out commercial software

Commercial statistical programs usually provide a wide range of capabilities, personal user support (such as a phone help-line), and some reason to believe (or at least to hope) that the software will be around and supported for many years to come.

Prices vary widely, and the array of pricing options may be bewildering, with single-user and site licenses, nonprofit and academic discounts, one-year and permanent licenses, “basic” and “pro” versions, and so on. Therefore I make only very general statements about relative prices for the commercial packages; check the vendors’ websites for details.

Many companies let you download a demo version of their software that’s limited in some way — some features may be disabled, the maximum number of cases or variables may be limited, or the software may run for only a certain number of days.

tip.eps Demo versions are a great way to see whether a software package is easy to use and meets your needs before you shell out the cash for a full version.

In the following sections, I discuss several commercial software programs for you to consider, starting with the biggest, most general, most powerful, and most expensive.

SAS

SAS is one of the most comprehensive statistical packages on the market. It’s widely used in all branches of science and is especially pervasive in the pharmaceutical industry. The current versions run on Windows and some Linux systems.

SAS is designed to be run by user-written programs. A GUI module makes the programming task easier, but SAS isn’t designed like a typical personal computer program. It doesn’t use the familiar “document” paradigm that almost all other personal computer software uses. For example, you don’t create a new data file by going to a File menu and selecting New, nor do you open an existing data file by going to File and selecting Open. Most users need SAS training in order to use the program productively.

SAS is large-scale software, designed for large-scale operations. It comes with a wide variety of analyses built in, and its programming language lets you create modules to perform other, less-common kinds of analyses. Its scope has grown beyond just the statistical analysis of data; SAS is now a complete data acquisition, validation, management, analysis, and presentation system.

SAS is also expensive — depending on the optional modules you want to use, it can cost over $1,000 per year. If your organization has a site license, you may be able to use SAS for relatively little or no money for as long as you’re affiliated with that organization.

tip.eps For most readers of this book, SAS is likely to be overkill, but if it's available at your school or organization, it may be worth your time to learn how to use it (especially if you plan to work in pharmaceutical research). See www.sas.com for more details about this program.

SPSS

SPSS is another comprehensive program that can perform all the analyses you’re likely to need while remaining quite intuitive and user-friendly. You create and edit data files the same way you’d create and edit word-processing documents and spreadsheets — using the File menu’s commands: New, Save, Open, and so forth. SPSS contains a programming language that can automate repetitive tasks and perform calculations and analyses beyond those built into the software. SPSS runs on Windows, Macintosh, and some Linux systems.

SPSS pricing is complicated. Depending on the modules you want to use, it can cost many hundreds of dollars per year. Check out www.spss.com for details on this software.

GraphPad Prism and InStat

Unlike most commercial stats packages, these two programs were designed by and for scientists, not by and for statisticians. GraphPad Prism focuses on the needs of biological and clinical researchers in laboratory settings, and it’s quite capable of handling non-laboratory research as well. It offers a powerful combination of parametric and nonparametric tests, extensive regression and curve-fitting (including nonlinear regression), survival analysis, and scientific graphing. It runs on Windows and Mac systems.

GraphPad InStat carries the “scientist, rather than statistician” theme even further, with a user-friendly interface that guides you through the process of selecting the right test based on the structure of your experiment, verifying that your data meets the assumptions of the test, and interpreting all parts of the output in “plain English” with a minimum of statistical jargon. It doesn’t have all the capabilities of Prism; its emphasis is on ease of use. If you don’t want to have to become a statistician but just want to get your data analyzed properly with minimal fuss, and without a long learning process, check out InStat. It runs on Windows and some Mac systems.

These programs are reasonably priced; academic and student discounts are available, and you can download trial versions to evaluate. They're definitely worth a close look; head to www.graphpad.com for details.

Excel and other spreadsheet programs

warning_bomb.eps You can use Excel (and similar spreadsheet programs) to store, summarize, and analyze your raw data and to prepare graphs from your analysis. But using Excel for data storage and analysis has been controversial. Some have argued that Excel is too unstructured to serve as a respectable database (you can put anything into any cell, with no constraints on data types, ranges, and so forth), and you can easily destroy all or parts of your database (by sorting just some columns and not others). Others have said that Excel’s built-in mathematical and statistical functions are inaccurate and unreliable. Although some of those criticisms were valid years ago, today’s Excel is much improved and is satisfactory for most purposes.

Excel has built-in functions for summarizing data (means, standard deviations, medians, and so on) for the common probability distribution functions and their inverses (normal, Student t, Fisher F, and chi-square) and for performing Student t tests and calculating correlation coefficients and simple linear regression (slope and intercept). If you install the optional Analysis add-in packages provided with Excel, Excel can do more extensive analyses, such as ANOVA and multivariate regression.

Excel runs on both Windows and Macintosh. You can buy it as part of the Microsoft Office suite, and prices vary depending on the version of the suite. For more information, see office.microsoft.com/en-us/excel.

Some other packages to consider

Among the many other commercial statistics packages, you may want to look into one or more of these:

check.png Stata: This package provides a broad range of capabilities through user-written routines. It originally used a command-line interface, but recent versions have implemented a graphical shell. It runs on Windows, Mac, Unix, and Linux systems.

check.png S Plus: Based on the S programming language (similar to the R language I describe later in this chapter), S Plus provides an extensive graphical user interface. It is highly extensible through user-written routines for almost every imaginable statistical procedure.

check.png Minitab: With an emphasis on industrial quality control, this package contains many of the capabilities you need for biological research. It runs on Windows systems.

Focusing on free software

Over the years, many dedicated and talented people have developed statistical software packages and made them freely available worldwide. Although some of these programs may not have the scope of coverage or the polish of the commercial packages that I describe earlier in this chapter, they’re high-quality programs that can handle most, if not all, of what you probably need to do. The following sections describe several general-purpose statistical packages that perform a wide variety of analyses, an Excel add-in module, and two special-purpose packages that perform power and sample-size ­calculations.

OpenStat and LazStats

OpenStat, developed by Dr. Bill Miller, is an excellent free program that can perform almost all the statistical analyses described in this book. It has a very friendly user interface, with menus and dialogs that resemble those of SPSS. Dr. Miller provides several excellent manuals and textbooks that support OpenStat, and users can e-mail Dr. Miller directly to get answers to questions or problems they may have. OpenStat runs on Windows systems.

An alternative is LazStats, also from Dr. Miller, which has many of the same capabilities as OpenStat but can run directly (without emulation) on Macintosh and at least some Linux systems.

technicalstuff.eps Fun fact: The “Laz” in “LazStats” doesn’t stand for lazy; it stands for the free Lazarus compiler that was used to create the software for Mac and Linux as well as Windows.

Get the scoop on both programs at www.statprograms4U.com.

R

R is a free statistical programming and graphical system that runs on Windows, Macintosh, and Linux systems. It’s one of the most powerful computing software packages available, with capabilities surpassing those of many commercial packages. It has built-in support for every kind of statistical analysis described in this book, and many hundreds of add-on packages (also free) extend its capabilities into every area of statistics. You can generate almost every imaginable kind of graph with complete control over every detail (all the technical graphs in this book were made with R).

warning_bomb.eps But R is not easy to use. Its user interface is very rudimentary, and all analyses have to be specified as commands or statements in R’s programming language (which is very similar to the S language used by the commercial S-Plus package). It may take you awhile to become proficient in R, but once you do, you’ll have almost unlimited capability to carry out any kind of statistical analysis you can think up.

Check out www.r-project.org for more information.

Epi Info

Epi Info, developed by the Centers for Disease Control, was designed to be a fairly complete system to acquire, manage, analyze, and display the results of epidemiological research, although it's useful in all kinds of biostatistical research. It contains modules for creating survey forms, collecting data, and performing a wide range of analyses: t tests, ANOVA, nonparametric statistics, cross tabulations, logistic regression (conditional and unconditional), survival analysis, and analysis of complex survey data. Epi Info runs under Windows. Find the details on this program at wwwn.cdc.gov/epiinfo.

PopTools

PopTools is a free add-in for Excel, written by Greg Hood, an ecologist from Australia. It provides some impressive extensions to Excel — several statistical tests (ANOVA, chi-square, and a few others), a variety of matrix operations, functions to generate random numbers from many different distributions, programs that let you easily perform several kinds of simulations (bootstrapping, Monte-Carlo analysis, and so on), and several handy features for checking the quality of your data. Definitely worth looking at if you're using Excel on a Windows PC (unfortunately, it doesn't work with Mac Excel). Discover more at poptools.org.

PS (Power and Sample Size Calculation) and G*Power

The PS program, from W.D. Dupont and W.D. Plummer of Vanderbilt University, does a few things, and it does them very well. It performs power and sample-size calculations for Student t tests, chi-square tests, several kinds of linear regression, and survival analysis. It has a simple, intuitive user interface and a good help feature, and it provides a verbal description of the analysis (describing the assumptions and interpreting the results) that you can copy and paste into a research proposal or grant application. You can create graphs of power versus sample size or effect size for various scenarios and tweak them until they're of publication quality. For more info, check out biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize.

Another excellent power/sample-size program is G*Power. This program handles many more types of statistical analyses than PS, such as multi-factor ANOVAs, ANCOVAs, multiple regression, logistic regression, Poisson regression, and several nonparametric tests. Like PS, it also provides excellent graphics. See www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3 for details.

tip.eps G*Power can be more intimidating for the casual user than PS, but because both products are free, you should download and install both of them.

On the Go: Calculators and Mobile Devices

Over the years, as computing has moved from mainframes to minicomputers to personal computers to hand-held devices (calculators, tablets, and smartphones), statistical software has undergone a similar migration. Today you can find statistical software for just about every intelligent (that is, computerized) device there is (with the possible exception of smart toasters).

Scientific and programmable calculators

Many scientific calculators claim to perform statistical calculations, although they may entail no more than calculating the mean and standard deviation of a set of numbers that you key in. Some of the newer scientific calculators also handle correlation and simple linear regression analysis.

Programmable calculators like the TI-83 and HP 35s aren’t limited to the calculations that are hard-wired into the device; they let you define your own special-purpose calculations, and therefore can perform almost any computation for which a suitable program has been written.

Mobile devices

Mobile devices (smartphones, tablets, and similar devices) are rapidly becoming the “computer of choice” for many people (according to Mashable.com, 6 billion cellphones were active worldwide in 2011). Indeed, for tasks like e-mail and web browsing, many people find them to be more convenient than desktop or even laptop computers. Perhaps the main reason for their incredible popularity is that they can run an astounding number of custom-written applications, or apps.

Statistics-related apps are available for all the major mobile platforms — Apple iOS, Android, Windows Mobile, and BlackBerry. These range from simple calculators that can do elementary statistical functions (such as means, standard deviations, and some probability functions) to apps that can do fairly sophisticated statistical analyses (such as ANOVAs, multiple regression, and so forth). Prices for these apps range from zero to several hundred dollars. One example is the free StatiCal (short for Statistical Calculator) app for Android systems, which can evaluate the common probability functions and their inverses; calculate confidence intervals; and perform t tests, simple ANOVAs, chi-square tests, and simple correlation and regression analyses.

warning_bomb.eps As of this writing, a tablet or cellphone isn’t the ideal platform for maintaining large data files, but it can be very handy for quick calculations on summary data where you need to enter only a few numbers (like chi-square tests on cross-tab tables, or power calculations).

tip.eps The mobile environment is changing so rapidly that I’m reluctant to recommend specific apps. Go to the “app store” for your particular device (for example, Apple’s iTunes App Store or Android’s Play Store), and search using terms like statistics, statistical, anova, correlation, and so on.

Gone Surfin’: Web-Based Software

I define a web-based system as one that requires only that your device have a fairly modern web browser (like Microsoft’s Internet Explorer, Mozilla Firefox, Opera, Google Chrome, or Apple’s Safari) with JavaScript. All modern smartphones and tablets (iPhone, iPad, Android, and Windows) meet this criterion. Properly written web-based software is platform-independent — it doesn’t care whether you’re running a PC with Windows, a Macintosh, a computer with Linux, an iPhone, an iPad, or an Android phone or tablet. No special software has to be downloaded or installed on your device.

I define a cloud-based system as one in which the software and your data files (if any) are stored on servers in the cloud (that is, somewhere on the Internet, and you don’t care where). Cloud-based systems offer the prospect of letting you access your data, in its most up-to-date form, from any device, anywhere. The ideal web-based/cloud-based system would require only a browser, so it could be accessed from your personal computer, tablet, or smartphone. I’m not aware of any systems currently available that provide all of these capabilities, but they may be coming soon.

tip.eps Less ambitious than a complete web-based statistics package is a web page that can do one specific statistical calculation or analysis using data that you enter into the page. Many such calculating pages exist, and the website StatPages.info lists hundreds of them, organized by type of calculation: descriptive statistics, single-group tests, confidence intervals, two-group comparisons, ANOVAs, cross-tab chi-square tests, correlation and regression analysis, power calculations, and more.

Taken together, all these pages can be thought of as a free, multiplatform, cloud-based statistical software package. But because they’ve been written by many different people, they don’t have a consistent look and feel, they don’t exchange data with each other, they don’t manage stored data files, and (like anything on the web) individual online calculators tend to come and go over the course of time. But they can be accessed from anywhere at any time; all you need is a device with a browser (a computer, tablet, or smartphone) and an Internet connection.

tip.eps Besides software, other very useful statistics-related resources are freely available on the web. These include interactive textbooks, tutorials, and other educational materials. Many of these are listed on the StatPages.info website.

On Paper: Printed Calculators

I recommend using the options that I list earlier in this chapter for most of your statistical calculations. But there are still a few times when ancient (that is, pre-computer) techniques can be useful. Believe it or not, it’s possible to create a printed page that, when used with a ruler or a piece of string, actually becomes a working calculator.

Nomograms, also called alignment charts, look something like ordinary graphs, but they’re quite different. They usually have three or more straight or curved scales corresponding to three or more variables that are related by some mathematical formula (like height, weight, and body mass index). The scales are positioned on the paper in such a way that if you lay a ruler (or stretch a string) across them, it will intersect the scales at values that obey the mathematical expression. So if you know the values of any two of the three variables, you can easily find the corresponding value of the third variable.

Nomograms can’t be constructed for every possible three-variable expression, but when they can, they’re quite useful. Figure 4-1 shows a simple body-mass-index nomogram; several others appear in this book. The dotted line shows that someone who is 5 feet, 9 inches tall and weighs 160 pounds has a BMI of about 24 kilograms per square meter (kg/m2), near the high end of the normal range.

9781118553992-fg0401.eps

Illustration by Wiley, Composition Services Graphics

Figure 4-1: A simple nomogram for body mass index, calculated from height and weight.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset