IN THIS CHAPTER
Although Excel was designed primarily for business users, people in other disciplines, including education, research, statistics, and engineering, also use the software. One way Excel addresses these nonbusiness users is with its Analysis ToolPak add-in. However, many features in the Analysis ToolPak are valuable for business applications as well.
The Analysis ToolPak is an add-in that provides analytical capability that normally isn't available in Excel.
These analysis tools offer many features that may be useful to those in the scientific, engineering, and educational communities — not to mention business users whose needs extend beyond the normal spreadsheet fare.
This section provides a quick overview of the types of analyses that you can perform with the Analysis ToolPak. This chapter covers each of the following tools:
As you can see, the Analysis ToolPak add-in brings a great deal of functionality to Excel. These procedures have limitations, however. In some cases, you may prefer to create your own formulas to do some calculations.
The Analysis ToolPak is implemented as an add-in. Before you can use it, though, you need to make sure that the add-in is installed. Select the Data tab. If you see an Analyze group, showing Data Analysis, the Analysis ToolPak is installed. If you can't access Data Analyze Data Analysis, install the add-in by following these steps:
Using the procedures in the Analysis ToolPak add-in is relatively straightforward as long as you're familiar with the particular analysis type. To use any of these tools, choose Data Analyze Data Analysis, and the Data Analysis dialog box, shown in Figure 37.1, appears. Scroll through the list until you find the analysis tool that you want to use, and then click OK. A dialog box specific to the procedure that you select appears.
Usually, you need to specify one or more Input ranges, plus an Output range. (One cell is sufficient.) Alternatively, you can choose to place the results on a new worksheet or in a new workbook. The procedures vary in the amount of additional information required. In many dialog boxes, you may be able to indicate whether your Data range includes labels. If so, you can specify the entire range, including the labels, and indicate to Excel that the first column (or row) contains labels. Excel then uses these labels in the tables that it produces. Most tools also provide different output options that you can select, based on your needs.
This section describes each tool in the Analysis ToolPak and provides an example. I don't describe every available option in these procedures. If you need to use the advanced analysis tools, you probably already know how to use most of the options not covered here.
Before you use any of these tools, I suggest that you read the appropriate section in Excel's Help system.
Analysis of Variance (sometimes abbreviated as Anova) is a statistical test that determines whether two or more samples were drawn from the same population. Using tools in the Analysis ToolPak, you can perform three types of analysis of variance:
Figure 37.2 shows the dialog box for a single-factor analysis of variance. Alpha represents the statistical confidence level for the test.
The output for this test consists of the means and variances for each of the samples, the value of F, the critical value of F, and the significance of F (P-value).
Correlation is a widely used statistic that measures the degree to which two sets of values vary together. For example, if higher values in one data set are typically associated with higher values in the second data set, the two data sets have a positive correlation. The degree of correlation is expressed as a coefficient that ranges from –1.0 (a perfect negative correlation) to +1.0 (a perfect positive correlation). A correlation coefficient of 0 indicates that the two variables aren't correlated.
Figure 37.3 shows the Correlation dialog box. Specify the input range, which can include any number of variables, arranged in rows or columns.
The output consists of a correlation matrix that shows the correlation coefficient for each variable paired with every other variable.
The Covariance tool produces a matrix that is similar to the one generated by the Correlation tool. Covariance, like correlation, measures the degree to which two variables vary together. Specifically, covariance is the average of the product of the deviations of each data point pair from their respective means.
Because the Covariance tool does not generate formulas, you may prefer to calculate a covariance matrix using the COVAR
function.
The Descriptive Statistics tool produces a table that describes your data with some standard statistics. Figure 37.4 shows some sample output.
Because the output for this procedure consists of values (not formulas), you should use this procedure only when you're certain that your data isn't going to change; otherwise, you'll need to re-execute the procedure. You can generate all these statistics by using formulas.
Exponential Smoothing is a technique for predicting data that is based on the previous data point and the previously predicted data point. You can specify the damping factor (also known as a smoothing constant), which can range from 0 to 1. This factor determines the relative weighting of the previous data point and the previously predicted data point. You also can request standard errors and a chart.
The exponential smoothing procedure generates formulas that use the damping factor you specify. Therefore, if the data changes, Excel updates the formulas.
An F-test is a commonly used statistical test that enables you to compare two population variances. Figure 37.5 shows a small data set and F-test output.
The output for this test consists of the means and variances for each of the two samples, the value of F, the critical value of F, and the significance of F.
The Fourier Analysis tool performs a “fast Fourier” transformation of a range of data. Using the Fourier Analysis tool, you can transform a range limited to the following sizes: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 data points. This procedure accepts and generates complex numbers, which are represented as text string (not numerical values).
The Histogram tool is useful for producing data distributions and histogram charts. It accepts an Input range and a Bin range. A Bin range is a range of values that specifies the limits for each column of the histogram. If you omit the Bin range, Excel creates ten equal-interval bins for you. The size of each bin is determined by the following formula:
=(MAX(input_range)– MIN(input_range))/10
Output from the Histogram tool is shown in Figure 37.6. As an option, you can specify that the resulting histogram be sorted by frequency of occurrence in each bin.
If you specify the Pareto (Sorted Histogram) option, the Bin range must contain values and can't contain formulas. If formulas appear in the Bin range, Excel doesn't sort properly, and your worksheet displays error values. The Histogram tool doesn't use formulas, so if you change any of the input data, you need to repeat the histogram procedure to update the results.
The Moving Average tool helps you smooth out a data series that has a lot of variability. This procedure is often used in conjunction with a chart. Excel does the smoothing by computing a moving average of a specified number of values. In many cases, a moving average enables you to spot trends that otherwise would be obscured by noise in the data.
Figure 37.7 shows a chart generated by the Moving Average tool. You can, of course, specify the number of values that you want Excel to use for each average. If you select the Standard Errors check box in the Moving Average dialog box, Excel calculates standard errors and places formulas for these calculations next to the moving average formulas. The standard error values indicate the degree of variability between the actual values and the calculated moving averages.
The first few cells in the output are #N/A
because not enough data points exist to calculate the average for these initial values.
Although Excel contains built-in functions to calculate random numbers, the Random Number Generation tool is much more flexible because you can specify what type of distribution you want the random numbers to have. Figure 37.8 shows the Random Number Generation dialog box. The Parameters section varies, depending on the type of distribution that you select.
Number of Variables refers to the number of columns that you want, and Number of Random Numbers refers to the number of rows that you want. For example, if you want 200 random numbers arranged in 10 columns of 20 rows, you specify 10 and 20, respectively, in these fields.
In the Random Seed field, you can specify a starting value that Excel uses in its random-number-generating algorithm. Usually, you leave this field blank. If you want to generate the same random number sequence, however, you can specify a seed between 1 and 32,767 (integer values only). You can create the following types of distributions via the Distribution drop-down list in the Random Number Generation dialog box:
The Rank and Percentile tool creates a table that shows the ordinal and percentile ranking for each value in a range. You can also generate ranks and percentiles by using Excel functions (those that begin with RANK
and PERCENTILE
).
Use the Regression tool (see Figure 37.9) to calculate a regression analysis from worksheet data. You can use regression to analyze trends, forecast the future, build predictive models, and, often, to make sense out of a series of seemingly unrelated numbers.
Regression analysis enables you to determine the extent to which one range of data (the dependent variable) varies as a function of the values of one or more other ranges of data (the independent variables). This relationship is expressed mathematically, using values that Excel calculates. You can use these calculations to create a mathematical model of the data and predict the dependent variable by using different values of one or more independent variables. This tool can perform simple and multiple linear regressions and calculate and standardize residuals automatically.
As you can see, the Regression dialog box offers many options:
The Sampling tool generates a random sample from a range of input values. The Sampling tool can help you work with a large database by creating a subset of it.
This procedure has two options: periodic and random. If you choose a periodic sample, Excel selects every nth value from the Input range, where n equals the period that you specify. With a random sample, you simply specify the size of the sample you want Excel to select; every value has an equal probability of being chosen.
Use the T-Test tool to determine whether a statistically significant difference exists between two small samples. The Analysis ToolPak can perform three types of t-tests:
Figure 37.10 shows output for the Paired Two Sample for Means t-test. You specify the significance level (alpha) and the hypothesized difference between the two means (that is, the null hypothesis).
The T-Test tool is used for small samples; the Z-Test tool is used for larger samples or populations. You must know the variances for both input ranges.