Actual values, 211, 216
for life is vs opinion on death penalty, 214
of party affiliation by capital punishment, 213
table of, 212
Actual vs expected theory, 203–209
Alternative hypothesis, 72, 167, 187, 190, 203, 209
definition of, 165
difference of means test, 176–177
large sample size test, 168, 170
small sample size test, 173–174
testing hypothesis for proportion, 181, 183
Analysis of Variance (ANOVA) procedure, 238, 245
computations for, 250
definition of, 246
Excel demonstration, 253–254
one / single factor, 247–252
two factor, 252
used two means, 246
Analysis reporting, 5
Analysis ToolPak installation in Microsoft Excel, 39, 178
dialog to install, 16
output of, two-sample t-test with equal and unequal variance, 180
procedures of, 17
options for, 42
Auditor, 4
Axioms, 93
Bar charts, 27, 38–39, 41, 45. See also Microsoft Excel
applicable to categorical variables, 23
poverty data for New England states, 40
sample, 24–25
standard, money spent per student, 34
use of vertical or horizontal bars, 24
Bell curve, 99
Bernoulli, Jacob, 128
Bernoulli Trial, 128, 180
Bimodal variables, 53
Binomial distributions, 130
definition of, 128
Excel use for computation of, 131
formula for, 129
parameters for, 128
Binomial probabilities in Excel
rules to find, 133
Binomial random variable, 128, 158
Bins, 27, 30. See also Frequency histograms or table
boundaries, 42
computation for MLB data, 44
Blood pressure, 11
Box / box plot
definition of, 72
estimation of mean, 73–74
generated by Excel, 86
indicates shape of histogram, 73
for normal distribution, 77
skewed to the left, 76
skewed to the right, 77
Calculus, 2
Categorical variables
association between, 193
definition of, 10
groups of, 10–11
Census, 7
Central limit theorem, 168
applet, 124–127
colloquial version, 123
definition of, 122
distribution of means, 141
importance of, 123
for means, 122
uniform distribution for tossing one die, 123
Central tendency
measures of, 50–53
pros and cons of, 53–55
Certified public accountants (CPAs), 17
Charts, 22–25. See also Bar charts; Pie charts
use of, 50
Chi-Square Test for categorical crosstabs, 202–214, 226
Excel demonstration, 214–217
Chi-Square Test of independence, 209–210
Coefficients of least-squares regression line, 239
Company sales report, 14
Confidence intervals for difference of means
equal variances, 153–155
and hypothesis testing, relationship between, 186–187
unequal variances, 155–157
Confidence intervals for means, 140–152
Contingency tables, 198–202, 205
Continuous random variable, 114–118
Continuous variables, 10
Corporate presidents, 2
Correlation between numerical values, 219–226, 233
Correlation coefficient, 237
definition of, 222
Excel use for computation of, 223–226
Course evaluation, frequency table for, 66
Crosstabs table, 212
Cumulative frequency histograms, 33
Darts, 95
Data
analysis, 5
collected, 22
points, 60
spread, method to measure, 60
Data collection, 5
primary sources of, 4
secondary sources of, 4
Decimal numbers, 96
Dependent variable, 221, 226, 247
definition of, 203
Descriptive statistics, 91, 140
available parameters for procedure of, 88
definition of, 3
output procedure, 88, 150
parameters for procedure, 161
procedure options, 149
using excel, 79–86
Difference of means test, 160
equal variances, 175–177
unequal variances, 178–180
Discrete random variable, 114–118, 129
Distribution, histogram
definition of, 74
skewed to the left, 74–75, 77
skewed to the right, 74–75, 77
Distributions of variable
definition of, 13
types of, 13–15
Drop Row Fields in Microsoft Excel, 46
Equal variances, 153–155, 175–177
Estimate / Estimation, 139
confidence intervals for difference of means, 153–157
confidence intervals for means, 140–152
Excel demonstration, 161–162
point, 159
proportions, 157–159
Expected value tables, 209, 216
computation of, 205
creation of, 206
for life is vs opinion on death penalty, 214
methods of computation, 205, 207
of party affiliation by capital punishment, 213
in sex versus income level, 211
table of, 212
F distribution, 119–121, 248–249, 252
definition of, 118
Fictitious data, 204
Frequency histograms or table, 25–33, 56–57, 121
with arbitrary bin boundaries, 101
computation of average, 58
computation of percentages for, 58
construction for height, 96
cumulative frequency histograms (see Cumulative frequency histograms)
with cumulative percent, 59
relative frequency histograms (see Relative frequency histograms)
for salary data, 58
with superimposed normal curve, 100
variance and standard deviation for, 64–66
Fulfillment time, definition of, 188–189
Gaussian normal distribution, 101, 103
General Social Science survey, 157, 182
Graphs, 92
Group variance, 248
Heterogeneous distribution, 13–14
Histogram for short, 25
Homogeneous distribution, 13, 15
Horizontal line, 72
Hypothesis testing, 163
confidence intervals and, relationship between, 186–187
difference of means test, 175–180, 188
Excel demonstration, 188–191
for mean, 188
large sample size, 168–172
small sample size, 172–175
one-tailed and two-tailed tests, 183–186
for proportion, 180–183, 188
as a trial by jury, 164–168
Inconclusive test, 170
Independent variable, 221, 226, 247
definition of, 203
Inferential statistics, 92, 140
definition of, 3
Inter quartile range (IQR), 78–79. See also Outliers
definition of, 73
for life expectancy data, 73
Inverse normal problem
definition of, 110
function of, 110
Inverse probability problem, 121
JMP software, 69
Joint standard error, 153, 176–178
Kolmogorov axioms, 93
Large sample size
confidence interval, 143–147, 152
one-tailed test for mean, 185
testing hypothesis for mean, 168–172
Least-squares linear regression, 231–235
Least-squares regression line, 231, 235–240
Level of significance, 169
Likert scale
computation of mean and median through, 56
definition of, 56
Linear regression, 229–240
correlation between numerical values, 219–226
definition of, 219
dialog, 241
Excel demonstration, 240–243
least-squares linear regression (see Least-squares linear regression)
least-squares regression line (see Least-squares regression line)
parameters for, 236
results of analysis, 242
Lower quartiles, 70–71
computation of, 67–68
definition of, 66–67
Mac, 16
Management, 17
Margin of error, definition of, 139
Margin of error, 159
Mean, 66, 73–74, 81
advantages of, 54–55
applies to numerical variables, 52
computation of, 51
confidence intervals for, 140–152
for continuous random variables, 116–118
definition of, 51
for discrete random variables, 114–116
for frequency distributions, 56–59
impacted by large or small values, 75
influenced by extreme values, 54
letters used for, 51
for machine, 60
for MLB data, 80
for ordinal variables, 55–56
of salaries, 82
testing hypothesis for, 168–175
Median, 54, 66, 72
applies to numerical variables, 53
computation of, 52–53
definition of, 52
for frequency distributions, 56–59
measure central tendency, 54
middle quartile, 67
for MLB data, 80
sorting of data before determination of, 52
Microsoft Excel, 2, 4, 6, 55, 69, 101, 105
charts creation, 38
computation of mean and standard deviation, 104
confidence interval for mean with, 143
confidence intervals computation by using, 148–150
data point without changing data, tricks to highlight, 37
demonstration of, 17–19, 86–89, 133–137
descriptive statistics using, 79–86
fictitious data in, 76
fulfillment data in, 189
histogram tool, 28–29, 39–44
installing Analysis ToolPak in, 15–17
output, descriptive statistics procedure, 83
pivot tool, 44–47
random sample selection in, 9
recommendation to pie charts, 39
selection of random sample by using, 18
special function to compute probabilities, 113
Microsoft Office CD ROM, 15
Middle quartile, 67
MINITAB software, 69
Mode
advantage of, 53
definition of, 53
for frequency distributions, 56–59
for MLB data, 80
for ordinal variables, 55–56
problem with, 53
usefulness of, 53
National Aeronautics and Space Administration (NASA), 132
National error rate, 4
New York City (NYC), 7–8
Nominal variable, 11, 25
Normal distribution, 74, 77, 121
bell-shaped, 101
computation of normal probabilities with Excel, 101–110
converting to z-scores, 112–114
definition of, 99
dialog for functioning of, 136
with different means and standard deviations, 102
Gaussian normal distribution (see Gaussian normal distribution)
solving probability problems by using, 135–137
and standard deviation, 111–112
transformation formula for, 113
Null hypothesis, 171, 187, 190, 203, 209
definition of, 165
difference of means test, 176–177
large sample size test, 168, 170
small sample size test, 173–174
testing hypothesis for proportion, 181, 183
tried and true assumption, 167
type-1 error, 166
type-2 error, 166
Numeric variables, 10
One-tailed tests, 183–186
Ordinal variables, 10–11, 55–56
Outliers. See also Inter quartile range (IQR)
definition of, 78
for life expectancy data, 79
Percentiles, 69–71
Excel use to find, 83
Percentile value, 71
Pie charts, 25–27, 92
3D, 38–39
apply to categorical variables, 22
divides circle into segments, 22
exploding, 23–24
poverty data for New England states, 40
visualize data, 22
Pivot table, 44–47, 199–200, 212
value field settings for, 201
Plot of least-squares regression line, 239
Pooled standard deviation, 153, 176–177
Population, 5, 10
definition of, 4
mean, 140, 154–155, 159–160
Population variance, 61
Potential frequency table, 46
Probability of success, 128
Probability space, definition of, 92
Probability theory, 91, 140. See also Normal distribution
categories of probability distribution, 98
computation of, 94
under normal distribution, 105
computed, 109
construction of frequency histogram for height, 96
definition of, 92
determination of, 129–130
Excel demonstration, 133–137
inverse probability problem (see Inverse probability problem)
of success, 160
weighted coin, 95
Problem definition, 5
Product-based organization, 18
Proportion, testing hypothesis for, 180–183
Quartiles
definition of, 66–67
distribution of values, useful in, 66, 69
method to find, 67
types of, 66–67
Random sample, 145, 158
definition of, 6, 78
selection procedure, 8–10
using of Excel to select, 18–19
Random sample size, 159
Random sampling, 2
Random variable, definition of, 114, 129
Range, 81
definition of, 60
measure of dispersion, 60
midpoints, 58
of salaries, 82
Recording variables, 12
Regression statistics, 237
Rejection region, 187, 209
definition of, 166, 171
difference of means, 176–177
large sample size test, 168
probability of committing an error, 167
small size sample test, 173–175
testing hypothesis for proportion, 181, 183
Relative frequencies for age, 96
Relative frequency distribution of height, 97
Relative frequency histograms, 30–33
Residual output, 238–239
Row vs column percentages, 196–198
Rule of thumb for a chi-square test, 210
Samples, 5–6
definition of, 4
Sample space, 92–94
Sample variance, 61
SAS software, 69
Scatter plots, 226–229, 235
of high school versus college scores, 237
with lines of best fit, 230
with regression line, 242
Small sample size, 172–175
confidence interval, 147–148, 152
one-tailed test for mean, 185
testing hypothesis for mean, 147–148, 152
Special normal distribution, 99
Standard deviation, 64, 66, 77–79, 81, 111–112. See also Normal distribution
computation of, 63
for continuous random variables, 116–118
definition of, 63
for discrete random variables, 114–116
estimation of, 78
of salaries, 82
Standard error, 146, 149
Standard normal distribution, 99, 145
Standard normal probability, 136
Statistical analyses, 1
Statistical test, elements of, 165–166
Statistics
advantage of, 2
computation by using individual Excel commands, 89
definition of, 2–3, 22, 50
descriptive (see Descriptive statistics)
inferential (see Inferential statistics)
reliability on data, 2
study of, 193
Student t-distribution, definition of, 118
Sum of square (SS) differences to mean, 248
Tax auditor, 3, 5
t-distribution
degree of freedom for, 155
for large sample sizes, 155
one-and two-tailed, 120, 152
one-and two-tailed vs standard normal and F distribution, 120
vs F distributions, 119
vs standard normal distributions, 119
Test statistics, 187, 209
collection of evidence in random sample form, 167
definition of, 165
difference of means test, 176–177
large sample size test, 168, 171
small size sample test, 173–175
testing hypothesis for proportion, 181, 183
Total variance, 248
Transactional leadership style, 253
Transactional motivation technique, 253
Transformational leadership style, 253
Twain, Mark, 2
Two-dimensional data summary, 195–198
Two-tailed tests, 183–186
Type-1 error, 166
Type-2 error, 166
Unequal variances, 155–157, 178–180, 190
Uniform distribution, 117
Uniformly distributed random variable, 117
Upper quartiles, 71
computation of, 67–68
definition of, 67
U.S. House of Representatives, 7
Values in tabular form, 12
Variables
categories of, 10
definition of, 5
dependent (see Dependent variable)
distribution of, 13
independent (see Independent variable)
use of, 10
Variance(s), 81
computation of, 61–62, 64
definition of, 61
equal, 153–155, 175–177
estimation of, 65
for frequency tables, 64–66
of machine, 62
population, 61
sample, 61
sample manual, method to find, 61–62
shortcut for, 63–64
symbols for, 61
unequal, 155–157, 178–180
unit of, 63
Whiskers box, 72–79
Zoning law, 195–196
z-scores, 145, 168