231
Chapter 27
Statistical and
Mathematical Analysis in
a Healthcare Setting
Roque Perez-Velez
Introduction
is chapter will discuss the topic of statistical and mathematical analysis in a healthcare setting.
e author will share his experience with dealing, analyzing, and studying data from various
healthcare systems, and how to explain trends to a nontechnically oriented audience. If the reader
is interested in the basics of statistics or learning more about this topic, the author recommends
Kurtz,
*
Walpole and Myers,
or Montgomery and Runger.
So, what is statistical and mathematical analysis? First, we need to dene several terms. We, as
engineers or managers, are concerned with two types of problems: summarizing, describing, and
*
M. Kurtz, “Engineering Economics,” in Standard Handbook of Engineering Calculations, 2nd ed., ed. T. G.
Hicks (New York: McGraw-Hill Book Co., 1985).
R. E. Walpole and R. H. Myers, Probability and Statistics for Engineers and Scientists, 4th ed. (New York:
Macmillan Publishing Company, Inc., 1989).
D. C. Montgomery and G. C. Runger, Applied Statistics and Probabilities for Engineers, 2nd ed. (New York: John
Wiley& Sons, Inc., 1999).
Contents
Introduction .............................................................................................................................231
Commonly Used Descriptive Statistics .................................................................................... 232
Data Visualization ................................................................................................................... 234
Mathematical Analysis ............................................................................................................. 240
Exploratory Data Analysis ........................................................................................................241
Conclusion ...............................................................................................................................241
232Roque Perez-Velez
exploring data or using data to infer on its nature. Mendenhall and Sincich
*
dene descriptive sta-
tistics “as the branch of statistics devoted to the organization, summarization, and description of
data sets.” Furthermore, in our profession we need to understand the type of data we are working
with. Vining
classies statistical analysis as “either enumerative or analytic studies. Enumerative
studies tend to assume that the data come from a static process. Analytic studies tend to assume
that the data come from a dynamic process that changes over time.
Also, Boslaugh
oers that “the practice of statistics usually involves analyzing data, and the
validity of the statistical results depends in large part on the validity of the data analyzed.” She
asserts that “this means that at some point between data collection and data analysis, someone
has to get her hands dirty working directly with the data le, cleaning, organizing, and otherwise
getting it ready for analysis.” Finally, Peck and Devore
§
assert, “statistics involves collecting, sum-
marizing, and analyzing data. All three tasks are critical. Without summarization and analysis,
raw data are of little value, and even sophisticated analyses cant produce meaningful information
from data that were not collected in a sensible way.
With this in mind, we can dene statistical analysis as the collection, management, organiza-
tion, summarization, analysis, and description of data sets by means of a statistical software pro-
gram or other similar methods.
Perhaps the reader has heard the term structured data analysis. Is this a dierent analysis or is
it associated with the statistical analysis dened above? First, structured data analysis is dened
as the statistical analysis of structured data sets such as results from surveys, multiple-choice
questionnaires, or other arranged data sets. By denition, structured data analysis is a subset
of statistical analysis. Some examples of this methodology are regression, Bayesian, cluster, and
algebraic analysis.
e author denes mathematical analysis as the study of stochastic, continuous probability
and Markov chain analyses as a subdivision of the work performed during statistical analysis.
e parameters calculated with statistical analysis are used as a foundation, in stochastic or
Markov chain analyses, to further study any healthcare system, such as an emergency depart-
ment’s patient ow.
Commonly Used Descriptive Statistics
In this section, the author denes and provides examples of the most commonly used descriptive
statistics: mean, standard deviation, median, mode, minimum, and maximum. First, we will dene
the statistics that are used as measures of central tendency followed by the measures of dispersion.
e mean, commonly called the arithmetic mean, is the average of a set of values. e mean
is used as a measure of central tendency. Suppose we have a family medicine practice clinic with
weekly patient load as shown in Table27.1.
We can calculate the mean as:
(52 + 57 + 57 + 61 + 44)/5 = 54.2
*
W. Mendenhall and T. Sincich, Statistics for Engineers and the Sciences, 3rd ed. (San Francisco, CA: Dellen
Publishing Co., 1992).
G. Georey Vining, Statistical Methods for Engineers (Pacic Grove, CA: Brooks/Cole Publishing Co., 1998).
S. Boslaugh, Statistics in a Nutshell (Sebastopol, CA: O’Reilly Media, Inc., 2012).
§
R. Peck and J. L. Devore, Statistics: e Exploration and Analysis of Data (Boston, MA: Brooks/Cole Publishing
Co., 2012).
Statistical and Mathematical Analysis in a Healthcare Setting233
e arithmetic mean formula, as expressed in summation notation, is shown in Equation (27.1):
µ=
=
n
x
1
i
i
n
1
(27.1)
When the values are ranked in ascending or descending order, the median, mode, minimum,
and maximum are the middle value, the most frequently occurring value, and the lowest and the
highest occurring values, in that order. e median is a better measure of central tendency than
the mean for data that is asymmetrical or contains outliers, while the mode is most often useful
in describing ordinal or categorical data. Continuing with the clinic example above, the patient
load, ranked in ascending order, is: 44, 52, 57, 57, and 61. e minimum is 44, the median is 57,
the mode is 57, and the maximum is 61. e median is formally dened as the (n + 1)/2 values for
odd numbers or average of the two middle values for even numbers.
Please bear in mind that, in perfectly symmetrical distribution such as the normal distribu-
tion, the mean, median, and mode are identical while in asymmetrical or skewed distributions,
these three measures will dier.
A common measure of dispersion for continuous data is standard deviation. It describes how
much the individual values in a data set vary from the mean. e formula for the sample standard
deviation is shown in Equation (27.2):
=
=
s
n
xx
1
1
()
i
i
n
2
1
(27.2)
So, what will the standard deviation be for our family practice clinic example? Let’s see:
s = 1/(5 – 1) × [(44 – 54.2)
2
+ (52 – 54.2)
2
+ (57 – 54.2)
2
+ (57 – 54.2)
2
+ (61 – 54.2)
2
] = 6.53
Another measure of dispersion is the percentile, of which quartiles are a subset. When an
ordered set of data is divided into four equal parts, the division points are called quartiles. e
rst or lower quartile, q1, is a value that has approximately 25% of the observations below it and
approximately 75% of the observations above. e second quartile, q2, has approximately 50%
of the observations below its value. e second quartile is exactly equal to the median. e third
Table27.1 Weekly Patient
Load
Weekday Patient Load
Monday 52
Tuesday 57
Wednesday 57
Thursday 61
Friday 44
234Roque Perez-Velez
quartile, q3, has approximately 75% of the observations below its value. e rst and third quar-
tiles can be calculated as (n + 1)/4 and 3(n + 1)/4 respectively, where n is the number of observa-
tions. e interquartile range (iqr) is calculated as (q3 – q1). Also, the smallest and largest values
are calculated as q1 – 1.5 (q3 – q1) and q3 + 1.5 (q3 – q1), respectively. ese metrics are exten-
sively used in the creation of box plots or commonly known as box-and-whiskers plots. Tuery
*
indicates that it can also be used to compare two populations, or to detect the individual outliers
that must be excluded from the analysis to avoid falsifying the results.
Suppose that we have a pediatric unit where the management engineer is conducting a stang
analysis. e engineer wants to know the estimated daily census for any given day. One way for
the engineer to understand the census dispersion for a particular day is to calculate the sample’s
percentiles and subdivide it into quartiles. Table27.2 shows the census for 24 days.
For this example, the median, after sorting in ascending order, is 22. e minimum and maxi-
mum values, respectively, are 13 and 31. e rst and third quartiles, using the formulas presented
above, are 19 and 25, respectively. ese values give the engineer a pretty good perspective in rela-
tion to the spread or dispersion for the daily census.
ese metrics are widely used to analyze any process or system within the healthcare envi-
ronment no matter if the data is nominal, ordinal, interval, continuous, or discrete. e use of
metrics, such as the mean and standard deviation, is the foundation of statistical process control
(SPC), Total Quality Management (TQM), and Six Sigma methodologies, which are discussed in
another chapter of this book.
Now, the author has noticed that when presenting statistical analysis, on occasions where the
audience’s background is diverse (nontechnical to technical), the audience is likely to mistakenly
believe that the values for the mean and standard deviation are equal to quartiles. Figure27.1
shows how these two metrics compare.
Data Visualization
Statistical analysis results must be presented in meaningful ways, specically if the audience is
diverse. It should be presented in a simple and clear but concise method. Care must be taken when
visualizing data to present an unbiased picture. Ryan
stresses that “much care must be exercised
in the use of graphical procedures, otherwise, the impressions that are conveyed could be very
misleading.” ere are methods that are appropriate for displaying essential information in large
data sets and there are methods for displaying small data sets. Methods for displaying small data
sets include, but are not limited to, tabular displays, steam-and-leaf displays, control charts, scat-
ter plots, frequency tables, bar charts, pie charts, and dot plots. Common methods for displaying
*
S. Tuery, Data Mining and Statistics for Decision Making (West Sussex, UK: John Wiley & Sons Ltd., 2011).
T. P. Ryan, Modern Engineering Statistics (Hoboken, NJ: John Wiley & Sons, Inc., 2007).
Table27.2 Weekly Patient Load
Pediatric Unit Daily Census
31 20 18 30 20 27 22 15
13 19 17 15 24 24 27 18
23 21 25 22 21 19 30 25
Statistical and Mathematical Analysis in a Healthcare Setting235
large data sets include, but are not limited to, histograms, box plots, Pareto charts, line and regres-
sion charts, and bivariate and multivariate charts. For extremely large data sets, the most recent
analysis method is called data mining.
e following is a discussion of several examples of displaying data sets of various sizes.
Pie charts are broadly used to display small data sets with a small number of workable groups.
Pie charts are the simplest and most commonly used to depict nominal data, such as limited-
option questionnaires. Ott and Longnecker
*
provide simple guidelines for constructing pie charts.
ey recommend choosing “a small number (ve or six) of categories for the variable, and to,
whenever possible, construct the pie chart so that percentages are in either ascending or descend-
ing order.” Figure27.2 depicts a local hospitals percentage of births by day of week in a pie chart.
Bar charts are widely used to display small to medium-size data sets. e chart consists of
two axes, horizontal and vertical, arranged on a small number of workable groups, that visually
represents magnitude. A simple example would be for a clinical laboratorys manager to respond
to a question related to length of time per transaction for a pneumatic tube transport system.
Table27.3 summarizes the number of transactions per time frame.
Figure27.3 shows the same data plotted using a bar chart.
By using similar data to that presented in Table27.2, the management engineer can plot the
daily census, by day of the week, for a pediatric unit. is will enable the engineer to better visu-
alize any patterns in daily census. is large data set is from the results of a dynamic simulation
*
R. L. Ott, and M. Longnecker, An Introduction to Statistical Methods and Data Analysis (Belmont, CA: Brooks/
Cole, Cengage Learning, 2010).
4σ3σ2σ1σ0σ
15.73% 68.27% 15.73%
24.65% 50%
Median
Q1 – 1.5 × IQR Q3 + 1.5 × IQR
Q1 Q3
IQR
24.65%
–4σ –3σ –2σ –1σ
4σ3σ2σ1σ0σ–4σ –3σ –2σ
–2.698σ 2.698σ–0.6745σ 0.6745σ
–1σ
4σ3σ2σ1σ0σ–4σ –3σ –2σ –1σ
Figure 27.1 Graphs illustrating mean and standard deviation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset