Appendix B: Understanding Results

In the process of analyzing data, you will encounter terms in the results that reference vital information. Throughout Chapters 5 and 6, we discussed many of the relevant terms and results in the context of the application and analysis that we were performing. This appendix is designed to provide a basic description of the common terms used in each of those platforms to help you understand your own results. We also point out where you can find complete descriptions in the JMP documentation. In most cases, you can simply use the “?” help tool and select the term in question to obtain more information. In other cases, by holding your mouse cursor over statistics, you can see Hover Help, which provides context-specific assistance in interpreting a statistic.

While a graph can be worth a thousand words and is more easily interpreted, the statistical results generated from the Analyze menu contain numerical results and terms that might be less clear. It is important that you understand the basic ideas behind the results. This section offers a basic and general reference rather than a comprehensive one. To increase your confidence in interpreting statistical results, we encourage you to seek advice from an experienced data analyst or reference the books that ship with JMP (see the Help menu) or those in the bibliography.

This appendix covers key terms and related concepts in the book, in alphabetical order. For the purpose of this appendix, the terms column and variable are interchangeable in these definitions.

Bivariate Plot: Also known as a scatter diagram or scatterplot where each point in the plot expresses both an X and Y value. These two-dimensional plots are used when comparing one continuous column with another continuous column. See Help JMP Documentation Library Basic Analysis Chapter 5: Bivariate Analysis.

Box and Whisker Plot: Also called a box plot or an outlier box plot. A graphical presentation of the important characteristics of a continuous variable. Box plots display the interquartile range of the data (the “box”), the spread (the whiskers), and potential outliers (disconnected points). Box plots are useful for describing variables that have a skewed distribution and for comparing two or more distributions. See Help JMP Documentation Library Basic Analysis Chapter 3: Distributions.

Confidence Interval: An interval within which we expect a value to fall with a given certainty. For a 95% confidence interval, we are 95% certain, or confident, that a new value will fall within this interval.

Contingency Table: A table showing the observed frequencies of two nominal variables, with the rows indicating one variable and the columns indicating another. The table reports a chi-square statistic. See Help JMP Documentation Library Basic Analysis Chapter 7: Contingency Analysis.

Count: The total number of members in a group.

Correlation: A measure of relationship between two continuous variables. It is a relationship where changes in the value of one variable are accompanied by changes in another variable or variables. For example, a correlation of 1 indicates that as one variable increases, the other also increases by some constant proportional amount. See Help JMP Documentation Library Multivariate Methods Chapter 3: Correlations and Multivariate Techniques.

Degrees of Freedom: Also abbreviated DF, degrees of freedom are associated with many statistical estimates. Intuitively, degrees of freedom are the number of freely varying observations in a data-based calculation. The larger the sample size, the larger the degrees of freedom and the stronger the inferences we can draw about population parameters. See Help JMP Documentation Library Basic Analysis Oneway Analysis.

Distribution: The values of a single column or variable in terms of frequency of occurrence, spread, and shape. A distribution can be observed (based on data) or theoretical. Some examples of theoretical distributions are normal (bell- shaped), binomial, and Poisson. Distribution is also the JMP univariate platform.

F Ratio: In an ANOVA, the ratio of between-group variability to the within-group variability . (See One Way Analysis of Variance). The F ratio is used, in conjunction with Prob > F, to test the null hypothesis that the group means are equal (that there is no real difference between them). In general, larger F ratios indicate significant differences between at least two means. See Help JMP Documentation Library Fitting Linear Models Stand Least Squares Report and Options.

Frequency: The number of times a categorical value is observed, a type of event occurs, or the number of elements of a sample that belong to a specified group. It is also called count.

Interquartile Range: A measure of variability or dispersion for a continuous column calculated as the difference or distance between the 25th and 75th percentiles (the first and third quartiles, respectively).

Logistic Regression: A type of regression technique where the Y, dependent variable is nominal or ordinal and there is at least one X, independent variable. Logistic models (sometimes called logit models) are used to predict the probability of occurrence of an event based on the values of one or more variables. See Help JMP Documentation Library Fitting Linear Models Chapter 11: Logistic Regression Models.

Maximum: The largest value observed in the sample.

Mean: A measure of location or central tendency of a column of continuous data. It is the arithmetic average computed by summing all the values in a column and dividing by the number of non-missing rows.

Median: A measure of location or central tendency of a continuous column of data. It is the middle value in an ordered column, which divides a distribution exactly in half. Fifty percent of the values are higher than the median and 50% are lower.

Minimum: The smallest value observed in the sample.

Multiple Regression: An analysis involving two or more independent X variables as predictors to estimate the value of a single dependent variable. The dependent Y variable is usually continuous, but the independent X variables can be continuous or categorical. The model is usually estimated by the method of standard least squares. See Help JMP Documentation Library Fitting Linear Models Chapter 2: Model Specification.

One-Way Analysis of Variance (or One-Way ANOVA): A procedure involving a categorical X, independent variable and a continuous Y, dependent variable. One-way ANOVA is used to test for differences in means among two or more independent groups (though it is typically used to test for differences among at least three groups because the two- group case can be analyzed with a t-test). When there are only two means to compare, the t-test and the F-test (see F-test) are equivalent. See Help JMP Documentation Library Basic Analysis Chapter 6: Oneway Analysis.

Outlier: An observation that is so extreme that it stands apart from the rest of the observations; that is, it differs so greatly from the rest of the data that it gives rise to the question of whether it is from the same population or involves measurement error. One common rule of thumb for an outlier is any value that is 1.5 times the interquartile range if the distribution is approximately normally distributed.

Partition: The Partition platform iteratively separates data according to a predictive relationship between a Y and multiple X values, from strongest to weakest, forming a tree structure. Partition searches through the data table to find values within X columns that best predict the outcome of Y, your column of interest. Partition is a data mining or predictive modeling technique. See Help JMP Documentation Library Predictive and Specialized Modeling Chapter 4: Partition Models.

Prob > F: In ANOVA, the probability of obtaining (by chance alone) an F-value greater than the one calculated if, in reality, the null hypothesis is true. Prob (or “p” values) of 0.05 or less are often considered evidence that a model fits the data. See Help JMP Documentation Library Basic Analysis Chapter 6: Oneway Analysis.

Prob > t: A p-value or measure of significance for a t-test. For a one-sample test or for a test of differences between two means, it is the probability of obtaining a value more extreme than the hypothesized value, if the null hypothesis were true. Prob > t values of 0.05 or less are usually considered significant. See Help JMP Documentation Library Basic Analysis Chapter 3: Distributions.

Quantiles: Values that divide an ordered set of continuous data (from smallest to largest) into equal proportions. Related terms are deciles (dividing data into 10 parts) and quartiles (dividing data into four parts, or quarters). Values in the 97th percentile, or quantile, are equal to or larger than 97% of all values in the distribution.

Quartiles: Values in a continuous column of data that are first ordered (from smallest to largest) and then divided into four quarters, each of which contains 25% of the observed values. The 25th, 50th, and 75th percentiles are the same as the first, second, and third quartiles, respectively. See also Quantiles.

Regression: A statistical procedure that shows how two or more variables are related, which is represented in the simple case by a fitted line in a bivariate scatterplot. The fitted line, along with its regression equation, allows one to predict values of Y based on observations of X. The simple form of this equation is expressed as y=mx+b, which determines the extent to which one variable changes with another.

RSquare: A measure of the adequacy of a model defined as a proportion of variability that is accounted for by the statistical model. RSquare provides a measure of how well future outcomes are likely to be predicted by the model. See Help JMP Documentation Library Basic Analysis Chapter 6: Bivariate Analysis.

Standard Deviation: A measure of variability or dispersion of a data set calculated by taking the square root of the variance. It can be interpreted as the average distance of the individual observations from the mean. The standard deviation is expressed in the same units as the measurement in question. It is usually employed in conjunction with the mean to summarize a continuous column.

Standard Least Squares: A method of fitting a line to data in a bivariate plot or multiple regression model. Least squares is used where there is one continuous Y and at least one X column in your model. Least squares fits a model that minimizes the total sum of squares in the data, hence the name “least squares.” The least squares method is used extensively for prediction and for calculating the relationship between two or more variables. See Help JMP Documentation Library Fitting Linear Models Chapter 3: Standard Least Squares Report and Options.

Sum of Squares: A measure of variation of the model to the observed data. This is calculated by squaring the errors (the vertical distance between each data point and the model fit) and summing those values. We square the values to obtain a positive sum of the errors regardless of whether the observed values are above or below the fit. The best model fit is one where the total sum of squares is minimized.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset