Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Best practices for statistics

Statistics are an integral part of any predictive modelling assignment. Statistics are important because they help us gauge the efficiency of a model. Each predictive model generates a set of statistics, which suggests how good the model is and how the model can be fine-tuned to perform better. The following is a summary of the most widely reported statistics and their desired values for the predictive models described in this book:

Algorithms	Statistics/Parameter	The desired value of statistics
Linear regression	R₂, p-values, F-statistic, and Adj. R₂	High Adj. R₂, low F-statistic, and low p-value
Logistic regression	Sensitivity, specificity, Area Under the Curve (AUC), and KS statistic	High AUC (proximity to 1)
Clustering	Intra-cluster distance and silhouette coefficient	High intra-cluster distance and high silhouette coefficient (proximity to 1)
Decision trees (classification)	AUC and KS statistics	High AUC (proximity to 1)

While reporting the results of a predictive model, the value of these statistics and its meaning in the business context should be stated explicitly. A brief and lucid explanation of the relevance and significance of the statistic is appreciated. Report the best values (most optimum value attainable) of these statistics. The model should be fine-tuned based on the value of these statistics until the point that they can't be further improved.

Apart from these statistics, there are various statistical tests that can be performed over the dataset to test certain hypothesis about the data before fitting any predictive model to it. These tests include Z-test, t-test, chi-square test, ANOVA, and so on. If such tests have been performed, the results (value and significance) and their implications should be clearly stated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Best practices for statistics

Create new playlist

Sign In

Sign Up

Best practices for statistics

Table of Contents for
Best practices for statistics