Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Interpreting the p-value

The p-values are a decimals between 0 and 1 that represent the probability that the data given to us occurred by chance under the hypothesis test. Simply put, the lower the p-value, the better the chance that we can reject the null hypothesis. For our purposes, the smaller the p-value, the better the chances that the feature has some relevance to our response variable and we should keep it.

For a more in-depth handling of statistical testing, check out Principles of Data Science, https://www.packtpub.com/big-data-and-business-intelligence/principles-data-science, by Packt Publishing.

The big take away from this is that the f_classif function will perform an ANOVA test (a type of hypothesis test) on each feature on its own (hence the name univariate testing) and assign that feature a p-value. The SelectKBest will rank the features by that p-value (the lower the better) and keep only the best k (a human input) features. Let's try this out in Python.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Interpreting the p-value

Create new playlist

Sign In

Sign Up

Table of Contents for
Interpreting the p-value