Appendix B. Important statistical concepts

Statistics is such a broad topic that we’ve only been able to pull pieces of it into our data science narrative. But it’s an important field that has a lot to say about what happens when you attempt to infer from data. We’ve assumed in this book that you already know some statistical ideas (in particular, summary statistics such as the mean, mode, median, variance, and standard deviation). In this appendix, we’ll demonstrate a few more important statistical concepts that relate to model fitting, characterizing uncertainty, and experimental design.

Statistics is math, so this appendix is a bit mathematical. It’s also intended to teach you the proper statistical nomenclature, so you can share your work with other data scientists. This appendix covers technical terms you will hear as part of “data science shop talk.” You’ve been doing the data science work; now, we’ll discuss tools to talk about and criticize the work.

A statistic is any sort of summary or measure of data. An example would be the number of people in a room. Statistics is the study of how observed summaries of samples relate to the (unobserved) true summaries of the entire population we hope to model. Statistics help us to describe and mitigate the variance (or variation) of estimates, uncertainty (ranges or estimated ranges of what we do not know), and bias (systematic errors our procedures unfortunately introduce).

For example, if we are using a database of all past marketing of our company, this is still at best a sample of all possible sales (including future marketing and sales we are hoping to predict with our models). If we do not account for the uncertainty in sampling (and also from many other causes), we will draw incorrect inferences and conclusions.[1]

1

We like to call machine learning the optimistic view of data and statistics the pessimistic view. In our opinion, you need to understand both of these viewpoints to work with data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset