Chapter 3. Statistical Data Analysis and Probability

We will cover the following recipes in this chapter:

  • Fitting data to the exponential distribution
  • Fitting aggregated data to the gamma distribution
  • Fitting aggregated counts to the Poisson distribution
  • Determining bias
  • Estimating kernel density
  • Determining confidence intervals for mean, variance, and standard deviation
  • Sampling with probability weights
  • Exploring extreme values
  • Correlating variables with the Pearson's correlation
  • Correlating variables with the Spearman rank correlation
  • Correlating a binary and a continuous variable with the point-biserial correlation
  • Evaluating relationships between variables with ANOVA

Introduction

Various statistical distributions have been invented, which are the equivalent of the wheel for data analysts. Just as whatever I think of comes out differently in print, data in our world doesn't follow strict mathematical laws. Nevertheless, after visualizing our data, we can see that the data follows (to certain extent) a distribution. Even without visualization, we can find a candidate distribution using rules of thumb. The next step is to try to fit the data to a known distribution. If the data is very complex, possibly due to a high number of variables, it is useful to estimate its kernel density (also useful with one variable). In all scenarios, it is good to estimate the confidence intervals or p-values of our results. When we have at least two variables, it is sometimes appropriate to have a look at the correlation between variables. In this chapter, we will apply three types of correlation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset