Chapter 3. Learning About Models

In the most generic sense, a model is an approximate description of a portion of reality. Models are essential to science and, in fact, any area of knowledge: it is only possible to comprehend the world by concentrating on a small part of it at a time and making suitable simplifications.

In this chapter, we will discuss the following topics:

  • Using basic models in data analysis
  • Using the cumulative distribution function and probability density function to characterize a variable
  • Using the preceding functions and various tools to make point estimates and generating random numbers with a certain distribution
  • Discussing examples of discrete and continuous random variables and an overview of multivariate distributions

Models and experiments

Models can take many forms: a verbal description, set of mathematical equations, or segment of computer code. In this book, we are interested in a specific kind of model, probabilistic or statistical model, which represents the variability that occurs in a nondeterministic experiment.

Note

We use the term experiment in this book in a somewhat non-technical sense. For us, an experiment is any observation of an event of interest. Examples of experiments are observing the number of visitors to a website or conducting an opinion poll or clinical trial. The main characteristic of experiments, for us, is that they can be repeated and that there is randomness, that is, each repetition of the same experiment may result in different outcomes.

The models that we will consider take the form of random variables. A random variable is an idealized representation of a probabilistic outcome that has numerical results. It is important to realize that a random variable is an abstraction: it does not represent the outcome of a particular experiment, it just models what results we expect to get once the experiment is actually performed.

In the remainder of this chapter, we will discuss how statistical models are formulated and describe the most important models used in data analysis.

Before running the examples in this chapter, start the Jupyter Notebook. After the default imports, run the following commands in a cell:

from pandas import Series, DataFrame
import numpy.random as rnd
import scipy.stats as st

You are now ready to start running the code for this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset