Monte Carlo estimation of the likelihood function and PyMC

Bayesian statistics isn't just another method. It is an entirely different paradigm for practicing statistics. It uses probability models for making inferences, given the data that has been collected. This can be expressed in a fundamental expression as P(H|D).

Here, H is our hypothesis, that is, the thing we're trying to prove, and D is our data or observations.

As a reminder of our previous discussion, the diachronic form of Bayes' theorem is as follows:

Here, P(H) is an unconditional prior probability that represents what we know before we conduct our trial. P(D|H) is our likelihood function, or probability of obtaining the data we observe, given that our hypothesis is true.

P(D) is the probability of the data, also known as the normalizing constant. This can be obtained by integrating the numerator over H.

The likelihood function is the most important piece of our Bayesian calculation and encapsulates all the information about the unknowns in the data. It has some semblance to a reverse probability mass function.

One argument against adopting a Bayesian approach is that the calculation of the prior can be subjective. There are many arguments in favor of this approach, one being that external prior information can be included, as mentioned previously.

The likelihood value represents an unknown integral, which in simple cases can be obtained by analytic integration.

Monte Carlo (MC) integration is needed for more complicated use cases involving higher-dimensional integrals and can be used to compute the likelihood function.

MC integration can be computed through a variety of sampling methods, such as uniform sampling, stratified sampling, and importance sampling. In Monte Carlo integration, we can approximate the integral as follows:

The following is the finite sum:

Here, x is a sample vector from g. The proof that this estimate is a good one can be obtained from the law of large numbers and by making sure that the simulation error is small.

When conducting Bayesian analysis in Python, we will need a module that will enable us to calculate the likelihood function using the Monte Carlo method. The PyMC library fulfills that need. It provides a Monte Carlo method known commonly as MCMC. We will not delve further into the technical details of MCMC, but the interested reader can find out more about MCMC implementation in PyMC from the following sources:

Monte Carlo Integration in Bayesian Estimation: http://bit.ly/1bMALeu
Markov Chain Monte Carlo Maximum Likelihood: http://bit.ly/1KBP8hH
Bayesian Statistical Analysis Using Python–Part 1, SciPy 2014, Chris Fonnesbeck: http://www.youtube.com/watch?v=vOBB_ycQ0RA

MCMC is not a universal panacea; there are some drawbacks to the approach, and one of them is the slow convergence of the algorithm.

Table of Contents for Monte Carlo estimation of the likelihood function and PyMC

Create new playlist

Sign In

Sign Up

Table of Contents for
Monte Carlo estimation of the likelihood function and PyMC