Poisson distribution

The Poisson distribution has been explained in an earlier section of this chapter. In a nutshell, the Poisson distribution is a binomial distribution with an infinitely large number of samples, so large that the discrete nature of the binomial distribution gives way to the Poisson distribution. The Poisson distribution also deals with the probability of the occurrence of events. But rather than thinking in terms of the probability of the occurrence of the event in each trial, we think in terms of time intervals and ask ourselves how many times the event of interest would occur in that time interval. The parameter also moves from the probability of success for each trial to the number of successes in a given time interval.

Here's a summary:

Binomial: Probability of a number of successes in a given number of trials given a probability of success for each trial
Poisson: Probability of a number of successes in a given interval of time given the arrival or success rate—that is, the average number of successes in a given time interval

The Poisson probability distribution is expressed by the following:

Here, λ is the arrival or success rate.

This expression gives the probability of observing x successes in the given time interval (the same interval in which the arrival rate is defined).

We are interested in the MLE estimation of λ given a set of datasets that are supposed to be following a Poisson distribution:

Maths of the MLE calculation for a Poisson distribution

Note how taking the log eases the calculation algebraically. It introduces some numerical challenges though—for example, making sure that the likelihood is never 0, as log cannot be defined as 0. Numerical methods also result in an invalid value if the log-likelihood is infinitely small.

The MLE finds that the estimate of the arrival rate is equal to the mean of the dataset—that is, the number of observed arrivals in the given time interval in the past. The preceding calculation can be done using NumPy and other supporting packages in Python.

There are several steps that we need to take to perform this calculation in Python:

Write a function to calculate the Poisson probability for each point:

import numpy as np
import math as mh
np.seterr(divide='ignore', invalid='ignore') #ignore division by zero and invalid numbers
def poissonpdf(x,lbd):
    val = (np.power(lbd,x)*np.exp(-lbd))/(mh.factorial(x))
    return val

Write a function to calculate the log-likelihood over the data given a value for the arrival rate:

def loglikelihood(data,lbd):
    lkhd=1
    for i in range(len(data)):
        lkhd=lkhd*poissonpdf(data[i],lbd)
    if lkhd!=0:
        val=np.log(lkhd)
    else:
        val=0
    return val

Write a function to calculate the derivative of the log-likelihood for the arrival rate λ:

def diffllhd(data,lbd):
    diff = -len(data) + sum(data)/lbd
    return diff

Generate test data with 100 data points—a random number of arrivals/unit time between 3 and 12:

data=[randint(3, 12) for p in range(100)]

Calculate the log-likelihood for different values of arrival rates (1 to 9) and plot them to find the arrival rate that maximizes:

y=[loglikelihood(data,i) for i in range(1,10)]
y=[num for num in y if num ]
x=[i for i in range(1,10) if loglikelihood(data,i)]
plt.plot(x,y)
plt.axvline(x=6,color='k')
plt.title('Log-Likelihoods for different lambdas')
plt.xlabel('Log Likelihood')
plt.ylabel('Lambda')

From this, we get the following plot, which shows that the maximum value of the log-likelihood on the test data is obtained when the arrival rate is 6/unit time:

Log-likelihood values at different values of lambda (that is, arrival rate)

Use the Newton–Raphson method to find the global maximum of the log-likelihood:

def newtonRaphson(data,lbd):
    h = loglikelihood(data,lbd) / diffllhd(data,lbd)
    while abs(h) >= 0.0001:
        if diffllhd!=0:
            h = loglikelihood(data,lbd) / diffllhd(data,lbd)

        # x(i+1) = x(i) - f(x) / f'(x)
            lbd = lbd - h
        else:
            lbd=lbd
    return lbd

Note: The lbd parameter in the function definition is the initial value to start the search with.

The Newton–Raphson method is a popular computation method used to find the roots of complex equations. It is an iterative process that finds different values of independent variables until the dependent variable reaches 0. More information can be found at http://www.math.ubc.ca/~anstee/math104/newtonmethod.pdf.

The results are greatly affected by the initial value of the parameter that is provided to start the search. The iterative search can go in very different directions if the start values are different, so be careful while using it.

The concept of MLE can be extended to perform distribution-based regression. Suppose that we hypothesize that the arrival rate is a function of one or several parameters. Then the lambda would be defined by a function of the parameters:

In this case, the arrival rate would be calculated as follows:

Use the value of the arrival rate from the previous equation in the log-likelihood calculation.
Find the partial derivative of the log-likelihood with regard to w₀, w₁, and w₂.
Equate all the partial derivates to 0 and find the optimum values of w₀, w₁, and w₂.
Find the optimum value of the arrival rate based on these parameters.

To use the MLE calculations, do the following:

Find population parameters, such as the mean, std dev, arrival rate, and density, from the sample parameters.
Fit the distribution-based regression on data where simple linear regression wouldn't work, such as the parameter-based arrival rate example discussed previously, or logistic regression weights.

Its usage in fitting the regression model brings it into the same league of optimization methods such as OLS, gradient descent, Adam optimization, RMSprop, and other methods.

Table of Contents for Poisson distribution

Create new playlist

Sign In

Sign Up

Table of Contents for
Poisson distribution