Computing the chi-squared value

The value is based on , where the e values are the expected values and the o values are the observed values. In our case, we have two dimensions, shift, s, and defect type, t, which leads to .

We can compute the specified formula's value as follows:

diff = lambda e, o: (e-o)**2/e

chi2 = sum(
    diff(expected[s, t], defects[s, t])
    for s in shift_totals 
    for t in type_totals
)

We've defined a small lambda to help us optimize the calculation. This allows us to execute the expected[s,t] and defects[s,t] attributes just once, even though the expected value is used in two places. For this dataset, the final value is 19.18.

There are a total of six degrees of freedom based on three shifts and four defect types. Since we're considering them to be independent, we get . A chi-squared table shows us that anything below 12.5916 would have a 1 chance in 20 of the data being truly random. Since our value is 19.18, the data is unlikely to be random.

The cumulative distribution function for shows that a value of 19.18 has a probability of the order of 0.00387: about 4 chances in 1,000 of being random. The next step in the overall analysis is to design a follow-up study to discover the details of the various defect types and shifts. We'll need to see which independent variable has the biggest correlation with defects and continue the analysis. This work is justified because the value indicates the effect is not simple random variation.

A side-bar question is the threshold value of 12.5916. While we can find this in a table of statistical values, we can also compute this threshold directly. This leads to a number of interesting functional programming examples.

Table of Contents for Computing the chi-squared value

Create new playlist

Sign In

Sign Up

Table of Contents for
Computing the chi-squared value