Computing expected values and displaying a contingency table

The expected defect production is a combined probability. We'll compute the shift defect probability multiplied by the probability based on defect type. This will allow us to compute all 12 probabilities from all combinations of shift and defect type. We can weight these with the observed numbers and compute the detailed expectation for defects.

The following code calculates expected values:

expected = {
(s, t): P_shift[s]*P_type[t]*total
for t in P_type
for s in P_shift
}

We'll create a dictionary that parallels the initial defectsCounter object. This dictionary will have a sequence of two-tuples with keys and values. The keys will be two-tuples of shift and defect type. Our dictionary is built from a generator expression that explicitly enumerates all combinations of keys from the P_shift and P_type dictionaries.

The value of the expected dictionary looks like this:

{('2', 'B'): Fraction(2208, 103), 
('2', 'D'): Fraction(1216, 103),
('3', 'D'): Fraction(4522, 309),
('2', 'A'): Fraction(2368, 103),
('1', 'A'): Fraction(6956, 309),
('1', 'B'): Fraction(2162, 103),
('3', 'B'): Fraction(2737, 103),
('1', 'C'): Fraction(12032, 309),
('3', 'C'): Fraction(15232, 309),
('2', 'C'): Fraction(4096, 103),
('3', 'A'): Fraction(8806, 309),
('1', 'D'): Fraction(3572, 309)}

Each item of the mapping has a key with shift and defect type. This is associated with a Fraction value based on the probability of defect based on shift times, the probability of a defect based on defect type times the overall number of defects. Some of the fractions are reduced, for example, a value of 6624/309 can be simplified to 2208/103.

Large numbers are awkward as proper fractions. Displaying large values as float values is often easier. Small values (such as probabilities) are sometimes easier to understand as fractions.

We'll print the observed and expected times in pairs. This will help us visualize the data. We'll create something that looks like the following to help summarize what we've observed and what we expect:

obs exp    obs exp      obs exp     obs exp    
 15 22.51    21 20.99    45 38.94    13 11.56    94
 26 22.99    31 21.44    34 39.77     5 11.81    96
 33 28.50    17 26.57    49 49.29    20 14.63   119
 74          69         128          38         309  

This shows 12 cells. Each cell has values with the observed number of defects and an expected number of defects. Each row ends with the shift totals, and each column has a footer with the defect totals.

In some cases, we might export this data in CSV notation and build a spreadsheet. In other cases, we'll build an HTML version of the contingency table and leave the layout details to a browser. We've shown a pure text version here.

The following code contains a sequence of statements to create the contingency table shown previously:

print("obs exp "*len(type_totals))
for s in sorted(shift_totals):
pairs = [
f"{defects[s,t]:3d} {float(expected[s,t]):5.2f}"
for t in sorted(type_totals)
]
print(f"{' '.join(pairs)} {shift_totals[s]:3d}")
footers = [
f"{type_totals[t]:3d} "
for t in sorted(type_totals)]
print(f"{' '.join(footers)} {total:3d}")

This spreads the defect types across each line. We've written enough obs exp column titles to cover all defect types. For each shift, we'll emit a line of observed and actual pairs, followed by a shift total. At the bottom, we'll emit a line of footers with just the defect type totals and the grand total.

A contingency table such as this one helps us to visualize the comparison between observed and expected values. We can compute a chi-squared value for these two sets of values. This will help us decide whether the data is random or whether there's something that deserves further investigation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset