Computing probabilities from Counter objects

We've read the data and computed summaries in two separate steps. In some cases, we may want to create the summaries while reading the initial data. This is an optimization that may save a little bit of processing time. We could write a more complex input reduction that emitted the grand total, the shift totals, and the defect type totals. These Counter objects would be built one item at a time.

We've focused on using the Counter instances, because they seem to allow us flexibility. Any changes to the data acquisition will still create Counter instances and won't change the subsequent analysis.

Here's how we can compute the probabilities of defect by shift and by defect type:

from fractions import Fraction
P_shift = {
shift: Fraction(shift_totals[shift], total)
for shift in sorted(shift_totals)
}
P_type = {
type: Fraction(type_totals[type], total)
for type in sorted(type_totals)
}

We've created two mappings: P_shift and P_type. The P_shift dictionary maps a shift to a Fraction object that shows the shift's contribution to the overall number of defects. Similarly, the P_type dictionary maps a defect type to a Fraction object that shows the type's contribution to the overall number of defects.

We've elected to use Fraction objects to preserve all of the precision of the input values. When working with counts like this, we may get probability values that make more intuitive sense to people reviewing the data.

The P_shift data looks like this:

{'1': Fraction(94, 309), '2': Fraction(32, 103), 
'3': Fraction(119, 309)}

The P_type data looks like this:

{'A': Fraction(74, 309), 'B': Fraction(23, 103), 
'C': Fraction(128, 309), 'D': Fraction(38, 309)}

A value such as 32/103 or 96/309 might be more meaningful to some people than 0.3106. We can easily get float values from Fraction objects, as we'll see later.

In Python 3.6, the keys in the dictionary will tend to remain in the order of the keys as found in the source data. In previous versions of Python, the order of the keys was less predictable. In this case, the order doesn't matter, but it can help debugging efforts when the keys have a predictable order.

The shifts all seem to be approximately at the same level of defect production. The defect types vary, which is typical. It appears that the defect C is a relatively common problem, whereas the defect B is much less common. Perhaps the second defect requires a more complex situation to arise.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset