Unknown unknowns

Unknown unknowns have been largely made famous due to a phrase from a response the United States Secretary of Defense, Donald Rumsfeld, gave to a question at a United States Department of Defense (DoD) news briefing on February 12, 2002 about the lack of evidence linking the government of Iraq with the supply of weapons of mass destruction to terrorist groups, and books by Nassim Taleb (The Black Swan: The Impact of the Highly Improbable by Nassim Taleb, Random House, 2007).

Note

Turkey paradox

Arguably, the unknown unknown is better explained by the turkey paradox. Suppose you have a family of turkeys playing in the backyard and enjoying protection and free food. Across the fence, there is another family of turkeys. This all works day after day, and month after month, until Thanksgiving comes—Thanksgiving Day is a national holiday celebrated in Canada and the United States, where it's customary to roast the turkeys in an oven. The turkeys are very likely to be harvested and consumed at this point, although from the turkey's point of view, there is no discernable signal that anything will happen on the second Monday of October in Canada and the fourth Thursday of November in the United States. No amount of modeling on the within-the-year data can fix this prediction problem from the turkey's point of view besides the additional year-over-year information.

The unknown unknown is something that is not in the model and cannot be anticipated to be in the model. In reality, the only unknown unknowns that are of interest are the ones that affect the model so significantly that the results that were previously virtually impossible, or possible with infinitesimal probability, now become the reality. Given that most of the practical distributions are from exponential family with really thin tails, the deviation from normal does not have to be more than a few sigmas to have devastating results on the standard model assumptions. While one has still to come up with an actionable strategy of how to include the unknown factors in the model—a few ways have been proposed, including fractals, but few if any are actionable—the practitioners have to be aware of the risks, and here the definition of the risk is exactly the possibility of delivering the models useless. Of course, the difference between the known unknown and unknown unknown is exactly that we understand the risks and what needs to be explored.

As we looked at the basic scope of problems that the decision-making systems are facing, let's look at the data pipelines, the software systems that provide information for making the decisions, and more practical aspects of designing the data pipeline for a data-driven system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset