Chapter 7

The Search for Moral Certainty

One winter night during one of the many German air raids on Moscow in World War II, a distinguished Soviet professor of statistics showed up in his local air-raid shelter. He had never appeared there before. “There are seven million people in Moscow,” he used to say. “Why should I expect them to hit me?” His friends were astonished to see him and asked what had happened to change his mind. “Look,” he explained, “there are seven million people in Moscow and one elephant. Last night they got the elephant.”

This story is a modern version of the thunderstorm phobias analyzed in the Port-Royal Logic, but it differs at a critical point from the moral of the example cited there. In this case, the individual involved was keenly aware of the mathematical probability of being hit by a bomb. What the professor’s experience really illuminates, therefore, is the dual character that runs throughout everything to do with probability: past frequencies can collide with degrees of belief when risky choices must be made.

The story has more to it than that. It echoes the concerns of Graunt, Petty, and Halley, When complete knowledge of the future—or even of the past—is an impossibility, how representative is the information we have in hand? Which counts for more, the seven million humans or the elephant? How should we evaluate new information and incorporate it into degrees of belief developed from prior information? Is the theory of probability a mathematical toy or a serious instrument for forecasting?

Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details—in the quality of information that forms the basis of probability estimates. This chapter describes a sequence of giant steps over the course of the eighteenth century that revolutionized the uses of information and the manner in which probability theory can be applied to decisions and choices in the modern world.

image

The first person to consider the linkages between probability and the quality of information was another and older Bernoulli, Daniel’s uncle Jacob, who lived from 1654 to 1705.1 Jacob was a child when Pascal and Fermat performed their mathematical feats, and he died when his nephew Daniel was only five years old. Talented like all the Bernoullis, he was a contemporary of Isaac Newton and had sufficient Bernoullian ill temper and hubris to consider himself a rival of that great English scientist.

Merely raising the questions that Jacob raised was an intellectual feat in itself, quite apart from offering answers as well. Jacob undertook this task, he tells us, after having meditated on it for twenty years; he completed his work only when he was approaching the age of 80, shortly before he died in 1705.

Jacob was an exceptionally dour Bernoulli, especially toward the end of his life, though he lived during the bawdy and jolly age that followed the restoration of Charles II in 1660.a One of Jacob’s more distinguished contemporaries, for example, was John Arbuthnot, Queen Anne’s doctor, a Fellow of the Royal Society, and an amateur mathematician with an interest in probability that he pepped up with a generous supply of off-color examples to illustrate his points. In one of Arbuthnot’s papers, he considered the odds on whether “a woman of twenty has her maidenhead” or whether “a town-spark of that age ‘has not been clap’d.’”2

Jacob Bernoulli had first put the question of how to develop probabilities from sample data in 1703. In a letter to his friend Leibniz, he commented that he found it strange that we know the odds of throwing a seven instead of an eight with a pair of dice, but we do not know the probability that a man of twenty will outlive a man of sixty. Might we not, he asks, find the answer to this question by examining a large number of pairs of men of each age?

In responding to Bernoulli, Leibniz took a dim view of this approach. “[N]ature has established patterns originating in the return of events,” he wrote, “but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary.”3 Although Leibniz wrote this letter in Latin, he put the expression, “but only for the most part” into Greek: ως επι το πολν. Perhaps this was to emphasize his point that a finite number of experiments such as Jacob suggested would inevitably be too small a sample for an exact calculation of nature’s intentions.b Although Leibniz wrote this letter in Latin, he put the expression, “but only for the most part” into Greek: ως επι το πολν. Perhaps this was to emphasize his point that a finite number of experiments such as Jacob suggested would inevitably be too small a sample for an exact calculation of nature’s intentions.b

Jacob was not deterred by Leibniz’s response, but he did change the manner in which he went about solving the problem. Leibniz’s admonition in Greek would not be forgotten.

Jacob’s effort to uncover probabilities from sample data appears in his Ars Conjectandi (The Art of Conjecture), the work that his nephew Nicolaus finally published in 1713, eight years after Jacob’s death.4 His interest was in demonstrating where the art of thinking—objective analysis—ends and the art of conjecture begins. In a sense, conjecture is the process of estimating the whole from the parts.

Jacob’s analysis begins with the observation that probability theory had reached the point where, to arrive at a hypothesis about the likelihood of an event, “it is necessary only to calculate exactly the number of possible cases, and then to determine how much more likely it is that one case will occur than another.” The difficulty, as Jacob goes on to point out, is that the uses of probability are limited almost exclusively to games of chance. Up to that point, Pascal’s achievement had amounted to little more than an intellectual curiosity.

For Jacob, this limitation was extremely serious, as he reveals in a passage that echoes Leibniz’s concerns:

But what mortal . . . could ascertain the number of diseases, counting all possible cases, that afflict the human body . . . and how much more likely one disease is to be fatal than another—plague than dropsy . . . or dropsy than fever—and on that basis make a prediction about the relationship between life and death in future generations?

. . . [W[ho can pretend to have penetrated so deeply into the nature of the human mind or the wonderful structure of the body that in games which depend . . . on the mental acuteness or physical agility of the players he would venture to predict when this or that player would win or lose?

Jacob is drawing a crucial distinction between reality and abstraction in applying the laws of probability. For example, Paccioli’s incomplete game of balla and the unfinished hypothetical World Series that we analyzed in the discussion of Pascal’s Triangle bear no resemblance to real-world situations. In the real world, the contestants in a game of balla or in a World Series have differing “mental acuteness or physical agility,” qualities that I ignored in the oversimplified examples of how to use probability to forecast outcomes. Pascal’s Triangle can provide only hints about how such real-life games will turn out.

The theory of probability can define the probabilities at the gaming casino or in a lottery—there is no need to spin the roulette wheel or count the lottery tickets to estimate the nature of the outcome—but in real life relevant information is essential. And the bother is that we never have all the information we would like. Nature has established patterns, but only for the most part. Theory, which abstracts from nature, is kinder: we either have the information we need or else we have no need for information. As I quoted Fischer Black as saying in the Introduction, the world looks neater from the precincts of MIT on the Charles River than from the hurly-burly of Wall Street by the Hudson.

In our discussion of Paccioli’s hypothetical game of balla and our imaginary World Series, the long-term records, the physical capabilities, and the I.Q.s of the players were irrelevant. Even the nature of the game itself was irrelevant. Theory was a complete substitute for information.

Real-life baseball fans, like aficionados of the stock market, assemble reams of statistics precisely because they need that information in order to reach judgments about capabilities among the players and the teams—or the outlook for the earning power of the companies trading on the stock exchange. And even with thousands of facts, the track record of the experts, in both athletics and finance, proves that their estimates of the probabilities of the final outcomes are open to doubt and uncertainty.

Pascal’s Triangle and all the early work in probability answered only one question: what is the probability of such-and-such an outcome? The answer to that question has limited value in most cases, because it leaves us with no sense of generality. What do we really know when we reckon that Player A has a 60% chance of winning a particular game of balla? Can that likelihood tell us whether he is skillful enough to win 60% of the time against Player B? Victory in one set of games is insufficient to confirm that expectation. How many times do Messrs. A and B have to play before we can be confident that A is the superior player? What does the outcome of this year’s World Series tell us about the probability that the winning team is the best team all the time not just in that particular series? What does the high proportion of deaths from lung cancer among smokers signify about the chances that smoking will kill you before your time? What does the death of an elephant reveal about the value of going to an air-raid shelter?

But real-life situations often require us to measure probability in precisely this fashion—from sample to universe. In only rare cases does life replicate games of chance, for which we can determine the probability of an outcome before an event even occurs—a priori, as Jacob Bernoulli puts it. In most instances, we have to estimate probabilities from what happened after the fact—a posteriori. The very notion of a posteriori implies experimentation and changing degrees of belief. There were seven million people in Moscow, but after one elephant was killed by a Nazi bomb, the professor decided the time had come to go to the air-raid shelter.

image

Jacob Bernoulli’s contribution to the problem of developing probabilities from limited amounts of real-life information was twofold. First, he defined the problem in this fashion before anyone else had even recognized the need for a definition. Second, he suggested a solution that demands only one requirement. We must assume that “under similar conditions, the occurrence (or non-occurrence) of an event in the future will follow the same pattern as was observed in the past.”5

This is a giant assumption. Jacob may have complained that in real life there are too few cases in which the information is so complete that we can use the simple rules of probability to predict the outcome. But he admits that an estimate of probabilities after the fact also is impossible unless we can assume that the past is a reliable guide to the future. The difficulty of that assignment requires no elaboration.

The past, or whatever data we choose to analyze, is only a fragment of reality. That fragmentary quality is crucial in going from data to a generalization. We never have all the information we need (or can afford to acquire) to achieve the same confidence with which we know, beyond a shadow of a doubt, that a die has six sides, each with a different number, or that a European roulette wheel has 37 slots (American wheels have 38 slots), again each with a different number. Reality is a series of connected events, each dependent on another, radically different from games of chance in which the outcome of any single throw has zero influence on the outcome of the next throw. Games of chance reduce everything to a hard number, but in real life we use such measures as “a little,” “a lot,” or “not too much, please” much more often than we use a precise quantitative measure.

Jacob Bernoulli unwittingly defined the agenda for the remainder of this book. From this point forward, the debate over managing risk will converge on the uses of his three requisite assumptions—full information, independent trials, and the relevance of quantitative valuation. The relevance of these assumptions is critical in determining how successfully we can apply measurement and information to predict the future. Indeed, Jacob’s assumptions shape the way we view the past itself: after the fact, can we explain what happened, or must we ascribe the event to just plain luck (which is merely another way of saying we are unable to explain what happened)?

image

Despite all the obstacles, practicality demands that we assume, sometimes explicitly but more often implicitly, that Jacob’s necessary conditions are met, even when we know full well that reality differs from the ideal case. Our answers may be sloppy, but the methodology developed by Jacob Bernoulli and the other mathematicians mentioned in this chapter provides us with a powerful set of tools for developing probabilities of future outcomes on the basis of the limited data provided by the past.

Jacob Bernoulli’s theorem for calculating probabilities a posteriori is known as the Law of Large Numbers. Contrary to the popular view, this law does not provide a method for validating observed facts, which are only an incomplete representation of the whole truth. Nor does it say that an increasing number of observations will increase the probability that what you see is what you are going to get. The law is not a design for improving the quality of empirical tests: Jacob took Leibniz’s advice to heart and rejected his original idea of finding firm answers by means of empirical tests.

Jacob was searching for a different probability. Suppose you toss a coin over and over. The Law of Large Numbers does not tell you that the average of your throws will approach 50% as you increase the number of throws; simple mathematics can tell you that, sparing you the tedious business of tossing the coin over and over. Rather, the law states that increasing the number of throws will correspondingly increase the probability that the ratio of heads thrown to total throws will vary from 50% by less than some stated amount, no matter how small. The word “vary” is what matters. The search is not for the true mean of 50% but for the probability that the error between the observed average and the true average will be less than, say, 2%—in other words, that increasing the number of throws will increase the probability that the observed average will fall within 2% of the true average.

That does not mean that there will be no error after an infinite number of throws; Jacob explicitly excludes that case. Nor does it mean that the errors will of necessity become small enough to ignore. All the law tells us is that the average of a large number of throws will be more likely than the average of a small number of throws to differ from the true average by less than some stated amount. And there will always be a possibility that the observed result will differ from the true average by a larger amount than the specified bound. Seven million people in Moscow were apparently not enough to satisfy the professor of statistics.

The Law of Large Numbers is not the same thing as the Law of Averages. Mathematics tells us that the probability of heads coming up on any individual coin toss is 50%—but the outcome of each toss is independent of all the others. It is neither influenced by previous tosses nor does it influence future tosses. Consequently, the Law of Large Numbers cannot promise that the probability of heads will rise above 50% on any single toss if the first hundred, or million, tosses happen to come up only 40% heads. There is nothing in the Law of Large Numbers that promises to bail you out when you are caught in a losing streak.

To illustrate his Law of Large Numbers, Jacob hypothesized a jar filled with 3000 white pebbles and 2000 black pebbles, a device that has been a favorite of probability theorists and inventors of mind-twisting mathematical puzzles ever since. He stipulates that we must not know how many pebbles there are of each color. We draw an increasing number of pebbles from the jar, carefully noting the color of each pebble before returning it to the jar. If drawing more and more pebbles can finally give us “moral certainty”—that is, certainty as a practical matter rather than absolute certainty—that the ratio is 3:2, Jacob concludes that “we can determine the number of instances a posteriori with almost as great accuracy as if they were know to us a priori.”6 His calculations indicate that 25,550 drawings from the jar would suffice to show, with a chance exceeding 1000/1001, that the result would be within 2% of the true ratio of 3:2. That’s moral certainty for you.

Jacob does not use the expression “moral certainty” lightly. He derives it from his definition of probability, which he draws from earlier work by Leibniz. “Probability,” he declares, “is degree of certainty and differs from absolute certainty as the part differs from the whole.”7

But Jacob moves beyond Leibniz in considering what “certainty” means. It is our individual judgments of certainty that attract Jacob’s attention, and a condition of moral certainty exists when we are almost completely certain. When Leibniz introduced the concept, he had defined it as “infinitely probable.” Jacob himself is satisfied that 1000/1001 is close enough, but he is willing to be flexible: “It would be useful if the magistrates set up fixed limits for moral certainty.”8

image

Jacob is triumphant. Now, he declares, we can make a prediction about any uncertain quantity that will be just as scientific as the predictions made in games of chance. He has elevated probability from the world of theory to the world of reality:

If, instead of the jar, for instance, we take the atmosphere or the human body, which conceal within themselves a multitude of the most varied processes or diseases, just as the jar conceals the pebbles, then for these also we shall be able to determine by observation how much more frequently one event will occur than another.9

Yet Jacob appears to have had trouble with his jar of pebbles. His calculation that 25,550 trials would be necessary to establish moral certainty must have struck him as an intolerably large number; the entire population of his home town of Basel at that time was less than 25,550. We must surmise that he was unable to figure out what to do next, for he ends his book right there. Nothing follows but a wistful comment about the difficulty of finding real-life cases in which all the observations meet the requirement that they be independent of one another:

If thus all events through all eternity could be repeated, one would find that everything in the world happens from definite causes and according to definite rules, and that we would be forced to assume amongst the most apparently fortuitous things a certain necessity, or, so to say, FATE.10

Nevertheless, Jacob’s jar of pebbles deserves the immortality it has earned. Those pebbles became the vehicle for the first attempt to measure uncertainty—indeed, to define it—and to calculate the probability that an empirically determined number is close to a true value even when the true value is an unknown.

image

Jacob Bernoulli died in 1705. His nephew Nicolaus—Nicolaus the Slow—continued to work on Uncle Jacob’s efforts to derive future probabilities form known observations even while he was inching along toward the completion of Ars Conjectandi. Nicolaus’s results were published in 1713, the same year in which Jacob’s book finally appeared.

Jacob had started with the probability that the error between an observed value and the true value would fall within some specified bound; he then went on to calculate the number of observations needed to raise the probability to that amount. Nicolaus tried to turn his uncle’s version of probability around. Taking the number of observations as given, he then calculated the probability that they would fall within the specified bound. He used an example in which he assumed that the ratio of male to female births was 18:17. With, say, a total of 14,000 births, the expected number of male births would be 7,200. He then calculated that the odds are at least 43.58-to-l that the actual number of male births would fall between 7,200 +163 and 7,200 −163, or between 7,363 and 7,037.

In 1718, Nicolaus invited a French mathematician named Abraham de Moivre to join him in his research, but de Moivre turned him down: “I wish I were capable of . . . applying the Doctrine of Chances to Oeconomical and Political Uses [but] I willingly resign my share of that task to better Hands.”11 Nevertheless, de Moivre’s response to Nicolaus reveals that the uses of probability and forecasting had come a long way in just a few years.

De Moivre had been born in 1667—thirteen years after Jacob Bernoulli—as a Protestant in a France that was increasingly hostile to anyone who was not Catholic.12 In 1685, when de Moivre was 18 years old, King Louis XIV revoked the Edict of Nantes, which had been promulgated under the Protestant-born King Henri IV in 1598 to give Protestants, known as Huguenots, equal political rights with Catholics. After the revocation, exercise of the reformed religion was forbidden, children had to be educated as Catholics, and emigration was prohibited. De Moivre was imprisoned for over two years for his beliefs. Hating France and everything to do with it, he managed to flee to London in 1688, where the Glorious Revolution had just banished the last vestiges of official Catholicism. He never returned to his native country.

De Moivre led a gloomy, frustrating life in England. Despite many efforts, he never managed to land a proper academic position. He supported himself by tutoring in mathematics and by acting as a consultant to gamblers and insurance brokers on applications of probability theory. For that purpose, he maintained an informal office at Slaughter’s Coffee House in St. Martin’s Lane, where he went most afternoons after his tutoring chores were over. Although he and Newton were friends, and although he was elected to the Royal Society when he was only thirty, he remained a bitter, introspective, antisocial man. He died in 1754, blind and poverty-stricken, at the age of 87.

In 1725, de Moivre had published a work titled Annuities upon Lives, which included an analysis of Halley’s tables on life and death in Breslaw. Though the book was primarily a work in mathematics, it suggested important questions related to the puzzles that the Bernoullis were trying to resolve and that de Moivre would later explore in great detail.

Stephen Stigler, a historian of statistics, offers an interesting example of the possibilities raised by de Moivre’s work in annuities. Halley’s table showed that, of 346 men aged fifty in Breslaw, only 142, or 41%, survived to age seventy. That was only a small sample. To what extent could we use the result to generalize about the life expectancy of men fifty years old? De Moivre could not use these numbers to determine the probability that a man of fifty had a less than 50% chance of dying by age seventy, but he would be able to answer this question: “If the true chance were 1/2, what is the probability a ratio as small as 142/346 or smaller should occur?”

De Moivre’s first direct venture into the subject of probability was a work titled De Mensura Sortis (literally, On the Measurement of Lots). This work was first published in 1711 in an issue of Philosophical Transactions, the publication of the Royal Society. In 1718, de Moivre issued a greatly expanded English edition titled The Doctrine of Chances, which he dedicated to his good friend Isaac Newton. The book was a great success and went through two further editions in 1738 and 1756. Newton was sufficiently impressed to tell his students on occasion, “Go to Mr. de Moivre; he knows these things better than I do.” De Mensura Sortis is probably the first work that explicitly defines risk as chance of loss: “The Risk of losing any sum is the reverse of Expectation; and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss.”

In 1730, de Moivre finally turned to Nicolaus Bernoulli’s project to ascertain how well a sample of facts represented the true universe from which the sample was drawn. He published his complete solution in 1733 and included it in the second and third editions of Doctrine of Chances. He begins by acknowledging that Jacob and Nicolaus Bernoulli “have shewn very great skill . . . . [Y]et some things were farther required.” In particular, the approach taken by the Bernoullis appeared “so laborious, and of so great difficulty, that few people have undertaken the task.”

The need for 25,550 trials was clearly an obstacle. Even if, as James Newman has suggested, Jacob Bernoulli had been willing to settle for the “immoral certainty” of an even bet—probability of 50/100—that the result would be within 2% of the true ratio of 3:2, 8,400 drawings would be needed. Jacob’s selection of a probability of 1000/1001 is in itself a curiosity by today’s standards, when most statisticians accept odds of 1 in 20 as sufficient evidence that a result is significant (today’s lingo for moral certainty) rather than due to mere chance.

De Moivre’s advance in the resolution of these problems ranks among the most important achievements in mathematics. Drawing on both the calculus and on the underlying structure of Pascal’s Triangle, known as the binomial theorem, de Moivre demonstrated how a set of random drawings, as in Jacob Bernoulli’s jar experiment, would distribute themselves around their average value. For example, assume that you drew a hundred pebbles in succession from Jacob’s jar, always returning each pebble drawn, and noted the ratio of white to black. Then assume you made a series of successive drawings, each of a hundred balls. De Moivre would be able to tell you beforehand approximately how many of those ratios would be close to the average ratio of the total number of drawings and how those individual ratios would distribute themselves around the grand average.

De Moivre’s distribution is known today as a normal curve, or, because of its resemblance to a bell, as a bell curve. The distribution, when traced out as a curve, shows the largest number of observations clustered in the center, close to the average, or mean, of the total number of observations. The curve then slopes symmetrically downward, with an equal number of observations on either side of the mean, descending steeply at first and then exhibiting a flatter downward slope at each end. In other words, observations far from the mean are less frequent than observations close to the mean.

The shape of de Moivre’s curve enabled him to calculate a statistical measure of its dispersion around the mean. This measure, now known as the standard deviation, is critically important in judging whether a set of observations comprises a sufficiently representative sample of the universe of which they are just a part. In a normal distribution, approximately 68% of the observations will fall within one standard deviation of the mean of all the observations, and 95% of them will fall within two standard deviations of the mean.

The standard deviation can tell us whether we are dealing with a case of the head-in-the-oven-feet-in-the-refrigerator, where the average condition of this poor man is meaningless in telling us how he feels. Most of the readings would be far from the average of how he felt around his middle. The standard deviation can also tell us that Jacob’s 25,550 draws of pebbles would provide an extremely accurate estimate of the division between the black and white pebbles inside the jar, because relatively few observations would be outliers, far from the average.

De Moivre was impressed with the orderliness that made its appearance as the numbers of random and unconnected observations increased; he ascribed that orderliness to the plans of the Almighty. It conveys the promise that, under the right conditions, measurement can indeed conquer uncertainty and tame risk. Using italics to emphasize the significance of what he had to say, de Moivre summarized his accomplishment: “[A]tho’ Chance produces Irregularities, still the Odds will be infinitely great, that in process of Time, those Irregularities will bear no proportion to recurrency of that Order which naturally results from ORIGINAL DESIGN.”13

image

De Moivre’s gift to mathematics was an instrument that made it possible to evaluate the probability that a given number of observations will fall within some specified bound around a true ratio. That gift has provided many practical applications.

For example, all manufacturers worry that defective products may slip through the assembly line and into the hands of customers. One hundred percent perfection is a practical impossibility in most instances—the world as we know it seems to have an incurable habit of denying us perfection.

Suppose the manager of a pin factory is trying to hold down the number of defective pins to no more than 10 out of every 100,000 produced, or 0.01% or the total.14 To see how things are going, he takes a random sample of 100,000 pins as they come off the assembly line and finds 12 pins without heads—two more than the average of 10 defectives that he had hoped to achieve. How important is that difference? What is the probability of finding 12 defective pins out of a sample of 100,000 if, on the average, the factory would be turning out 10 defective pins out of every 100,000 produced? De Moivre’s normal distribution and standard deviation provide the answer.

But that is not the sort of question that people usually want answered. More often, they do not know for certain before the fact how many defective units the factory is going to produce on the average. Despite good intentions, the true ratio of defectives could end up higher than 10 per 100,000 on the average. What does that sample of 100,000 pins reveal about the likelihood that the average ratio of defectives will exceed 0.01% of the total? How much more could we learn from a sample of 200,000? What is the probability that the average ratio of defectives will fall between 0.009% and 0.011%? Between .007% and .013%? What is the probability that any single pin I happen to pick up will be defective?

In this scenario, the data are given—10 pins, 12 pins, 1 pin—and the probability is the unknown. Questions put in this manner form the subject matter of what is known as inverse probability: with 12 defective pins out of 100,000, what is the probability that the true average ratio of defectives to the total is 0.01%?

image

One of the most effective treatments of such questions was proposed by a minister named Thomas Bayes, who was born in 1701 and lived in Kent.15 Bayes was a Nonconformist; he rejected most of the ceremonial rituals that the Church of England had retained from the Catholic Church after their separation in the time of Henry VIII.

Not much is known about Bayes, even though he was a Fellow of the Royal Society. One otherwise dry and impersonal textbook in statistics went so far as to characterize him as “enigmatic.”16 He published nothing in mathematics while he was alive and left only two works that were published after his death but received little attention when they appeared.

Yet one of those papers, Essay Towards Solving A Problem In The Doctrine Of Chances, was a strikingly original piece of work that immortalized Bayes among statisticians, economists, and other social scientists. This paper laid the foundation for the modern method of statistical inference, the great issue first posed by Jacob Bernoulli.

When Bayes died in 1761, his will, dated a year earlier, bequeathed the draft of this essay, plus one hundred pounds sterling, to “Richard Price, now I suppose a preacher at Newington Green.”17 It is odd that Bayes was so vague about Richard Price’s location, because Price was more than just a preacher in Islington in north London.

Richard Price was a man with high moral standards and a passionate belief in human freedom in general and freedom of religion in particular. He was convinced that freedom was of divine origin and therefore was essential for moral behavior; he declared that it was better to be free and sin than to be someone’s slave. In the 1780s, he wrote a book on the American Revolution with the almost endless tide of Observations on the Importance of the American Revolution and the Means of Making it a Benefit to the World in which he expressed his belief that the Revolution was ordained by God. At some personal risk, he cared for the American prisoners of war who had been transferred to camps in England. Benjamin Franklin was a good friend, and Adam Smith was an acquaintance. Price and Franklin read and criticized some of the draft chapters of The Wealth of Nations as Smith was writing it.

One freedom bothered Price: the freedom to borrow. He was deeply concerned about the burgeoning national debt, swollen by the wars against France and by the war against the colonies in North America. He complained that the debt was “funding for eternity” and dubbed it the “Grand National Evil.”18

But Price was not just a minister and a passionate defender of human freedom. He was also a mathematician whose work in the field of probability was impressive enough to win him membership in the Royal Society.

In 1765, three men from an insurance company named the Equitable Society called on Price for assistance in devising mortality tables on which to base their premiums for life insurance and annuities. After studying the work of Halley and de Moivre, among others, Price published two articles on the subject in Philosophical Transactions; his biographer, Carl Cone, reports that Price’s hair is alleged to have turned gray in one night of intense concentration on the second of these articles.

Price started by studying records kept in London, but the life expectancies in those records turned out to be well below actual mortality rates.19 He then turned to the shire of Northampton, where records were more carefully kept than in London. He published the results of his study in 1771 in a book titled Observations on Reversionary Payments, which was regarded as the bible on the subject until well into the nineteenth century. This work has earned him the title of the founding father of actuarial science—the complex mathematical work in probability that is performed today in all insurance companies as the basis for calculating premiums.

And yet Price’s book contained serious, costly errors, in part because of an inadequate data base that omitted the large number of unregistered births. Moreover, he overestimated death rates at younger ages and underestimated them at later ages, and his estimates of migration into and out of Northampton were flawed. Most serious, he appears to have underestimated life expectancies, with the result that the life-insurance premiums were much higher than they needed to be. The Equitable Society flourished on this error; the British government, using the same tables to determine annuity payments to its pensioners, lost heavily.20

image

Two years later, after Bayes had died, Price sent a copy of Bayes’s “very ingenious” paper to a certain John Canton, another member of the Royal Society, with a cover letter that tells us a good deal about Bayes’s intentions in writing the paper. In 1764, the Royal Society subsequently published Bayes’s essay in Philosophical Transactions, but even then his innovative work languished in obscurity for another twenty years.

Here is how Bayes put the problem he was trying to solve:

PROBLEM

Given that the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.21

The problem as set forth here is precisely the inverse of the problem as defined by Jacob Bernoulli some sixty years earlier (page 118). Bayes is asking how we can determine the probability that an event will occur under circumstances where we know nothing about it except that it has occurred a certain number of times and has failed to occur a certain number of other times. In other words, a pin could be either defective or it could be perfect. If we identify ten defective pins out of a sample of a hundred, what is the probability that the total output of pins—not just any sample of a hundred—will contain between 9% and 11% defectives?

Price’s cover letter to Canton reflects how far the analysis of probability had advanced into the real world of decision-making over just a hundred years. “Every judicious person,” Price writes, “will be sensible that the problem now mentioned is by no means a curious speculation in the doctrine of chances, but necessary to be solved in order to [provide] a sure foundation for all our reasonings concerning past facts, and what is likely to be hereafter.”22 He goes on to say that neither Jacob Bernoulli nor de Moivre had posed the question in precisely this fashion, though de Moivre had described the difficulty of reaching his own solution as “the hardest that can be proposed on the subject of chance.”

Bayes used an odd format to prove his point, especially for a dissenting minister: a billiard table. A ball is rolled across the table, free to stop anywhere and thereafter to remain at rest. Then a second ball is rolled repeatedly in the same fashion, and a count is taken of the number of times it stops to the right of the first ball. That number is “the number of times in which an unknown event has happened.” Failure—the number of times the event does not happen—occurs when the ball lands to the left. The probability of the location of the first ball—a single trial—is to be deduced from the “successes” and “failures” of the second.23

The primary application of the Bayesian system is in the use of new information to revise probabilities based on old information, or, in the language of the statisticians, to compare posterior probability with the priors. In the case of the billiard balls, the first ball represents the priors and the continuous revision of estimates as to its location as the second ball is repeatedly thrown represents the posterior probabilities.

This procedure of revising inferences about old information as new information arrives springs from a philosophical viewpoint that makes Bayes’s contribution strikingly modern: in a dynamic world, there is no single answer under conditions of uncertainty. The mathematician A.F.M. Smith has summed it up well: “Any approach to scientific inference which seeks to legitimise an answer in response to complex uncertainty is, for me, a totalitarian parody of a would-be rational learning process.”24

Although the Bayesian system of inference is too complex to recite here in detail, an example of a typical application of Bayesian analysis appears in the appendix to this chapter.

image

The most exciting feature of all the achievements mentioned in this chapter is the daring idea that uncertainty can be measured. Uncertainty means unknown probabilities; to reverse Hacking’s description of certainty, we can say that something is uncertain when our information is correct and an event fails to happen, or when our information is incorrect and an event does happen.

Jacob Bernoulli, Abraham de Moivre, and Thomas Bayes showed how to infer previously unknown probabilities from the empirical facts of reality. These accomplishments are impressive for the sheer mental agility demanded, and audacious for their bold attack on the unknown. When de Moivre invoked ORIGINAL DESIGN, he made no secret of his wonderment at his own accomplishments. He liked to turn such phrases; at another point, he writes, “If we blind not ourselves with metaphysical dust we shall be led by a short and obvious way, to the acknowledgment of the great MAKER and GOUVERNOUR of all.”25

We are by now well into the eighteenth century, when the Enlightenment identified the search for knowledge as the highest form of human activity. It was a time for scientists to wipe the metaphysical dust from their eyes. There were no longer any inhibitions against exploring the unknown and creating the new. The great advances in the efforts to tame risk in the years before 1800 were to take on added momentum as the new century approached, and the Victorian era would provide further impulse.

APPENDIX: AN EXAMPLE OF THE BAYESIAN SYSTEM OF STATISTICAL INFERENCE IN ACTION

We return to the pin-manufacturing company. The company has two factories, the older of which produces 40% of the total output. This means that a pin picked up at random has a 40% probability of coming from the old factory, whether it is defective or perfect; this is the prior probability. We find that the older factory’s defective rate is twice that found in the newer factory. If a customer calls and complains about finding a defective pin, which of the two factories should the manager call?

The prior probability would suggest that the defective pin was most likely to have come from the new plant, which produces 60% of the total. On the other hand, that plant produces only one-third of the company’s total of defective pins. When we revise the priors to reflect this additional information, the probability that the new plant made the defective pin turns out to be only 42.8%; there is a 57.2% probability that the older plant is the culprit. This new estimate becomes the posterior probability.

a He did have sufficient poetry in his soul to request that the beautiful Fibonacci spiral be engraved on his tombstone, claiming that the way it could grow without changing its form was “a symbol of fortitude and constancy in adversity: or even of the resurrection of our flesh.” He went on to ask that it be inscribed with the epitaph “Eadem mutata resurgo” [However changed it is always the same]. See David, 1962, p. 139.

b At a later point in the correspondence with Jacob, Leibniz observed, “It is certain that someone who tried to use modern observations from London and Paris to judge mortality rates of the Fathers before the flood would enormously deviate from the truth.” (Hacking, 1975, p. 164.)

Notes

1. Background material on Jacob Bernoulli is from Newman, 1988f.

2. Hacking, 1975, p. 166; see also Kendall, 1974.

3. Gesammelte Werke (ed. Pertz and Gerhardt), Halle 1855, vol. 3, pp. 71–97. I am grateful to Marta Steele and Doris Bullard for the translation into English. Chapter XXX of Keynes, 1921, has an illuminating discussion of this exchange between Leibniz and Bernoulli.

4. An excellent analysis of Ars Conjectandi may be found in David, 1962, pp. 133–139 and in Stigler, 1986, pp. 63–78.

5. Bernoulli, Jacob, 1713, p. 1430.

6. Ibid., p. 1431.

7. Hacking, 1975, p. 145.

8. Ibid., p. 146.

9. Ibid., p. 163.

10. David, 1962, p. 137.

11. Stigler, 1986, p. 71. This book was an invaluable source of information for this chapter.

12. The background material on de Moivre is from Stigler, 1986, Chapter 2, and David, 1962, Chapter XV.

13. Stigler, 1986, p. 85.

14. This example is freely adapted from Groebner and Shannon, 1993, Chapter 20.

15. The background material on Bayes is from Stigler, 1986, and Cone, 1952.

16. Groebner and Shannon, 1993, p. 1014.

17. Stigler, 1986, p. 123.

18. Cone, 1952, p. 50.

19. Ibid., p. 41.

20. Ibid., pp. 42–44.

21. Bayes, 1763.

22. Price’s letter of transmittal and Bayes’s essay are reprinted in Kendall and Plackett, 1977, pp. 134–141.

23. An excellent description of this experiment may be found in Stigler, 1986, pp. 124–130.

24. Smith, 1984. This paper contains an excellent analysis of the Bayesian approach.

25. David, 1962, p. 177.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset