1

The History of Big Data

Of all the data in recorded human history, 90 percent has been created in the last two years. However, the need to use and interpret such Big Data has been around for much longer. In fact, the earliest examples of using data to track and control businesses date back 7,000 years, when Mesopotamians used rudimentary accounting to record the growth of crops and herds. Accounting principles continued to improve, and in 1663, John Graunt recorded and examined all information about mortality rolls in London. He wanted to gain an understanding of and build a warning system for the ongoing bubonic plague.1 In the first recorded example of statistical data analysis, he gathered his findings in the book Natural and Political Observations Made upon the Bills of Mortality, which provides great insights into the causes of death in the seventeenth century. Because of his work, Graunt can be considered the father of statistics.

The nineteenth century witnessed the start of the information age. Modern data was first gathered in 1887, when Herman Hollerith invented a computing machine that could read holes punched into paper cards to organize census data.2

THE TWENTIETH CENTURY

In 1937, during Franklin D. Roosevelt administration, the United States created the first major data project to keep track of contributions by more than three million employers and 26 million employees under the new Social Security Act. IBM was awarded the contract to develop a punch card-reading machine for this immense bookkeeping task.3

The British developed the first data-processing machine in 1943 to decipher Nazi codes during World War II.4 The device, named Colossus, searched for patterns in intercepted messages at a rate of 5,000 characters per second. This reduced the time required to perform the task from weeks to merely hours. It was a huge step forward.

In 1952 the U.S. National Security Agency (NSA) was created and, within 10 years, it had contracts with more than 12,000 cryptologists.5 They were confronted with information overload during the Cold War, as they started collecting and processing intelligence signals automatically.

In 1965, the U.S. Government decided to build the first data center to store its more than 742 million tax returns and 175 million sets of fingerprints.6 Employees transferred all those records onto magnetic computer tape that was stored in a single location. The project was later dropped out of fear of “Big Brother,” but it represented the beginning of the electronic data storage era.

Then, in 1989, British computer scientist Tim Berners-Lee developed what eventually became the World Wide Web.7 He wanted to facilitate the sharing of information through a “hypertext” system. Little could he know at that moment the impact his invention would have on everyone.

Beginning in the 1990s, data was created at an amazing rate as more and more devices were connected to the Internet. In 1995, the first supercomputer was built; it performed as much work in a second than a calculator operated by a single person could do in 30,000 years.8

THE TWENTY-FIRST CENTURY

In 2005, Roger Mougalas of O'Reilly Media coined the term “Big Data,” a year after the company created the term Web 2.0.9 He used the term to refer to a large set of data that is almost impossible to manage and process using traditional business intelligence tools.

In that same year, Yahoo! created Hadoop on top of Google's MapReduce.10,11 Its goal was to index the entire World Wide Web; nowadays, many organizations around the world use the open-source Hadoop to crunch massive data sets.

As more and more social network sites appeared and Web 2.0 took flight, more and more data was created daily. Innovative startups slowly mined this vast amount of data and governments also began Big Data projects. In 2009, the Indian government decided to take an iris scan, fingerprint, and photograph of all of its 1.2 billion inhabitants.12 All this data is stored in the largest biometric database in the world.

By 2010, when Eric Schmidt, the Executive Chairman of Google, spoke at the Techonomy forum in Lake Tahoe, California, he put the information revolution in perspective by stating that “every two days now we create as much information as we did from the dawn of civilization up until 2003…. That's something like five exabytes of data….”13

In 2011 the well-received McKinsey report on “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” concluded that by 2018, the United States would face a shortage of 140,000 to 190,000 data scientists, as well as 1.5 million data managers.14 The job Big Data Scientist is therefore often coined the sexiest job of the twenty-first century.

In the past few years, there has been a massive increase in the number of Big Data startup companies. All are trying to help organizations manage and understand this explosion of Big Data. As more companies are slowly adopting Big Data, just as with the Internet in 1993, the Big Data revolution is still ahead of us, so a lot will change in the coming years.15

In fact, the amount of data is growing at such an explosive rate that we have gone past the decimal system. Today, U.S. agencies, such as NSA and the FBI, are talking about yottabytes when calculating the size of their files. In the (near) future, we will be talking about brontobytes regarding sensor data. Therefore, new terms have been created to describe the amount of data that is expected to be created in coming years (see Figure 1-1).

Big Data will completely change organizations and societies around the world. It is expected that the amount of data currently available will double every two years worldwide.16 So, let's take a closer look at what Big Data exactly is.

Figure 1-1 Brontobytes Infographic

images

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset