CHAPTER 1

Introduction

The Internet, the World Wide Web, and the concept of service delivery have revolutionized the way commercial, academic, governmental, and nongovernmental organizations deal with their supplies and their clients and customers. Individuals and organizations are overwhelmed with data produced by IT systems that are so pervasive throughout society, government, and business. The wide variety and huge numbers of data sources including sensors, cell phones, tablets, and other devices is increasing at a seemingly exponential rate. Estimates (2010) were that all sources of data, including replicated data such as retweets and resends of e-mail, amount to tens of exabytes per month—that is 1018 or 1,000,000,000,000,000,000 bytes. The numbers are staggering, and, obviously, no one knows for sure. In 2012, the International Data Corporation (IDC) stated there were 2.8 zettabytes (ZB) and forecasted that we will generate 40 ZB by 2020 (http://www.webopedia.com/quick_ref/just-how-much-data-is-out-there.html). Our data generation is growing exponentially.

Individuals and organizations do not actively collect, own, process, or analyze this much data themselves. However, many individuals and organizations acquire and deal with gigabytes of data, and many organizations utilize terabytes and petabytes of data per year. Senior executives and managers in government, academia, and business operations are grappling with the deluge of data available to them and trying to make sense—decisions and conclusions based upon it. A critical area is how to collect, organize, process, store, and analyze this flood of data in order to deliver superior service to their client and customer base—both internal and external.

Much of this data is generated from the services sector of the economy: health, manufacturing, marketing, telecommunications, and so on. To address this wealth of data and the underlying technology and practices, IBM pioneered the term service science to encompass the broad spectrum of business, teaching, and research expertise to develop the capabilities to sustain and advance the services environment that we live in today. Advances in technology have made large volumes of data available to users and providers within the services environment. This volume of data has come to be called Big Data and has its own business, teaching, and research expertise associated with it.

This book will describe how coordinating and integrating the expertise between the services environment and the Big Data environment has and is leading to enhanced service delivery to customers and clients and increasing revenue and profit in many industries. However, despite the attention given in the popular press and the blog-o-sphere, many more opportunities exist and even more will be invented so more organizations can derive benefits and value from the analysis of Big Data.

Defining Big Data

There are many definitions of Big Data. Our preferred definition, cited in Kaisler et al. (2013) is: Big Data is the volume of data that cannot be efficiently organized and processed with the storage and tools that we currently possess. Under certain circumstances, we can organize, process, and analyze Big Data. However, we cannot do it very efficiently or effectively. For example, because we cannot process a real-time data stream fast enough, we cannot generate results that will enable decision-making within a specified observe, orient, decide, and act (OODA) cycle.

While Big Data often implies many very large volumes of data, as Leslie Johnston (2013) noted it can also imply that “Big Data can most definitely mean small data files but a lot of them.” These extremes present challenges to business and IT managers and owners on a continuing daily basis.

How do you know when you are facing Big Data? Well, the transition from organizational data and databases to Big Data is not exact, but there are a number of characteristics that can be used to help one understand when the transition occurs.

Big Data has been characterized by several attributes. We have defined the five Vs as described in Table 1.1. The initial three Vs were first stated by Doug Laney (2001). Based on our research and experience, we added the last two Vs.

Table 1.1 Five Vs of Big Data

V

Description

Data volume

Data volume measures the amount of data collected by and available to an organization, which does not necessarily have to own all of it as long as it can access it. As data volume increases, the value of different data records will decrease in proportion to age, type, richness, and quantity among other factors. It is estimated that over 2.5 exabytes (1018) of data are created every day as of 2012 (Wikipedia 2013).

Data velocity

Data velocity measures the speed of data streamng, its aggregation, and its accumulation. Data velocity also has connotations of how quickly it gets purged, how frequently it changes, and how fast it becomes outdated. e-commerce has rapidly increased the speed and richness of data used for different business transactions (e.g., website clicks). Data velocity management is much more than a bandwidth issue; it is also an ingest issue (the extract-transform-load (ETL) problem).

Data variety

Data variety is a measure of the richness of the data representation—either structured, such as resource description framework (RDF) files, databases, and Excel tables or unstructured, such as text, audio files, and video. From an analytic perspective, it is probably the biggest obstacle to effectively using large volumes of data. Incompatible data formats, nonaligned data structures, and inconsistent data semantics represent significant challenges that can lead to analytic sprawl.

Data value

Data value measures the usefulness of data in making decisions. It has been noted that “the purpose of computing is insight, not numbers.” Data science is exploratory and useful in getting to know the data, but “analytic science” encompasses the predictive power of Big Data. A large amount of data may be valueless if it is perishable, late, imprecise, or has other weaknesses or flaws.

Data veracity

Data veracity is the accuracy, precision, and reliability of the data. A data set may have very accurate data with low precision and low reliability based on the collection methods and tools or the data generation methods. The information and results generated by processing this data may then be seriously flawed or compromised.

Big Data has often been used to represent a large volume of data of one type, such as text or numbers or pixels. Recently, many organizations are creating blended data from data sources with varied types through analysis. These data come from instruments, sensors, Internet transactions, e-mail, social media such as Twitter, YouTube, Reddit, Pinterest, Tumblr, RFID devices, and from clickstreams. New data types may be derived through analysis or joining different types of data.

Getting Started with Big Data

Research and practical analysis have shown that there are many areas where you can focus your attention. We will review four process areas that are enormously fruitful for immediate analysis of processes using the tools that can best be applied to Big Data.

First, data categorization can aid in the analysis of Big Data because the tools available now permit machine-learning algorithms to explore large and varied data collections through machines that are trained or seeded with previously known classifications, such as process output or groupings. The value of these groupings is that they will provide classifications or labeled analysis variables that may then be used to discover a relationship or a predictor that shows or predicts the value of or output of the process. It may be the result of a hidden value that is a part—input, subprocess, step, or activity—within the process or a characteristic of the process input. This analysis is truly a directed discovery process and organizational learning experience enabled by Big Data (Deng, Runger, and Tuv 2012; Hwang, Runger, and Tuv 2007). The data does not have to be fully classified or categorized because specific techniques can be applied to assign grouping or to cluster data or process outputs that do not appear within the previous groupings (Zhang et al. 2010). The value is that this categorization develops insights required to understand the root-causes for underlying problems by discovering relations that were simply not previously visible. This process was used in cancer prediction, diagnosis, and in understanding its development (Cruz and Wishart 2006).

Secondly, functional data analysis can be implemented because it permits discrete calculations due to the continuous nature of production data (Ferraty and Romain 2011). The section will not attempt to describe this analysis in great detail, but the reader should be aware of the benefits of this form of Big Data analysis. This application analysis is closely related to profile monitoring and control charting that is employed when the quality of a process or product can be characterized by a functional relationship between a production measure of some output or output value and an explanatory variable(s). We can often “see” these relationships (but may not fully understand the value) in the graphs, curves, and visual depictions prepared when data are logged and drawn as graphs or curves. The business value is obvious in being able to predict strength, changes, and locate when relationships may begin and end.

The value of this can be recognized in the potential for assessing quality characteristics, such as size and quantity data, product shapes, geometric relationships, appearances of faults and imperfections, patterns, and surface finish, while these are happening and relate them directly to end results of processes. Data may be produced by visual sensors, image observation in many medical, military, and scientific applications, or other mechanisms (Megahed, Woodall, and Camelio 2011).

Thirdly, managers must now recognize that situations that can be “drawn” as graphs showing associations between various objects in the data can be analyzed as Big Data problems. Graphic representations (Cook and Holder 2006) are seen as two connected nodes with an edge, if the nodes possess a relationship. Researchers have used these data to assess social networks (e.g., Facebook, LinkedIn), intrusions for networks, disease, or product adoption, and rating or rankings for consumer e-commerce actions (Chakrabarti and Faloutsos 2012). For example, one can identify potential changes in social network datasets or communications network. McCulloh et al. (2008) found variances in the Al-Qaeda network prior to September 11. For the business, it could be important to know that sources of data and information acquisition are changing for clients or business customers.

Finally, in the era of Big Data, data may now be available simultaneously from numerous and unrelated sources. Historically, a control chart was employed to observe multiple sources of periodic data (Boyd 1950). Automated tests can now be used to detect the likelihood of changes in only one stream of data and concurrently monitoring multiple streams based on the stream features, assessing correlations among the streams, and the scope and size of any change sought (Jirasettapong and Rojanarowan 2011).

Adding Value to Organizations

Big Data has an impact in every field of human endeavor, if the data are available and can be processed. Impact is different from value. Impact helps to advance a field with new knowledge whereas value affects how useful the resulting actionable information is—whether predicting events, making a profit, discovering a new particle, or improving the lot of our fellow humans. Big Data can add value in several ways: (1) It can make information transparent and usable at a higher frequency. (2) As more accurate data is collected, it allows organizations to conduct more controlled experiments to assess efficiency and refine business operations. (3) It can focus attention on narrower segments of the customer community for precisely specifying products and services (market segmentation). (4) Given usage data, it can be used for the specification of new products and services.

In a research study by IBM and the Said Business School, Oxford University (Turner, Schroeck, and Shockley 2012), four areas for employing Big Data were identified as described in Table 1.2.

Outline of This Book

This chapter has provided a brief introduction to some of the issues and challenges that senior executives and managers must consider in using Big Data to assess and enhance their service delivery operations. The remaining chapters provide additional information on each of the major topics presented earlier and present a framework for gaining value from this growing phenomenon.

Table 1.2 Areas for use of Big Data

Using Big Data

Brief description

Customer analytics

IBM and Said noted that 55 percent of the companies they surveyed focus their Big Data efforts on customer-centered objectives in order to improve service delivery to their diverse customer base. These companies want to improve their ability to anticipate varying market conditions and customer preferences in order to take advantage of market opportunities to improve customer service and increase customer loyalty in an agile manner.

Build upon scalable and extensible information foundation

IBM and Said noted that companies believe results can be obtained from Big Data only if the IT and information infrastructure can respond to evolving aspects of Big Data focused on the three Vs: variety, velocity, and volume. This means they must be able to evolve their IT and information infrastructure in an agile manner transparently to customer interactions. (Note: they only examined the original three Vs, but we believe that the information foundation must be focused on the five Vs.)

Initial focus is on gaining insights from existing and new sources of Big Data

IBM and Said found that most initial Big Data efforts are focused on analyzing existing data sets and stores in order to have a near-term effect on business operations. They suggest that this is a pragmatic approach to beginning to develop a Big Data usage capability. Most companies do not know what insights they will gain or how much useful and usable information they can extract from the information on hand. In many cases, the data has been collected, perhaps organized, and stored away for many years without ever being analyzed.

Requires strong analytics

Using Big Data requires a variety of analytics tools and the skills to use them. Typically, companies use such tools as data mining, online analytical processing (OLAP), statistical packages, and the like on structured data based on existing data stores, marts, and warehouses. However, as they accumulate unstructured data, the diversity of data types and structures requires new techniques for analysis and visualization. Existing tools can have trouble scaling to the volumes characteristic of Big Data and, often, cannot adequately analyze geospatial data, voice, video, or streaming data.

The remainder of this book is divided into six chapters as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset