After completing this chapter, you should be able to do the following:
• Distinguish the root reason for a Big Data strategy.
• Recall the goals of Big Data.
• Identify the strategic implications of Big Data.
All business endeavors must increase value for shareholders or stakeholders. In this chapter, we’re going to consider the strategic implications of Big Data to an organization. What are the goals an organization should have in regards to Big Data? What type of strategies should be created to achieve the goals? How does an organization begin the journey of implementing Big Data? The first step for all these questions is to determine the overall objective of the organization and how Big Data will be used to help achieve that objective.
For-profit organizations focus on increasing enterprise value as their main objective or purpose. The main objective or purpose for not-for-profits must be to enhance the lives, services, and opportunities for the customers or stakeholders. In figure 4-1, notice that the center purpose is money, time, or value.
The first circle outside money, time, or value represents business insights. For the organization to enhance the enterprise or stakeholder’s value, it must consider business insights that affect the customer, operations, or predictions as to future results. When acted upon, the insights should yield an increase in the organization’s value.
The second circle represents Big Data architecture. The architecture indicates that for the organization to achieve business insights, it will most likely be required to make an investment in Big Data architecture. Big Data architecture has a myriad of hardware, software, and consultant options. It also involves software programs such as Hadoop, R, and MapReduce. It is safe to assume that without making an investment in the Big Data architecture, it will be very difficult to generate business insights relating to Big Data that will yield increased value.
The final circle represents Big Data sources. This level encompasses all of the structured and unstructured sources of Big Data that have been covered previously. If an organization is unwilling or incapable of accessing and analyzing Big Data sources, the organization will not be able to generate insights that will lead to increased enterprise or stakeholder value.
1. Big Data requires that all business endeavors add as their root objective the goal of
a. Increasing value for shareholders or stakeholders.
b. Increasing scalable architecture.
c. Increasing Big Data capacity.
d. Data aggregation.
The overarching goal of Big Data analytics is to inform businesses. By analyzing data and trends, businesses can gain valuable insights that can be applied in nearly every area of operations. Some specific ways that Big Data can contribute to achieving the goal of informing businesses include the following:
• Monetize data or interpret data to realize competitive advantage which can be monetized.
• Analyze operational effectiveness—machine sensors, product failures, and traffic patterns.
• Create a reliable, scalable, and capable infrastructure that aids the data gathering, analysis, and inferences.
• Access and use internal and external data that are structured, unstructured, and streaming.
• Predict business, social, political, economic, technological, and environmental trends.
• Take action based on prescribed scenarios.
The following list attempts to convey the wide variety of applications of Big Data. Many of the data components relate to multiple industries.
1. Customer analytics. As noted in a previous chapter, one of the main sources of Big Data comes from transactions. This kind of information from customers can be used to gain insights into many aspects of customer behavior, including the following:
• Dropping product or service
• Analyzing customer behavior while on company website
• Monitoring customer usage of products to detect manufacturing or design problems
• Identifying high-value customers
• Identifying cross-selling opportunities as well as up-selling opportunities
• Determining which customers not to engage
• Identifying, targeting, and retaining customers
• Combining clickstream data with transactional data to improve customer profile
• Limiting product offerings to those that interest the customer
• Determining any aspect of customer behavior or product preference
• Identifying customer segmentation
• Recording and analyzing customer service and support issues
• Engaging brand advocates and changing the perception of brand antagonists
• Empowering customers to sell your products
• Enabling customers to locate items more quickly
• Improving loyalty or net to promote are scores
• Analyzing smartphone or mobile data—called detail record processing, social analysis, churn prediction, GEO mapping
• Analyzing point of sale data
• Creating forums—crowd creativity, crowd solutions
2. Manufacturing. Manufacturers now have access to real-time data from a variety of process activities that allow them to gain insight into many factors, including the following:
• Tracking of product quality or defects
• Supply chain management and planning
• Optimizing machines
• Engineering Analytics
• Predictive maintenance
• Process and quality analysis
• Warranty claim potential (based on social media comments or complaints)
• Enterprise resource planning—operations, service delivery, supply chain management, and automation of routine decisions
• Continuous improvement to processes and procedures
3. Research and development. Recently, the federal government recognized the potential value that lies in Big Data for research and development. A new initiative was launched that intends to extract knowledge from collections of digital data to help solve challenges on a national level. On a business level, Big Data can help with the following:
• Monitoring product quality
• Identifying customer needs for potential new products
• Soliciting input from customers regarding products
• Improving products based on call center data
4. Distribution. As warehouses and distribution centers become increasingly high-tech, they now generate information that can be used to monitor and track labor, inventory, and equipment, including the following:
• Monitoring product shipments
• Identifying variances in logistic costs
• Determining inventories levels
• Using location data like GPS
• Using Radio-frequency identification
• Using distribution optimization
5. Logistics. "Information of Things" is a huge new source of data in logistics. It can be used to track goods and provide insight into the following:
• Demand forecasting
• Supply chain analytics
• Tracking
• Delivery forecasting
• Travel industry—searchers, pricing, bundling (air, hotel, car, ship, entertainment)
6. Marketing. Marketing departments are no stranger to using data to determine customers’ habits. With great access to data, they can apply their insights to the following:
• Determining marketing campaign effectiveness
• Determining channel effectiveness
• Monitoring and improving customer experience
• Tailoring marketing campaigns based on location and demographic data
• Providing advertising and public relations campaigns—demand signaling, targeted advertising, sentiment analysis, customer acquisition, promotions, and other advertising mediums
• Offering brand sentiment analysis
• Providing product placement optimization
• Providing response modeling
• Providing retention modeling
• Providing market-based analysis
• Providing net promoter scores
• Providing customer segmentation
7. Predictions. Big Data may enable predictions to be made in areas other than business. Some of these include the following:
• Crimes, threat analysis
• Weather
• Investments
• Mineral location
• Astrophysics
• Health
• Relationships
8. Operations analysis. Operations analysis leverages information from machine sensors to improve operations in many ways, including the following:
• More accurate and timely decision making
• Deviation analysis of logs and operational data
• Facility layout—either in manufacturing or retail
• Supply chain optimization
• Dynamic pricing
9. Human Resources. Some retail companies use wearable technology to track their employees’ communications and movements within stores. Although that’s an extreme example of using Big Data for human resources, many companies can use information about their employees to do some of the following:
• Identify employees at risk to leave company
• Monitor recruitment activities
• Identify recruits external candidates
• Résumé data
• Employee search
• Employee future team
10. Accounting. We will discuss some of the following applications in other chapters of this course:
• Measuring risk
— Credit risk
— Market risk
— Operational risk
• Budgeting, forecasting, planning
• Fraud detection
— Detecting multiparty fraud
— Real-time fraud prevention
• Algorithmic trading
• Customer analysis
• Duplicate payments
• Pricing, business intelligence, and data mining
11. Competition
• Tracking competitors’ prices
• Tracking competitors’ sales
• Tracking competitors’ marketing initiatives
• Mapping out the competitive landscape
12. Media and telecommunications. Network optimization, customer scoring, sure and prevention, fraud prevention
13. Energy. Smart grid analysis, exploration, operational modeling, power line sensors
14. Healthcare and life sciences. Bioinformatics, pharmaceutical research, clinical outcomes research, pharmacogenomics, neonatal, ICU monitoring, epidemic early warning system, remote healthcare monitoring, likely return to the hospital.
• Drug discovery
• Health cures
• Health diagnosis
15. Government
• Regulatory compliance
• Threat analysis
• Law enforcement, defense, and cyber security (for example, real-time surveillance, situational awareness, cyber security detection, license plate tracking, GPS tracking)
• Natural systems—wildfire management, water management, wildlife management
• Transportation—intelligent traffic management
• Tax avoidance, Social Security fraud, money laundering, terrorist detection, communication surveillance and monitoring, market governance, weapons systems and counterterrorism, econometrics, health informatics
16. Unstructured data. Related to many of the preceding sections
• Sensor data—automotive, appliance, machine, temperature, security, vending machine
• Social networking—sentiment data from user-generated comments on ratings, reviews, and blogs
• Text messaging SMS Software—application logs
• Internet search—text and documents, mining
• Digital images and videos
• Voice data
• Web—web analytics, social media analytics, multivariate testing (Multivariate testing is a technique for testing a hypothesis in which multiple variables are modified. The goal of multivariate testing is to determine which combination of variations performs the best out of all of the possible combinations. Websites and mobile apps are made of combinations of changeable elements.)1
• Other—text analytics, business process analytics
• Clickstream—a virtual trail that a user leaves behind while surfing the Internet. A clickstream is a record of a user’s activity on the Internet, including every website and every page of every website that the user visits, how long the user was on a page or site, in what order the pages were visited, any newsgroups that the user participates in, and even the email addresses of mail that the user sends and receives. Both Internet service providers and individual websites are capable of tracking a user’s clickstream.2
17. Stock market analysis. For example, the impact of weather on security prices or analysis of market data latencies
2. Which of the following was not mentioned as a Big Data insight as it relates to research and development?
a. Monitoring product quality.
b. Identifying customer needs.
c. Creating third-and fourth-generation products.
d. Soliciting input from customers.
In order to provide all of the insights needed for businesses, a Big Data platform needs to accomplish many goals. Big Data architecture must be designed so that data are analyzed in the natural environment as opposed to recreating data in voluminous data tables. The architecture must allow for reading and accessing a variety of sources such as email, financial, audio, images, GPS, and the like. The architecture should be created to accomplish the four Vs—volume, velocity, variety, and veracity—as described in chapter 2. The architecture must also be economically scalable, have an adequate response time, have multiple hardware options (due to hardware failures), and have built-in security to prevent unauthorized access to confidential detailed data.
To pursue Big Data as a tool for the organization, there are some key strategic issues that first must be addressed. If the strategic issues are overlooked at the beginning of the process, it may be difficult to implement Big Data successfully. The following list can be used as a starting point to think about implementing a Big Data platform:
• Strategic challenges
— Establishing suitability for purpose
— Providing an overall system architectural plan
• Technological challenges
— Gaining access to data
— Gaining access to the associated methodology and metadata
— Establishing provenance and lineage of data sets
— Establishing data set quality with respect to a city (accuracy, fidelity), uncertainty, error, bias, reliability, and calibration
— Addressing security concerns
— Technological feasibility
— Existing data warehouse architecture
— Immature new systems or reliability of selected data
— Lack of metadata and schema for the Big Data
— Lack of tooling
— Availability of enterprise-ready products and tools
— High latency (Hadoop)
— Running inside the cluster
• Resource or capacity challenges
— Ability to implement a wide-scale Big Data initiative
— Consolidation of disparate data
— Quality and cost of collecting data
— Budget constraints
— Cost too high
• Staff challenges
— Experiment and trial testing big analytics
— Integrity of network transmission
— Poor data quality
— Ability to deal with real-time data
• Project management challenges
— Reliance on multiple consultants that may not work in harmony
— Starting with the right project
• Change management challenges
— Institutional change management
— Ensuring inter-jurisdictional collaboration and common standards
— Different department systems that inhibit collection and organization of Big Data
— Acquiring technically competent staff
— Steep technical learning curve
— Hiring qualified people
— Barriers between departments that are cultural and nature
— Data that are not accepted or believed
— Data ownership especially as it tries to organization culture
— Lack of business sponsorship
— Lack of belief in a business case
• Partnership challenges
— Forming strategic alliances with Big Data producers
• Legal and regulatory issues
3. According to TDWI, what is the biggest challenge for Big Data?
a. Lack of business sponsorship.
b. Lack of skills for IT staff.
c. Data integration complexity.
d. Poor data quality.
In addition to the strategic challenges of using Big Data, there is also a significant potential that an organization might use Big Data incorrectly. There are many situations which could jeopardize the integrity of the decision-making process if an organization uses Big Data without fully understanding statistical pitfalls. Any issues with small amounts of data will be magnified in larger quantities. Sample error or bias could create data that is not representative of the situation, a common inaccuracy found in polling. False assumptions could be the collection of data from the very beginning. For example, a company may assume that a variable is a strong predictor of customer retention, but in reality, that variable is only correlated to retention.
There’s a major danger that organizations become entranced with aggregating vast amounts of data only to draw improper conclusions regarding that data. This is especially important for manufacturing environments which may make crucial production decisions based on prescriptive analytics.
Let us consider an example that Ari Zoldan wrote about in Wired magazine, which discussed drawing conclusions from Twitter data collected during Hurricane Sandy.3
In an intriguing study from Rutgers University, scientists set out to understand people’s decision-making related to Hurricane Sandy. From October 27th to November 1st, over 20 million tweets were recorded that pertained to the super storm. Tweets concerning preparedness peaked the night before, and tweets about partying peaked after the storm subsided.
The majority of the tweets originated from Manhattan, largely because of the high concentration of smartphone and Twitter usage. Due to the high concentration of power outages, and diminishing cell phone batteries, very few tweets were made from the hardest hit areas, such as Seaside Heights and Midland Beach. From the data, one could infer that the Manhattan borough bared the brunt of the storm; however, we know that wasn’t the case. There was actually a lot going on in those outlying areas. Given the way the data was presented, there was a huge data gap from communities unrepresented in the Twittersphere.
This example illustrates several points. First, it refutes the myth that more data will create greater insights. It demonstrates the importance of not being overly influenced by volumes of data or statistics. As you look at data, be objective, critical, and independent of any outcome.
Statistics does not mean facts. Big Data may appear to be factual when it is just more volume. When it is raw, Big Data is large and unorganized, and organizing data for analysis is difficult.
You should also be wary of biases and missing context. Confirmation bias is the phenomenon that people search the data to confirm their preexisting viewpoint. Also, when data conflict with underlying assumptions, there is a tendency to ignore it. Just because the data can be charted or analyzed by an algorithm, it does not mean the interpretation is valid. Faster and more powerful systems mean that we can also make the wrong interpretation and prescription faster than ever.
When evaluating data, keep the following three cautions in mind:
1. People tend to find what they seek. More data and speed do not necessarily mean that the results will be improved.
2. There are two types of data—quantitative and qualitative. Qualitative analysis is necessary to explain the quantitative analysis. Consider the announcement of public earnings reports. The numbers are announced. Then they are put into context by explaining or "verbally" adjusting earnings to present in the best light.
3. Remember that the context of data is very important. Consider how global warming has been interpreted, reinterpreted, and reinterpreted again. For every data set, it is important to understand the analyst’s bias, such as data presented, data modified, and data excluded. For example, the first quarter gross domestic product in 2015 did not come in as high as hoped. The first quarter has been lower than expected each of the past couple of years. Analysts originally credited this performance to harsh winters. Unfortunately, the weather still didn’t account for all of the disappointing results, so analysts stated that there was a "first quarter residual seasonality" and soft readings in other variables. It should be noted that the economists of the Federal Reserve did not find significant statistical evidence for such distortions on the aggregate GDP.4
How big is the problem of Big Data for the information technology manager? According to Infochimps:
55% of Big Data projects don’t get completed, and many others fall short of their objectives.
Though there may be many reasons for this, undoubtedly one of the biggest factors is a lack of communication between top managers, who provide the overall project vision, and those charged with implementing it. Far too frequently the opinions of the IT staff doing the heavy lifting necessary to develop a Big Data project are taken as an afterthought and consequently considered only when projects veer off-course.5
According to that quote, almost half of Big Data projects are never finished. Of those remaining, a large subset will not add value to the organization or stakeholders. In addition to potential mistakes with data selection and processing, IT can add complications. Subramanian Iyer of Oracle wrote about the five Big Data mistakes that IT makes:6
1. Too much emphasis on the technology needed rather than the business need.
2. Many times IT management focuses on the wrong business cases assuming that the payback will be the same as others in the industry.
3. Management may launch multiple initiatives in parallel as part of a big bang approach to implementing Big Data. This approach may lessen the chances of success with Big Data projects.
4. Many times IT management does not complete a proper cost-benefit analysis to determine what the payback on the Big Data project will be.
5. Placing a Big Data application under the same process requirements (mechanism for authentication, access, data isolation, and management of environments) as compared to traditional applications may jeopardize the project.
4. Which of these is NOT a Big Data mistake?
a. Using an iterative implementation strategy.
b. Focusing on technology instead of the business need.
c. Not executing a cost-benefit analysis.
d. Executing multiple initiatives in parallel as part of a "big bang" approach or pilot implementations.
1. What is the root reason for developing Big Data?
2. What are some of the change management issues confronting Big Data implementations?
3. A cautionary tale of Big Data and tweets was related to Hurricane Sandy. What occurred with related tweets that, taken out of context, might produce a false conclusion?