Chapter 3
IN THIS CHAPTER
Seeing the benefits of business-centric data science
Knowing business intelligence from business-centric data science
Finding the expert to call when you want the job done right
Seeing how a real-world business put data science to good use
To the nerds and geeks out there, data science is interesting in its own right, but to most people, it’s interesting only because of the benefits it can generate. Most business managers and organizational leaders couldn’t care less about coding and complex statistical algorithms. They are, on the other hand, extremely interested in finding new ways to increase business profits by increasing sales rates and decreasing inefficiencies. In this chapter, I introduce the concept of business-centric data science, discuss how it differs from traditional business intelligence, and talk about how you can use data-derived business insights to increase your business’s bottom line.
The modern business world is absolutely deluged with data. That’s because every line of business, every electronic system, every desktop computer, every laptop, every company-owned cellphone, and every employee is continually creating new business-related data as a natural and organic output of their work. This data is structured or unstructured some of it is big and some of it is small, fast or slow; maybe it’s tabular data, or video data, or spatial data, or data that no one has come up with a name for yet. But though there are many varieties and variations between the types of datasets produced, the challenge is only one — to extract data insights that add value to the organization when acted upon. In this chapter, I walk you through the challenges involved in deriving value from actionable insights that are generated from raw business data.
Business is complex. Data science is complex. At times, it’s easy to get so caught up looking at the trees that you forget to look for a way out of the forest. That’s why, in all areas of business, it’s extremely important to stay focused on the end goal. Ultimately, no matter what line of business you’re in, true north is always the same: business profit growth. Whether you achieve that by creating greater efficiencies or by increasing sales rates and customer loyalty, the end goal is to create a more stable, solid profit-growth rate for your business. The following list describes some of the ways that you can use business-centric data science and business intelligence to help increase profits:
Turning your raw data into actionable insights is the first step in the progression from the data you’ve collected to something that actually benefits you. Business-centric data scientists use data analytics to generate insights from raw data.
Listed here, in order of increasing complexity, are the four types of data analytics you’ll most likely encounter:
Analytics commonly pose at least two challenges in the business enterprise. First, organizations often have difficulty finding new hires with specific skill sets that include analytics. Second, even skilled analysts often have difficulty communicating complex insights in a way that’s understandable to management decision makers.
To overcome these challenges, the organization must create and nurture a culture that values and accepts analytics products. The business must work to educate all levels of the organization so that management has a basic concept of analytics and the success that can be achieved by implementing them. Conversely, business-centric data scientists must have a solid working knowledge about business in general and, in particular, a solid understanding of the business at hand. A strong business knowledge is one of the three main requirements of any business-centric data scientist; the other two are a strong coding acumen and strong quantitative analysis skills via math and statistical modeling.
Data wrangling is another important portion of the work that’s required in order to convert data to insights. To build analytics from raw data, you’ll almost always need to use data wrangling — the processes and procedures that you use to clean and convert data from one format and structure to another so that the data is accurate and in the format that analytics tools and scripts require for consumption. The following list highlights a few of the practices and issues I consider most relevant to data wrangling:
Data governance: Data governance standards are used as a quality control measure to ensure that manual and automated data sources conform to the data standards of the model at hand. Data governance standards must be applied so that the data is at the right granularity when it’s stored and made ready for use.
Granularity is a measure of a dataset’s level of detail. Data granularity is determined by the relative size of the subgroupings into which the data is divided.
After wrangling your data down to actionable insights, the second step in the progression from raw data to value-added is to take decisive actions based on those insights. In business, the only justifiable purpose for spending time deriving insights from raw data is that the actions should lead to an increase in business profits. Failure to take action on data-driven insights results in a complete and total loss of the resources that were spent deriving them, at no benefit whatsoever to the organization. An organization absolutely must be ready and equipped to change, evolve, and progress when new business insights become available.
To best prepare your organization to take action on insights derived from business data, make sure you have the following people and systems in place and ready to go:
Business-centric data scientists and business analysts who do business intelligence are like cousins: They both use data to work toward the same business goal, but their approach, technology, and function differ by measurable degrees. In the following sections, I define, compare, and distinguish between business intelligence and business-centric data science.
The purpose of business intelligence is to convert raw data into business insights that business leaders and managers can use to make data-informed decisions. Business analysts use business intelligence tools to create decision-support products for business management decision making. If you want to build decision-support dashboards, visualizations, or reports from complete medium-size sets of structured business data, you can use business intelligence tools and methods to help you.
Business intelligence (BI) is composed of
Insights that are generated in business intelligence (BI) are derived from standard-size sets of structured business data. BI solutions are mostly built off of transactional data — data that’s generated during the course of a transaction event, like data generated during a sale or during a money transfer between bank accounts, for example. Transactional data is a natural byproduct of business activities that occur across an organization, and all sorts of inferences can be derived from it. The following list describes the possible questions you can answer by using BI to derive insights from these types of data:
To streamline BI functions, make sure that your data is organized for optimal ease of access and presentation. You can use multidimensional databases to help you. Unlike relational, or flat databases, multidimensional databases organize data into cubes that are stored as multidimensional arrays. If you want your BI staff to be able to work with source data as quickly and easily as possible, you can use multidimensional databases to store data in a cube rather than store the data across several relational databases that may or may not be compatible with one another.
This cubic data structure enables Online Analytical Processing (OLAP) — a technology through which you can quickly and easily access and use your data for all sorts of different operations and analyses. To illustrate the concept of OLAP, imagine that you have a cube of sales data that has three dimensions: time, region, and business unit. You can slice the data to view only one rectangle — to view one sales region, for instance. You can dice the data to view a smaller cube made up of some subset of time, region(s), and business unit(s). You can drill down or drill up to view either highly detailed or highly summarized data, respectively. And you can roll up, or total, the numbers along one dimension — to total business unit numbers, for example, or to view sales across time and region only.
OLAP is just one type of data warehousing system — a centralized data repository that you can use to store and access your data. A more traditional data warehouse system commonly employed in business intelligence solutions is a data mart — a data storage system that you can use to store one particular focus area of data, belonging to only one line of business in the enterprise. Extract, transform, and load (ETL) is the process that you’d use to extract data, transform it, and load it into your database or data warehouse. Business analysts generally have strong backgrounds and training in business and information technology. As a discipline, BI relies on traditional IT technologies and skills.
Within the business enterprise, data science serves the same purpose that business intelligence does — to convert raw data into business insights that business leaders and managers can use to make data-informed decisions. If you have large sets of structured and unstructured data sources that may or may not be complete and you want to convert those sources into valuable insights for decision support across the enterprise, call on a data scientist. Business-centric data science is multidisciplinary and incorporates the following elements:
Quantitative analysis: Can be in the form of mathematical modeling, multivariate statistical analysis, forecasting, and/or simulations.
The term multivariate refers to more than one variable. A multivariate statistical analysis is a simultaneous statistical analysis of more than one variable at a time.
Data science is a pioneering discipline. Data scientists often employ the scientific method for data exploration, hypotheses formation, and hypothesis testing (through simulation and statistical modeling). Business-centric data scientists generate valuable data insights, often by exploring patterns and anomalies in business data. Data science in a business context is commonly composed of
Like business analysts, business-centric data scientists produce decision-support products for business managers and organizational leaders to use. These products include analytics dashboards and data visualizations, but generally not tabular data reports and tables.
You can use data science to derive business insights from standard-size sets of structured business data (just like BI) or from structured, semi-structured, and unstructured sets of big data. Data science solutions are not confined to transactional data that sits in a relational database; you can use data science to create valuable insights from all available data sources. These data sources include
Machine data from business operations: Machines automatically generate this unstructured data, like SCADA data, machine data, or sensor data.
The acronym SCADA refers to Supervisory Control and Data Acquisition. SCADA systems are used to control remotely operating mechanical systems and equipment. They generate data that is used to monitor the operations of machines and equipment.
Since the products of data science are often generated from big data, cloud-based data platform solutions are common in the field. Data that’s used in data science is often derived from data-engineered big data solutions, like Hadoop, MapReduce, Spark, and massively parallel processing (MPP) platforms. (For more on these technologies, check out Chapter 2.) Data scientists are innovative forward-thinkers who must often think outside the box in order to exact solutions to the problems they solve. Many data scientists tend toward open-source solutions, when available. From a cost perspective, this approach benefits the organizations that employ these scientists.
Business-centric data scientists often use machine learning techniques to find patterns in (and derive predictive insights from) huge datasets that are related to a line of business or the business at large. They’re skilled in math, statistics, and programming, and they often use these skills to generate predictive models. They generally know how to program in Python or R. Most of them know how to use SQL to query relevant data from structured databases. They are usually skilled at communicating data insights to end users — in business-centric data science, end users are business managers and organizational leaders. Data scientists must be skillful at using verbal, oral, and visual means to communicate valuable data insights.
A discussion of data science in business would be incomplete without a description of the popular machine learning methods being used to generate business value, as described in this list:
The similarities between BI and business-centric data science are glaringly obvious; it’s the differences that most people have a hard time discerning. The purpose of both BI and business-centric data science is to convert raw data into actionable insights that managers and leaders can use for support when making business decisions.
BI and business-centric data science differ with respect to approach. Although BI can use forward-looking methods like forecasting, these methods are generated by making simple inferences from historical or current data. In this way, BI extrapolates from the past and present to infer predictions about the future. It looks to present or past data for relevant information to help monitor business operations and to aid managers in short- to medium-term decision making.
In contrast, business-centric data science practitioners seek to make new discoveries by using advanced mathematical or statistical methods to analyze and generate predictions from vast amounts of business data. These predictive insights are generally relevant to the long-term future of the business. The business-centric data scientist attempts to discover new paradigms and new ways of looking at the data to provide a new perspective on the organization, its operations, and its relations with customers, suppliers, and competitors. Therefore, the business-centric data scientist must know the business and its environment. She must have business knowledge to determine how a discovery is relevant to a line of business or to the organization at large.
Other prime differences between BI and business-centric data science are
Since most business managers don’t know how to do advanced data work themselves, it’s definitely beneficial to at least know which types of problems are best suited for a business analyst and which problems should be handled by a data scientist instead.
If you want to use enterprise data insights to streamline your business so that its processes function more efficiently and effectively, bring in a business analyst. Organizations employ business analysts so that they have someone to cover the responsibilities associated with requirements management, business process analysis, and improvements-planning for business processes, IT systems, organizational structures, and business strategies. Business analysts look at enterprise data and identify what processes need improvement. They then create written specifications that detail exactly what changes should be made for improved results. They produce interactive dashboards and tabular data reports to supplement their recommendations and to help business managers better understand what is happening in the business. Ultimately, business analysts use business data to further the organization’s strategic goals and to support them in providing guidance on any procedural improvements that need to be made.
In contrast, if you want to obtain answers to very specific questions on your data, and you can obtain those answers only via advanced analysis and modeling of business data, bring in a business-centric data scientist. Many times, a data scientist may support the work of a business analyst. In such cases, the data scientist might be asked to analyze very specific data-related problems and then report the results back to the business analyst to support him in making recommendations. Business analysts can use the findings of business-centric data scientists to help them determine how to best fulfill a requirement or build a business solution.
Southeast Telecommunications Company was losing many of its customers to customer churn — the customers were simply moving to other telecom service providers. Because it’s significantly more expensive to acquire new customers than it is to retain existing customers, Southeast’s management wanted to find a way to decrease the churn rates. So, Southeast Telecommunications engaged Analytic Solutions, Inc. (ASI), a business-analysis company. ASI interviewed Southeast’s employees, regional managers, supervisors, frontline employees, and help desk employees. After consulting with personnel, they collected business data that was relevant to customer retention.
ASI began examining several years’ worth of Southeast’s customer data to develop a better understanding of customer behavior and why some people left after years of loyalty while others continued to stay on. The customer datasets contained records for the number of times a customer had contacted Southeast’s help desk, the number of customer complaints, and the number of minutes and megabytes of data each customer used per month. ASI also had demographic and personal data (credit score, age, and region, for example) that was contextually relevant to the evaluation.
By looking at this customer data, ASI discovered the following insights. Within the 1-year time interval before switching service providers
Based on these results, ASI fitted a logistic regression model to the historical data in order to identify the customers who were most likely to churn. With the aid of this model, Southeast could identify and direct retention efforts at the customers that it was most likely to lose. These efforts helped Southeast improve its services by identifying sources of dissatisfaction; increase returns on investment by restricting retention efforts to only those customers at risk of churn (rather than all customers); and, most importantly, decrease overall customer churn, thus preserving the profitability of the business at large.
What’s more, Southeast didn’t make these retention efforts a one-time event: The company incorporated churn analysis into its regular operating procedures. By the end of that year, and in the years since, it has seen a dramatic reduction in overall customer churn rates.