As discussed in Chapter 2, collecting and maintaining trusted data is a critical aspect of any business. Knowing how and where to find data, store it efficiently, analyze it in new ways to increase the organization’s competitive advantage, and enable the right people to access it at the right time are all fundamental components of managing the ever-increasing amounts of corporate data. Indeed, data analytics is the primary differentiator when doing business in the 21st century. Transactional, social, mobile, cloud, Web, and sensor data offer enormous potential. But without tools to analyze these data types and volumes, there would not be much difference between business in the 20th century and business today—except for mobile access. High-quality data and human expertise are essential to the value of analytics.
Human expertise is necessary because analytics alone cannot explain the reasons for trends or relationships; know what action to take; or provide sufficient context to determine what the numbers represent and how to interpret them.
Database, data warehouse, data analytics, and business intelligence (BI) technologies interact to create a new biz-tech ecosystem. Data analytics and BI discover insights or relationships of interest that otherwise might not have been recognized. They make it possible for managers to make decisions and act with clarity, speed, and confidence. Data analytics is not just about managing more or varied data. Rather, it is about asking new questions, formulating new hypotheses, exploration and discovery, and making data-driven decisions. Ultimately, a big part of data analysis efforts is the use of new analytics techniques.
Mining data or text taken from day-to-day business operations reveals valuable information, such as customers’ desires, products that are most important, or processes that can be made more efficient. These insights expand the ability to take advantage of opportunities, minimize risks, and control costs.
While you might think that physical pieces of paper are a relic of the past, in most offices the opposite is true. Aberdeen Group’s survey of 176 organizations worldwide found that the volume of physical documents is growing by up to 30% per year. Document management technology archives digital and physical data to meet business needs, as well as regulatory and legal requirements (Eisenhauer, 2015).
Due to the incredible volume of data that the typical organization creates, effective data management is vital to prevent storage costs from spiraling out of control and controlling data growth while supporting greater performance. Data management oversees the end-to-end lifecycle of data from creation and initial storage to the time when it becomes obsolete and is deleted.
The objectives of data management include the following:
Typically, newer data, and data that is accessed more frequently, is stored on faster, but more expensive storage media while less critical data is stored on cheaper, slower media.
The main benefits of data management include greater compliance, higher security, less legal liability, improved sales and marketing strategies, better product classification, and improved data governance to reduce risk. The following data management technologies keep users informed and support the various business demands:
Each of these database management technologies will be discussed in greater detail later in this chapter.
Data-processing techniques, processing power, and enterprise performance management capabilities have undergone revolutionary advances in recent years for reasons you are already familiar with—big data, mobility, and cloud computing. The last decade, however, has seen the emergence of new approaches, first in data warehousing and, more recently, for transaction processing. Given the huge number of transactions that occur daily in an organization, the data in databases are constantly in use or being updated. The volatility of databases makes it impossible to use them for complex decision-making and problem-solving tasks. For this reason, data are extracted from the database, transformed (processed to standardize the data), and then loaded into a data warehouse.
Database management systems (DBMSs) integrate with data collection systems such as TPS and business applications; store the data in an organized way; and provide facilities for accessing and managing that data. Factors to consider when evaluating the performance of a database management system are listed in Tech Note 3.1. Over the past 25 years, the relational database has been the standard database model adopted by most enterprises. Relational databases store data in tables consisting of columns and rows, similar to the format of a spreadsheet, as shown in Figure 3.3.
Relational management systems (RDBMSs) provide access to data using a declarative language—structured query language (SQL). Declarative languages simplify data access by requiring that users only specify what data they want to access without defining how access will be achieved. The format of a basic SQL statement is
Structured query language (SQL) is a standardized query language for accessing databases.
An instance of SQL is shown in Figure 3.4.
An accurate and consistent view of data throughout the enterprise is needed so one can make informed, actionable decisions that support the business strategy. Functions performed by a DBMS to help create such a view are shown in Figure 3.5.
Select the caption to view an interactive version of this figure online.
When most business transactions occur—for instance, an item is sold or returned, an order is sent or cancelled, a payment or deposit is made—changes are made immediately to the database. These online changes are additions, updates, or deletions. DBMSs record and process transactions in the database, and support queries and reporting. Given their functions, DBMSs are referred to as online transaction processing (OLTP) systems. OLTP is a database design that breaks down complex information into simpler data tables to strike a balance between transaction-processing efficiency and query efficiency. OLTP databases process millions of transactions per second. However, databases cannot be optimized for data mining, complex online analytics processing (OLAP) systems, and decision support. These limitations led to the introduction of data warehouse technology. Data warehouses and data marts are optimized for OLAP, data mining, BI, and decision support. OLAP is a term used to describe the analysis of complex data from the data warehouse. In summary, databases are optimized for extremely fast transaction processing and query processing. Data warehouses are optimized for analysis.
Online transaction processing (OLTP) systems are designed to manage transaction data, which are volatile.
One of the major drivers of change in the data management market is the increased amount of data to be managed. Enterprises need powerful DBMSs and data warehousing solutions, analytics, and reporting. The four vendors that dominate this market—Oracle, IBM, Microsoft, and Teradata—continue to respond to evolving data management needs with more intelligent and advanced software and hardware. Advanced hardware technology enables scaling to much higher data volumes and workloads than previously possible, or it can handle specific workloads. Older general-purpose relational databases DBMSs lack the scalability or flexibility for specialized or very large workloads, but are very good at what they do.
RDBMSs are still the dominant database engines, but the trend toward NoSQL (short for “not only SQL”) systems is clear. NoSQL systems increased in popularity by 96% from 2014 to 2016. Although NoSQL have existed for as long as relational DBMS, the term itself was not introduced until 2009. That was when many new systems were developed in order to cope with the unfolding requirements for DBMS—namely, handling big data, scalability, and fault tolerance for large Web applications. Scalability means the system can increase in size to handle data growth or the load of an increasing number of concurrent users. To put it differently, scalable systems efficiently meet the demands of high-performance computing. Fault tolerance means that no single failure results in any loss of service.
NoSQL systems are such a heterogeneous group of database systems that attempts to classify them are not very helpful. However, their general advantages are the following:
Starting in 2010 and continuing through 2016, Microsoft has been working on the first rewrite of SQL Server’s query execution since Version 7 was released in 1998. The goal is to offer NoSQL-like speeds without sacrificing the capabilities of a relational database.
With most NoSQL offerings, the bulk of the cost does not lie in acquiring the database, but rather in implementing it. Data need to be selected and migrated (moved) to the new database. Microsoft hopes to reduce these costs by offering migration solutions.
The top five enterprise database systems of 2016 are Oracle’s 12c Database, Microsoft SQL Server, IBM DB2, SAP Sybase ASE, and PostgreSQL:
Databases can be centralized or distributed, as shown in Figure 3.6. Both types of databases need one or more backups and should be archived on- and offsite in case of a crash or security incident.
Centralized database stores all data in a single central compute such as a mainframe or server.
Distributed database stores portions of the database on multiple computers within a network.
For decades the main database platform consisted of centralized database files on massive mainframe computers. Benefits of centralized database configurations include the following:
A major disadvantage of centralized databases, like all centralized systems, is transmission delay when users are geographically dispersed. More powerful hardware and networks compensate for this disadvantage.
In contrast, distributed databases use client/server architecture to process information requests. The databases are stored on servers that reside in the company’s data centers, a private cloud, or a public cloud (Figure 3.7). Advantages of a distributed database include reliability—if one site crashes, the system will keep running—and speed—it’s faster to search a part of a database than the whole. However, if there’s a problem with the network that the distributed database is using, it can cause availability issues and the appropriate hardware and software can be expensive to purchase.
Data collection is a highly complex process that can create problems concerning the quality of the data being collected. Therefore, regardless of how the data are collected, they need to be validated so users know they can trust them. Classic expressions that sum up the situation are “garbage in, garbage out” (GIGO) and the potentially riskier “garbage in, gospel out.” In the latter case, poor-quality data are trusted and used as the basis for planning. For example, you have probably encountered data safeguards, such as integrity checks, to help improve data quality when you fill in an online form, such as when the form will not accept an e-mail address or a credit card number that is not formatted correctly.
Table 3.2 lists the characteristics typically associated with dirty or poor-quality data.
TABLE 3.2 Characteristics of Poor-Quality or Dirty Data
Characteristic of Dirty Data | Description |
Incomplete | Missing data |
Outdated or invalid | Too old to be valid or useful |
Incorrect | Too many errors |
Duplicated or in conflict | Too many copies or versions of the same data―and the versions are inconsistent or in conflict with each other |
Nonstandardized | Data are stored in incompatible formats―and cannot be compared or summarized |
Unusable | Data are not in context to be understood or interpreted correctly at the time of access |
As discussed in Chapter 2, too often managers and information workers are actually constrained by data that cannot be trusted because they are incomplete, out of context, outdated, inaccurate, inaccessible, or so overwhelming that they require weeks to analyze. In such situations, the decision-maker is facing too much uncertainty to make intelligent business decisions.
On average, an organization experiences 40% data growth annually, and 20% of that data is found to be dirty. Each dirty data point, or record, costs $100 if not resolved (RingLead, 2015). The costs of poor-quality data spread throughout a company, affecting systems from shipping and receiving to accounting and customer service. Data errors typically arise from the functions or departments that generate or create the data—and not within the IT department. When all costs are considered, the value of finding and fixing the causes of data errors becomes clear. In a time of decreased budgets, some organizations may not have the resources for such projects and may not even be aware of the problem. Others may be spending most of their time fixing problems, thus leaving them with no time to work on preventing them. However, the benefits of acting preventatively against dirty data are astronomical. It costs $1 to prevent and $10 to correct dirty data. While the short-run cost of cleaning and preventing dirty data is unrealistic for some companies, the long-term conclusion is far more expensive (Kramer, 2015).
Dirty data is poor-quality data that lacks integrity and cannot be trusted.
Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.
Consider a corporation that follows the cost structure associated with clean/dirty data explained above with 100,000 data points. Over a three-year span, by cleaning the 20% of dirty data during the first year and using prevention methods for the following years, the corporation will save $8,495,000. Purely based on the quality of its data, a corporation with a large amount of data can hypothetically increase its revenue by 70% (RingLead, 2015).
The cost of poor-quality data may be expressed as a formula:
Examples of these costs include the following:
For a particular company, it is difficult to calculate the full cost of poor-quality data and its long-term effects. Part of the difficulty is the time delay between the mistake and when it is detected. Errors can be very difficult to correct, especially when systems extend across the enterprise. Another concern is that the impacts of errors can be unpredictable, far-reaching, and serious.
Compliance with numerous federal and state regulations relies on rock-solid data and trusted metrics used for regulatory reporting. Data ownership, data quality, and formally managed data are high priorities on the agenda of CFOs and CEOs who are held personally accountable if their company is found to be in violation of regulations.
Despite the need for high-quality data, organizational politics and technical issues make that difficult to achieve. The source of the problem is data ownership—that is, who owns or is responsible for the data. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.
The tendency to delegate data-quality responsibilities to the technical teams who have no control over data quality, as opposed to business users who do have such control, is another common pitfall that stands in the way of accumulating high-quality data.
Those who manage a business or part of a business are tasked with trying to improve business performance and retain customers. Compensation is tied to improving profitability, driving revenue growth, and improving the quality of customer service. These key performance indicators (KPIs) are monitored closely by senior managers who want to find and eliminate defects that harm performance. It is strange then that so few managers take the time to understand how performance is impacted by poor-quality data. Two examples make a strong case for investment in high-quality data.
Retail banking: For retail bank executives, risk management is the number one issue. Disregard for risk contributed to the 2008 financial services meltdown. Despite risk management strategies, many banks still incur huge losses. Part of the problem in many banks is that their ISs enable them to monitor risk only at the product level—mortgages, loans, or credit cards. Product-level risk management ISs monitor a customer’s risk exposure for mortgages, or for loans, or for credit cards, and so forth—but not for a customer for all products. With product-level ISs, a bank cannot see the full risk exposure of a customer. The limitations of these siloed product-level risks have serious implications for business performance because bad-risk customers cannot be identified easily, and customer data in the various ISs may differ. However, banks are beginning to use big data to analyze risk more effectively. Although they are still very limited to credit card, loan, and mortgage risk data, cheaper and faster computing power allows them to keep better and more inclusive records of customer data. Portfolio monitoring offers earlier detection and predictive analytics for potential customers, and more advanced risk models show intricate patterns unseen by the naked eye in large data sets. Also, more fact-based inputs and standardized organizational methods are being implemented to reduce loan and credit officer bias to take risks on undesirable customers.
Marketing: Consider what happens when each product-level risk management IS feeds data to marketing ISs. Marketing may offer bad-risk customers incentives to take out another credit card or loan that they cannot repay. And since the bank cannot identify its best customers either, they may be ignored and enticed away by better deals offered by competitors. This scenario illustrates how data ownership and data-quality management are critical to risk management. Data defects and incomplete data can quickly trigger inaccurate marketing and mounting losses. Banks’ increasing dependence on business modeling requires that risk managers understand and manage model risk better. Although losses often go unreported, the consequences of errors in the model can be extreme. For instance, a large Asia–Pacific bank lost $4 billion when it applied interest-rate models that contained incorrect assumptions and data-entry errors. Risk mitigation will entail rigorous guidelines and processes for developing and validating models, as well as the constant monitoring and improvement of them (Harle et al., 2016).
Manufacturing: Many manufacturers are at the mercy of a powerful customer base—large retailers. Manufacturers want to align their processes with those of large retail customers to keep them happy. This alignment makes it possible for a retailer to order centrally for all stores or to order locally from a specific manufacturer. Supporting both central and local ordering makes it difficult to plan production runs. For example, each manufacturing site has to collect order data from central ordering and local ordering systems to get a complete picture of what to manufacture at each site. Without accurate, up-to-date data, orders may go unfilled, or manufacturers may have excess inventory. One manufacturer who tried to keep its key retailer happy by implementing central and local ordering could not process orders correctly at each manufacturing site. No data ownership and lack of control over how order data flowed throughout business operations had negative impacts. Conflicting and duplicate business processes at each manufacturing site caused data errors, leading to mistakes in manufacturing, packing, and shipments. Customers were very dissatisfied.
These examples demonstrate the consequences of a lack of data ownership and data quality. Understanding the impact mismanaged data can have on business performance highlights the need to make data ownership and data accuracy a high priority.
The data life cycle is a model that illustrates the way data travel through an organization as shown in Figure 3.8. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps. Supply chain management (SCM), customer relationship management (CRM), and e-commerce are enterprise applications that require up-to-date, readily accessible data to function properly.
Three general data principles relate to the data life cycle perspective and help to guide IT investment decisions:
As data become more complex and their volumes explode, database performance degrades. One solution is the use of master data and master data management (MDM) as introduced in Chapter 2. MDM processes integrate data from various sources or enterprise applications to create a more complete (unified) view of a customer, product, or other entity. Figure 3.9 shows how master data serve as a layer between transactional data in a database and analytical data in a data warehouse. Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim is probably not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data is simply not possible.
Select the caption to view an interactive version of this figure online.
Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise. In IT at Work 3.1, participants in the health-care supply chain essentially developed a master reference file of its key data entities. A data entity is anything real or abstract about which a company wants to collect and store data. Master data entities are the main entities of a company, such as customers, products, suppliers, employees, and assets.
At an insurance company, the cost of processing each claim is $1, but the average downstream cost due to errors in a claim is $300. The $300 average downstream costs included manual handling of exceptions, customer support calls initiated due to errors in claims, and reissuing corrected documents for any claims processed incorrectly the first time. In addition, the company faced significant soft costs from regulatory risk, lost revenues due to customer dissatisfaction, and overpayment on claims due to claims-processing errors. These soft costs are not included in the hard cost of $300.
Every day health-care administrators and others throughout the health-care supply chain waste 24–30% of their time correcting data errors. Each transaction error costs $60 to $80 to correct. In addition, about 60% of all invoices among supply chain partners contain errors, and each invoice error costs $40 to $400 to reconcile. Altogether, errors and conflicting data increase supply costs by 3–5%. In other words, each year billions of dollars are wasted in the health-care supply chain because of supply chain data disconnects, which refer to one organization’s IS not understanding data from another’s IS.
Each department has distinct master data needs. Marketing, for example, is concerned with product pricing, brand, and product packaging, whereas production is concerned with product costs and schedules. A customer master reference file can feed data to all enterprise systems that have a customer relationship component, thereby providing a more unified picture of customers. Similarly, a product master reference file can feed data to all the production systems within the enterprise.
An MDM includes tools for cleaning and auditing the master data elements as well as tools for integrating and synchronizing data to make them more accessible. MDM offers a solution for managers who are frustrated with how fragmented and dispersed their data sources are.
Data warehouses are the primary source of cleansed data for analysis, reporting, and business intelligence (BI). Often the data are summarized in ways that enable quick responses to queries. For instance, query results can reveal changes in customer behavior and drive the decision to redevelop the advertising strategy.
Data warehouses that pull together data from disparate sources and databases across an entire enterprise are called enterprise data warehouses (EDWs).
Enterprise data warehouses (EDWs) is a data warehouse that integrates data from databases across an entire enterprise.
Data warehouses store data from various source systems and databases across an enterprise in order to run analytical queries against huge datasets collected over long time periods.
The high cost of data warehouses can make them too expensive for a company to implement. Data marts are lower-cost, scaled-down versions of a data warehouse that can be implemented in a much shorter time, for example, in less than 90 days. Data marts serve a specific department or function, such as finance, marketing, or operations. Since they store smaller amounts of data, they are faster and easier to use, and navigate.
Consider a bank’s database. Every deposit, withdrawal, loan payment, or other transaction adds or changes data. The volatility caused by constant transaction processing makes data analysis difficult—and the demands to process millions of transactions per second consume the database’s processing power. In contrast, data in warehouses are relatively stable, as needed for analysis. Therefore, select data are moved from databases to a warehouse. Specifically, data are as follows:
These three procedures—extract, transform, and load—are referred to by their initials ETL (Figure 3.10). In a warehouse, data are read-only; that is, they do not change until the next ETL.
Three technologies involved in preparing raw data for analytics include ETL, change data capture (CDC), and data deduplication (“deduping the data”). CDC processes capture the changes made at data sources and then apply those changes throughout enterprise data stores to keep data synchronized. CDC minimizes the resources required for ETL processes by only dealing with data changes. Deduping processes remove duplicates and standardize data formats, which helps to minimize storage and data synch.
Figure 3.11 diagrams the process of building and using a data warehouse. The organization’s data from operational transaction processes systems are stored in operational databases (left side of the figure). Not all data are transferred to the data warehouse. Frequently, only summary data are transferred. The warehouse organizes the data in multiple ways—by subject, functional area, vendor, and product. As shown, the data warehouse architecture defines the flow of data that starts when data are captured by transaction systems; the source data are stored in transactional (operational) databases; ETL processes move data from databases into data warehouses or data marts, where the data are available for access, reports, and analysis.
Early data warehouse technology primarily supported strategic applications that did not require instant response time, direct customer interaction, or integration with operational systems. ETL might have been done once per week or once per month. But, demand for information to support real time customer interaction and operations leads to real-time data warehousing and analytics—known as an active data warehouse (ADW). Massive increases in computing power, processing speeds, and memory made ADW possible. ADW are not designed to support executives’ strategic decision-making, but rather to support operations. For example, shipping companies like DHL use huge fleets of trucks to move millions of packages. Every day and all day, operational managers make thousands of decisions that affect the bottom line, such as: “Do we need four trucks for this run?” “With two drivers delayed by bad weather, do we need to bring in extra help?” Traditional data warehousing is not suited for immediate operational support, but active data warehousing is. For example, companies with an ADW are able to:
Here are some examples of how two companies use ADW.
Capital One. Capital One uses its ADW to track each customer’s “profitability score” to determine the level of customer service to provide for that person. Higher-cost personalized service is only given to those with high scores. For instance, when a customer calls Capital One, he or she is asked to enter a credit card number, which is linked to a profitability score. Low-profit customers get a voice response unit only; high-profit customers are connected to a live customer service representative (CSR) because the company wants to minimize the risk of losing those customers.
Travelocity. If you use Travelocity, an ADW is finding the best travel deals especially for you. The goal is to use “today’s data today” instead of “yesterday’s data today.” The online travel agency’s ADW analyzes your search history and destinations of interest; then predicts travel offers that you would most likely purchase. Offers are both relevant and timely to enhance your experience, which helps close the sale in a very competitive market. For example, when a customer is searching flights and hotels in Las Vegas, Travelocity recognizes the interest—the customer wants to go to Vegas. The ADW searches for the best-priced flights from all carriers, builds a few package deals, and presents them in real time to the customer. When customers see a personalized offer they are already interested in, the ADW helps generate a better customer experience. The real-time data-driven experience increases the conversion rate and sales.
Data warehouse content can be delivered to decision-makers throughout the enterprise via the cloud or company-owned intranets. Users can view, query, and analyze the data and produce reports using Web browsers. These are extremely economical and effective data delivery methods.
Many organizations built data warehouses because they were frustrated with inconsistent data that could not support decisions or actions. Viewed from this perspective, data warehouses are infrastructure investments that companies make to support ongoing and future operations, including the following:
Table 3.3 summarizes several successful applications of data warehouses.
TABLE 3.3 Data Warehouse Applications by Industry
Industry | Applications |
Airline | Crew assignment, aircraft deployment, analysis of route profitability, and customer loyalty promotions |
Banking and financial | Customer service, trend analysis, product and service services promotions, and reduction of IS expenses |
Credit card | Customer service, new information service for a fee, fraud detection |
Defense contracts | Technology transfer, production of military applications |
E-business | Data warehouses with personalization capabilities, marketing/shopping preferences allowing for up-selling and cross-selling |
Government | Reporting on crime areas, homeland security |
Health care | Reduction of operational expenses |
Investment and insurance | Risk management, market movements analysis, customer tendencies analysis, and portfolio management |
Retail chain | Trend analysis, buying pattern analysis, pricing policy, inventory control, sales promotions, and optimal distribution channel decision |
Like mobile and cloud, big data and advanced data analytics are reshaping organizations and business processes to increase efficiency and improve performance. Research firm IDC forecasts that big data and analytics spending will reach $187 billion in 2019 (Ovalsrud, 2016).
Big data is an extremely large data set that is too large or complex to be analyzed using traditional data processing techniques.
Data analytics is a technique of qualitatively or quantitatively analyzing a data set to reveal patterns, trends, and associations that often relate to human behavior and interactions, to enhance productivity and business gain.
Data analytics is an important tool across organizations, which helps users discover meaningful real-time insights to meet customer expectations, achieve better results and stay competitive. These deeper insights combined with human expertise enable people to recognize meaningful relationships more quickly or easily; and furthermore, realize the strategic implications of these situations. Imagine trying to make sense of the fast and vast data generated by social media campaigns on Facebook or by sensors attached to machines or objects. Low-cost sensors make it possible to monitor all types of physical things—while analytics makes it possible to understand those data in order to take action in real time. For example, sensors data can be analyzed in real time:
In this section, you will learn about the value, challenges, and technologies involved in putting data and analytics to use to support decisions and action, together with examples of skill sets currently in high demand by organizations expanding their efforts to train, hire and retain competent data professionals (Career Insight 3.1).
Concerns about the analytics skills gap have existed for years. It is increasingly clear that the shortage isn’t just in data scientists, but also data engineers, data analysts, and even the executives required to manage data initiatives. As a result, organizations and institutions are expanding their efforts to train, hire, and retain data professionals. Here are two of those skill sets that are in high demand.
Big data specialists manage and package big data collections, analyze, and interpret trends and present their findings in easy to understand ways to “C”-level executives. Those who can present the data through user-friendly data visualizations will be particularly sought after. Skills required of these big data professionals include big data visualization, statistical analysis, Big Data reporting and presentation, Apache Hadoop, NoSQL Database Skills, and machine learning.
Business intelligence (BI) analysts use tools and techniques to go beyond the numbers of big data and take action based on the findings of the big data analyses. Successful BI professionals use self-service BI platforms, like Tableau, SAP, Oracle BI, Microsoft BI, and IBM Cognos, to create BI reports and visualizations to streamline the process and reduce reliance on additional staff. Additional skills of critical thinking, creative problem solving, effective communication, and presentations further enhance their attractiveness to employers (Hammond, 2015).
When the data set is too large or complex to be analyzed using traditional data processing applications, big data analytics tools are used. One of the biggest sectors of customer relations relative to big data is customer value analytics (CVA). CVA studies the recent phenomenon that customers are more willing to use and purchase innovative products, services, and customer service channels while demanding an increasing amount of high-quality, personalized products. Companies and producers use big data analytics to capture this combination to transform the information into usable data to track and predict trends. If companies know what customers like, what makes them spend more, and when they are happy, they can leverage the information to keep them happy and provide better products and services.
Big data analytics process of examining large and varied data sets to identify hidden patterns and correlations, market trends, customer preferences and other useful information to enable better business decisions.
Companies can also use big data analytics to store and use their data across the supply chain. To maximize the effectiveness of data analytics, companies usually complete these objectives throughout their input transformation process:
These big data programs enable them to pinpoint improvement opportunities across the supply chain—from purchasing to in-store availability management. Specifically, the companies are able to predict how customers will behave and use that knowledge to be prepared to respond quickly. According to Louis Columbus at Forbes, the market demand for big data analytics is about to hit its largest increase in history. Software for business analytics will increase by more than 50% by 2019. Prescriptive analytics software will be worth $1.1B in 2019, compared to its value of $415M in 2014. Since increasing the focus on customer demand trends, effectively entering new markets and producing better business models, and enhancing organizational performance are the most important goals for 21st-century companies, business analytics will be needed in almost every instance. Taking advantage of the benefits of business intelligence is allowing sectors like health care to compete in areas they would have not been able to enter before (Columbus, 2016).
To be effective in using data analysis, organization must pay attention to the four Vs of analytics—variety, volume, velocity, and veracity—shown in Figure 3.12.
Select the caption to view an interactive version of this figure online.
Big data can have a dramatic impact on the success of any enterprise, or they can be a low-contributing major expense. However, success is not achieved with technology alone. Many companies are collecting and capturing huge amounts of data, but spending very little effort to ensure the veracity and value of data captured at the transactional stage or point of origin. Emphasis in this direction will not only increase confidence in the datasets, but also significantly reduce the efforts for analytics and enhance the quality of decision-making. Success depends also on ensuring that you avoid invalid assumptions, which can be done by testing the assumptions during analysis.
Human expertise and judgment are needed to interpret the output of analytics (refer to Figure 3.13). Data are worthless if you cannot analyze, interpret, understand, and apply the results in context. This brings up several challenges:
IT at Work 3.2 describes how big data analytics, collaboration, and human expertise have transformed the new drug development process.
Machine-generated sensor data are becoming a larger proportion of big data (Figure 3.14), according to a research report by IDC (2015). It is predicted that these data will increase to two-thirds of all data by 2020, representing a significant increase from the 11% level of 2005. In addition to its growth as a portion of analyzed data, the market for sensor data will increase to $1.7 trillion in 2020.
On the consumer side, a significant factor in this market is the boom in wearable technology—products like FitBit and the Apple Watch. Users no longer even have to input data to these devices as it is automatically gathered and tracked in real time. On the public sector and enterprise side, sensor data and the Internet of Things (IoT) are being used in the advancement of IT-enabled business processes like automated factories and distribution centers and IT-enabled products like the wearable tech (IDC, 2015). Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.
Big data volumes exceed the processing capacity of conventional database infrastructures. A widely used processing platform is Apache Hadoop. It places no conditions on the structure of the data it can process. Hadoop distributes computing problems across a number of servers. Hadoop implements MapReduce in two stages:
Drug development is a high-risk business. Almost 90% of new drugs ultimately fail to reach the market. One of the challenges has been the amount, variety, and complexity of the data that need to be systematically analyzed. Big data technologies and private–public partnerships have made biomedical analytics feasible.
Biotechnology advances have produced massive data on the biological causes of disease. However, analyzing these data and converting discoveries into treatments are much more difficult. Not all biomedical insights lead to effective drug targets, and choosing the wrong target leads to failures late in the drug development process, costing time, money, and lives. Developing a new drug—from early discovery through Food and Drug Administration (FDA) approval—takes over a decade. As a consequence, each success ends up costing more than $1 billion. Sometimes much more! For example, by the time Pfizer Inc., Johnson & Johnson, and Eli Lilly & Co. announced their new drugs had only limited benefit for Alzheimer’s patients in late-stage testing, the industry had spent more than $30 billion researching amyloid plaque in the brain.
Drug makers, governments, and academic researchers have partnered to improve the odds of drug success and after years of decline, the pharmaceutical industry is beginning to experience a greater rate of success with its clinical trials. Partnerships bring together the expertise of scientists from biology, chemistry, bioinformatics, genomics, and big data. They are using big data to identify biological targets for drugs and eliminate failures before they reach the human testing stage and many anticipate that big data and the analytics that go with it could be a key element in further increasing the success rates in pharmaceutical R&D (Cattell et al., 2016).
GlaxoSmithKline, the European Bioinformatics Institute (EBI), and the Wellcome Trust Sanger Institute established the Centre for Therapeutic Target Validation (CTTV) near Cambridge, England. CTTV partners combine cutting-edge genomics with the ability to collect and analyze massive amounts of biological data. By not developing drugs that target the wrong biological pathways, they avoid wasting billions of research dollars.
With biology now a data-driven discipline, collaborations such as CTTV are needed to improve efficiencies, cut costs, and provide the best opportunities for success. Other private–public partnerships that had formed to harness drug research and big data include the following:
“The big data opportunity is especially compelling in complex business environments experiencing an explosion in the types and volumes of available data. In the health-care and pharmaceutical industries, data growth is generated from several sources, including the R&D process itself, retailers, patients and caregivers. Effectively utilizing these data will help pharmaceutical companies better identify new potential drug candidates and develop them into effective, approved and reimbursed medicines more quickly” (Cattell et al., 2016).
Sources: Compiled from Cattell et al. (2016), HealthCanal (2014), Kitamura (2014), and NIH (2014).
To store data, Hadoop has its own distributed file system, Hadoop File System (HDFS), which functions in three stages:
Figure 3.15 diagrams how Facebook uses database technology and Hadoop. IT at Work 3.3 describes how First Wind has applied big data analytics to improve the operations of its wind farms and to support sustainability of the planet by reducing environmentally damaging carbon emissions.
Wind power can play a major role in meeting America’s rising demand for electricity—as much as 20% by 2030. Using more domestic wind power would reduce the nation’s dependence on foreign sources of natural gas and also decrease carbon dioxide (CO2) emissions that contribute to adverse climate change.
First Wind is an independent North American renewable energy company focused on the development, financing, construction, ownership, and operation of utility-scale power projects in the United States. Based in Boston, First Wind has developed and operates 980 megawatts (MW) of generating capacity at 16 wind energy projects in Maine, New York, Vermont, Utah, Washington, and Hawaii. First Wind has a large network of sensors embedded in the wind turbines, which generate huge volumes of data continuously. The data are transmitted in real time and analyzed on a 24/7 real time basis to understand the performance of each wind turbine.
Sensors collect massive amounts of data on the temperature, wind speeds, location, and pitch of the blades. The data are analyzed to study the operation of each turbine in order to adjust them to maximum efficiency. By analyzing sensor data, highly refined measurements of wind speeds are possible. In wintry conditions, turbines can detect when they are icing up, and speed up or change pitch to knock off the ice. In the past, when it was extremely windy, turbines in the entire farm had been turned off to prevent damage from rotating too fast. Now First Wind can identify the specific portion of turbines that need to be shut down. Based on certain alerts, decisions often need to be taken within a few seconds.
Upgrades on 123 turbines on two wind farms have improved energy output by 3%, or about 120 megawatt hours per turbine per year. That improvement translates to $1.2 million in additional revenue a year from these two farms.
Sources: Compiled from www.FirstWind.com (2014) and U.S. Department of Energy (2015).
Data and text mining are different from DBMS and data analytics. As you have read earlier in this chapter, a DBMS supports queries to extract data or get answers from huge databases. But, in order to perform queries in a DBMS you must first know the question you want to be answered. You also have read that Data Analytics describes the entire function of applying technologies, algorithms, human expertise, and judgment. Data and text mining are specific analytic techniques that allow users to discover knowledge that they didn’t know existed in the databases.
Data mining software enables users to analyze data from various dimensions or angles, categorize them, and find correlations or patterns among fields in the data warehouse. Up to 75% of an organization’s data are nonstructured word-processing documents, social media, text messages, audio, video, images and diagrams, faxes and memos, call center or claims notes, and so on.
IT at Work 3.4 describes one example of how the U.S. government is using data mining software to continuously improve its detection and deterrence systems.
Text mining is a broad category that involves interpreting words and concepts in context. Any customer becomes a brand advocate or adversary by freely expressing opinions and attitudes that reach millions of other current or prospective customers on social media. Text mining helps companies tap into the explosion of customer opinions expressed online. Social commentary and social media are being mined for sentiment analysis or to understand consumer intent. Innovative companies know they could be more successful in meeting their customers’ needs, if they just understood them better. Tools and techniques for analyzing text, documents, and other nonstructured content are available from several vendors.
Combining data and text mining can create even greater value. Burns (2016) pointed out that mining text or nonstructural data enables organizations to forecast the future instead of merely reporting the past. He also noted that forecasting methods using existing structured data and nonstructured text from both internal and external sources provide the best view of what lies ahead.
Enterprises invest in data mining tools to add business value. Business value falls into three categories, as shown in Figure 3.16.
Digital Reasoning, a large player in the field of big data analytics has upgraded its software that is currently contracted by the Department of Defense and Homeland Security. Synthesys 4, the name for the brand new software, allows the agencies to monitor threats in the homeland and gather data about potential attacks. Ironically, one of the main tactics employed with this software is to track and deter potential employees or contractors who have access to it. Vice President of Federal Programs Eric Hansen says that the software excels at monitoring behavioral patterns, language, and data to act and respond like a human detective would to a potential threat.
While Digital Reasoning also contracts out to other organizations like Goldman Sachs, the US government is probably its most interesting and important client. Using automatic computer software to analyze data is much more effective at hindering attacks and quicker for analyzing large amounts of data about potential threats domestically and abroad. For instance, the software knows exactly what to look for without being bogged down and distracted by superfluous data. As available data and analytical capabilities increase, the US government is continuously aiming to improve its detection and deterrence systems using software like Synthesys 4 (Bing, 2016).
Sources: Compiled from Bing (2016) and syntheses.net (2017).
Here are some brief cases illustrating the types of business value created by data and text mining.
With text analytics, information is extracted from large quantities of various types of textual information. The basic steps involved in text analytics include the following:
Analytics applications cover business intelligence functions sold as a standalone application for decision support or embedded in an integrated solution. The introduction of intuitive decision support tools, dashboards, and data visualization (discussed in detail in Chapter 11) have added some interesting interactive components to big data analytics to bring the data to life and enable nonexperts to use it.
Organizations invest in analytics, BI, and data/text mining applications based on new features and capabilities beyond those offered by their legacy systems. Analytics vendors offer everything from simple-to-use reporting tools to highly sophisticated software for tackling the most complex data analysis problems. A list of the top five analytics and BI application vendors are shown in Table 3.4.
TABLE 3.4 Top Analytics Vendors
Rank | Vendor | Focus | Products |
1 | SAP | Market lines of analytics products that cove BI and reporting, predictive analysis, performance management and governance, risk and compliance applications | SAP Business Objects Predictive Analytics SAP Business Objects BI SAP Business Objects Planning and Consolidation |
2 | SAS | Offer simple desktop solutions to high performance distributed processing solutions | SAS Analytics Pro SAS Enterprise Minder SAS Visual Analytics SAS Customer Intelligence 360 |
3 | IBM | Allow users to quickly discover patterns and meanings in data with guided data discovery, automated predictive analytics, one-click analysis, self-service dashboards, and a natural language dialogue | Watson Analytics |
4 | Oracle | Offer a complete solution for connecting and collaborating with analytics in the cloud. Products allow users to aggregate, experiment, manage, and analyze/act | Oracle Data Integrator Oracle Big Data Cloud Service Oracle R Advanced Analytics for Hadoop BI Cloud Service Oracle Stream Explore |
5 | Microsoft | Provide a broad range of products from standalone solutions to integrated tools that provide data preparation, data discovery, and interactive dashboard capabilities in a single tool | Excel HDInsight Machine Learning Stream Analytics Power BI Embedded |
Continuing developments in data analytics and business intelligence (BI) make it increasingly necessary for organizations to be aware of the differences between these terms and the different ways in which they add value in an organization. The field of BI started in the late 1980s and has been a key to competitive advantage across industries and in enterprises of all sizes. Unlike data analytics that has predictive capabilities, BI is a comprehensive term that refers to analytics and reporting tools that were traditionally used to determine trends in historical data.
Business intelligence (BI) is a set of tools and techniques for acquiring and transforming raw data into meaningful and useful information for business analysis purposes in the forms of reports, dashboards, or interactive visualizations.
The key distinction between data analytics and BI is that analytics uses algorithms to statistically determine the relationships between data whereas BI presents data insights established by data analytics in reports, easy-to-use dashboards, and interactive visualizations. BI can also make it easier for users to ask data-related questions and obtain results that are presented in a way that they can easily understand.
What started as a tool to support sales, marketing, and customer service departments has widely evolved into an enterprise wide strategic platform. While BI software is used in the operational management of divisions and business processes, they are also used to support strategic corporate decision-making. The dramatic change that has taken effect over the last few years is the growth in demand for operational intelligence across multiple systems and businesses—increasing the number of people who need access to increasing amounts of data. Complex and competitive business conditions do not leave much slack for mistakes.
Unfortunately, some companies are not able to use their data efficiently, creating a higher cost to gather information than the benefits it provides. Luckily, BI software brings decision-making information to businesses in as little as two clicks. Small businesses have a shared interest with large corporations to enlist BI to help with decision-making, but they are usually unequipped to build data centers and use funds to hire analysts and IT consultants. However, small business BI software is rapidly growing in the analytics field, and it is increasingly cheaper to implement it as a decision-making tool. Small businesses do not always have workers specialized in certain areas, but BI software makes it easy for all employees to analyze the data and make decisions (King, 2016).
BI provides data at the moment of value to a decision-maker—enabling it to extract crucial facts from enterprise data in real time or near real time. A BI solution with a well-designed dashboard, for example, provides retailers with better visibility into inventory to make better decisions about what to order, how much, and when in order to prevent stock-outs or minimize inventory that sits on warehouse shelves.
Companies use BI solutions to determine what questions to ask and find answers to them. BI tools integrate and consolidate data from various internal and external sources and then process them into information to make smart decisions. BI answers questions such as these: Which products have the highest repeat sales rate in the last six months? Do customer likes on Facebook relate to product purchase? How does the sales trend break down by product group over the last five years? What do daily sales look like in each of my sales regions?
According to The Data Warehousing Institute, BI “unites data, technology, analytics, and human knowledge to optimize business decisions and ultimately drive an enterprise’s success. BI programs usually combine an enterprise data warehouse and a BI platform or tool set to transform data into usable, actionable business information” (The Data Warehousing Institute, 2014). For many years, managers have relied on business analytics to make better-informed decisions. Multiple surveys and studies agree on BI’s growing importance in analyzing past performance and identifying opportunities to improve future performance.
Companies cannot analyze all of their data—and much of them would not add value. Therefore, an unending challenge is how to determine which data to use for BI from what seems like unlimited options (Oliphant, 2016). One purpose of a BI strategy is to provide a framework for selecting the most relevant data without limiting options to integrate new data sources. Information overload is a major problem for executives and for employees. Another common challenge is data quality, particularly with regard to online information, because the source and accuracy might not be verifiable.
Reports and dashboards are delivery tools, but they may not be delivering business intelligence. To get the greatest value out of BI, the CIO needs to work with the CFO and other business leaders to create a BI governance program whose mission is to achieve the following (Ladley, 2016):
After completing these activities, BI analysts can identify the data to use in BI and the source systems. This is a business-driven development approach that starts with a business strategy and work backward to identify data sources and the data that need to be acquired and analyzed.
Businesses want KPIs that can be utilized by both departmental users and management. In addition, users want real-time access to these data so that they can monitor processes with the smallest possible latency and take corrective action whenever KPIs deviate from their target values. To link strategic and operational perspectives, users must be able to drill down from highly consolidated or summarized figures into the detailed numbers from which they were derived to perform in-depth analyses.
BI architecture is undergoing technological advances in response to big data and the performance demands of end-users (Wise, 2016). BI vendors are facing the challenges of social, sensor, and other newer data types that must be managed and analyzed. One technology advance that can help handle big data is BI in the cloud. Figure 3.17 lists the key factors contributing to the increased use of BI. It can be hosted on a public or private cloud. Although cloud services come with more upkeep, optimizing the service and customizing it for one’s company brings undeniable benefits in data security. With a public cloud, a service provider hosts the data and/or software that are accessed via an Internet connection. For private clouds, the company hosts its own data and software, but uses cloud-based technologies.
For cloud-based BI, a popular option offered by a growing number of BI tool vendors is software as a service (SaaS). MicroStrategy offers MicroStrategy Cloud, which provides fast deployment with reduced project risks and costs. This cloud approach appeals to small and midsized companies that have limited IT staff and want to carefully control costs. The potential downsides include slower response times, security risks, and backup risks.
CarMax, Inc. is the nation’s largest retailer of used cars and for a decade has remained one of FORTUNE Magazine’s “100 Best Companies to Work For.” CarMax was the fastest retailer in U.S. history to reach $1 billion in revenues. In 2016 the company had over $15 billion in net sales and operating revenues, representing a 6.2% increase over the prior year’s results. The company grew rapidly because of its compelling customer offer—no-haggle prices and quality guarantees backed by a 125-point inspection that became an industry benchmark—and auto financing. As of November 30, 2016, CarMax operated in 169 locations across 39 U.S. states and had more than 22,000 full- and part-time employees.
CarMax continues to enhance and refine its information systems, which it believes to be a core competitive advantage. CarMax’s IT includes the following:
Throughout CarMax, analytics are used as a strategic asset and insights gained from analytics are available to everyone who needs them.
All organizations create and retain business records. A record is documentation of a business event, action, decision, or transaction. Examples are contracts, research and development, accounting source documents, memos, customer/client communications, hiring and promotion decisions, meeting minutes, social posts, texts, e-mails, website content, database records, and paper and electronic files. Business documents such as spreadsheets, e-mail messages, and word-processing documents are a type of record. Most records are kept in electronic format and maintained throughout their life cycle—from creation to final archiving or destruction by an electronic records management system (ERMS).
Electronic records management system (ERMS) consists of hardware and software that manage and archive electronic documents and image paper documents; then index and store them according to company policy.
One application of an ERMS would be in a company that is required by law to retain financial documents for at least seven years, product designs for many decades, and e-mail messages about marketing promotions for a year. The major ERM tools are workflow software, authoring tools, scanners, and databases. ERM systems have query and search capabilities so documents can be identified and accessed like data in a database. These systems range from those designed to support a small workgroup to full-featured, Web-enabled enterprisewide systems.
Companies need to be prepared to respond to an audit, federal investigation, lawsuit, or any other legal action against them. Types of lawsuits against companies include patent violations, product safety negligence, theft of intellectual property, breach of contract, wrongful termination, harassment, discrimination, and many more.
Because senior management must ensure that their companies comply with legal and regulatory duties, managing electronic records (e-records) is a strategic issue for organizations in both the public and private sectors. The success of ERM depends greatly on a partnership of many key players, namely, senior management, users, records managers, archivists, administrators, and most importantly, IT personnel. Properly managed, records are strategic assets. Improperly managed or destroyed, they become liabilities.
Effective ERM systems capture all business data and documents at their first touchpoint—data centers, laptops, the mailroom, at customer sites, or remote offices. Records enter the enterprise in multiple ways—from online forms, bar codes, sensors, websites, social sites, copiers, e-mails, and more. In addition to capturing the entire document as a whole, important data from within a document can be captured and stored in a central, searchable repository. In this way, the data are accessible to support informed and timely business decisions.
In recent years, organizations such as the Association for Information and Image Management (AIIM), National Archives and Records Administration (NARA), and ARMA International (formerly the Association of Records Managers and Administrators) have created and published industry standards for document and records management. Numerous best practices articles, and links to valuable sources of information about document and records management, are available on their websites. The IT Toolbox describes ARMA’s eight generally accepted recordkeeping principles framework.
Departments or companies whose employees spend most of their day filing or retrieving documents or warehousing paper records can reduce costs significantly with ERM. These systems minimize the inefficiencies and frustration associated with managing paper documents and workflows. However, they do not create a paperless office as had been predicted.
An ERM can help a business to become more efficient and productive by the following:
When workflows are digital, productivity increases, costs decrease, compliance obligations are easier to verify, and green computing becomes possible. Green computing is an initiative to conserve our valuable natural resources by reducing the effects of our computer usage on the environment. You can read about green computing and the related topics of reducing an organization’s carbon footprint, sustainability, and ethical and social responsibilities in Chapter 14.
Businesses also rely on their ERM system for disaster recovery and business continuity, security, knowledge sharing and collaboration, and remote and controlled access to documents. Because ERM systems have multilayered access capabilities, employees can access and change only the documents they are authorized to handle.
When companies select an ERM to meet compliance requirements, they should ask the following questions:
IT at Work 3.5 describes how several companies currently use ERM. Simply creating backups of records is not sufficient because the content would not be organized and indexed to retrieve them accurately and easily. The requirement to manage records—regardless of whether they are physical or digital—is not new.
Here some examples of how companies use ERM in the health-care, finance, and education sectors:
Understand how businesses benefit from managing data, data architecture, data analytics and business intelligence to sustain a competitive advantage in the marketplace.
active data warehouse (ADW)
big data
big data analytics
business analytics
business intelligence (BI)
business record
business-driven development approach
centralized database
change data capture (CDC)
data analytics
data entity
data management
data marts
data mining
data warehouse
database
database management system (DBMS)
decision model
declarative language
dirty data
distributed database
electronic records management system (ERMS)
extract, transform and load (ETL)
enterprise data warehouses (EDWs)
eventual consistency
fault tolerance
Hadoop
information overload
immediate consistency
latency
MapReduce
master data management (MDM)
NoSQL
online transaction processing (OLTP)
systems
online analytical processing (OLAP)
systems
petabyte
query
relational database
relational management systems (RDBMSs)
sentiment analysis
scalability
structured query language (SQL)
text mining
Sam would prefer a system that lets employees find and work with business documents without leaving their desks. He’s most concerned about the human resources and accounting departments. These personnel are traditional heavy users of paper files and would greatly benefit from a modern document management system. At the same time, however, Sam is also risk averse. He would rather invest in solutions that would reduce the risk of higher costs in the future. He recognizes that the U.S. PATRIOT Act’s requirements that organizations provide immediate government access to records apply to SSC. He has read that manufacturing and government organizations rely on efficient document management to meet these broader regulatory imperatives. Finally, Sam wants to implement a disaster recovery system.
Prepare a report that provides Sam with the data he needs to evaluate the company’s costly paper-intensive approach to managing documents. You will need to conduct research to provide data to prepare this report. Your report should include the following information:
With 62 million daily customers and an annual revenue of $27 billion, McDonald’s has a virtually unrivaled amount of data at its disposal to analyze. In order to dominate the market, retain its loyal customers, and attract new customers who are skeptical of McDonald’s practices and quality, it lends itself to its data, becoming an “information centric organization.” What does it mean to be information centric? Instead of using a fixed process of production, service, etc. as a business plan that is product driven, McDonald’s uses customer data to dictate its next move as a customer-driven corporation. During the inception of McDonald’s in 1940, the McDonald brothers derived a product-driven business centered around fast service and tasty food. While that method was successful before other restaurants entered the fast food market, growth was stunted due to a lack of innovation and change. So, the organization began to collect customer data as a means to monitor successful products, customer demands, and the results of marketing campaigns.
This venture led to McDonald’s becoming the premier fast food chain across the United States in the 1980s. Soon after becoming a customer-driven corporation, McDonald’s introduced the Happy Meal so families with small children could reduce costs and waste at dinner time, released the Egg McMuffin as the most successful breakfast item of all time, equipped professionals and teenagers with free Wi-Fi to expand its customer segmentation, and provided nutrition details to become the most transparent fast food chain at the time. All of these improvements derived from McDonald’s using its immense amount of data to set its chain apart from the rest.
In 2008, to further improve its ability to leverage big data, McDonald’s made the transition from average-based metrics to trend analytics. The issue with average-based metrics is that it is hard to compare regions and stores. A store could be growing in its sales and productivity but have the same average metrics as a store that is declining. Using trend analytics allowed McDonald’s to combine multiple datasets from multiple data sources to visualize and understand cause-and-effect relationships in individual stores and regions. The correlations it found enabled its analysts to prescribe solutions to problems in sales, production, turnover, and supply chain management to reduce costs and save time. The variables it studies allows McDonald’s to create a standardized experience across the world. However, analyzing local data in each store produces minor changes around the organization. For example, most McDonald’s locations look the same, but each restaurant is slightly different and optimized for the local market.
A great example of McDonald’s big data analysis in action is its updated drive-thru system. All fast food chains have bottlenecks in their drive-thru lanes, but McDonald’s average customer wait time is about 3 minutes, which is close to the industry’s longest wait time of 214 seconds. One of the most prominent issues in its drive-thru was that customers going through the line for dinner, ordering large meals and searching over the menu for an extended period of time, created a negative experience for each car in line behind them. In response, McDonald’s optimized the drive-thru across three components: design, information, and people. Design focused on the improvements to the drive-thru, including better speaker quality and higher resolution, digital menu boards. Information centered around what was on the menu board. In order to decrease order times, McDonald’s removed about 40% of the drive-thru menu board. In its third aspect, people, the fast food chain attempted to reduce the negative experiences for those in line by creating a second drive-thru line with a designated order taker for each line, a third drive-thru window, and two production lines.
Another example showing McDonald’s commitment to being a customer-driven corporation is its introduction of all day breakfast, which was the highest priority for customers across the United States. Being the corporation with by far the largest share of the fast food market, McDonald’s will continue to use its growing data sets to provide the best experience and food to its customers (van Rijmenam, 2016).
Sources: Compiled from Van Rijmenam (2016) and McDonald’s (2017).
Verizon leverages Teradata’s data analytics platform to shift its operations from qualitative decision-making to evidence-based and data-driven decision-making to improve the customer experience. Visit www.Teradata.com and search for the video “Verizon: Using Advanced Analytics to Deliver on Their Digital Promise to Help Customers Innovate Their Lifestyle.”
Research the concept of Big Data. Find at least one company that maintains a “Big Data” database.
Provide at least two hyperlinked references to back up your findings (one for the organization you chose to discuss and one for the concept of big data in general). Post your findings and respond to at least two comments posted by your fellow students.
The Framework for generally accepted recordkeeping principles is a useful tool for managing business records to ensure that they support an enterprise’s current and future regulatory, legal, risk mitigation, environmental, and operational requirements.
The framework consists of eight principles or best practices, which also support data governance. These principles were created by ARMA International and legal and IT professionals.