CHAPTER 3
Data Management, Data Analytics, and Business Intelligence

Introduction

As discussed in Chapter 2, collecting and maintaining trusted data is a critical aspect of any business. Knowing how and where to find data, store it efficiently, analyze it in new ways to increase the organization’s competitive advantage, and enable the right people to access it at the right time are all fundamental components of managing the ever-increasing amounts of corporate data. Indeed, data analytics is the primary differentiator when doing business in the 21st century. Transactional, social, mobile, cloud, Web, and sensor data offer enormous potential. But without tools to analyze these data types and volumes, there would not be much difference between business in the 20th century and business today—except for mobile access. High-quality data and human expertise are essential to the value of analytics.

Human expertise is necessary because analytics alone cannot explain the reasons for trends or relationships; know what action to take; or provide sufficient context to determine what the numbers represent and how to interpret them.

Database, data warehouse, data analytics, and business intelligence (BI) technologies interact to create a new biz-tech ecosystem. Data analytics and BI discover insights or relationships of interest that otherwise might not have been recognized. They make it possible for managers to make decisions and act with clarity, speed, and confidence. Data analytics is not just about managing more or varied data. Rather, it is about asking new questions, formulating new hypotheses, exploration and discovery, and making data-driven decisions. Ultimately, a big part of data analysis efforts is the use of new analytics techniques.

Mining data or text taken from day-to-day business operations reveals valuable information, such as customers’ desires, products that are most important, or processes that can be made more efficient. These insights expand the ability to take advantage of opportunities, minimize risks, and control costs.

While you might think that physical pieces of paper are a relic of the past, in most offices the opposite is true. Aberdeen Group’s survey of 176 organizations worldwide found that the volume of physical documents is growing by up to 30% per year. Document management technology archives digital and physical data to meet business needs, as well as regulatory and legal requirements (Eisenhauer, 2015).

3.1 Data Management and Database Technologies

Due to the incredible volume of data that the typical organization creates, effective data management is vital to prevent storage costs from spiraling out of control and controlling data growth while supporting greater performance. Data management oversees the end-to-end lifecycle of data from creation and initial storage to the time when it becomes obsolete and is deleted.

The objectives of data management include the following:

  1. Mitigating the risks and costs of complying with regulations.
  2. Ensuring legal requirements are met.
  3. Safeguarding data security.
  4. Maintaining accuracy of data and availability.
  5. Certifying consistency in data that come from or go to multiple locations.
  6. Ensuring that data conform to organizational best practices for access, storage, backup, and disposal.

Typically, newer data, and data that is accessed more frequently, is stored on faster, but more expensive storage media while less critical data is stored on cheaper, slower media.

The main benefits of data management include greater compliance, higher security, less legal liability, improved sales and marketing strategies, better product classification, and improved data governance to reduce risk. The following data management technologies keep users informed and support the various business demands:

  • Databases store data generated by business apps, sensors, operations, and transaction-processing systems (TPS). Data in some databases can be extremely volatile. Medium and large enterprises typically have many databases of various types—centralized and distributed.
  • Data warehouses integrate data from multiple databases and data silos across the organization, and organize them for complex analysis, knowledge discovery, and to support decision-making. For example, data are extracted from a database, processed to standardize their format, and then loaded into data warehouses at specific times, such as weekly. As such, data in data warehouses are nonvolatile—and are ready for analysis.
  • Data marts are small-scale data warehouses that support a single function or one department. Enterprises that cannot afford to invest in data warehousing may start with one or more data marts.
  • Business intelligence (BI)—tools and techniques process data and do statistical analysis for insight and discovery—that is, to discover meaningful relationships in the data, keep informed in real time, detect trends, and identify opportunities and risks.

Each of these database management technologies will be discussed in greater detail later in this chapter.

Database Management Systems and SQL

Data-processing techniques, processing power, and enterprise performance management capabilities have undergone revolutionary advances in recent years for reasons you are already familiar with—big data, mobility, and cloud computing. The last decade, however, has seen the emergence of new approaches, first in data warehousing and, more recently, for transaction processing. Given the huge number of transactions that occur daily in an organization, the data in databases are constantly in use or being updated. The volatility of databases makes it impossible to use them for complex decision-making and problem-solving tasks. For this reason, data are extracted from the database, transformed (processed to standardize the data), and then loaded into a data warehouse.

Database management systems (DBMSs) integrate with data collection systems such as TPS and business applications; store the data in an organized way; and provide facilities for accessing and managing that data. Factors to consider when evaluating the performance of a database management system are listed in Tech Note 3.1. Over the past 25 years, the relational database has been the standard database model adopted by most enterprises. Relational databases store data in tables consisting of columns and rows, similar to the format of a spreadsheet, as shown in Figure 3.3.

Illustration for structured data format.

FIGURE 3.3 Illustration of structured data format. Numeric and alphanumeric data are arranged into rows and predefined columns similar to those in an Excel spreadsheet.

Relational management systems (RDBMSs) provide access to data using a declarative language—structured query language (SQL). Declarative languages simplify data access by requiring that users only specify what data they want to access without defining how access will be achieved. The format of a basic SQL statement is

  • SELECT column_name(s)
  • FROM table_name
  • WHERE condition

An instance of SQL is shown in Figure 3.4.

Illustration for instance of SQL to access employee information based on date of hire.

FIGURE 3.4 An instance of SQL to access employee information based on date of hire.

DBMS Functions

An accurate and consistent view of data throughout the enterprise is needed so one can make informed, actionable decisions that support the business strategy. Functions performed by a DBMS to help create such a view are shown in Figure 3.5.

Illustration for DBMS functions.

FIGURE 3.5 DBMS functions.

Select the caption to view an interactive version of this figure online.

Online Transaction Processing and Online Analytics Processing

When most business transactions occur—for instance, an item is sold or returned, an order is sent or cancelled, a payment or deposit is made—changes are made immediately to the database. These online changes are additions, updates, or deletions. DBMSs record and process transactions in the database, and support queries and reporting. Given their functions, DBMSs are referred to as online transaction processing (OLTP) systems. OLTP is a database design that breaks down complex information into simpler data tables to strike a balance between transaction-processing efficiency and query efficiency. OLTP databases process millions of transactions per second. However, databases cannot be optimized for data mining, complex online analytics processing (OLAP) systems, and decision support. These limitations led to the introduction of data warehouse technology. Data warehouses and data marts are optimized for OLAP, data mining, BI, and decision support. OLAP is a term used to describe the analysis of complex data from the data warehouse. In summary, databases are optimized for extremely fast transaction processing and query processing. Data warehouses are optimized for analysis.

DBMS and Data Warehousing Vendors Respond to Latest Data Demands

One of the major drivers of change in the data management market is the increased amount of data to be managed. Enterprises need powerful DBMSs and data warehousing solutions, analytics, and reporting. The four vendors that dominate this market—Oracle, IBM, Microsoft, and Teradata—continue to respond to evolving data management needs with more intelligent and advanced software and hardware. Advanced hardware technology enables scaling to much higher data volumes and workloads than previously possible, or it can handle specific workloads. Older general-purpose relational databases DBMSs lack the scalability or flexibility for specialized or very large workloads, but are very good at what they do.

Trend Toward NoSQL Systems

RDBMSs are still the dominant database engines, but the trend toward NoSQL (short for “not only SQL”) systems is clear. NoSQL systems increased in popularity by 96% from 2014 to 2016. Although NoSQL have existed for as long as relational DBMS, the term itself was not introduced until 2009. That was when many new systems were developed in order to cope with the unfolding requirements for DBMS—namely, handling big data, scalability, and fault tolerance for large Web applications. Scalability means the system can increase in size to handle data growth or the load of an increasing number of concurrent users. To put it differently, scalable systems efficiently meet the demands of high-performance computing. Fault tolerance means that no single failure results in any loss of service.

NoSQL systems are such a heterogeneous group of database systems that attempts to classify them are not very helpful. However, their general advantages are the following:

  • higher performance
  • easy distribution of data on different nodes, which enables scalability and fault tolerance
  • greater flexibility
  • simpler administration

Starting in 2010 and continuing through 2016, Microsoft has been working on the first rewrite of SQL Server’s query execution since Version 7 was released in 1998. The goal is to offer NoSQL-like speeds without sacrificing the capabilities of a relational database.

With most NoSQL offerings, the bulk of the cost does not lie in acquiring the database, but rather in implementing it. Data need to be selected and migrated (moved) to the new database. Microsoft hopes to reduce these costs by offering migration solutions.

DBMS Vendor Rankings

The top five enterprise database systems of 2016 are Oracle’s 12c Database, Microsoft SQL Server, IBM DB2, SAP Sybase ASE, and PostgreSQL:

  1. Oracle 12c Database consolidates and manages databases as cloud services via Oracle’s multitenant architecture and in-memory data processing capabilities and can be rapidly provisioned.
  2. Microsoft SQL Server ease of use, availability, and Windows operating system integration make it an easy choice for firms that choose Microsoft products for their enterprises.
  3. IBM DB2 is widely used in large data centers and runs on Linux, UNIX, Windows, IBM iSeries, and mainframes.
  4. SAP Sybase ASE is a major force after 25 years of success and improvements. Supports partition locking, relaxed query limits, query plan optimization, and dynamic thread assignment.
  5. PostgreSQL is the most advanced open source database, often used by online gaming applications and Skype, Yahoo!, and MySpace. This database runs on a wide variety of operating systems including Linux, Windows, FreeBSD, and Solaris.

Concept Check 3.1

  1. Data management involve all of the following, EXCEPT:
a. Mitigating risks and costs of complying with regulations
b. Ensuring legal requirements are met
c. Certifying consistency of data
d. Inputting day-to-day sales records
Correct or Incorrect?

 

  1. A _________________________ is a collection of data sets or records stored in a systematic way.
a. Database
b. Data center
c. Data store
d. Data mart
Correct or Incorrect?

 

  1. The functions of a database management system (DBMS) include:
a. Data filtering and profiling
b. Data integrity and maintenance
c. Data security
d. All of the above
Correct or Incorrect?

 

3.2 Centralized and Distributed Database Architectures

Databases can be centralized or distributed, as shown in Figure 3.6. Both types of databases need one or more backups and should be archived on- and offsite in case of a crash or security incident.

Illustration for Comparison of (a) centralized and (b) distributed databases.

FIGURE 3.6 Comparison of (a) centralized and (b) distributed databases.

For decades the main database platform consisted of centralized database files on massive mainframe computers. Benefits of centralized database configurations include the following:

  1. Better control of data quality Data consistency is easier when data are kept in one physical location because data additions, updates, and deletions can be made in a supervised and orderly fashion.
  2. Better IT security Data are accessed via the centralized host computer, where they can be protected more easily from unauthorized access or modification.

A major disadvantage of centralized databases, like all centralized systems, is transmission delay when users are geographically dispersed. More powerful hardware and networks compensate for this disadvantage.

In contrast, distributed databases use client/server architecture to process information requests. The databases are stored on servers that reside in the company’s data centers, a private cloud, or a public cloud (Figure 3.7). Advantages of a distributed database include reliability—if one site crashes, the system will keep running—and speed—it’s faster to search a part of a database than the whole. However, if there’s a problem with the network that the distributed database is using, it can cause availability issues and the appropriate hardware and software can be expensive to purchase.

Illustration for Distributed database architecture for headquarters, manufacturing, and sales and marketing.

FIGURE 3.7 Distributed database architecture for headquarters, manufacturing, and sales and marketing.

Garbage In, Garbage Out

Data collection is a highly complex process that can create problems concerning the quality of the data being collected. Therefore, regardless of how the data are collected, they need to be validated so users know they can trust them. Classic expressions that sum up the situation are “garbage in, garbage out” (GIGO) and the potentially riskier “garbage in, gospel out.” In the latter case, poor-quality data are trusted and used as the basis for planning. For example, you have probably encountered data safeguards, such as integrity checks, to help improve data quality when you fill in an online form, such as when the form will not accept an e-mail address or a credit card number that is not formatted correctly.

Table 3.2 lists the characteristics typically associated with dirty or poor-quality data.

TABLE 3.2 Characteristics of Poor-Quality or Dirty Data

Characteristic of Dirty Data Description
Incomplete Missing data
Outdated or invalid Too old to be valid or useful
Incorrect Too many errors
Duplicated or in conflict Too many copies or versions of the same data―and the versions are inconsistent or in conflict with each other
Nonstandardized Data are stored in incompatible formats―and cannot be compared or summarized
Unusable Data are not in context to be understood or interpreted correctly at the time of access

Dirty Data Costs and Consequences

As discussed in Chapter 2, too often managers and information workers are actually constrained by data that cannot be trusted because they are incomplete, out of context, outdated, inaccurate, inaccessible, or so overwhelming that they require weeks to analyze. In such situations, the decision-maker is facing too much uncertainty to make intelligent business decisions.

On average, an organization experiences 40% data growth annually, and 20% of that data is found to be dirty. Each dirty data point, or record, costs $100 if not resolved (RingLead, 2015). The costs of poor-quality data spread throughout a company, affecting systems from shipping and receiving to accounting and customer service. Data errors typically arise from the functions or departments that generate or create the data—and not within the IT department. When all costs are considered, the value of finding and fixing the causes of data errors becomes clear. In a time of decreased budgets, some organizations may not have the resources for such projects and may not even be aware of the problem. Others may be spending most of their time fixing problems, thus leaving them with no time to work on preventing them. However, the benefits of acting preventatively against dirty data are astronomical. It costs $1 to prevent and $10 to correct dirty data. While the short-run cost of cleaning and preventing dirty data is unrealistic for some companies, the long-term conclusion is far more expensive (Kramer, 2015).

Bad data are costing U.S. businesses hundreds of billions of dollars a year and affecting their ability to ride out the tough economic climate. Incorrect and outdated values, missing data, and inconsistent data formats can cause lost customers, sales, and revenue; misallocation of resources; and flawed pricing strategies.

Consider a corporation that follows the cost structure associated with clean/dirty data explained above with 100,000 data points. Over a three-year span, by cleaning the 20% of dirty data during the first year and using prevention methods for the following years, the corporation will save $8,495,000. Purely based on the quality of its data, a corporation with a large amount of data can hypothetically increase its revenue by 70% (RingLead, 2015).

The cost of poor-quality data may be expressed as a formula:

Cost of Poor-Quality Data=Lost Business+Cost to Prevent Errors+Cost to Correct Errors

Examples of these costs include the following:

  • Lost business Business is lost when sales opportunities are missed, orders are returned because wrong items were delivered, or errors frustrate and drive away customers.
  • Time spent preventing errors If data cannot be trusted, then employees need to spend more time and effort trying to verify information in order to avoid mistakes.
  • Time spent correcting errors Database staff need to process corrections to the database. For example, the costs of correcting errors at U-rent Corporation are estimated as follows:
    1. Two database staff members spend 25% of their workday processing and verifying data corrections each day:
      2 people * 25% of 8 hours/day = 4 hours/day correcting errors
    2. Hourly salaries are $50 per hour based on pay rate and benefits:
      $50/hour * 4 hours/day = $200/day correcting errors
    3. 250 workdays per year:
      $200/day * 250 days = $50,000/year to correct errors

For a particular company, it is difficult to calculate the full cost of poor-quality data and its long-term effects. Part of the difficulty is the time delay between the mistake and when it is detected. Errors can be very difficult to correct, especially when systems extend across the enterprise. Another concern is that the impacts of errors can be unpredictable, far-reaching, and serious.

Data Ownership and Organizational Politics

Compliance with numerous federal and state regulations relies on rock-solid data and trusted metrics used for regulatory reporting. Data ownership, data quality, and formally managed data are high priorities on the agenda of CFOs and CEOs who are held personally accountable if their company is found to be in violation of regulations.

Despite the need for high-quality data, organizational politics and technical issues make that difficult to achieve. The source of the problem is data ownership—that is, who owns or is responsible for the data. Data ownership problems exist when there are no policies defining responsibility and accountability for managing data. Inconsistent data formats of various departments create an additional set of problems as organizations try to combine individual applications into integrated enterprise systems.

The tendency to delegate data-quality responsibilities to the technical teams who have no control over data quality, as opposed to business users who do have such control, is another common pitfall that stands in the way of accumulating high-quality data.

Those who manage a business or part of a business are tasked with trying to improve business performance and retain customers. Compensation is tied to improving profitability, driving revenue growth, and improving the quality of customer service. These key performance indicators (KPIs) are monitored closely by senior managers who want to find and eliminate defects that harm performance. It is strange then that so few managers take the time to understand how performance is impacted by poor-quality data. Two examples make a strong case for investment in high-quality data.

Retail banking: For retail bank executives, risk management is the number one issue. Disregard for risk contributed to the 2008 financial services meltdown. Despite risk management strategies, many banks still incur huge losses. Part of the problem in many banks is that their ISs enable them to monitor risk only at the product level—mortgages, loans, or credit cards. Product-level risk management ISs monitor a customer’s risk exposure for mortgages, or for loans, or for credit cards, and so forth—but not for a customer for all products. With product-level ISs, a bank cannot see the full risk exposure of a customer. The limitations of these siloed product-level risks have serious implications for business performance because bad-risk customers cannot be identified easily, and customer data in the various ISs may differ. However, banks are beginning to use big data to analyze risk more effectively. Although they are still very limited to credit card, loan, and mortgage risk data, cheaper and faster computing power allows them to keep better and more inclusive records of customer data. Portfolio monitoring offers earlier detection and predictive analytics for potential customers, and more advanced risk models show intricate patterns unseen by the naked eye in large data sets. Also, more fact-based inputs and standardized organizational methods are being implemented to reduce loan and credit officer bias to take risks on undesirable customers.

Marketing: Consider what happens when each product-level risk management IS feeds data to marketing ISs. Marketing may offer bad-risk customers incentives to take out another credit card or loan that they cannot repay. And since the bank cannot identify its best customers either, they may be ignored and enticed away by better deals offered by competitors. This scenario illustrates how data ownership and data-quality management are critical to risk management. Data defects and incomplete data can quickly trigger inaccurate marketing and mounting losses. Banks’ increasing dependence on business modeling requires that risk managers understand and manage model risk better. Although losses often go unreported, the consequences of errors in the model can be extreme. For instance, a large Asia–Pacific bank lost $4 billion when it applied interest-rate models that contained incorrect assumptions and data-entry errors. Risk mitigation will entail rigorous guidelines and processes for developing and validating models, as well as the constant monitoring and improvement of them (Harle et al., 2016).

Manufacturing: Many manufacturers are at the mercy of a powerful customer base—large retailers. Manufacturers want to align their processes with those of large retail customers to keep them happy. This alignment makes it possible for a retailer to order centrally for all stores or to order locally from a specific manufacturer. Supporting both central and local ordering makes it difficult to plan production runs. For example, each manufacturing site has to collect order data from central ordering and local ordering systems to get a complete picture of what to manufacture at each site. Without accurate, up-to-date data, orders may go unfilled, or manufacturers may have excess inventory. One manufacturer who tried to keep its key retailer happy by implementing central and local ordering could not process orders correctly at each manufacturing site. No data ownership and lack of control over how order data flowed throughout business operations had negative impacts. Conflicting and duplicate business processes at each manufacturing site caused data errors, leading to mistakes in manufacturing, packing, and shipments. Customers were very dissatisfied.

These examples demonstrate the consequences of a lack of data ownership and data quality. Understanding the impact mismanaged data can have on business performance highlights the need to make data ownership and data accuracy a high priority.

Data Life Cycle and Data Principles

The data life cycle is a model that illustrates the way data travel through an organization as shown in Figure 3.8. The data life cycle begins with storage in a database, to being loaded into a data warehouse for analysis, then reported to knowledge workers or used in business apps. Supply chain management (SCM), customer relationship management (CRM), and e-commerce are enterprise applications that require up-to-date, readily accessible data to function properly.

Overview of Data life cycle.

FIGURE 3.8 Data life cycle.

Three general data principles relate to the data life cycle perspective and help to guide IT investment decisions:

  1. Principle of diminishing data value The value of data diminishes as they age. This is a simple, yet powerful principle. Most organizations cannot operate at peak performance with blind spots (lack of data availability) of 30 days or longer. Global financial services institutions rely on near real-time data for peak performance.
  2. Principle of 90/90 data use According to the 90/90 data-use principle, a majority of stored data, as high as 90%, is seldom accessed after 90 days (except for auditing purposes). That is, roughly 90% of data lose most of their value after three months.
  3. Principle of data in context The capability to capture, process, format, and distribute data in near real time or faster requires a huge investment in data architecture (Chapter 2) and infrastructure to link remote POS systems to data storage, data analysis systems, and reporting apps. The investment can be justified on the principle that data must be integrated, processed, analyzed, and formatted into “actionable information.”

Master Data and Master Data Management

As data become more complex and their volumes explode, database performance degrades. One solution is the use of master data and master data management (MDM) as introduced in Chapter 2. MDM processes integrate data from various sources or enterprise applications to create a more complete (unified) view of a customer, product, or other entity. Figure 3.9 shows how master data serve as a layer between transactional data in a database and analytical data in a data warehouse. Although vendors may claim that their MDM solution creates “a single version of the truth,” this claim is probably not true. In reality, MDM cannot create a single unified version of the data because constructing a completely unified view of all master data is simply not possible.

Illustration for enterprise having transactional, master, and analytical data.

FIGURE 3.9 An enterprise has transactional, master, and analytical data.

Select the caption to view an interactive version of this figure online.

Master Reference File and Data Entities

Realistically, MDM consolidates data from various data sources into a master reference file, which then feeds data back to the applications, thereby creating accurate and consistent data across the enterprise. In IT at Work 3.1, participants in the health-care supply chain essentially developed a master reference file of its key data entities. A data entity is anything real or abstract about which a company wants to collect and store data. Master data entities are the main entities of a company, such as customers, products, suppliers, employees, and assets.

Each department has distinct master data needs. Marketing, for example, is concerned with product pricing, brand, and product packaging, whereas production is concerned with product costs and schedules. A customer master reference file can feed data to all enterprise systems that have a customer relationship component, thereby providing a more unified picture of customers. Similarly, a product master reference file can feed data to all the production systems within the enterprise.

An MDM includes tools for cleaning and auditing the master data elements as well as tools for integrating and synchronizing data to make them more accessible. MDM offers a solution for managers who are frustrated with how fragmented and dispersed their data sources are.

Concept Check 3.2

  1. The main purpose of master data management (MDM) is to:
a. Consolidate data
b. Collect data
c. Categorize data
d. Distribute data
Correct or Incorrect?

 

  1. A database can be:
a. Decentralized and integrated
b. Centralized and distributed
c. Distributed and integrated
d. Controlled and centralized
Correct or Incorrect?

 

  1. Poor quality data that cannot be trusted is commonly referred to as:
a. Untrusted data
b. Poor quality data
c. Missing data
d. Dirty data
Correct or Incorrect?

 

  1. The ___________ is a model that illustrates the way data travel through an organization.
a. Data model
b. Data chain
c. Data life cycle
d. Data rotation
Correct or Incorrect?

 

  1. The quality of an organizations’ data can be difficult to maintain due to:
a. Strict policies defining responsibility and accountability for managing data
b. Consistent data formats across departments
c. Delegation of data-quality responsibilities to technical team
d. Delegation of data-quality responsibility to business users
Correct or Incorrect?

 

3.3 Data Warehouses

Data warehouses are the primary source of cleansed data for analysis, reporting, and business intelligence (BI). Often the data are summarized in ways that enable quick responses to queries. For instance, query results can reveal changes in customer behavior and drive the decision to redevelop the advertising strategy.

Data warehouses that pull together data from disparate sources and databases across an entire enterprise are called enterprise data warehouses (EDWs).

Data warehouses store data from various source systems and databases across an enterprise in order to run analytical queries against huge datasets collected over long time periods.

The high cost of data warehouses can make them too expensive for a company to implement. Data marts are lower-cost, scaled-down versions of a data warehouse that can be implemented in a much shorter time, for example, in less than 90 days. Data marts serve a specific department or function, such as finance, marketing, or operations. Since they store smaller amounts of data, they are faster and easier to use, and navigate.

Procedures to Prepare EDW Data for Analytics

Consider a bank’s database. Every deposit, withdrawal, loan payment, or other transaction adds or changes data. The volatility caused by constant transaction processing makes data analysis difficult—and the demands to process millions of transactions per second consume the database’s processing power. In contrast, data in warehouses are relatively stable, as needed for analysis. Therefore, select data are moved from databases to a warehouse. Specifically, data are as follows:

  1. Extracted from designated databases.
  2. Transformed by standardizing formats, cleaning the data, integrating them.
  3. Loaded into a data warehouse.

These three procedures—extract, transform, and load—are referred to by their initials ETL (Figure 3.10). In a warehouse, data are read-only; that is, they do not change until the next ETL.

Illustration of Data entering databases from transaction systems.

FIGURE 3.10 Data enter databases from transaction systems. Data of interest are extracted from databases, transformed to clean and standardize them, and then loaded into a data warehouse. These three processes are called ETL.

Three technologies involved in preparing raw data for analytics include ETL, change data capture (CDC), and data deduplication (“deduping the data”). CDC processes capture the changes made at data sources and then apply those changes throughout enterprise data stores to keep data synchronized. CDC minimizes the resources required for ETL processes by only dealing with data changes. Deduping processes remove duplicates and standardize data formats, which helps to minimize storage and data synch.

Building a Data Warehouse

Figure 3.11 diagrams the process of building and using a data warehouse. The organization’s data from operational transaction processes systems are stored in operational databases (left side of the figure). Not all data are transferred to the data warehouse. Frequently, only summary data are transferred. The warehouse organizes the data in multiple ways—by subject, functional area, vendor, and product. As shown, the data warehouse architecture defines the flow of data that starts when data are captured by transaction systems; the source data are stored in transactional (operational) databases; ETL processes move data from databases into data warehouses or data marts, where the data are available for access, reports, and analysis.

Illustration of Database, data warehouse and marts, and BI architecture.

FIGURE 3.11 Database, data warehouse and marts, and BI architecture.

Real-Time Support from an Active Data Warehouse

Early data warehouse technology primarily supported strategic applications that did not require instant response time, direct customer interaction, or integration with operational systems. ETL might have been done once per week or once per month. But, demand for information to support real time customer interaction and operations leads to real-time data warehousing and analytics—known as an active data warehouse (ADW). Massive increases in computing power, processing speeds, and memory made ADW possible. ADW are not designed to support executives’ strategic decision-making, but rather to support operations. For example, shipping companies like DHL use huge fleets of trucks to move millions of packages. Every day and all day, operational managers make thousands of decisions that affect the bottom line, such as: “Do we need four trucks for this run?” “With two drivers delayed by bad weather, do we need to bring in extra help?” Traditional data warehousing is not suited for immediate operational support, but active data warehousing is. For example, companies with an ADW are able to:

  • Interact with a customer to provide superior customer service.
  • Respond to business events in near real time.
  • Share up-to-date status data among merchants, vendors, customers, and associates.

Here are some examples of how two companies use ADW.

Capital One. Capital One uses its ADW to track each customer’s “profitability score” to determine the level of customer service to provide for that person. Higher-cost personalized service is only given to those with high scores. For instance, when a customer calls Capital One, he or she is asked to enter a credit card number, which is linked to a profitability score. Low-profit customers get a voice response unit only; high-profit customers are connected to a live customer service representative (CSR) because the company wants to minimize the risk of losing those customers.

Travelocity. If you use Travelocity, an ADW is finding the best travel deals especially for you. The goal is to use “today’s data today” instead of “yesterday’s data today.” The online travel agency’s ADW analyzes your search history and destinations of interest; then predicts travel offers that you would most likely purchase. Offers are both relevant and timely to enhance your experience, which helps close the sale in a very competitive market. For example, when a customer is searching flights and hotels in Las Vegas, Travelocity recognizes the interest—the customer wants to go to Vegas. The ADW searches for the best-priced flights from all carriers, builds a few package deals, and presents them in real time to the customer. When customers see a personalized offer they are already interested in, the ADW helps generate a better customer experience. The real-time data-driven experience increases the conversion rate and sales.

Data warehouse content can be delivered to decision-makers throughout the enterprise via the cloud or company-owned intranets. Users can view, query, and analyze the data and produce reports using Web browsers. These are extremely economical and effective data delivery methods.

Data Warehousing Supports Action as well as Decisions

Many organizations built data warehouses because they were frustrated with inconsistent data that could not support decisions or actions. Viewed from this perspective, data warehouses are infrastructure investments that companies make to support ongoing and future operations, including the following:

  • Marketing Keeps people informed of the status of products, marketing program effectiveness, and product line profitability; and allows them to take intelligent action to maximize per-customer profitability.
  • Pricing and contracts Calculates costs accurately in order to optimize pricing of a contract. Without accurate cost data, prices may be below or too near to cost; or prices may be uncompetitive because they are too high.
  • Forecasting Estimates customer demand for products and services.
  • Sales Calculates sales profitability and productivity for all territories and regions; analyzes results by geography, product, sales group, or individual.
  • Financial Provides real-time data for optimal credit terms, portfolio analysis, and actions that reduce risk or bad debt expense.

Table 3.3 summarizes several successful applications of data warehouses.

TABLE 3.3 Data Warehouse Applications by Industry

Industry Applications
Airline Crew assignment, aircraft deployment, analysis of route profitability, and customer loyalty promotions
Banking and financial Customer service, trend analysis, product and service services promotions, and reduction of IS expenses
Credit card Customer service, new information service for a fee, fraud detection
Defense contracts Technology transfer, production of military applications
E-business Data warehouses with personalization capabilities, marketing/shopping preferences allowing for up-selling and cross-selling
Government Reporting on crime areas, homeland security
Health care Reduction of operational expenses
Investment and insurance Risk management, market movements analysis, customer tendencies analysis, and portfolio management
Retail chain Trend analysis, buying pattern analysis, pricing policy, inventory control, sales promotions, and optimal distribution channel decision

Concept Check 3.3

  1. The four V’s of data analytics are:
a. Variety, volume, velocity and veracity
b. Variety, velocity, veracity and variability
c. Velocity, veracity, variability and validity
d. Variability, volume, velocity and versatility
Correct or Incorrect?

 

  1. To obtain actionable information you need:
a. High quality data
b. Human expertise and judgment
c. Data analytics
d. All of the above
Correct or Incorrect?

 

  1. Lower cost data warehouses that are easier to implement are referred to as:
a. Enterprise data warehouse
b. Company data warehouse
c. Data mart
d. Data store
Correct or Incorrect?

 

  1. _______________________ is an important tool across organizations that helps users discover meaningful real-time insights to meet customer expectations, achieve better results and stay competitive.
a. Data analytics
b. Database management system
c. Spreadsheet
d. Relational database
Correct or Incorrect?

 

  1. Which of the following is an advantage of collecting sensor data?
a. Sensors are embedded in equipment
b. Sensor data can be analyzed in real time
c. Sensor data is cheap
d. Sensors never fail
Correct or Incorrect?

 

3.4 Big Data Analytics and Data Discovery

Like mobile and cloud, big data and advanced data analytics are reshaping organizations and business processes to increase efficiency and improve performance. Research firm IDC forecasts that big data and analytics spending will reach $187 billion in 2019 (Ovalsrud, 2016).

Data analytics is an important tool across organizations, which helps users discover meaningful real-time insights to meet customer expectations, achieve better results and stay competitive. These deeper insights combined with human expertise enable people to recognize meaningful relationships more quickly or easily; and furthermore, realize the strategic implications of these situations. Imagine trying to make sense of the fast and vast data generated by social media campaigns on Facebook or by sensors attached to machines or objects. Low-cost sensors make it possible to monitor all types of physical things—while analytics makes it possible to understand those data in order to take action in real time. For example, sensors data can be analyzed in real time:

  • To monitor and regulate the temperature and climate conditions of perishable foods as they are transported from farm to supermarket.
  • To sniff for signs of spoilage of fruits and raw vegetables and detect the risk of E. coli contamination.
  • To track the condition of operating machinery and predict the probability of failure.
  • To track the wear of engines and determine when preventive maintenance is needed.

In this section, you will learn about the value, challenges, and technologies involved in putting data and analytics to use to support decisions and action, together with examples of skill sets currently in high demand by organizations expanding their efforts to train, hire and retain competent data professionals (Career Insight 3.1).

When the data set is too large or complex to be analyzed using traditional data processing applications, big data analytics tools are used. One of the biggest sectors of customer relations relative to big data is customer value analytics (CVA). CVA studies the recent phenomenon that customers are more willing to use and purchase innovative products, services, and customer service channels while demanding an increasing amount of high-quality, personalized products. Companies and producers use big data analytics to capture this combination to transform the information into usable data to track and predict trends. If companies know what customers like, what makes them spend more, and when they are happy, they can leverage the information to keep them happy and provide better products and services.

Companies can also use big data analytics to store and use their data across the supply chain. To maximize the effectiveness of data analytics, companies usually complete these objectives throughout their input transformation process:

  • Invest heavily in IT to collect, integrate, and analyze data from each store and sales unit.
  • Link these data to suppliers’ databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily.
  • Constantly test, integrate, and report information instantly available across the organization—from the store floor to the CFO’s office.

These big data programs enable them to pinpoint improvement opportunities across the supply chain—from purchasing to in-store availability management. Specifically, the companies are able to predict how customers will behave and use that knowledge to be prepared to respond quickly. According to Louis Columbus at Forbes, the market demand for big data analytics is about to hit its largest increase in history. Software for business analytics will increase by more than 50% by 2019. Prescriptive analytics software will be worth $1.1B in 2019, compared to its value of $415M in 2014. Since increasing the focus on customer demand trends, effectively entering new markets and producing better business models, and enhancing organizational performance are the most important goals for 21st-century companies, business analytics will be needed in almost every instance. Taking advantage of the benefits of business intelligence is allowing sectors like health care to compete in areas they would have not been able to enter before (Columbus, 2016).

To be effective in using data analysis, organization must pay attention to the four Vs of analytics—variety, volume, velocity, and veracity—shown in Figure 3.12.

Illustration of four Vs of data analytics: Variety,Volume, Velocity, and Veracity.

FIGURE 3.12 The four Vs of data analytics.

Select the caption to view an interactive version of this figure online.

Big data can have a dramatic impact on the success of any enterprise, or they can be a low-contributing major expense. However, success is not achieved with technology alone. Many companies are collecting and capturing huge amounts of data, but spending very little effort to ensure the veracity and value of data captured at the transactional stage or point of origin. Emphasis in this direction will not only increase confidence in the datasets, but also significantly reduce the efforts for analytics and enhance the quality of decision-making. Success depends also on ensuring that you avoid invalid assumptions, which can be done by testing the assumptions during analysis.

Human Expertise and Judgment are Needed

Human expertise and judgment are needed to interpret the output of analytics (refer to Figure 3.13). Data are worthless if you cannot analyze, interpret, understand, and apply the results in context. This brings up several challenges:

  • Data need to be prepared for analysis For example, data that are incomplete or duplicated need to be fixed.
  • Dirty data degrade the value of analytics The “cleanliness” of data is very important to data mining and analysis projects. Analysts have complained that data analytics is like janitorial work because they spend so much time on manual, error-prone processes to clean the data. Large data volumes and variety mean more data that are dirty and harder to handle.
  • Data must be put into meaningful context If the wrong analysis or datasets are used, the output would be nonsense, as in the example of the Super Bowl winners and stock market performance. Stated in reverse, managers need context in order to understand how to interpret traditional and big data.
Illustration of Data analytics, human expertise, and high-quality data needed to obtain actionable information.

FIGURE 3.13 Data analytics, human expertise, and high-quality data are needed to obtain actionable information.

IT at Work 3.2 describes how big data analytics, collaboration, and human expertise have transformed the new drug development process.

Machine-generated sensor data are becoming a larger proportion of big data (Figure 3.14), according to a research report by IDC (2015). It is predicted that these data will increase to two-thirds of all data by 2020, representing a significant increase from the 11% level of 2005. In addition to its growth as a portion of analyzed data, the market for sensor data will increase to $1.7 trillion in 2020.

Illustration of Machine-generated data from physical objects becoming a much larger portion of big data and analytics.

FIGURE 3.14 Machine-generated data from physical objects are becoming a much larger portion of big data and analytics.

On the consumer side, a significant factor in this market is the boom in wearable technology—products like FitBit and the Apple Watch. Users no longer even have to input data to these devices as it is automatically gathered and tracked in real time. On the public sector and enterprise side, sensor data and the Internet of Things (IoT) are being used in the advancement of IT-enabled business processes like automated factories and distribution centers and IT-enabled products like the wearable tech (IDC, 2015). Federal health reform efforts have pushed health-care organizations toward big data and analytics. These organizations are planning to use big data analytics to support revenue cycle management, resource utilization, fraud prevention, health management, and quality improvement.

Hadoop and MapReduce

Big data volumes exceed the processing capacity of conventional database infrastructures. A widely used processing platform is Apache Hadoop. It places no conditions on the structure of the data it can process. Hadoop distributes computing problems across a number of servers. Hadoop implements MapReduce in two stages:

  1. Map stage MapReduce breaks up the huge dataset into smaller subsets; then distributes the subsets among multiple servers where they are partially processed.
  2. Reduce stage The partial results from the map stage are then recombined and made available for analytic tools.

To store data, Hadoop has its own distributed file system, Hadoop File System (HDFS), which functions in three stages:

  • Loads data into HDFS.
  • Performs the MapReduce operations.
  • Retrieves results from HDFS.

Figure 3.15 diagrams how Facebook uses database technology and Hadoop. IT at Work 3.3 describes how First Wind has applied big data analytics to improve the operations of its wind farms and to support sustainability of the planet by reducing environmentally damaging carbon emissions.

Illustration of Facebook's MySQL database and Hadoop technology providing customized pages for its members.

FIGURE 3.15 Facebook’s MySQL database and Hadoop technology provide customized pages for its members.

Data and Text Mining

Data and text mining are different from DBMS and data analytics. As you have read earlier in this chapter, a DBMS supports queries to extract data or get answers from huge databases. But, in order to perform queries in a DBMS you must first know the question you want to be answered. You also have read that Data Analytics describes the entire function of applying technologies, algorithms, human expertise, and judgment. Data and text mining are specific analytic techniques that allow users to discover knowledge that they didn’t know existed in the databases.

Data mining software enables users to analyze data from various dimensions or angles, categorize them, and find correlations or patterns among fields in the data warehouse. Up to 75% of an organization’s data are nonstructured word-processing documents, social media, text messages, audio, video, images and diagrams, faxes and memos, call center or claims notes, and so on.

IT at Work 3.4 describes one example of how the U.S. government is using data mining software to continuously improve its detection and deterrence systems.

Text mining is a broad category that involves interpreting words and concepts in context. Any customer becomes a brand advocate or adversary by freely expressing opinions and attitudes that reach millions of other current or prospective customers on social media. Text mining helps companies tap into the explosion of customer opinions expressed online. Social commentary and social media are being mined for sentiment analysis or to understand consumer intent. Innovative companies know they could be more successful in meeting their customers’ needs, if they just understood them better. Tools and techniques for analyzing text, documents, and other nonstructured content are available from several vendors.

Combining data and text mining can create even greater value. Burns (2016) pointed out that mining text or nonstructural data enables organizations to forecast the future instead of merely reporting the past. He also noted that forecasting methods using existing structured data and nonstructured text from both internal and external sources provide the best view of what lies ahead.

Creating Business Value

Enterprises invest in data mining tools to add business value. Business value falls into three categories, as shown in Figure 3.16.

Illustration of Business value falling into three buckets.

FIGURE 3.16 Business value falls into three buckets.

Here are some brief cases illustrating the types of business value created by data and text mining.

  1. Using pattern analysis, Argo Corporation, an agricultural equipment manufacturer based in Georgia, was able to optimize product configuration options for farm machinery and real-time customer demand to determine the optimal base configurations for its machines. As a result, Argo reduced product variety by 61% and cut days of inventory by 81% while still maintaining its service levels.
  2. The mega-retailer Walmart wanted its online shoppers to find what they were looking for faster. Walmart analyzed clickstream data from its 45 million monthly online shoppers; then combined that data with product- and category-related popularity scores. The popularity scores had been generated by text mining the retailer’s social media streams. Lessons learned from the analysis were integrated into the Polaris search engine used by customers on the company’s website. Polaris has yielded a 10% to 15% increase in online shoppers completing a purchase, which equals roughly $1 billion in incremental online sales.
  3. McDonald’s bakery operation replaced manual equipment with high-speed photo analyses to inspect thousands of buns per minute for color, size, and sesame seed distribution. Automatically, ovens and baking processes adjust instantly to create uniform buns and reduce thousands of pounds of waste each year. Another food products company also uses photo analyses to sort every french fry produced in order to optimize quality.
  4. Infinity Insurance discovered new insights that it applied to improve the performance of its fraud operation. The insurance company text mined years of adjuster reports to look for key drivers of fraudulent claims. As a result, the company reduced fraud by 75%, and eliminated marketing to customers with a high likelihood of fraudulent claims.

Text Analytics Procedure

With text analytics, information is extracted from large quantities of various types of textual information. The basic steps involved in text analytics include the following:

  1. Exploring First, documents are explored. This might occur in the form of simple word counts in a document collection, or by manually creating topic areas to categorize documents after reading a sample of them. For example, what are the major types of issues (brake or engine failure) that have been identified in recent automobile warranty claims? A challenge of the exploration effort is misspelled or abbreviated words, acronyms, or slang.
  2. Preprocessing Before analysis or the automated categorization of content, the text may need to be preprocessed to standardize it to the extent possible. As in traditional analysis, up to 80% of preprocessing time can be spent preparing and standardizing the data. Misspelled words, abbreviations, and slang may need to be transformed into consistent terms. For instance, BTW would be standardized to “by the way” and “left voice message” could be tagged as “lvm.”
  3. Categorizing and modeling Content is then ready to be categorized. Categorizing messages or documents from information contained within them can be achieved using statistical models and business rules. As with traditional model development, sample documents are examined to train the models. Additional documents are then processed to validate the accuracy and precision of the model, and finally new documents are evaluated using the final model (scored). Models can then be put into production for the automated processing of new documents as they arrive.

Analytics Vendor Rankings

Analytics applications cover business intelligence functions sold as a standalone application for decision support or embedded in an integrated solution. The introduction of intuitive decision support tools, dashboards, and data visualization (discussed in detail in Chapter 11) have added some interesting interactive components to big data analytics to bring the data to life and enable nonexperts to use it.

Organizations invest in analytics, BI, and data/text mining applications based on new features and capabilities beyond those offered by their legacy systems. Analytics vendors offer everything from simple-to-use reporting tools to highly sophisticated software for tackling the most complex data analysis problems. A list of the top five analytics and BI application vendors are shown in Table 3.4.

TABLE 3.4 Top Analytics Vendors

Rank Vendor Focus Products
1 SAP Market lines of analytics products that cove BI and reporting, predictive analysis, performance management and governance, risk and compliance applications SAP Business Objects Predictive Analytics
SAP Business Objects BI
SAP Business Objects Planning and Consolidation
2 SAS Offer simple desktop solutions to high performance distributed processing solutions SAS Analytics Pro
SAS Enterprise Minder
SAS Visual Analytics
SAS Customer Intelligence 360
3 IBM Allow users to quickly discover patterns and meanings in data with guided data discovery, automated predictive analytics, one-click analysis, self-service dashboards, and a natural language dialogue Watson Analytics
4 Oracle Offer a complete solution for connecting and collaborating with analytics in the cloud. Products allow users to aggregate, experiment, manage, and analyze/act Oracle Data Integrator
Oracle Big Data Cloud Service
Oracle R Advanced Analytics for Hadoop
BI Cloud Service
Oracle Stream Explore
5 Microsoft Provide a broad range of products from standalone solutions to integrated tools that provide data preparation, data discovery, and interactive dashboard capabilities in a single tool Excel
HDInsight
Machine Learning
Stream Analytics
Power BI Embedded

Concept Check 3.4

  1. __________________ enables users to analyze data from various dimensions and angles.
a. Text mining
b. Data mining
c. Gold mining
d. Information mining
Correct or Incorrect?

 

  1. Data mining tools add business value by:
a. Making more informed decisions at the time they need to be made
b. Discovering unknown insights, patterns or relationships
c. Automating and streamlining or digitizing business processes
d. All of the above
Correct or Incorrect?

 

  1. The basic steps involved in the text analytics procedure include:
a. Exploring, preprocessing, categorizing and modeling
b. Input, process, output, store
c. Processing, analyzing, categorizing and reporting
d. Exploring, analyzing, modeling and reporting
Correct or Incorrect?

 

  1. ___________________________ is the growing trend in the market of analytics to assist corporations and small businesses alike in managing and analyzing data to predict outcomes, optimize inputs, and minimize costs and waste.
a. Organizational knowledge
b. Business intelligence
c. Corporate knowledge
d. Business information
Correct or Incorrect?

 

  1. A popular cloud-based option that a growing number of BI tool vendors are offering is:
a. IaaS
b. PaaS
c. DaaS
d. SaaS
Correct or Incorrect?

 

3.5 Business Intelligence and Electronic Records Management

Continuing developments in data analytics and business intelligence (BI) make it increasingly necessary for organizations to be aware of the differences between these terms and the different ways in which they add value in an organization. The field of BI started in the late 1980s and has been a key to competitive advantage across industries and in enterprises of all sizes. Unlike data analytics that has predictive capabilities, BI is a comprehensive term that refers to analytics and reporting tools that were traditionally used to determine trends in historical data.

The key distinction between data analytics and BI is that analytics uses algorithms to statistically determine the relationships between data whereas BI presents data insights established by data analytics in reports, easy-to-use dashboards, and interactive visualizations. BI can also make it easier for users to ask data-related questions and obtain results that are presented in a way that they can easily understand.

What started as a tool to support sales, marketing, and customer service departments has widely evolved into an enterprise wide strategic platform. While BI software is used in the operational management of divisions and business processes, they are also used to support strategic corporate decision-making. The dramatic change that has taken effect over the last few years is the growth in demand for operational intelligence across multiple systems and businesses—increasing the number of people who need access to increasing amounts of data. Complex and competitive business conditions do not leave much slack for mistakes.

Unfortunately, some companies are not able to use their data efficiently, creating a higher cost to gather information than the benefits it provides. Luckily, BI software brings decision-making information to businesses in as little as two clicks. Small businesses have a shared interest with large corporations to enlist BI to help with decision-making, but they are usually unequipped to build data centers and use funds to hire analysts and IT consultants. However, small business BI software is rapidly growing in the analytics field, and it is increasingly cheaper to implement it as a decision-making tool. Small businesses do not always have workers specialized in certain areas, but BI software makes it easy for all employees to analyze the data and make decisions (King, 2016).

Business Benefits of BI

BI provides data at the moment of value to a decision-maker—enabling it to extract crucial facts from enterprise data in real time or near real time. A BI solution with a well-designed dashboard, for example, provides retailers with better visibility into inventory to make better decisions about what to order, how much, and when in order to prevent stock-outs or minimize inventory that sits on warehouse shelves.

Companies use BI solutions to determine what questions to ask and find answers to them. BI tools integrate and consolidate data from various internal and external sources and then process them into information to make smart decisions. BI answers questions such as these: Which products have the highest repeat sales rate in the last six months? Do customer likes on Facebook relate to product purchase? How does the sales trend break down by product group over the last five years? What do daily sales look like in each of my sales regions?

According to The Data Warehousing Institute, BI “unites data, technology, analytics, and human knowledge to optimize business decisions and ultimately drive an enterprise’s success. BI programs usually combine an enterprise data warehouse and a BI platform or tool set to transform data into usable, actionable business information” (The Data Warehousing Institute, 2014). For many years, managers have relied on business analytics to make better-informed decisions. Multiple surveys and studies agree on BI’s growing importance in analyzing past performance and identifying opportunities to improve future performance.

Common Challenges: Data Selection and Quality

Companies cannot analyze all of their data—and much of them would not add value. Therefore, an unending challenge is how to determine which data to use for BI from what seems like unlimited options (Oliphant, 2016). One purpose of a BI strategy is to provide a framework for selecting the most relevant data without limiting options to integrate new data sources. Information overload is a major problem for executives and for employees. Another common challenge is data quality, particularly with regard to online information, because the source and accuracy might not be verifiable.

Aligning BI Strategy with Business Strategy

Reports and dashboards are delivery tools, but they may not be delivering business intelligence. To get the greatest value out of BI, the CIO needs to work with the CFO and other business leaders to create a BI governance program whose mission is to achieve the following (Ladley, 2016):

  1. Clearly articulate business strategies.
  2. Deconstruct the business strategies into a set of specific goals and objectives—the targets.
  3. Identify the key performance indicators (KPIs) that will be used to measure progress toward each target.
  4. Prioritize the list of KPIs.
  5. Create a plan to achieve goals and objectives based on the priorities.
  6. Estimate the costs needed to implement the BI plan.
  7. Assess and update the priorities based on business results and changes in business strategy.

After completing these activities, BI analysts can identify the data to use in BI and the source systems. This is a business-driven development approach that starts with a business strategy and work backward to identify data sources and the data that need to be acquired and analyzed.

Businesses want KPIs that can be utilized by both departmental users and management. In addition, users want real-time access to these data so that they can monitor processes with the smallest possible latency and take corrective action whenever KPIs deviate from their target values. To link strategic and operational perspectives, users must be able to drill down from highly consolidated or summarized figures into the detailed numbers from which they were derived to perform in-depth analyses.

BI Architecture and Analytics

BI architecture is undergoing technological advances in response to big data and the performance demands of end-users (Wise, 2016). BI vendors are facing the challenges of social, sensor, and other newer data types that must be managed and analyzed. One technology advance that can help handle big data is BI in the cloud. Figure 3.17 lists the key factors contributing to the increased use of BI. It can be hosted on a public or private cloud. Although cloud services come with more upkeep, optimizing the service and customizing it for one’s company brings undeniable benefits in data security. With a public cloud, a service provider hosts the data and/or software that are accessed via an Internet connection. For private clouds, the company hosts its own data and software, but uses cloud-based technologies.

Illustration of Four factors contributing to increased use of BI: Smart Devices Everywhere, Data are Big Business, Advanced Bl and Analytics, and Cloud Enabled Bl and Analytics.

FIGURE 3.17 Four factors contributing to increased use of BI.

For cloud-based BI, a popular option offered by a growing number of BI tool vendors is software as a service (SaaS). MicroStrategy offers MicroStrategy Cloud, which provides fast deployment with reduced project risks and costs. This cloud approach appeals to small and midsized companies that have limited IT staff and want to carefully control costs. The potential downsides include slower response times, security risks, and backup risks.

Competitive Analytics in Practice: CarMax

CarMax, Inc. is the nation’s largest retailer of used cars and for a decade has remained one of FORTUNE Magazine’s “100 Best Companies to Work For.” CarMax was the fastest retailer in U.S. history to reach $1 billion in revenues. In 2016 the company had over $15 billion in net sales and operating revenues, representing a 6.2% increase over the prior year’s results. The company grew rapidly because of its compelling customer offer—no-haggle prices and quality guarantees backed by a 125-point inspection that became an industry benchmark—and auto financing. As of November 30, 2016, CarMax operated in 169 locations across 39 U.S. states and had more than 22,000 full- and part-time employees.

CarMax continues to enhance and refine its information systems, which it believes to be a core competitive advantage. CarMax’s IT includes the following:

  • A proprietary IS that captures, analyzes, interprets, and distributes data about the cars CarMax sells and buys.
  • Data analytics applications that track every purchase; number of test drives and credit applications per car; color preferences in every demographic and region.
  • Proprietary store technology that provides management with real-time data about every aspect of store operations, such as inventory management, pricing, vehicle transfers, wholesale auctions, and sales consultant productivity.
  • An advanced inventory management system that helps management anticipate future inventory needs and manage pricing.

Throughout CarMax, analytics are used as a strategic asset and insights gained from analytics are available to everyone who needs them.

Electronic Records Management

All organizations create and retain business records. A record is documentation of a business event, action, decision, or transaction. Examples are contracts, research and development, accounting source documents, memos, customer/client communications, hiring and promotion decisions, meeting minutes, social posts, texts, e-mails, website content, database records, and paper and electronic files. Business documents such as spreadsheets, e-mail messages, and word-processing documents are a type of record. Most records are kept in electronic format and maintained throughout their life cycle—from creation to final archiving or destruction by an electronic records management system (ERMS).

One application of an ERMS would be in a company that is required by law to retain financial documents for at least seven years, product designs for many decades, and e-mail messages about marketing promotions for a year. The major ERM tools are workflow software, authoring tools, scanners, and databases. ERM systems have query and search capabilities so documents can be identified and accessed like data in a database. These systems range from those designed to support a small workgroup to full-featured, Web-enabled enterprisewide systems.

Legal Duty to Retain Business Records

Companies need to be prepared to respond to an audit, federal investigation, lawsuit, or any other legal action against them. Types of lawsuits against companies include patent violations, product safety negligence, theft of intellectual property, breach of contract, wrongful termination, harassment, discrimination, and many more.

Because senior management must ensure that their companies comply with legal and regulatory duties, managing electronic records (e-records) is a strategic issue for organizations in both the public and private sectors. The success of ERM depends greatly on a partnership of many key players, namely, senior management, users, records managers, archivists, administrators, and most importantly, IT personnel. Properly managed, records are strategic assets. Improperly managed or destroyed, they become liabilities.

ERM Best Practices

Effective ERM systems capture all business data and documents at their first touchpoint—data centers, laptops, the mailroom, at customer sites, or remote offices. Records enter the enterprise in multiple ways—from online forms, bar codes, sensors, websites, social sites, copiers, e-mails, and more. In addition to capturing the entire document as a whole, important data from within a document can be captured and stored in a central, searchable repository. In this way, the data are accessible to support informed and timely business decisions.

In recent years, organizations such as the Association for Information and Image Management (AIIM), National Archives and Records Administration (NARA), and ARMA International (formerly the Association of Records Managers and Administrators) have created and published industry standards for document and records management. Numerous best practices articles, and links to valuable sources of information about document and records management, are available on their websites. The IT Toolbox describes ARMA’s eight generally accepted recordkeeping principles framework.

ERM Benefits

Departments or companies whose employees spend most of their day filing or retrieving documents or warehousing paper records can reduce costs significantly with ERM. These systems minimize the inefficiencies and frustration associated with managing paper documents and workflows. However, they do not create a paperless office as had been predicted.

An ERM can help a business to become more efficient and productive by the following:

  • Enabling the company to access and use the content contained in documents.
  • Cutting labor costs by automating business processes.
  • Reducing the time and effort required to locate information the business needs to support decision-making.
  • Improving the security of content, thereby reducing the risk of intellectual property theft.
  • Minimizing the costs associated with printing, storing, and searching for content.

When workflows are digital, productivity increases, costs decrease, compliance obligations are easier to verify, and green computing becomes possible. Green computing is an initiative to conserve our valuable natural resources by reducing the effects of our computer usage on the environment. You can read about green computing and the related topics of reducing an organization’s carbon footprint, sustainability, and ethical and social responsibilities in Chapter 14.

ERM for Disaster Recovery, Business Continuity, and Compliance

Businesses also rely on their ERM system for disaster recovery and business continuity, security, knowledge sharing and collaboration, and remote and controlled access to documents. Because ERM systems have multilayered access capabilities, employees can access and change only the documents they are authorized to handle.

When companies select an ERM to meet compliance requirements, they should ask the following questions:

  1. Does the software meet the organization’s needs? For example, can the DMS be installed on the existing network? Can it be purchased as a service?
  2. Is the software easy to use and accessible from Web browsers, office applications, and e-mail applications? If not, people will not use it.
  3. Does the software have lightweight, modern Web and graphical user interfaces that effectively support remote users?
  4. Before selecting a vendor, it is important to examine workflows and how data, documents, and communications flow throughout the company. For example, know which information on documents is used in business decisions. Once those needs and requirements are identified, they guide the selection of technology that can support the input types—that is, capture and index them so they can be archived consistently and retrieved on-demand.

IT at Work 3.5 describes how several companies currently use ERM. Simply creating backups of records is not sufficient because the content would not be organized and indexed to retrieve them accurately and easily. The requirement to manage records—regardless of whether they are physical or digital—is not new.

Concept Check 3.5

  1. Managing electronic records is a(n) ______________ issue for organizations in both the public and private sectors.
a. Operational
b. Managerial
c. Strategic
d. Statistical
Correct or Incorrect?

 

  1. An electronics record management system can help a business become more efficient and productive by:
a. Enabling the company to access and use content contained in documents
b. Raising labor costs by automating business processes
c. Limiting the security of content to reduce risk of intellectual property theft
d. Maximizing printing, storing and search costs of accessing content
Correct or Incorrect?

 

  1. Electronic records management systems can assist with
a. Disaster recovery
b. Business continuity
c. Regulatory compliance
d. All of the above
Correct or Incorrect?

 

  1. Creating document backups is a(n)_________________ way to manage organization’s documents.
a. Difficult
b. Efficient
c. Insufficient
d. Easy
Correct or Incorrect?

 

Key Terms

active data warehouse (ADW)

big data

big data analytics

business analytics

business intelligence (BI)

business record

business-driven development approach

centralized database

change data capture (CDC)

data analytics

data entity

data management

data marts

data mining

data warehouse

database

database management system (DBMS)

decision model

declarative language

dirty data

distributed database

electronic records management system (ERMS)

extract, transform and load (ETL)

enterprise data warehouses (EDWs)

eventual consistency

fault tolerance

Hadoop

information overload

immediate consistency

latency

MapReduce

master data management (MDM)

NoSQL

online transaction processing (OLTP)

systems

online analytical processing (OLAP)

systems

petabyte

query

relational database

relational management systems (RDBMSs)

sentiment analysis

scalability

structured query language (SQL)

text mining

Assuring Your Learning

References

  1. Bing, C. “Data Mining Software Used by Spy Agencies just got more Powerful.” FedScoop, June 21, 2016.
  2. Burns, E. “Coca-Cola Overcomes Challenges to Seize BI Opportunities.” TechTarget.com. August 2013.
  3. Burns, E. “Text Analysis Tool Helps Lenovo Zero in on the Customer.” Business Analytics, April 8, 2016.
  4. BusinessIntelligence.com. “Coca-Cola’s Juicy Approach to Big Data.” July 29, 2013b. http://businessintelligence.com/bi-insights/coca-colas-juicy-approach-to-big-data
  5. Cattell, J., S. Chilukuri, and M. Levy. “How Big Data Can Revolutionize Pharmaceutical R&D.” 2016. http://www.mckinsey.com/industries/pharmaceuticals-and-medical-products/our-insights/how-big-data-can-revolutionize-pharmaceutical-r-and-d
  6. CNNMoney, The Coca-Cola Co (NYSE:KO) 2014.
  7. Columbus, L. “Ten Ways Big Data Is Revolutionizing Marketing and Sales.” Forbes, May 9, 2016.
  8. Eisenhauer, T. “The Undeniable Benefits of Having a Well-Designed Document Management System.” Axero Solutions, August 5, 2015.
  9. FirstWind website www.firstwind.com, 2017.
  10. Forbes. “Betting on Big Data.” 2015.
  11. Hammond, T. “Top IT Job Skills for 2014: Big Data, Mobile, Cloud, Security.” TechRepublic.com, January 31, 2014.
  12. Harle, P., A. Havas, and H. Samandari. “The Future of Bank Risk Management.” McKinsey & Company, July 2016.
  13. Harvard Business School. “How Coca-Cola Controls Nature’s Oranges.” November 22, 2015.
  14. HealthCanal. “Where Do You Start When Developing a New Medicine?” March 27, 2014.
  15. IDC. “Explosive Internet of Things Spending to Reach $1.7 Trillion in 2020, According to IDC.” June 02, 2015.
  16. King, L. “How Business Intelligence Helps Small Businesses Make Better Decisions.” Huffington Post, July 28, 2016.
  17. Kitamura, M. “Big Data Partnerships Tackle Drug Development Failures.” Bloomberg News, March 26, 2014.
  18. Kramer, S. “The High Costs of Dirty Data.” Digitalist, May 1, 2015.
  19. Ladley, J. “Business Alignment Techniques for Successful and Sustainable Analytics.” CIO, May 13, 2016.
  20. Liyakasa, K. “Coke Opens Data-Driven Happiness, Builds Out Marketing Decision Engine.” Ad Exchanger, October 14, 2015.
  21. McDonald’s website 2017. https://www.mcdonalds.com/us/en-us/about-us/our-history.html
  22. NIH (National Institute of Health). “Accelerating Medicines Partnership.” February 2014. http://www.nih.gov/science/amp/index.htm
  23. Oliphant, T. “How to Make Big Data Insights Work for You.” Business Intelligence, February 24, 2016.
  24. Ovalsrud, T. “Big Data and Analytics Spending to Hit $1.87 billion” CIO, May 24, 2016.
  25. Ransbothom, S. “Coca-Cola’s Unique Challenge: Turning 250 Datasets into One.” MIT Sloan Management Review, May 27, 2015.
  26. RingLead, Inc. “The True Cost of Bad (And Clean) Data.” July 17, 2015.
  27. syntheses.net 2017.
  28. The Data Warehousing Institute (TDWI). tdwi.org/portals/business-intelligence.asp. 2014
  29. U.S. Department of Energy. “Wind Vision: A New Era for Wind Power in the United States.” http://energy.gov/eere/wind/maps/wind-vision, March 12, 2015.
  30. Van Rijmenam, M. “From Big Data to Big Mac; how McDonalds leverages Big Data.” DataFloq.com, August 15, 2016.
  31. Van Rijmenam, M. “How Coca-Cola Takes a Refreshing Approach on Big Data.” DataFloq, July 18, 2016.
  32. Wise, L. “Evaluating Business Intelligence in the Cloud.” CIO, March 9, 2016.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset