Chapter 1

Data Mining in Business

Introduction

Data mining refers to the analysis of large quantities of data that are stored in computers. Bar coding has made checkout very convenient for us and provides retail establishments with masses of data. Grocery stores and other retail stores are able to quickly process our purchases and use computers to accurately determine the product prices. These same computers can help the stores with their inventory management, by instantaneously determining the quantity of items of each product on hand. ­Computers allow the store’s accounting system to more accurately measure costs and determine the profit that store stockholders are concerned about. All of this information is available based on the bar coding information attached to each product. Along with many other sources of information, information gathered through bar coding can be used for data mining analysis.

The era of big data is here, with many sources pointing out that more data are created over the past year or two than was generated throughout all prior human history. Big data involves datasets so large that traditional data analytic methods no longer work due to data volume. Davenport1 gave the following features of big data:

  • Data too big to fit on a single server
  • Data too unstructured to fit in a row-and-column database
  • Data flowing too continuously to fit into a static data ­warehouse
  • Lack of structure is the most important aspect (even more than the size)
  • The point is to analyze, converting data into insights, innovation, and business value

Big data has been said to be more about analytics than about the data itself. The era of big data is expected to emphasize focusing on knowing what (based on correlation) rather than the traditional obsession for causality. The emphasis will be on discovering patterns offering novel and useful insights.2Data will become a raw material for business, a vital ­economic input and source of value. Cukier and Mayer–Scheonberger3 cite big data providing the following impacts on the statistical body of theory established in the 20th century: (1) There is so much data available that sampling is usually not needed (n = all). (2) Precise accuracy of data is, thus, less important as inevitable errors are compensated for by the mass of data (any one observation is flooded by others). (3) Correlation is more important than causality—most data mining applications involving big data are interested in what is going to happen, and you don’t need to know why. Automatic trading programs need to detect the trend changes, not figure out that the Greek economy collapsed or the Chinese government will devalue the Renminbi (RMB). The programs in vehicles need to detect that an axle bearing is getting hot and the vehicle is vibrating and the wheel should be replaced, not whether this is due to a bearing failure or a housing rusting out.

There are many sources of big data.4 Internal to the corporation, e-mails, blogs, enterprise systems, and automation lead to structured, unstructured, and semistructured information within the organization. External data is also widely available, much of it free over the Internet, but much also available from the commercial vendors. There also is data obtainable from social media.

Data mining is not limited to business. Both major parties in the U.S. elections utilize data mining of potential voters.5 Data mining has been heavily used in the medical field, from diagnosis of patient records to help identify the best practices.6 Business use of data mining is also impressive. Toyota used data mining of its data warehouse to determine more efficient transportation routes, reducing the time to deliver cars to their customers by an average 19 days. Data warehouses are very large scale database systems capable of systematically storing all transactional data generated by a business organization, such as Walmart. Toyota also was able to identify the sales trends faster and to identify the best locations for new dealerships.

Data mining is widely used by banking firms in soliciting credit card customers, by insurance and telecommunication companies in detecting fraud, by manufacturing firms in quality control, and many other applications. Data mining is being applied to improve food product safety, criminal detection, and tourism. Micromarketing targets small groups of highly responsive customers. Data on consumer and lifestyle data is widely available, enabling customized individual marketing campaigns. This is enabled by customer profiling, identifying those subsets of ­customers most likely to be profitable to the business, as well as targeting, determining the characteristics of the most profitable customers.

Data mining involves statistical and artificial intelligence (AI) analysis, usually applied to large-scale datasets. There are two general types of data mining studies. Hypothesis testing involves expressing a theory about the relationship between actions and outcomes. This approach is referred to as supervised. In a simple form, it can be hypothesized that advertising will yield greater profit. This relationship has long been studied by retailing firms in the context of their specific operations. Data mining is applied to identifying relationships based on large quantities of data, which could include testing the response rates to various types of advertising on the sales and profitability of specific product lines. However, there is more to data mining than the technical tools used. The second form of data mining study is knowledge discovery. Data mining involves a spirit of knowledge discovery (learning new and useful things). Knowledge discovery is referred to as unsupervised. In this form of analysis, a preconceived notion may not be present, but rather relationships can be identified by looking at the data. This may be supported by visualization tools, which display data, or through fundamental statistical analysis, such as correlation analysis. Much of this can be accomplished through automatic means, as we will see in decision tree analysis, for example. But data mining is not limited to automated analysis. Knowledge discovery by humans can be enhanced by graphical tools and identification of unexpected patterns through a combination of human and computer interaction.

Requirements for Data Mining

Data mining requires identification of a problem, along with the collection of data that can lead to better understanding, and computer ­models to provide statistical or other means of analysis. A variety of analytic ­computer models have been used in data mining. In the later sections, we will discuss various types of these models. Also required is access to data. Quite often, systems including data warehouses and data marts are used to manage large quantities of data. Other data mining analyses are done with smaller sets of data, such as can be organized in online analytic processing systems.

Masses of data generated from cash registers, scanning, and topic-­specific databases throughout the company are explored, analyzed, reduced, and reused. Searches are performed across different models ­proposed for predicting sales, marketing response, and profit. The ­classical statistical approaches are fundamental to data mining. Automated AI methods are also used. However, a systematic exploration through classical statistical methods is still the basis of data mining. Some of the tools developed by the field of statistical analysis are harnessed through automatic control (with some key human guidance) in dealing with data.

Data mining tools need to be versatile, scalable, capable of accurately predicting the responses between actions and results, and capable of automatic implementation. Versatile refers to the ability of the tool to apply a wide variety of models. Scalable tools imply that if the tools works on a small dataset, it should also work on a larger dataset. Automation is ­useful, but its application is relative. Some analytic functions are often automated, but human setup prior to implementing procedures is required. In fact, analyst judgment is critical to successful implementation of data mining. Proper selection of data to include in searches is critical. Data transformation also is often required. Too many variables produce too much output, while too few can overlook the key relationships in the data.

Data mining is expanding rapidly, with many benefits to business. Two of the most profitable application areas have been the use of customer segmentation by marketing organizations to identify those with marginally greater probabilities of responding to different forms of marketing media, and banks using data mining to more accurately predict the likelihood of people to respond to the offers of different services offered. Many companies are using this technology to identify their blue-chip customers, so that they can provide them with the service needed to retain them.

The casino business has also adopted data warehousing and data mining. Historically, casinos have wanted to know everything about their ­customers. A typical application for a casino is to issue special cards, which are used whenever the customer plays at the casino, or eats, or stays, or spends money in other ways. The points accumulated can be used for complimentary meals and lodging. More points are awarded for activities that provide Harrah’s more profit. The information obtained is sent to the firm’s corporate database, where it is retained for several years. Instead of advertising the loosest slots in town, Bellagio and Mandalay Bay have developed the strategy of promoting luxury visits. Data mining is used to identify high rollers, so that these valued customers can be cultivated. Data warehouses enable casinos to estimate the lifetime value of the players. Incentive travel programs, in-house promotions, corporate business, and customer follow-up are the tools used to maintain the most profitable customers. Casino gaming is one of the richest datasets available. Very specific individual profiles can be developed. Some customers are identified as those who should be encouraged to play longer. Other customers are identified as those who are discouraged from playing.

Business Data Mining

Data mining has been very effective in many business venues. The key is to find actionable information or information that can be utilized in a concrete way to improve profitability. Some of the earliest applications were in retailing, especially in the form of market basket analysis. Table 1.1 shows the general application areas we will be discussing. Note that they are meant to be representative rather than comprehensive.


Table 1.1 Data mining application areas

Application area

Applications

Specifics

Retailing

Affinity positioning

Cross-selling; develop and maintain customer loyalty

Position products ­effectively

Find more products for customers

Banking

Customer relationship management (CRM)

Identify customer value

Develop programs to ­maximize the revenue

Credit card ­management

Lift

Churn

(Loyalty)

Identify effective market ­segments

Identify likely customer turnover

Insurance

Fraud detection

Identify claims meriting ­investigation

Telecommunications

Churn

Identify likely customer turnover

Telemarketing

Online information

Recommender systems

Aid telemarketers with easy data access

Human resource ­management

Churn (Retention)

Identify potential employee turnover



Retailing

Data mining offers retailers, in general, and grocery stores, specifically, valuable predictive information from mountains of data. Affinity positioning is based on the identification of products that the same customer is likely to want. For instance, if you are interested in cold medicine, you probably are interested in tissues. Thus, it would make marketing sense to locate both items within easy reach of the other. Cross-selling is a related concept. The knowledge of products that go together can be used by ­marketing the ­complementary product. Grocery stores do that through position product shelf location. Retail stores relying on advertising can send ads for sales on shirts and ties to those who have recently purchased suits. These ­strategies have long been employed by wise retailers. Recommender systems are ­effectively used by Amazon and other online retailers. Data mining provides the ability to identify less expected product affinities and cross-selling opportunities. These actions develop and maintain customer loyalty.

Grocery stores generate mountains of cash register data that require automated tools for analysis. Software is marketed to service a spectrum of users. In the past, it was assumed that cash register data was so massive that it couldn’t be quickly analyzed. However, the current technology enables the grocers to look at customers who have defected from a store, their purchase history, and characteristics of other potential defectors.

Banking

The banking industry was one of the first users of data mining. Banks are turning to technology to find out what motivates their customers and what will keep their business (customer relationship management—CRM). CRM involves the application of technology to monitor customer service, a function that is enhanced through data mining support. Understanding the value a customer provides the firm makes it possible to rationally evaluate if extra expenditure is appropriate in order to keep the customer. There are many opportunities for data mining in banking. Data mining applications in finance include predicting the prices of equities involve a dynamic environment with surprise information, some of which might be inaccurate and some of which might be too complex to comprehend and reconcile with intuition.

Data mining provides a way for banks to identify patterns. This is valuable in assessing loan applications as well as in target marketing. Credit unions use data mining to track member profitability as well as monitoring the effectiveness of marketing programs and sales representatives. They also are used in the effort of member care, seeking to identify what credit union customers want in the way of services.

Credit Card Management

The credit card industry has proven very profitable. It has attracted many card issuers, and many customers carry four or five cards. Balance surfing is a common practice, where the card user pays an old balance with a new card. These are not considered attractive customers, and one of the uses of data warehousing and data mining is to identify balance surfers. The profitability of the industry has also attracted those who wish to push the edge of credit risk, both from the customer and the card issuer perspective. Bank credit card marketing promotions typically generate 1,000 responses to mailed solicitations, a response rate of about 1 percent. This rate is improved significantly through data mining analysis.

Data mining tools used by banks include credit scoring. Credit scoring is a quantified analysis of credit applicants with respect to the prediction of on-time loan repayment. A key is a consolidated data warehouse, covering all products, including demand deposits, savings, loans, credit cards, insurance, annuities, retirement programs, securities underwriting, and every other product banks provide. Credit scoring provides a number for each applicant by multiplying a set of weighted numbers determined by the data mining analysis multiplied times ratings for that applicant. These credit scores can be used to make accept or reject recommendations, as well as to establish the size of a credit line. Credit scoring used to be conducted by bank loan officers, who considered a few tested variables, such as employment, income, age, assets, debt, and loan history. Data mining makes it possible to include many more variables, with greater accuracy.

The new wave of technology is broadening the application of database use and targeted marketing strategies. In the early 1990s, nearly all credit card issuers were mass-marketing to expand their card-holder bases. However, with so many cards available, broad-based marketing campaigns have not been as effective as they initially were. Card issuers are more carefully examining the expected net present value of each customer. Data warehouses provide the information, giving the issuers the ability to try to more accurately predict what the customer is interested in, as well as their potential value to the issuer. Desktop campaign management software is used by the more advanced credit card issuers, utilizing data mining tools, such as neural networks, to recognize customer behavior patterns to predict their future relationship with the bank.

Insurance

The insurance industry utilizes data mining for marketing, just as retailing and banking organizations do. But, they also have specialty applications. Farmers Insurance Group has developed a system for underwriting, which generates millions of dollars in higher revenues and lower claims. The system allows the firm to better understand narrow market niches and to predict losses for specific lines of insurance. One discovery was that it could lower its rates on sports cars, which increased their market share for this product line significantly.

Unfortunately, our complex society leads to some inappropriate business operations, including insurance fraud. Specialists in this underground industry often use multiple personas to bilk insurance companies, especially in the automobile insurance environment. Fraud detection software use a similarity search engine, analyzing information in company claims for similarities. By linking names, telephone numbers, streets, birthdays, and other information with slight variations, patterns can be identified, indicating a fraud. The similarity search engine has been found to be able to identify up to seven times more fraud than the exact-match systems.

Telecommunications

Deregulation of the telephone industry has led to widespread competition. Telephone service carriers fight hard for customers. The problem is that once a customer is obtained, it is attacked by competitors, and retention of customers is very difficult. The phenomenon of a customer switching carriers is referred to as churn, a fundamental concept in telemarketing as well as in other fields.

A director of product marketing for a communications company considered that one-third of churn is due to poor call quality and up to one-half is due to poor equipment. That firm has a wireless telephone performance monitor tracking telephones with poor performances. This system reduced churn by an estimated 61 percent, amounting to about 3 percent of the firm’s overall subscribers over the course of a year. When a telephone begins to go bad, the telemarketing personnel are alerted to contact the customer and suggest bringing in the equipment for service.

Another way to reduce churn is to protect customers from subscription and cloning fraud. Cloning has been estimated to have cost the wireless industry millions. A number of fraud prevention systems are marketed. These systems provide verification that is transparent to the legitimate subscribers. Subscription fraud has been estimated to have an economic impact of $1.1 billion. Deadbeat accounts and service shutoffs are used to screen potentially fraudulent applicants.

Churn is a concept that is used by many retail marketing operations. Banks widely use churn information to drive their promotions. Once data mining identifies customers by characteristic, direct mailing and telemarketing are used to present the bank’s promotional program. The mortgage market has seen massive refinancing in a number of periods. Banks were quick to recognize that they needed to keep their mortgage customers happy if they wanted to retain their business. This has led to banks contacting the current customers if those customers hold a mortgage at a rate significantly above the market rate. While they may cut their own lucrative financial packages, banks realize that if they don’t offer a better service to borrowers, a competitor will.

Human Resource Management

Business intelligence is a way to truly understand markets, competitors, and processes. Software technology such as data warehouses, data marts, online analytical processing (OLAP), and data mining make it possible to sift through data in order to spot trends and patterns that can be used by the firm to improve profitability. In the human resources field, this analysis can lead to the identification of individuals who are liable to leave the company unless additional compensation or benefits are provided.

Data mining can be used to expand upon things that are already known. A firm might know that 20 percent of its employees use 80 percent of services offered, but may not know which particular individuals are in that 20 percent. Business intelligence provides a means of identifying segments, so that programs can be devised to cut costs and increase productivity. Data mining can also be used to examine the way in which an organization uses its people. The question might be whether the most talented people are working for those business units with the highest ­priority or where they will have the greatest impact on profit.

Companies are seeking to stay in business with fewer people. Sound human resource management would identify the right people, so that organizations could treat them well to retain them (reduce churn). This requires tracking key performance indicators and gathering data on ­talents, company needs, and competitor requirements.

Summary

The era of big data is here, flooding businesses with numbers, text, and often more complex data forms, such as videos or pictures. Some of this data is generated internally, through enterprise systems or other software tools to manage a business’s information. Data mining provides a tool to utilize this data. This chapter reviewed the basic applications of data mining in business, to include customer profiling, fraud detection, and churn analysis. These will all be explored in greater depth in Chapter 2. But, here our intent is to provide an overview of what data mining is useful for in business.

The process of data mining relies heavily on information technology, in the form of data storage support (data warehouses, data marts, or OLAP tools) as well as software to analyze the data (data mining software). However, the process of data mining is far more than simply applying these data mining software tools to a firm’s data. Intelligence is required on the part of the analyst in selection of model types, in selection and transformation of the data relating to the specific problem, and in interpreting results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset