Data mining modeling in InfoSphere Warehouse Design Studio

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

380 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition

– The output column designates the column name in the output table into

which the results of the transformation of this feature is written. The output

table is the input into the data mining model.

10.2.3 Data mining modeling in InfoSphere Warehouse Design Studio

After the source data has been prepared, the data mining model can be

developed and tested. Recall that it is best to develop the mining model using a

subset of the historical data available, saving an equal portion of the historical

data (with known outcomes) for validation. InfoSphere Warehouse 10.1 Design

Studio has two primary routes to develop the data mining model:

???? The first route is to use the

mining flow editor in Design Studio to manually

build the flow of information from the source data tables into the data mining

method operators and finally into an output target. Outputs can be tables or

columns, or they can also be visualization methods. Mining flows are

conceptually and practically similar to the SQL Warehousing (SQW) data

flows described in Chapter 4, “Data modeling: End to end” on page 81.

???? The second route is to use the solution plan wizards that have been defined

for a subset of common data mining solutions. The wizard prompts the

developer for information that will be used to produce the same mining flows

as though they were developed manually, but the interface is more business

analyst-oriented, seeking input in terms of the business analysis being

performed instead of in terms of tables and columns.

For a more thorough discussion of the data mining modeling process and tooling

in InfoSphere Warehouse Design Studio, see InfoSphere Warehouse: A Robust

Infrastructure for Business Intelligence, SG24-7813. We consider each of these

approaches at summary level in this document.

Chapter 10. Techniques for data mining in an operational warehouse 381

Data mining flow editor

InfoSphere Warehouse 10.1 Design Studio provides the data mining flow editor

to design and test data mining models in a manner that is similar to SQL

Warehousing data flows. Figure 10-13 shows an example of a data mining flow.

Figure 10-13 Data mining flow in Design Studio

Notice several items of interest in this mining flow:

???? On the left side of the graphic there are several Table Source (extract)

operators, such as Customer.

???? These operators all feed into a multiway Join operator.

???? The output of the Join operator feeds into a Select operator. The Select

operator reduces from all of the columns in all the input tables down to only

those columns required in the Clusterer operator.

???? The Clusterer operator takes the input columns and performs the cluster

mining method. The output of the Cluster operator goes into three operators

in parallel.

???? One of the operators that follows the Clusterer operator is the Visualizer. The

Visualizer is most useful during model design and ad hoc mining operations.

There are distinct visualizers for each data mining method. Each one is

tailored to graphically display the results of the mining function. We saw

examples of the cluster visualization during the discussion of data exploration

and univariate and bivariate analysis.

???? Another operator that follows the Clusterer operator is the Cluster Extractor.

This operator extracts clusters from the Clusterer operator for insertion into a

table for further analysis. It is the equivalent of performing a SELECT

382 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition

statement on a table. These clusters are merely the descriptions of the

clusters that were found in the data set.

???? The output of the Cluster Extractor is a Table Target, which is a table that

collects all of the cluster detail information.

???? The third output of the Clusterer operator is the Scorer operator. The Scorer

operator takes the Cluster model as one input and the data set (table) as a

second input. The result is a mapping of customers in the dataset to their

respective cluster. The scoring operator is what enables the model to be

applied to new data records.

???? The final output of the Scorer operator is another Table Target operator that

collects the results of the Scorer operator, a mapping of customers to

clusters.

Data mining modeling revisited

We previously provided an example data mining flow for the clustering method.

In this section, we revisit the key aspects of the modeling process. This process

is repeated for each mining method. The general steps of the process are as

follows:

1. Extract the source data; this is shown in Figure 10-13 on page 381 as multiple

tables sources feeding a multiway join operation.

2. The model operator (Clusterer, for example) operates on the input source

data and generates output that can be processed in several ways.

3. The visualization operator can be used in two primary ways:

– As a means of evaluating and validating the data mining model results for

“correctness.” It can aid the developer in the process of tuning the mining

model for effectiveness.

– As a means of communicating the model results for use by users and

business analysts.

The visualizer can provide an easy-to-grasp view of the results that allow

important or anomalous results to stand out against the “noise” of extensive

results.

There are visualizers for each data mining method in Design Studio, and they

tend to flow from the mining operator.

4. The extractor operation simply distills the model results into a form suitable

for insertion into a database table for standard relational query and reporting.

This is an easy way to make the mining results available for integration into

wider BI applications and reporting solutions, integrating with other data

sources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data mining modeling in InfoSphere Warehouse Design Studio

Create new playlist

Sign In

Sign Up

Table of Contents for
Data mining modeling in InfoSphere Warehouse Design Studio