382 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
statement on a table. These clusters are merely the descriptions of the
clusters that were found in the data set.
???? The output of the Cluster Extractor is a Table Target, which is a table that
collects all of the cluster detail information.
???? The third output of the Clusterer operator is the Scorer operator. The Scorer
operator takes the Cluster model as one input and the data set (table) as a
second input. The result is a mapping of customers in the dataset to their
respective cluster. The scoring operator is what enables the model to be
applied to new data records.
???? The final output of the Scorer operator is another Table Target operator that
collects the results of the Scorer operator, a mapping of customers to
clusters.
Data mining modeling revisited
We previously provided an example data mining flow for the clustering method.
In this section, we revisit the key aspects of the modeling process. This process
is repeated for each mining method. The general steps of the process are as
follows:
1. Extract the source data; this is shown in Figure 10-13 on page 381 as multiple
tables sources feeding a multiway join operation.
2. The model operator (Clusterer, for example) operates on the input source
data and generates output that can be processed in several ways.
3. The visualization operator can be used in two primary ways:
– As a means of evaluating and validating the data mining model results for
“correctness.” It can aid the developer in the process of tuning the mining
model for effectiveness.
– As a means of communicating the model results for use by users and
business analysts.
The visualizer can provide an easy-to-grasp view of the results that allow
important or anomalous results to stand out against the “noise” of extensive
results.
There are visualizers for each data mining method in Design Studio, and they
tend to flow from the mining operator.
4. The extractor operation simply distills the model results into a form suitable
for insertion into a database table for standard relational query and reporting.
This is an easy way to make the mining results available for integration into
wider BI applications and reporting solutions, integrating with other data
sources.