Index

A

accuracy criterion 193–194

acquisition cost 417–418

activation functions

about 243, 322

output layer 247

target layer 270–272

Add value 306

adjusted frequencies 441

adjusted probabilities, expected profits using 236

AIC (Akaike Information Criterion) 350–352

Append node 48–50, 116

Arc Tanget function 243–244

Architecture property

about 316

MLP setting 247

Neural Network node 281, 283, 284, 293, 295–297, 298–300, 305

NRBFUN network 303–304

Regression node 389

architectures

alternative built-in 286–307

of neural networks 316

user-specified 305–307

Assessment Measure property 174, 187, 193, 198, 387–389, 396–399

attrition, predicting 384–392

auto insurance industry, predicting risk in 3–4

AutoNeural node 307–309, 314–315, 316

Average method 408

average profit, vs. total profit for comparing tree size 192–193

average squared error 174, 194

B

β, as vector of coefficients 322

Backward Elimination method

about 335

when target is binary 335–337

when target is continuous 338–340

bank deposit products, predicting rate sensitivity of 4–5

bin

See groups

binary split search, splitting nodes using 176–177

binary targets

Backward Elimination method with 335–337

Forward Selection method with 340–342

models for 384–392

with nominal-scaled categorical inputs 135–138

with numeric interval-scaled inputs 129–135

regression models with 321–324

stepwise selection method with 344–345

binning transformations 96–97

Bonferroni Adjustment property 183–184

Boolean retrieval method 427

branch 170

bucket 96

See also groups

business applications

logistic regression for predicting mail campaign response 359–371

of regression models 358–379

C

calculating

Chi-Square statistic for continuous input 113–115

cluster components 64

Cramer's V for continuous input 113–115

eigenvectors 111

misclassification rate/accuracy rate 193–194

principal components 112

residuals 403–404

validation profits 190–192

worth of a tree 173–175

worth of splits 177–182

categorical variables 1–2, 165

child nodes 170

Chi-Square

calculating for continuous input 113–115

criterion for 130–135, 137–138

selection method 73

statistic 52, 53

test for 234

Chi-Square property, StatExplore node 114

class inputs, transformations of 98

Class Inputs property, Transform Variables node 98, 99, 162, 376

class interval

See groups

Class Levels Count Threshold property 19, 20, 108, 109, 164, 250, 267

Cloglog 335

Cluster Algorithm property 451–458

Cluster node 50, 69–72, 451

Cluster Variable Role property, Cluster node 70, 72

Clustering Source property, Variable Clustering node 63

clusters and clustering

assigning variables to 64–65

EM (Expectation-Maximization) 452–458, 460–461

hierarchical 451–459

selecting components 148–150

selecting variables for 140–148

Code Editor property, SAS Code node 103–104

combination functions 243, 270–272

combining

groups 88–90

models 383–413

predictive models 402–411

comparing

alternative built-in architectures of neural networks 286–307

categorical variables with ungrouped variables 165

gradient boosting and ensemble methods 410–411

models 383–413

models generated by DMNeural, AutoNeural, and Dmine Regression nodes 314–315

samples and targets 8

Complementary Log-Log link (Cloglog) 335

continuous input, calculating Chi-Square and Cramer's V for 113–115

continuous targets

Backward Elimination method with 338–340

with Forward Selection method 342–343

with nominal-categorical inputs 124–129

with numeric interval-scaled inputs 119–124

regression for 371–379

regression models with 333

stepwise selection method with 345–347

Correlations property, StatExplore node 55

Cosine function 243

cost of default 418–419

Cramer's V 53–54, 113–115

Cross Validation Error 355

Cross Validation Misclassification rate 355

Cross Validation Profit/Loss criterion 357–358

customer attrition, predicting 6

customer lifetime value 422

customer profitability

about 415–417

acquisition cost 417–418

alternative scenarios of response and risk 422

cost of default 418–419

customer lifetime value 422

extending results 423

optimum cut-off point 421–422

profit 419–421

revenue 419

Cutoff Cumulative property, Principal Components node 92–93

Cutoff Value property

Replacement node 81

Transform Variables node 98

cut-off values 258

D

data

applying decision tree models to prospect 173

pre-processing 8–10

data cleaning 9

data matrix 427–428, 430–431

Data Mining the Web (Markov and Larose) 458

data modification, nodes for

Drop 10, 79–80

Impute 10, 83, 153–154, 360, 386

Interactive Binning 83–90

Principal Components 90–95

Replacement 80–83

Transform Variables (See Transform Variables node)

Data Options dialog box 58

Data Partition node

about 27–28, 29, 249, 250

loss frequency as an ordinal target 269

Partitioning Method property 28, 386

property settings 197

Regression node 360, 372

variable selection 139, 145

variable transformation 153–154

Data Set Allocations property, Data Partition node 28

data sets

applying decision tree models to score 205–208

creating from text files 433–435

scoring using Neural Network models 263–266

scoring with models 277–279

Data Source property, Input Data node 26–27

data sources

changing measurement scale of variables in 164–165

creating 16–25, 37–40, 436

creating for text mining 436

creating for transaction data 37–40

decision 171, 174

decision tree models

about 170–172

accuracy/misclassification criterion 193–194

adjusting predicted possibilities for over-sampling 235–236

applying to prospect data 173

assessing using Average Square Error 194

average profit vs. total profit 192–193

binary split searches 176–177

calculating worth of trees 173–175

compared with logistic regression models 172

controlling growth of trees 185

developing interactively 215–233

developing regression tree model to predict risk 208–215

exercises 236–237

impurity reduction 182–183

measuring worth of splits 177–182

Pearson's Chi-square test 234

for predicting attrition 387–389

predicting response to direct marketing with 195–208

for predicting risk in auto insurance 396–399

pruning trees 185–192

p-value adjustment options 183–185

regression tree 176

roles of training and validation data in 175

in SAS Enterprise Miner 176–195

selecting size of trees 194–195

Decision Tree node

See also decision trees

about 117–118, 163

bins in 97

building decision tree models 187, 195

Interactive property 222, 225, 231–233

Leaf Role property 151

logistic regression 366, 367, 368, 377

in process flow 134

regression models 387–389, 392–401

Regression node 359

variable selection in 121

variable selection using 150–153

decision trees

developing interactively 215–233

growing 233

Decision Weights tab 22–23

Decisions property 187

Decisions tab 23

Default Filtering Method property, Filter node 29

Default Input Method property, Impute node 83

Default Limits Method property, Replacement node 81

default methods 98–100

degree of separation 178–179

depth adjustment 184

depth multiplier 184

Diagram Workspace 15, 16

dimension reduction 429–431

direct mail, predicting response to 2–3

direct marketing, predicting response to 195–208

DMDB procedure 78–79

Dmine Regression node 312–313, 314–315, 316

DMNeural node 309–312, 314–315, 316

documents, retrieving from World Wide Web 432–433

document-term matrix 427–428

Drop from Tables property, Drop node 80

Drop node 10, 79–80

E

EHRadial value 307

Eigenvalue Source property 91

eigenvalues 64, 110–115

eigenvectors 110–115

Elliot function 243–244

EM (Expectation-Maximization) clustering 452–458, 460–461

Ensemble node 384, 402, 407–409, 410–411

Entropy 180–181

Entry Significance Level property, Regression node

Forward Selection method 340–342, 342–343

regression models 372, 386

Stepwise Selection method 345–347

EQRadial value 307

EQSlopes value 307

error function 248

EVRadial value 307

EWRadial value 307

exercises

decision tree models 236–237

models, combining 412–413

neural network models 318–319

predictive modeling 115–116

regression models 382

textual data, predictive modeling with 461

variable selection 166–167

Expectation-Maximization (EM) clustering 452–458, 460–461

expected losses 423

expected lossfrq 394

explanatory variables 170, 241

Exported Data property

Input Data node 102

Time Series node 43–44

F

false positive fraction 258

File Import node 32–35

Filter node 10, 28–32, 445

Filter Viewer property 445

fine tuning 27

Forward Selection method

about 340

when target is binary 340–342

when target is continuous 342–343

frequency

about 8

adjusted 441

FREQUENCY procedure 54

frequency weighting 440, 441

Frequency Weighting property 444

G

Gauss function 243

Gini Cutoff property, Interactive Binning node 84–85

Gini Impurity Index 180

gradient boosting 402–404

Gradient Boosting node 402, 404–406, 410–411

GraphExplore node 50, 51, 58–61

groups

See also leaf nodes

combining 88–90

splitting 85–88

H

Help Panel 15

Hidden Layer Activation Function property 241, 281, 283, 287–288, 305

Hidden Layer Combination Function property 241, 281, 283, 287–288, 305–306

hidden layers 242–246

Hide property

Regression node 363

Transform Variables node 101, 156, 159, 162

transforming variables 162

Hide Rejected Variables property, Variable Selection node 120, 122

hierarchical clustering 451–459

Huber-M Loss 411

Hyperbolic Tangent function 243–244

I

Identity link 335

Import File property, File Import node 33, 34

Imported Data property, SAS Code node 102–103

impurity reduction

about 53

as measure of goodness of splits 179–180

when target is continuous 182–183

Impute node 10, 83, 153–154, 360, 386

Include Class Variables property, Variable Clustering node 139–140

initial data exploration, nodes for

about 50–51

Cluster 50, 69–72, 451

Graph Explore 50, 51, 58–61

MultiPlot 50, 51, 56–58, 358

Stat Explore 10, 50, 51–56, 79, 114, 358

Variable Clustering 50, 61–69, 117–118, 138–150, 163, 352

Variable Selection 50, 72–79, 153–154, 155–157, 163, 164, 359

input 111, 170

Input Data node

about 10, 249, 250

building decision tree models 195–196, 205

Data Source property 26–27

Exported Data property 102

loss frequency as an ordinal target 267

in process flow 91, 101

regression models 385–386, 407

scoring datasets 277

transforming variables 153–154

Input Data Source node 333, 360, 372

input layer 242

Input Standardization property 305

input variables, regression with large number of 11

inputs window 6

Interactive Binning node 83–90

Interactive Binning property, Interactive Binning node 85, 88–89

Interactive property 216

Interactive property, Decision Tree node 222, 225, 231–233

Interactive Selection property, Principal Components node 91, 93

intermediate nodes 130

Interval Criterion property 177–178, 182, 183

interval inputs, transformations for 95–98

Interval Inputs property

Merge node 45–46, 47

Regression node 364

Transform Variables node 95, 96, 98, 155, 159, 162

interval variables 2

Interval Variables property

Filter node 30

StatExplore node 52, 55, 114

inverse link function 322

K

KeepHierarchies property, Variable Clustering node 65

L

Larose, D.T.

Data Mining the Web 458

latent semantic indexing 429–431

leaf nodes 170, 233

See also terminal nodes

Leaf Role property

Decision Tree node 151

Regression node 377

Leaf Rule property 389–390

Leaf Size property 185, 215, 367

Leaf Variable property 389–390

Least Absolute Deviation Loss 411

Least Squares Loss 411

lift 174–175

lift charts 393–394

Linear Combination function 305–306

Linear Regression 333

Linear value 306

link function 321

Link Function property, Regression node 324, 333–335, 348

Logistic function 244

logistic regression

about 333

for predicting attrition 386–387

for predicting mail campaign response 359–371

with proportional odds 394–396

logistic regression models, vs. decision tree models 172

Logit link 322, 334

Logit Loss 412

logworth 178–179

loss frequency 208, 240, 266–279

M

marginal profit 420

marginal revenue 420

Markov, Z.

Data Mining the Web 458

maximal tree 175, 191–192

Maximum Clusters property, Variable Clustering node 62

Maximum Depth property 185, 215, 367

Maximum Eigenvalue property, Variable Clustering node 62, 64

Maximum method 408

Maximum Number of Steps property 345–347

maximum posterior probability/accuracy, classifying nodes by 193

measurement scale 1–2, 107–109

measurement scale, of variables 164–165

Menu Bar 15

Merge node 45–47, 48, 159–162

Merging property, Transform Variables node 161

Metadata Advisor Options window 19

Method property 84, 187, 367

methods

Average 408

Backward Elimination 335–340

Boolean retrieval 427

Chi-Square selection 73

default 98–100

frequency weighting 441

Maximum 408

R-Square selection 72–73

term weighting 441–445

Minimum Chi-Square property, Variable Selection node 73

Minimum property, Cluster node 70

Minimum R-Square property, Variable Selection node 73, 74, 120–121, 157

misclassification criterion 174, 193–194

MLP (Multilayer Perception) neural network 279–281, 287–288

Model Comparison node

assessing predictive performance of estimated models 254–258

building decision tree models 204–205, 211, 233

comparing alternative built-in architectures 286

in process flow 139, 149

regression models 391

Regression node 360, 368

variable selection 151

Model Selection Criterion property 248, 249, 250, 273, 389

Model Selection property 386

modeling data, sources of 8

modeling strategies, alternative 10–11

models

See also neural network models

for binary targets 384–392

combining 412–413

comparing and combining 383–413

for ordinal targets 392–401

Multilayer Perception (MLP) neural network 279–281, 287–288

Multiple Method property, Transform Variable node 162

MultiPlot node 50, 51, 56–58, 358

N

Network property, Neural Network node 251, 269, 283

neural network models

about 240–241

alternative specifications of 279–286

AutoNeural node 314–315

comparing alternative built-in architectures of Neural Network node 286–309

Dmine Regression node 312–313, 314–315

DMNeural node 309–312, 314–315

estimating weights in 247–249

exercises 318–319

general example of 241–247

nodes for 240–241

for predicting attrition 389–392

predicting loss frequency in auto insurance 266–279

for predicting risk in auto insurance 400–401

scoring data sets using 263–266

target variables for 240

Neural Network node

about 240–241, 316

Architecture property 281, 283, 284, 293, 295–297, 298–300, 305

loss frequency as an ordinal target 268–269

Model Selection Criterion property 273

Multilayer Perceptron (MLP) neural networks 278–281

Normalized Radial Basis Function with Equal Heights and Unequal Widths (NRBFEH) 295–297

Normalized Radial Basis Function with Equal Volumes (NRBFEV) 301–302

Normalized Radial Basis Function with Equal Widths and Heights (NRBFEQ) 292–294

Ordinary Radial Basis Function with Equal Heights and Unequal Widths (ORBFUN) 291–292

Radial Basis Function neural networks in 282–286

regression models 389–390, 392–401

score ranks in Results window 275

scoring datasets 277

selecting optimal weights 261–263

setting properties of 250–254

target layer combination and activation functions 270–272

neural networks

about 316

alternative specifications of 279–286

comparing alternative built-in architectures in 286–307

node definitions 175

Node (Tool) group tabs 15

Node ID property, Transform Variables node 159, 161

nodes

See also Data Partition node

See also Decision Tree node

See also Input Data node

See also Model Comparison node

See also Neural Network node

See also Regression node

See also SAS Code node

See also Transform Variables node

See also Variable Clustering node

See also Variable Selection node

Append 48–50, 116

AutoNeural node 307–309, 314–315, 316

child 170

classifying by maximum posterior probability/accuracy 193

Cluster 50, 69–72, 451

for data modification 79–101

Dmine Regression 312–313, 314–315, 316

DMNeural 309–312, 314–315, 316

Drop 10, 79–80

Ensemble 384, 402, 407–409, 410–411

File Import 32–35

Filter 10, 28–32, 445

Gradient Boosting 402, 404–406, 410–411

GraphExplore 50, 51, 58–61

Impute 10, 83, 153–154, 360, 386

for initial data exploration 50–79

Input Data Source 333, 360, 372

Interactive Binning 83–90

intermediate 130

leaf 170, 233 (See also terminal nodes)

Merge 45–47, 48, 159–162

MultiPlot 50, 51, 56–58, 358

for neural network models 240–241

parent 170

Principal Components 90–95

Replacement 80–83

responder 192

Root 130, 170, 225–231

sample 26–50

Score 205–207, 265, 266, 277

splitting using binary split search 176–177

StatExplore 10, 50, 51–56, 79, 114, 358

Stochastic Boosting 384

terminal 130, 170

Text Cluster 450, 451–458, 458–461

Text Filtering 432, 440–450, 442

Text Import 431, 436

Text Parsing 431, 436–440, 442, 445–450

Text Topic 445–450

Time Series 35–44

Transformation 376

utility 101–107

nominal categorical (unordered polychotomous) target, predicting 7

Nominal Criterion property 177, 180–181

nominal (unordered) target, regression models with 329–332

nominal-categorical inputs, continuous target with 124–129

nominal-scaled categorical inputs, binary target with 135–138

non-responders 234

NRBFEH (Normalized Radial Basis Function with Equal Heights and Unequal Widths) 295–297

NRBFEQ (Normalized Radial Basis Function with Equal Widths and Heights) 292–294

NRBFEV (Normalized Radial Basis Function with Equal Volumes) 300–302

NRBFEW (Normalized Radial Basis Function with Equal Widths and Unequal Heights) 297–300

NRBFUN (Normalized Radial Basis Function with Unequal Widths and Heights) 302–304

Number of Bins property

about 131

StatExplore node 52

Variable Selection node 73

Number of Hidden Units property 279–281, 286, 389, 400–401

number of levels, of variables 107–109

numeric interval-scaled inputs

binary target with 129–135

continuous target with 119–124

O

observation weights 8

observed proportions 170

Offset Value property, Transform Variables node 96

opening SAS Enterprise Miner 12.1 14

operational lag 6

optimal binning 45, 96–97

optimal tree 175

Optimization property 250

optimum cut-off point 421–422

ORBFEQ (Ordinary Radial Basis Function with Equal Heights and Widths) 288–290

ORBFUN (Ordinary Radial Basis Function with Equal heights and Unequal Widths) 291–292

ORBFUN (Ordinary Radial with Unequal Widths) 283

ordered polychotomous targets

See ordinal targets

Ordinal Criterion property 177, 180–181

ordinal targets

loss frequency as 267–279

models for 392–401

regression models with 324–329

original segment 170

output data sets

created by Time Series node 43–44

developing predictive equations created by Text Topic node 449–450

output layer 246–247

overriding default methods 99–100

over-sampling, adjusting predicted probabilities for 235–236

P

p weights 284

parent nodes 170

Partitioning Method property, Data Partition node 28, 386

Pearson Correlations property, StatExplore node 55

Pearson's Chi-square test 234

percentage of ranked data (n%) 175

performance window 6, 7

posterior probability

about 170

for leaf nodes from training data 189

of non-response 203

of response 203

Posterior Probability property 408

Predicted Values property 408

predicting

attrition 384–392

customer attrition 6

loss frequency in auto insurance with Neural Network model 266–279

nominal categorical (unordered polychotomous) target 7

rate sensitivity of bank deposit products 4–5

response (See neural network models)

response to direct mail 2–3

response to direct marketing 195–208

risk (See neural network models)

risk in auto insurance industry 3–4

risk of accident risk 392–401

risk with regression tree models 208–215

predictive equations, developing using output data set created by Text Topic node 449–450

predictive modeling

See also textual data, predictive modeling with

about 14

boosting 402–411

combining 402–411

creating new projects in SAS Enterprise Miner 12.1 14–15

creating process flow diagrams 25–26

creating SAS data sources 16–25

eigenvalues 64, 110–115

eigenvectors 110–115

exercises 115–116

measurement scale 107–109

nodes for data modification 79–101

nodes for initial data exploration 50–79

number of levels of variable 107–109

opening SAS Enterprise Miner 12.1 14

principal components 110–115

sample nodes 26–50

SAS Enterprise Miner window 15–16

type of variable 107–109

utility nodes 101–107

Preliminary Maximum property, Cluster node 70

pre-processing data 8–10

principal components 110–115

Principal Components node 90–95

Prior Probabilities tab 22

probabilities, adjusted 236

Probit link 334

process flow diagrams 25–26, 40

profit 419–421

See also customer profitability

See also validation profit

average vs. total 192–193

marginal 420

Profit/Loss criterion 357

Project Panel 15

projects, creating in SAS Enterprise Miner 12.1 14–15

promotion window 4

properties

See also specific properties

of Neural Network node 250–254

of Regression node 333–358

Properties Panel 15

Proportional Odds model 394–396

pruning trees 175, 185

p-value 52

P-value adjustment options

Bonferroni Adjustment property 183–184

depth adjustment 184

Leaf Size property 185

Split Adjustment property 184

Threshold Significance Level property 184

Time of Kass Adjustment property 184

Q

quantifying textual data 426–428, 431–432

quantile 96

R

rate sensitivity, predicting of bank deposit products 4–5

RBF (Radial Basis Function) neural network 281–286

Receiver Operating Characteristic (ROC) charts 258–261

recursive partitioning 130, 170

regression

for continuous targets 371–379

with large number of input variables 11

regression models

about 321

with binary targets 321–324

business applications 358–379

exercises 382

Regression node properties 333–358

types of models developed using 321–333

Regression node

See also regression models

about 10, 11

Architecture property 389

Chi-Square criterion 138

Data Partition node 360, 372

Decision Tree node 359

Entry Significance Level property 340–342, 342–343, 345–347, 372, 386

Hide property 363

Interval Inputs property 364

Leaf Role property 377

Link Function property 324, 333–335, 348

predictive modeling 449–450

in process flow 76, 90, 95, 96, 101, 123–124, 129, 133, 139, 143–144

properties of 333–358

regression models 386, 387–389, 392–401

Regression Type property 324, 333, 348

Reject property 363

R-Square criterion 130, 136–137

Selection Model property 144, 335–347, 340, 342–343, 348, 367, 394, 449–450

testing significance of dummy variables 98

testing variables and transformations 45, 47, 48

Transform Variables node 359

transforming variables 157, 159, 161–162

Variable Clustering node 352

variable selection 145, 146, 148–149, 150, 151, 152

variable selection in 121

Variable Selection property 377, 389–390

Variables property 95, 123–124, 133–134

regression tree 176, 208–215

Regression Type property, Regression node 324, 333, 348

Reject property

Regression node 363

Transform Variables node 101, 156, 159, 162

transforming variables 162

Replacement Editor property, Replacement node 81–82

Replacement node 80–83

research strategy

about 1

alternative modeling strategies 10–11

defining targets 2–8

measurement scales for variables 1–2

pre-processing data 8–10

residuals, calculating 403–404

responder node 192

responders 234

response

alternative scenarios of 422

predicting (See neural network models)

predicting to direct mail 2–3

revenue 419

risk

See also neural network models

alternative scenarios of 422

classifying for rate setting 279

predicting in auto insurance industry 3–4

predicting with regression tree models 208–215

risk rate 415–416

ROC (Receiver Operating Characteristic) charts 258–261

Role property 263

Root node 130, 170, 225–231

R-Square criterion 130, 136–137

R-Square selection method 72–73

S

sample nodes

Append 48–50, 116

Data Partition 27–28, 29, 139, 145, 153–154, 198, 249, 250, 269, 360, 372

File Import 32–35

Filter 10, 28–32, 445

Input Data 10, 26–27, 91, 101, 102, 153–154, 195–196, 205, 249, 250, 267, 277, 385–386, 407

Merge 45–47, 48, 159–162

Time Series 35–44

samples, compared with targets 8

SAS Code node

about 10, 101–107

building decision tree models 207–208

logistic regression 374

predictive modeling 438, 444

score ranks in Results window 275

SAS Enterprise Miner

creating projects in 14–15

data cleaning after launching 9

data cleaning before launching 9

developing decision trees in 176–195

opening 14

window 15–16

SAS Enterprise Miner: Reference Help 284, 437

SBC (Schwarz Bayesian Criterion) 352–353

Score node 205–207, 265, 266, 277

scoring

data sets using Neural Network models 263–266

datasets with models 277–279

showing ranks in Results window 273–276

segments 170

See also leaf nodes

Select an Analysis property, Time Series node 41

Selection Criterion property, Regression node

about 348–350

Akaike Information Criteria (AIC) 350–352

Backward Elimination method 335

cross validation error 355

cross validation misclassification rate 355

Cross Validation Profit/Loss Criterion 357–358

Forward Selection method 341–342, 342–343

logistic regression 394

predictive modeling with textual data 449–450

Profit/Loss Criterion 357

regression models 365, 367, 376, 386

Schwarz Bayesian Criterion (SBC) 352–353

validation error 353–354

validation misclassification 354

Validation Profit/Loss Criterion 355–356

variable selection 144

Selection Default property 343

Selection Model property, Regression node 144, 335–347, 340, 342–343, 348, 367, 394, 449–450

Selection Options property 336, 344–345

sensitivity

See true positive fraction

separation, degree of 178–179

Significance Level property 184, 215, 348, 367

simple transformation 96

Sine function 243

Singular Value Decomposition (SVD) 429, 431

sources, of modeling data 8

Spearman Correlations property, StatExplore node 55

specificity

See true positive fraction

Split Adjustment property 184, 215

split point, changing of nominal variables 222–225

Split Size property 185

splits, measuring worth of 177–181, 181–182

splitting

groups 85–88

nodes using binary split search 176–177

process of 62

Splitting Rule Criterion property 233, 387–389

Splitting Rule Interval Criterion property 209

splitting value 176

StatExplore node 10, 50, 51–56, 79, 114, 358

Status Bar 15, 16

Stay Significance Level property 335, 336, 337, 343, 345–347, 372

stepwise selection method

about 343

when target is binary 344–345

when target is continuous 345–347

Stochastic Boosting node 384

stochastic gradient boosting 404

Stop R-Square property, Variable Selection node 73, 74, 121, 157

Sub Tree Method property 198

sub-segments 170

Subtree Assessment Measure property 233

Subtree Method property 377, 387–389, 396–399

SVD (Singular Value Decomposition) 429, 431

SVD Revolution property 451

synthetic variables 246–247

T

Tables to Filter property, Filter node 29

Target Activation Function property 393

target layer 246–247, 270–272

Target Layer Activation Function property 241, 281, 283, 305, 307

Target Layer Combination Function property 241, 270, 281, 283, 305, 307

Target Layer Error Function property 272–273, 281, 283

Target Model property, Variable Selection node 72, 77–78, 78–79, 121–122, 136, 137

target variables, for neural network models 240

targets

See also binary targets

See also continuous targets

See also ordinal targets

compared with samples 8

defining 2–8

maximizing relationship to 96–98

transformations of 98

Targets tab 21

Term Weight property 444

term weighting 440–441, 441–445

term-document matrix 426–427

terminal nodes 130, 170

test data

roles of in development of decision trees 175

testing model performance with 204–205

Test property, Data Partition node 28, 360

Text Cluster node 450, 451–458, 458–461

text files, creating SAS data sets from 433–435

Text Filter node 432, 440–450

Text Filtering node 442

Text Import node 431, 436

text mining, creating data sources for 436

Text Parsing node 431, 436–440, 442, 445–450

Text Topic node 445–450

textual data, predictive modeling with

about 425–426

creating data sources for text mining 436

creating SAS data sets from text files 433–435

dimension reduction 429–431

exercises 461

latent semantic indexing 429–431

quantifying textual data 426–428

retrieving documents from World Wide Web 432–433

Text Cluster node 450, 451–458, 458–461

Text Filter node 432, 440–450

Text Import node 431, 436

Text Parsing node 431, 436–440, 442, 445–450

Text Topic node 445–450

Threshold Significance Level property 184

Time of Kass Adjustment property 184

Time Series node 35–44

%TMFILTER macro 432, 435

Toolbar Shortcut Buttons 15, 16

tools

See nodes

Tools Bar 15

total profit, vs. average profit for comparing tree size 192–193

training, of trees 172

training data

developing trees using 188–189

roles of in development of decision trees 175

training data set 233

Training property, Data Partition node 28, 360

transaction data

converting to time series 35–37

creating data sources for 37–40

transform variables, saving code generated by 163

Transform Variables node

See also variable selection

about 95–101, 117–118, 163, 164

Class Inputs property 376

Hide property 101, 156, 159, 162

Interval Inputs property 95, 96, 98, 155, 159, 162

Merging property 161

Multiple Method property 162

Node ID property 159, 161

Offset Value property 96

in process flow 101, 102, 116

Regression node 359

Reject property 101, 156, 159, 162

testing variables and transformations 45, 46–47, 48

transforming variables 155–157, 158, 159, 160–162

transforming variables with 153–155

Variables property 99

Transformation node 376

transformations

after variable selection 157–159

binning 96

of class inputs 98

for interval inputs 95–98

multiple using Multiple Method property 162

passing more than one for each interval input 159–162

passing two types using Merge node 159–162

simple 96

of targets 98

before variable selection 155–157

of variables 153–163

TRANSPOSE procedure 439

Treat Missing as Level property

Interactive Binning node 84

Regression node 376

trees

about 170

assessing using Average Square Error 194

true positive fraction 258

U

unadjusted probabilities, expected profits using 236

ungrouped variables, compared with categorical variables 165

unordered (nominal) target, regression models with 329–332

Use AOV16 Variables property

Dmine Regression node 313

Variable Selection node 73, 74, 78, 120, 122, 157

Use Group Variables property, Variable Selection node 74–75, 122, 124–125

Use Selection Defaults property 335, 336, 340, 342–343, 386

user-defined networks 307

user-specified architectures 305–307

utility nodes 101–107

V

validation accuracy 193

validation data

pruning trees using 185–187

roles of in development of decision trees 175

validation error 353–354

Validation Error criterion 386

validation misclassification 354

validation profit 175, 190–192

Validation Profit/Loss criterion 355–356

Validation property, Data Partition node 28, 360

variable clustering, using example data set 65–69

Variable Clustering node

about 50, 61–69, 117–118, 163

Include Class Variables property 139–140

Maximum Clusters property 62

Maximum Eigenvalue property 62, 64

Regression node 352

Variable Selection property 139

variable selection using 138–150

variable selection

See also Transform Variables node

about 117–119

binary target with nominal-scaled categorical inputs 135–138

binary target with numeric interval-scaled inputs 129–135

continuous target with nominal-categorical inputs 124–129

continuous target with numeric interval-scaled inputs 119–124

exercises 166–167

transformation after 157–159

transformation before 155–157

using Decision Tree node 150–153

using Variable Clustering node 138–150

Variable Selection node

about 50, 72–79, 163, 164

Hide Rejected Variables property 120, 122

Minimum R-Square property 73, 74, 120–121, 157

regression models 359

Stop R-Square property 73, 74, 121, 157

transforming variables 153–154, 155–157

Variable Selection property

Regression node 377, 389–390

Variable Clustering node 139

variables

assigning to clusters 64–65

categorical 1–2, 165

changing measurement scale of in data sources 164–165

explanatory 241

interval 2

measurement scale of 1–2, 107–109

number of levels of 107–109

selecting for clusters 140–148

synthetic 246–247

transformation of 153–163

types of 107–109

Variables property

about 164

Drop node 80

File Import node 34

Impute node 83

Regression node 95, 123–124, 133–134

Transform Variables node 99

viewing properties 25

variance

of inputs 111

proportion explained by cluster component 65

proportion of explained by principal components 112–113

Variation Proportion property, Variable Clustering node 62

Voting Posterior Probabilities property 408–409

Voting...Average method 408–409

Voting..Proportion method 409

W

weights

estimating in neural network models 247–249

selecting for Neural Network node 261–263

windows

Metadata Advisor Options 19

SAS Enterprise Miner 15–16

World Wide Web, retrieving documents from 432–433

X

XRadial value 307

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset