- a priori algorithm
- association rules
- minimum confidence
- Modeler results
- one antecedent
- two antecedents
- two-step process
- frequent itemsets
- ADABoost algorithm
- final boosted classifier
- initial base classifier
- original dataset
- second base classifier
- third base classifier
- adjusted cost matrix
- bank loan
- equivalent cost
- false negative cost
- false positive cost
- retailer cost
- analysis of variance (ANOVA)
- Minitab results
- MSTR
- multiple regression model
- R code
- sample mean age
- sum of squares
- artificial neuron model
- association rules
- a priori property (see a priori algorithm)
- affinity analysis
- antecedent and consequent
- business and research
- categorical data
- confidence and support
- frequent itemsets
- J-measure
- lift ratio
- market basket analysis
- patterns and models
- R code
- strong rules
- supervised/unsupervised learning
- worst case scenario
- attribute-relation file format (ARFF) file
- back-propagation algorithm
- cross validation termination
- downstream node
- error propagation
- learning rate
- momentum term
- squared prediction error
- upstream node
- bagging model
- algorithm for
- bootstrap samples
- vs. CART model
- prediction method
- R code
- stable/unstable classification
- balanced iterative reducing and clustering using hierarchies (BIRCH) clustering
- bank loans data set
- cost matrix
- data sorting
- No Interest model
- With Interest model
- CF/CF tree
- Additivity Theorem
- algorithm
- building process
- clustering sub-clusters
- definition
- one-dimensional toy data set
- radius
- tree structure
- Modeler's two-step algorithm
- optimal number of clusters
- pseudo-F statistic method
- R code
- two-step clustering
- baseline model
- Captain Kirk's situation
- regression model
- Bayesian approach see also Nave Bayes classifier
- balancing data set
- drawbacks
- frequentist/classical approach
- likelihood function
- MAP method (see maximum a posteriori (MAP))
- marginal distribution
- MCMC methods
- posterior distribution
- posterior odds ratio
- prior distribution
- R code
- Bayesian belief networks (BBNs)
- clothing purchase
- conditional probability
- directed acyclic graph
- joint probability distribution
- prior probabilities
- WEKA
- Explorer Panel
- positive and negative classification
- prior probabilities
- test set predictions
- bias–variance trade-off
- boosting model
- ADABoost algorithm
- final boosted classifier
- initial base classifier
- original dataset
- second base classifier
- third base classifier
- vs. CART model
- R code
- C4.5 algorithm
- adult data set
- candidate splits
- capital gains
- categorical variables
- decision node A
- entropy reduction
- initial split
- marital status
- numerical variables
- savings split
- threshold partition
- training data set
- churn data set
- account length
- adult data set
- age predictor
- area code field
- balanced data set
- categorical variables
- clustered bar chart
- comparative pie chart
- directed web graph
- International Plan
- marginal distribution
- non-churners
- row percentages
- software packages
- two-way interaction
- voice mail plan
- clustering analysis
- CART decision trees
- churn proportion
- contingency tables
- international plan people
- no-plan majority
- voice mail plan people
- conditional independence
- continuous predictor (see continuous predictor)
- correlation coefficient
- account length
- matrix plot
- Minitab regression tool
- optimal solution
- p-values
- thresholds
- customer service calls
- data preparation
- contingency table
- HighDayEveMins_Flag variable
- voice mail messages
- z-score standardization
- day minutes
- dichotomous predictor (see dichotomous predictor)
- education-num variable
- field values
- flag variables
- hours-per-week
- income overlay
- International Plan
- maximum a posteriori
- complement probabilities
- conditional probability
- International Plan
- joint conditional probabilities
- marginal and conditional probabilities
- posterior probabilities
- Voice Mail Plan
- multivariate graphics
- numerical predictors
- binning methods
- churn proportion
- churners vs. non-churners
- customer service call
- International Calls
- normalized and non-normalized histogram
- t-test
- numerical variables
- polychotomous predictor (see polychotomous predictor)
- posterior odds ratio
- vs. variables
- visualization
- voice mail plan
- VoiceMail Plan adopters
- classification and regression trees (CART)
- adult data set
- bank loans
- candidate splits
- capital gains
- categorical variables
- classification error
- components
- contingency table
- cost matrix
- data-driven misclassification costs
- decision node A
- decision node B
- decision tree output
- estimated revenue increase
- evaluation measures
- initial split
- lift chart
- marital status
- maximum value
- numerical variables
- optimal split
- scaled cost matrix
- training data set
- cluster feature (CF)
- Additivity Theorem
- building process
- clustering sub-clusters
- definition
- one-dimensional toy data set
- radius
- tree structure
- cluster validation
- cross-validation
- loans data sets
- methodology
- prediction strength
- R code
- loans data sets
- methodology
- prediction strength
- pseudo-F statistic method
- clustering model
- distribution
- Iris data set
- R code
- SSB and SSE
- R code
- silhouette method
- cohesion/separation
- Iris data set
- mean silhouette
- positive/negative values
- R code
- clustering analysis
- CART decision trees
- churn proportion
- contingency tables
- definition
- hierarchical clustering
- agglomerative clustering
- complete-linkage clustering
- divisive clustering methods
- single-linkage clustering
- international plan people
- k-means clustering algorithm
- data points
- definition
- MSE
- processing steps
- pseudo-F statistic method
- SAS Enterpriser Miner (see churn data set)sub
- statistics behavior
- no-plan majority
- R code
- voice mail plan people
- confidence interval
- customer service call
- lower bound
- margin of error
- population proportion
- subgroup analyses
- t-interval
- upper bound
- continuous predictor
- categorical predictor
- confidence intervals
- day minute usage
- deviance
- p-value
- test statistics
- unit-increase interpretation
- Cook's distance
- correlation coefficient
- account length
- matrix plot
- Minitab regression tool
- optimal solution
- p-values
- PCA
- thresholds
- cost-benefit analysis
- CART model
- contingency table
- cost matrix
- estimated revenue increase
- evaluation measures
- scaled cost matrix
- cost matrix
- decision invariance
- binary classifier
- scaling
- direct cost
- k-nary classification
- accuracy
- contingency table
- Loans data sets
- overall error rate
- predicted/actual categories
- sensitivity
- Loans data set
- adjusted cost matrix
- assumptions
- CART model
- direct cost matrix
- simplified cost matrix
- strategies
- opportunity cost
- positive classification
- adjusted cost matrix
- C5.0 models
- R code
- rebalancing cost
- CART model
- confidence and positive confidence
- definition
- network models
- trinary classification
- accuracy
- assumptions
- contingency table
- cost calculation
- cost matrix
- false negative
- false positive
- number of customers
- number of records
- overall error rate
- predicted/actual categories
- principal and interest
- true negative
- true positive
- cross-industry standard process for data mining (CRISP-DM)
- adaptive process
- business understanding phase
- business/research phase
- clustering analysis
- BIRCH clustering algorithm
- cluster profiles
- cross-validation
- k-means clustering
- data phase
- data preparation phase
- deriving flag variable
- negative amounts
- product uniformity
- standardization
- data understanding phase
- absolute pairwise correlation
- continuous predictors
- dataset, fields
- de-transformation
- lifestyle cluster types
- missing values
- predictors and response
- zip code fields
- deployment phase
- evaluation phase
- modeling and evaluation strategy
- baseline model
- cost-benefit analysis
- high performance model
- input variables
- misclassification cost
- model voting
- processing steps
- profitable classification model
- propensity averaging
- rebalanced data set
- modeling phase
- principal components analysis
- data set partitioning
- input variables
- low communality predictors
- principal component profiles
- rotated component matrix
- cross-validation
- customer service calls (CSC) see polychotomous predictor
- data balancing
- data cleaning
- age field
- American zip code
- data set
- income field
- marital status field
- measures of center
- customer service calls
- measures of location
- measures of spread
- price/earning ratio
- standard deviation
- missing data
- data imputation method
- field values
- frequency distribution
- random values
- replacement values
- variable brand
- outliers
- poverty
- R code
- transaction amount field
- data imputation method
- data preparation
- contingency table
- HighDayEveMins_Flag variable
- voice mail messages
- z-score standardization
- data summarization
- bivariate relationship
- boxplot
- discrete variable
- levels of measurement
- measures of center
- measures of position
- measures of variability
- qualitative/quantitative variable
- data transformation
- binning methods
- categorical variables
- reclassification
- region_num variable
- survey_response variable
- correlated variables
- decimal scaling
- donation_dollar field
- duplicate records
- flag variables
- ID fields
- index field
- min–max normalization
- R code
- unary variables
- Z-score standardization
- inverse_sqrt (weight) transformation
- natural log transformation
- negative standardization
- normal probability plot
- normal Z distribution
- outliers
- positive standardization
- skewness
- square root transformation
- weighted data
- data visualization
- bar chart
- bivariate relationship
- cumulative frequency distribution
- dotplot
- frequency distribution
- histogram
- pie chart
- skewness
- stem-and-leaf display
- data-driven misclassification costs see cost-benefit analysis
- decision tree
- C4.5 algorithm, information-gain
- adult data set
- candidate splits
- capital gains
- categorical variables
- decision node A
- entropy reduction
- initial split
- marital status
- numerical variables
- savings split
- threshold partition
- training data set
- CART (see Classification and regression trees (CART))
- credit risk
- decision rules
- diverse attributes
- R code
- requirements
- dichotomous predictor
- reference cell coding
- voice mail plan
- dimension-reduction method
- applications
- factor analysis (see factor analysis)
- houses data set
- median income
- predictor variables
- multicollinearity
- PCA (see principal components analysis (PCA))
- R code
- user-defined composites
- definition
- houses data set
- measurement error
- summated scales
- direct cost matrix
- distance function
- age variable
- Euclidean distance
- min–max normalization
- properties
- Z-score standardization
- EDA see exploratory data analysis (EDA)
- ensemble methods
- bagging model
- algorithm for
- bootstrap samples
- vs. CART model
- prediction method
- R code
- stable/unstable classification
- bias-variance trade-off
- boosting model
- adaptive boosting (see ADABoost algorithm)sub
- algorithm for
- vs. CART model
- R code
- model voting
- alternative models
- contingency tables
- evaluative measures
- majority classification
- processing steps
- R code
- working test data set
- prediction error
- propensity averaging
- evaluative measures
- histogram model
- m base classifiers
- processing steps
- exploratory data analysis (EDA)
- churn data set (see churn data set)
- data understanding phase
- absolute pairwise correlation
- de-transformation
- predictors and response
- vs. hypothesis testing
- R code
- segmentation modeling
- capital gains/losses
- contingency tables
- overall error rate
- factor analysis model
- adult data set
- Bartlett's test
- correlation matrix
- factor loadings
- KMO statistics
- principal axis
- factor rotation
- oblique rotation method
- orthogonal rotation
- percentage of variance
- rotated vectors
- unrotated vectors
- varimax rotation
- flag variables
- GAs see genetic algorithms (GAs)
- gas mileage prediction
- backward elimination
- best subsets method
- forward selection method
- Mallows' Cp statistics
- predictors
- regression assumptions
- stepwise selection regression
- target variable MPG
- generalized rule induction (GRI) method
- genetic algorithms (GAs)
- crossover operator
- definition
- multi-point crossover
- real-valued data
- uniform crossover
- framework
- mutation operator
- neural networks
- backpropagation
- feed-forward nature
- learning method
- modified discrete crossover
- random shock mutation
- sum of squared errors
- topology and operation
- R code
- selection operator
- Boltzmann selection
- crowding phenomenon
- definition
- elitism
- fitness sharing
- rank selection
- sigma scaling
- tournament ranking
- terminologies
- WEKA
- AttributeSelectiedClassifier
- class distribution
- initial population characteristics
- Preprocess tab
- WrapperSubsetEval evaluation method
- gradient-descent method
- graphical evaluation
- gains charts
- lift chart
- profits charts
- R code
- response charts
- return-on-investment charts
- hierarchical clustering
- agglomerative clustering
- complete-linkage clustering
- divisive clustering methods
- single-linkage clustering
- hypothesis testing
- confidence interval
- criminal trial, outcomes
- null hypothesis
- p-value
- population proportion
- standard error
- treatment
- indicator variable
- cereals, y-intercepts
- estimated nutritional rating
- p-values
- parallel planes
- reference category
- regression coefficient values
- relative estimation error
- shelf effect
- instance-based learning
- issues
- sodium/potassium ratio
- training data points
- voting
- k-means clustering algorithm
- data points
- definition
- MSE
- processing steps
- pseudo-F statistic method
- SAS Enterpriser Miner (see churn data set)
- statistics behavior
- k-nary classification
- accuracy
- contingency table
- Loans data sets
- overall error rate
- predicted/actual categories
- sensitivity
- k-nearest neighbor (KNN) algorithm
- classification
- data set
- income bracket
- ClassifyRisk data set
- combination function
- simple unweighted voting
- weighted voting
- cross-validation approach
- database
- distance function
- age variable
- Euclidean distance
- min–max normalization
- properties
- Z-score standardization
- instance-based learning
- issues
- sodium/potassium ratio
- training data points
- voting
- locally weighted averaging
- modeler's results
- outliers/unusual observations
- R code
- Kaiser–Meyer–Olkin (KMO) statistics
- Kohonen networks
- age and income data set
- algorithm
- CART decision tree model
- cluster profiles
- flag variables
- International Plan adopters
- mean analysis
- numerical variables
- R code
- SOM
- architecture
- characteristic processes
- goal
- networks connection
- topology
- validation
- variables distribution
- VoiceMail Plan adoption
- logistic regression model
- conditional mean
- disease vs. age
- linear regression model
- logit transformation
- maximum-likelihood estimation
- confidence interval
- interpretation
- likelihood ratio test
- log-likelihood estimators
- mean square regression
- negative response
- parameters
- positive response
- saturated model
- Wald test, parameters
- odds ratio (see odds ratio (OR))
- R code
- sigmoidal curve
- training data set
- education variable
- marital status
- WEKA
- explorer panel
- RATING field
- regression coefficients
- test set prediction
- training file
- market basket analysis
- Markov chain Monte Carlo (MCMC) methods
- maximum a posteriori (MAP), churn data set
- complement probabilities
- conditional probability
- International Plan
- joint conditional probabilities
- marginal and conditional probabilities
- posterior probabilities
- Voice Mail Plan
- McKinsey Global Institute (MGI) report
- association task
- classification
- income bracket
- sodium/potassium ratio
- clustering
- continuous quality monitoring
- CRISP-DM
- adaptive process
- business/research phase
- data phase
- deployment phase
- evaluation phase
- modeling phase
- estimation model
- factors
- Forbes magazine
- HMO
- patterns and trends
- prediction
- problem solving, human process
- profitable results
- R code
- software packages
- tools
- mean absolute error (MAE)
- mean square error (MSE)
- mean square treatment (MSTR)
- missing data imputation
- CART model
- data weighting
- flag variable
- multiple regression model
- R code
- SEI formula
- model evaluation techniques
- classification task
- accuracy
- building and data model
- C5.0 model
- contingency table
- cost/benefit analysis
- error rate
- false negative
- false-negative rate
- false-positive
- false-positive rate
- financial lending firm
- gains chart
- income classification
- lift charts
- misclassification cost adjustment
- true negative
- true positive
- description task
- estimation and prediction tasks
- MAE
- MSE
- standard error of the estimate
- R code
- model voting process
- alternative models
- contingency tables
- evaluative measures
- majority classification
- processing steps
- R code
- working test data set
- multicollinearity
- correlation coefficients
- fiber variable
- matrix plot
- potassium variable
- stability coefficient
- user-defined composite
- variable coefficients
- variance inflation factor
- multinomial data
- chi-square test
- expected frequency
- observed frequency
- R code
- test statistics
- multiple regression model
- ANOVA table
- coefficient of determination, R2
- confidence interval
- mean value, y
- particular coefficient, βi
- estimation error
- indicator variable
- cereals, y-intercepts
- estimated nutritional rating
- p-values
- parallel planes
- reference category
- regression coefficient values
- relative estimation error
- shelf effect
- inference
- F-test
- t-test
- multicollinearity
- correlation coefficients
- fiber variable
- matrix plot
- potassium variable
- stability coefficient
- user-defined composite
- variable coefficients
- variance inflation factor
- nutritional rating vs. sugars
- population
- prediction interval
- predictor variables
- principal components
- Box–Cox transformation
- component values
- unrotated and rotated component weights
- varimax-rotated solution
- R code
- regression plane/hyperplane
- slope coefficients
- Spoon Size Shredded Wheat
- SSR
- three-dimensional scatter plot
- variable selection method (see variable selection method)
- Nave Bayes classifier see also Bayesian approach
- conditional independence
- posterior odds ratio
- predictor variables
- WEKA
- ARFF
- conditional probabilities
- Explorer Panel
- load training file
- test set predictions
- zero-frequency cells
- neural network model
- adult data set
- artificial neuron model
- back-propagation algorithm
- cross validation termination
- downstream node
- error propagation
- learning rate
- momentum term
- squared prediction error
- upstream node
- combination function
- data preprocessing
- estimation and prediction
- gradient-descent method
- hidden layer
- input and output encoding
- categorical variables
- dichotomous classification
- drawback
- min–max normalization
- thresholds
- input layer
- output layer
- prediction accuracy
- R code
- real neuron
- sensitivity analysis
- sigmoid function
- neural networks
- backpropagation
- feed-forward nature
- learning method
- modified discrete crossover
- random shock mutation
- sum of squared errors
- topology and operation
- odds ratio (OR)
- assumptions
- capnet variable
- churn overlay
- customer service calls
- continuous predictor (see continuous predictor)
- dichotomous predictor (see dichotomous predictor)
- estrogen replacement therapy
- interpretation
- polychotomous predictor (see polychotomous predictor)
- relative risk
- response variable
- zero-count cell
- overfitting
- complexity model
- provisional model
- partitioning variable
- PCA see Principal components analysis (PCA)
- polychotomous predictor
- confidence interval
- estimated probability
- medium customer service call
- reference cell encoding
- standard error
- Wald test
- principal components analysis (PCA)
- communality
- component matrix
- component size
- component weights
- coordinate system
- correlation coefficient
- correlation matrix
- covariance matrix
- data set partitioning
- eigenvalues
- eigenvectors
- geographical component
- housing median age
- input variables
- linear combination
- low communality predictors
- matrix plot
- median income
- multiple regression analysis
- orthogonal vectors
- principal component profiles
- rotated component matrix
- scree plot
- standard deviation matrix
- validation
- variance proportion
- profits charts
- propensity averaging process
- evaluative measures
- histogram model
- m base classifiers
- processing steps
- pseudo-F statistic method
- clustering model
- distribution
- Iris data set
- R code
- SSB and SSE
- regression modeling
- ANOVA table
- baseline model
- Box–Cox transformation
- cereals data set
- coefficient of determination, r2
- data points
- distance and time estimation
- estimation error
- maximum value
- minimum value
- predicted score column
- prediction error
- predictor and response variables
- predictor information
- residual error
- sample variance
- standard deviation
- sum of squares regression
- sum of squares total
- Cook's distance
- correlation coefficient, r
- confidence interval
- linear correlation
- negative correlation
- positive correlation
- quantitative variables
- dangers of extrapolation
- chocolate frosted sugar bombs
- observed and unobserved points
- policy recommendations
- prediction error
- predictor variable
- end-user
- confidence interval
- prediction interval
- field values
- high leverage point
- characteristics
- distance vs. time
- hard-core orienteer
- mild outlier
- observation
- regression results
- standard error
- inference
- least-squares estimation
- error term
- estimated nutritional rating
- nutritional rating vs. sugar content
- prediction error
- statistics
- sum of squared errors
- y-intercept b0
- linearity transformation
- bulging rule
- log transformation
- point value vs. letter frequency
- response variable
- Scrabble®
- square root transformation
- standardized residual
- normal probability plot
- Anderson–Darling (AD) statistics
- assumptions
- chi-square distribution
- distance vs. time
- horizontal zero line
- normal distribution
- p-value
- Rorschach effect
- uniform distribution
- outliers
- Minitab
- nutritional rating vs. sugars
- positive and negative values
- standardized residuals
- population regression equation
- assumptions
- bivariate observation
- constant variance
- true regression line
- R code
- regression equation
- standard error
- mean square error
- standard deviation, response variable
- sum of squares regression
- sum of squares total
- time and distance calculation
- t-test
- assumptions
- confidence interval
- null hypothesis
- nutritional rating vs. sugar content
- p-value method
- sampling distribution
- response charts
- return-on-investment (ROI) charts
- scatter plot
- segmentation modeling
- clustering analysis
- CART decision trees
- churn proportion
- contingency tables
- international plan people
- no-plan majority
- voice mail plan people
- exploratory analysis
- capital gains/losses
- contingency tables
- overall error rate
- performance enhancement
- processing steps
- R code
- SEI see standard error of the imputation (SEI)
- self-organizing map (SOM)
- architecture
- characteristic processes
- goal
- networks connection
- sigmoid function
- silhouette method
- cohesion/separation
- Iris data set
- mean silhouette
- positive/negative values
- R code
- simplified cost matrix
- squashing function
- standard error of the imputation (SEI)
- statistical inference
- confidence interval
- customer service call
- lower bound
- margin of error
- population proportion
- subgroup analyses
- t-interval
- upper bound
- crystal ball gazers
- definition
- hypothesis testing (see hypothesis testing)
- point estimation
- population parameters
- R code
- sample proportion
- sampling error
- statistical methods
- stem-and-leaf display
- sum of squares between (SSB)
- sum of squares error (SSE)
- sum of squares regression (SSR), multiple regression model
- supervised methods
- target variable
- unsupervised methods
- user-defined composites
- definition
- houses data set
- measurement error
- summated scales
- variable selection method
- all-possible-regression
- backward elimination
- best subsets method
- forward selection
- gas mileage data set (see gas mileage prediction)
- partial F-test
- stepwise regression
- Waikato Environment for Knowledge Analysis (WEKA)
- Bayesian belief networks
- Explorer Panel
- positive and negative classification
- prior probabilities
- test set predictions
- explorer panel
- genetic search algorithm
- AttributeSelectiedClassifier
- class distribution
- initial population characteristics
- Preprocess tab
- WrapperSubsetEval
- Nave Bayes
- ARFF
- conditional probabilities
- Explorer Panel
- load training file
- test set predictions
- RATING field
- regression coefficients
- test set prediction
- training file
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.