[SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]
; (semicolons)
:: notation
.. notation
.() notation
.BY list
.RData
.RDS
.tsv (tab-separated values)
[,] indexing operator, 2nd
[[]] operator, 2nd, 3rd
@ operator, 2nd
& operator, 2nd
&& operator
#' marks
%in% operation
<- operator, 2nd
= operator, 2nd
-> operator, 2nd
| operator, 2nd
|| operator
$ operator
A/B tests
evaluating
Fisher's test for independence
frequentist significance test
setting up
accuracy evaluation measure, 2nd
accuracyMeasures() function
additive processes
adjusted R-squared
AIC (Akaike information criterion)
Akismet, 2nd
all.equal() method
alpha parameter, 2nd, 3rd
Apgar test
Apriori algorithm
apriori() function
arcsinh
area under the curve (AUC), 2nd
arrange() command
arules package
apriori() function
examining data
inspecting and evaluating rules
reading in the data
restricting which items to mine
as.factor() function
as.numeric() command
assign_cluster() function
assignment operators
assignments, left-hand sides of
association rules, 2nd
example problem
mining with arules package
apriori() function
examining data
inspecting and evaluating rules
reading in the data
restricting which items to mine
overview
asw (average silhouette width), 2nd
AUC (area under the curve)
automatic printing
bagging classifiers
bar areas
bar charts
checking distributions for single variables
checking relationships between two variables
with faceting
base error rate
base package
basic analytics
baskets
Batch method
beta regression
betas
between sum of squares (BSS), 2nd, 3rd
bias
bias variance decomposition
bids
bimodal distribution
binary variables
BinaryYScatterPlot function
binomial classification
binomial distribution
bootstrap evaluation
BSS (between sum of squares), 2nd, 3rd
Buzz dataset
buzz scoring
by keyword
C hyperparameter, 2nd
c() operator, 2nd, 3rd
calibration set
Calinski-Harabasz (CH) index, 2nd, 3rd
call-by-value effect
CamelCase
CART trees
categorical variables, 2nd
bar charts for comparing
comparing continuous and
missing data in
using xgboost with
cbind operator, 2nd
cboot$bootmean
cdata::pivot_to_rowrecs()
cdata::unpivot_to_blocks
cdatapivot_to_rowrecs() function
cdataunpivot_to_blocks() function
center attribute, 2nd
centering and scaling
centroid
CH (Calinski-Harabasz) index, 2nd
char.freq.bang
character class
character types, 2nd
Character vector
Characterizing task
chi-squared test
chunks
class of interest
class() method, 2nd
classification, 2nd, 3rd
accuracy
classification problems
confusion matrix
data preparation for
building model
mkCrossFrameCExperiment()
properly using treatment plan
variable score frame
defined
F1
precision and recall
sensitivity and specificity
cleaning data
domain-specific data cleaning
missing values
in categorical variables
in numeric or logical variables
nature of
treating as information
vtreat package for automatic treatment of
ClevelandDotPlot function
client role
cluster analysis, 2nd
assigning new points to clusters
data preparation
distances
cosine similarity
Euclidean distance
Hamming distance
Manhattan (city block) distance
hierarchical clustering with hclust
bootstrap evaluation
picking number of clusters
principal components analysis
k-means algorithm
clusterboot()
kmeans() function
kmeansruns() function
cluster stability
clusterboot() function, 2nd, 3rd
clustering, 2nd, 3rd, 4th
coalescing
Codd-style operators
coding style
coef() function
coefficient of determination
coefficients, 2nd, 3rd, 4th
coefficients(model) function, 2nd
coefs vector
collinearity, 2nd
colnames() command
git help
comma-separated values (CSV)
comments
comment character (#)
writing effective
complete cases
complete.cases() function
Comprehensive R Archive Network (CRAN), 2nd, 3rd, 4th
concatenate operator c()
conditional transforms
confusion matrix, 2nd
continuous histograms
continuous variables, comparing categorical and
coord_flip() function
copy by value semantics
correlation
cos() function
cosine similarity
coverage
CRAN (Comprehensive R Archive Network), 2nd, 3rd, 4th
create_pruned_vocabulary() function, 2nd
CRISP-DM (cross-industry, standard, process, for, data mining)
cross-frame, 2nd
dangers of naively reusing data
safely reusing data
crossFrame element
cross-language linkage method, to deploy models
cryptographic hash, SHA
CSV (comma-separated values)
cumsum()
customer churn, 2nd
customer_data2 dataset
cutree() function
cv.glmnet() function, 2nd
cva.glmnet() function
data architect role
data collection and management stage
data coordinates
data dictionary, 2nd
data directory
data engineering
data selection
ordering rows
removing records with incomplete data
subsetting rows and columns
data transforms
adding new columns
aggregating
multitable transforms
combining data from multiple tables
combining two or more ordered data frames quickly
reshaping transforms
data coordinates
moving data from tall to wide form
moving data from wide to tall form
data frames
data provenance, 2nd
data range problems
data refresh
data science projects
roles in
client
data architect
data scientist
operations
project sponsor
setting expectations for
stages of
data collection and management
defining goal
model deployment and maintenance
model evaluation and critique
modeling
presentation and documentation
data scientists
data selection
ordering rows
removing records with incomplete data
subsetting rows and columns
data shaping, reshaping transforms
data coordinates
moving data from tall to wide form
moving data from wide to tall form
data transformations, 2nd
adding new columns
aggregating
centering and scaling
log transformations
multitable transforms
combining data from multiple tables
combining two or more ordered data frames quickly
normalization
reshaping
data coordinates
moving data from tall to wide form
moving data from wide to tall form
data tubing
data.frame class
data.frame() function
data.frames, 2nd, 3rd, 4th, 5th
data.table by argument
data.table class
adding new columns, 2nd
appending columns
appending rows
combining many rows into summary rows
full join
inner join
left join
ordering rows
removing records with incomplete data
splitting tables
subsetting rows and columns
data.table package, 2nd, 3rd, 4th, 5th, 6th
data.table::melt.data.table()
databases, using with R
running database queries using query generator
thinking relationally about data
datasets package, 2nd
datatable::dcast.data.table()
datatable::melt.data.table()
DBI package
dbplyr package, 2nd
dcast.data.table()
decision surface
decision trees
degrees of freedom
delayed class
delegation, to R
dendrograms
denormalized form
denormalized tables
density estimation
density plots
dependent variables, 2nd
deploying models, 2nd
as HTTP services
by export
using Shiny
derived columns
design*() methods
design_missingness_treatment() function, 2nd, 3rd
designTreatments*() function
designTreatmentsC() function, 2nd, 3rd, 4th
designTreatmentsN() function, 2nd, 3rd, 4th
designTreatmentsZ() function, 2nd, 3rd
deviance evaluation measure
df.null - df.model
dgCMatrix class, 2nd
dim() function, 2nd
disparity in units
dissimilarity
dist() function
distributions
binomial distribution
lognormal distribution
normal distribution
other R tools for
R's distribution naming conventions
documentation
comments
predicting popularity
R markdown
documenting data and producing model
example
purpose of
technical details
version control
to explore projects
to record history
to share work
document-term matrix
domain empathy
domain-specific data cleaning
dot arrow pipe, 2nd
dot notation, 2nd
dot pipe %.>%
dotplots
dot-product similarity
double density plot
double-precision floating-point
dply::bind_rows
dplyr
adding new columns, 2nd
appending columns
appending rows
combining many rows into summary rows
full join
inner join
left join
ordering rows
removing records with incomplete data
splitting tables
subsetting rows and columns
dplyr::bind_cols
dplyr::filter, 2nd
dplyr::full_join
dplyr::group_by
dplyr::select
dplyr::summarize
drop = FALSE argument, 2nd
dtest data frame
dummy variables, 2nd, 3rd
edf (effective degrees of freedom)
effects coding
efficiency, statistical
elastic net, 2nd
end user presentations
showing how model fits user workflow
showing how to use the model
summarizing project goals
end-of-statement markers
enrichment rate
ensemble learning
errors, 2nd
Euclidean distance
evalframe
evaluating models
classification models
accuracy
confusion matrix
F1 score
precision and recall
sensitivity and specificity
measures of model performance
overfitting
K-fold cross-validation
testing on held-out data
probability models
Akaike information criterion
deviance
double density plot
log likelihood
receiver operating characteristic curve
scoring models
root mean square error
R-squared
Excel Spreadsheet (XLS)
exchangeability
experimental design columns
explain() function, 2nd
explainers
explanatory variables, 2nd
explicit dot notation
exploring data for problems
summary statistics
data range
invalid values and outliers
missing values
units
visualization and graphics
checking distributions for single variables
checking relationships between two variables
exporting models
extend() method
eXtensible Markup Language (XML), 2nd
extrapolation
F1 score
facets, 2nd
facet_wrap layer
facet_wrap() command
factor class
factor coding
factor variables
Factor vector
factors, 2nd
false negatives (FN), 2nd
false positive rate, 2nd
false positives (FP)
family function
filled bar charts, 2nd
filter() function
Fisher scoring iterations, 2nd
Fisher's test for independence
Fisher’s exact test
fit_imdb_model() function
fit_iris_example() function, 2nd
fixed-width files (FWF)
floating-point format, 2nd
FN (false negatives), 2nd
forecasting
FP (false positives)
fpc package, 2nd
frequentist significance test
F-statistic
F-test, 2nd, 3rd
full joins
function arguments
FWF (fixed-width files)
gam package
gam() function, 2nd, 3rd, 4th, 5th
GAMs (generalized additive models)
extracting non-linear relationships
one-dimensional regression example
overview
using for logistic regression
using on actual data
gap statistic
Gaussian distributions
generalization error
generalized additive models.
See GAMs.
generalized linear models
geom_hex layer
geom_histogram layer
geom_histogram() command
geom_line layer
geom_point layer, 2nd
geom_smooth function, 2nd
ggplot() function, 2nd
ggplot2 package, 2nd, 3rd, 4th, 5th
ggpubr package
ggstatsplot package
Git
installing, 2nd
starting project using command line
using git diff to compare files from different commits
using git log and git status to view progress
using git log to find last time file was around
using through RStudio
git blame command
git clone command, 2nd
git commit command
git diff command
git help log command
git log command
git pull command
git push command, 2nd
git rebase command
git remote add command
git remote command
git status, 2nd
git tag command
glm() function, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th, 11th
glmnet
elastic net
lasso regression
ridge regression
glmnet method
glmnet package
glmnet::cv.glmnet() function
glmnet::glmnet() function
glmnetUtils package, 2nd, 3rd
goal defining stage
gradient boosting models
gradient-boosted trees, 2nd
gradient boosting for text classification
iris example
using xgboost with categorical variables
GROUP BY queries
group() function
group_by() command, 2nd
grouped data
Hamming distance
hash mark (#)
hashes
hclust() function, 2nd, 3rd
bootstrap evaluation
picking number of clusters
Calinski-Harabasz index
total within sum of squares
principal components analysis
head() command, 2nd, 3rd
help() command, 2nd, 3rd, 4th
help(match) command
help(model_support) command
help(setwd) command, 2nd
hexbin plots, 2nd, 3rd
HexBinPlot function
hierarchical clustering, 2nd
bootstrap evaluation
picking number of clusters
Calinski-Harabasz index
total within sum of squares
principal components analysis
histograms, 2nd
Homebrew
homoscedastic errors
horizontal offset
HTML
HTTP services, deploying models as
hyperparameters, 2nd
hypothesis testing
IDE (integrated development environment)
identical() method
if statements
impact
impact coding
implicit printing
importance() function
imputed value
independent variables, 2nd, 3rd
indicator variables, 2nd, 3rd
infix scalar-valued operators
inner joins
input variables
install.packages() command
integrated development environment (IDE)
interestMeasure() function
intermediate values, organizing
introducing indicators
invalid values
iris dataset, 2nd, 3rd, 4th
itemFrequency() function
items
Jaccard coefficient
join command
JSON (JavaScript Object Notation)
k(,) function
kernel functions, 2nd
defined
support vectors
kernel trick
kernlab library
k-fold cross-validation
k-means algorithm, 2nd
clusterboot()
kmeans() function
kmeansruns() function
k-means clustering, 2nd
kmeans() function, 2nd
kmeansruns() function
knitr, 2nd, 3rd
documenting data and producing model
confirming data provenance
recording performance of naive analysis
using milestones to save time
technical details
block declaration format
chunk options
L1 distance
L1-regularized regression
L2 distance
L2-regularized regression
Laplace smoothing
lasso regression, 2nd
LaTeX
lattice package
layers
lazy copying
LearnR
least squares method
left joins
length() function
length-zero vector
lhs() function
library() command, 2nd, 3rd
library(pkgname) command
lift
LIME (local interpretable model-agnostic explanations)
automated sanity checking
example
for text classification
explaining predictions
representing documents for modeling
training text classifier
how LIME works
lime package, 2nd
LIME variable importances
lime() function
line breaks
line of perfect prediction
line plots, 2nd
line wrapping
linear combination
linear regression
building model
finding relations and extracting advice
making predictions
PUMS dataset
reading model summary and characterizing coefficient quality
coefficients table
original model call
overall model quality summaries
residuals summary
when assumptions of are violated
link function
lists, 2nd
lm() command, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th
load() command
local interpretable model-agnostic explanations.
See LIME.
loess function
log likelihood, 2nd
log transformations
logarithmic scale
logical vectors
logistic regression, 2nd, 3rd, 4th, 5th
building model
finding relations and extracting advice
making predictions
overview
reading model summary and characterizing coefficients
deviance residuals summary
Fisher scoring iterations
original model call
overall model quality summaries
summary coefficients table
using generalized additive models for
logit function
logit link, 2nd
logit space (link space)
logit()
lognormal distribution
lognormally distributed monetary amounts
log-odds, of probabilities
log-scaled density plot
magrittr package
magrittr pipe operator %>%, 2nd, 3rd
Manhattan (city block) distance
mapping problems to machine learning tasks
classification problems
grouping
association rules
clustering
problem-to-method mapping
scoring problems
margins
Markdown
match() method, 2nd
matrices
MAX() function
m-dimensional linear model
mean squared error
measurement types
Mercer’s theorem
method chaining
mgcv package, 2nd
Microsoft Excel workbooks
Microsoft Word
missing values, 2nd
missing-value imputation
mixture of Gaussians
mkCrossFrame*Experiment() method
mkCrossFrame*Experiment() methods, vtreat’s
mkCrossFrame*Experiment()/$crossFrame pattern
mkCrossFrameCExperiment() function, 2nd, 3rd
mkCrossFrameNExperiment() function, 2nd, 3rd, 4th
mkExperiment*() methods
mk_formula() function, 2nd
mlogit package
model deployment and maintenance stage
model evaluation and critique stage
model matrix, 2nd
model object
model performance, determining lower bounds on
model.matrix() function, 2nd, 3rd, 4th
modeling
evaluating models
classification models
measures of model performance
overfitting
probability models
scoring models
local interpretable model-agnostic explanations
automated sanity checking
example
for text classification
how LIME works
mapping problems to machine learning tasks
classification problems
grouping
problem-to-method mapping
scoring problems
sampling for
trade-offs
modeling algorithm
model_ridge$lambda.1se
model_ridge$lambda.min
MongoDB
multicategory classification
multimodal data
multinomial classification
multiple comparison bias
multiple comparison problems
multiplicative process
multitable transforms
combining data from multiple tables
full joins
inner joins
left joins
right joins
rolling joins
combining two or more ordered data frames quickly
appending columns
appending rows
splitting tables
multivariate data matrix
mutable data types
mutate() function, 2nd, 3rd, 4th
NA (not available) values, 2nd, 3rd, 4th
in categorical variables
in numeric or logical variables
nature of
treating as information
vtreat package for automatic treatment of
vtreat variable treatment package
na.locf() function, 2nd
na.omit() function
na_if() function
naked repositories
named arguments
named lists, 2nd
named maps
NaN (not a number)
narrow data ranges
natural key columns
NB (nota bene or note well)
nchar() function
needsSplit
NEGATIVE (non-spam).
See truth mark.
negative coefficients
negative R-squareds
newpt data point
n_features parameter
N-fold cross-validation
no operation (no-op)
non-linear combinations
nonsignaling
--no-restore command-line flag set
normal distribution
normalization (rescaling)
normalized form
not a number (NaN)
not available values.
See NA (not available) values.
nota bene or note well (NB)
nu hyperparameter, 2nd
null deviance
null model, 2nd
NULL values, 2nd, 3rd
NULL vector
Numeric vector
one-hot encoding, 2nd, 3rd
one-versus-rest classifier
operations role
operator, 2nd, 3rd, 4th
order() function
outcome variable
outliers
out-of-bag samples
overfitting, 2nd, 3rd, 4th, 5th
K-fold cross-validation
testing on held-out data
overlaid density plot
package notation
payload columns
PCR (principal components regression)
peer presentations
discussing related work
discussing results and future work
discussing your approach
introducing problem
permTestAUC() function
phi() function, 2nd, 3rd
piped notation
pipe-separated (vertical bar) files
pivoting, 2nd
plot() function
plot(gammodel) function
plot_features() function, 2nd
plot_text_explanations() function
plumber package
PMML (Predictive Model Markup Language)
Poisson regression
POSITIVE (spam).
See truth mark.
prcomp() function
precision, 2nd, 3rd, 4th, 5th
predict() function, 2nd, 3rd, 4th, 5th, 6th, 7th
predictions, 2nd
prepare() method, 2nd, 3rd, 4th, 5th, 6th, 7th
presentations
end user presentations
showing how model fits user workflow
showing how to use the model
summarizing project goals
peer presentations
discussing related work
discussing results and future work
discussing your approach
introducing problem
project sponsor presentations
filling in details
making recommendations and discussing future work
stating project results
summarizing project goals
principal components analysis
principal components regression (PCR)
print() function, 2nd
printing
probability models
Akaike information criterion
deviance
double density plot
log likelihood
receiver operating characteristic curve
problem-to-method mapping
project sponsor presentations
filling in details
making recommendations and discussing future work
stating project results
summarizing project goals
project sponsors
provenance columns
PRTPlot() function
pseudo distance
pseudo R-squared, 2nd
pseudo-random sample
PUMS (Public Use Microdata Sample) data
curating data
examining and conditioning data
factor coding
linear regression
working with
p-value (significance), 2nd, 3rd
quasi-separation
query generators
Quick-R
quotes
R, 2nd, 3rd
installing, 2nd
installing tools
book-support materials
R package system, 2nd
required packages
primary data types
data frames
factors
lists
matrices
NULL and NA
slots
vectors
primary features of
assignment
object system
share-by-value characteristics
vectorized operations
programming basics
assignment operators
comment character
data.frame class
delegating to R
factors
identifiers
left-hand sides of assignments
line breaks
lists
NA value
named arguments
NULL value
organizing intermediate values
package notation
printing
semicolons
value semantics
vectors
relational databases
curating data
examining and conditioning data
factor coding
production-size example
working with
resources for
installing R views
online
structured data
less-structured data
well-structured data
using databases with
running database queries using query generator
thinking relationally about data
R markdown
documenting data and produce model
confirming data provenance
recording performance of naive analysis
using milestones to save time
example
purpose of
technical details
block declaration format
chunk options
RAND command
random forests, 2nd, 3rd
exporting to SQL with tidypredict
variable importance, 2nd
randomForest package
randomForest() function, 2nd
ranking tasks
raw (unscaled) variables
rbind
R-bloggers
Rcpp package, 2nd
RDF triples
reactive programming
read.table() function, 2nd
read.transactions() function
reader package
readr package
readRDS() command
readxl package
rebase
recall, 2nd, 3rd, 4th, 5th, 6th
receiver operating characteristic curve (ROC)
record grouping
rect.hclust() function
reference level, 2nd, 3rd, 4th
reference semantics
regression modeling, data preparation for
regression testing
regression to the mean
regressions, 2nd
regularization, 2nd
example of quasi-separation
types of
elastic net
lasso regression
ridge regression
with glmnet
elastic net solution
lasso regression solution
ridge regression solution
relational databases
curating data
examining and conditioning data
factor coding
production-size example
working with
relations, finding
relevel() function
remote relation
residual deviance
residual standard error
residuals, 2nd
reversion to mediocrity
ridge regression, 2nd
right joins
rm.duplicates = TRUE argument
rm() command
RMSE (root mean square error), 2nd, 3rd, 4th, 5th
ROC (receiver operating characteristic curve)
rolling joins
root mean square error (RMSE), 2nd, 3rd, 4th, 5th
rownames() function
rows
roxygen2 R package
rquery package, 2nd, 3rd, 4th
R-squared, 2nd
adjusted
multiple
pseudo R-squared
RStudio, 2nd, 3rd, 4th
community for
installing, 2nd
using Git through
RStudio Desktop
RTools
runif() function
S programming language
s() function, 2nd, 3rd
sample_frac() function
sample_n() function
sampling
creating sample group columns
data provenance
record grouping
splitting data into training and test sets
save() command
saveRDS() function
scale attribute, 2nd
scale() function, 2nd, 3rd
scaling
scatter plots, 2nd, 3rd
schema documentation
scikit-learn, Python
score frame
scoring, 2nd
scoring models
root mean square error
R-squared
scoring problems
scoring residuals
sdata data frame, 2nd
se = FALSE argument
sensitivity, 2nd
sentinel values, 2nd
sepal measurements
separable data
separation
Services method
sessionInfo() command
set.seed() command
setorderv() function
setosa, 2nd
setwd() function, 2nd
shadow graphs
shadow histograms
shadow plots, 2nd
ShadowHist() function
ShadowPlot command
Shiny tool
side-by-side bar charts, 2nd
sigmoid function
signed logarithm
significance
lack of
testing
vs. goodness of fit
sigr package, 2nd
sigr::wrapFTest() function
silhouette clustering
similarity
sin() function
single-variable models
slotNames()
slots
smoothing
smoothing curves, 2nd
SOA (services oriented architecture)
soft margin
sort() function
spam filters
spam model
spam proportion
sparse matrix
specificity, 2nd
split() function
sponsor sign-off
SQL (Structured Query Language), 2nd
SQL WHEN clause
sqldf package
sqr_edist() function
Stack Overflow R section
stack() notation
stacked bar charts, 2nd, 3rd
standard error ribbon
statistical efficiency
statistical theory
A/B tests
evaluating
setting up
power of tests
specialized statistical tests
statistical philosophy
bias variance decomposition
exchangeability
statistical efficiency
statistical view of data, examples of
omitted variable bias
example of
overview
spoiled analysis
working around
sampling bias
statistically efficient estimators
statistically significant thresholds
stats package, 2nd, 3rd, 4th
str() function, 2nd, 3rd
str(dTrain) command
stringsAsFactors = FALSE argument
string-valued (categorical) variable
structured data
less-structured tabular data
examining
transforming
well-structured data in comma-separated values format
examining
loading
well-structured data in other data formats
structured values
subset() function, 2nd
sum(error_sq) (sum squared error)
summarize() function
summary statistics, 2nd
typical problems revealed by
data range issues
invalid values
missing values
outliers
units
summary() function, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th
summary(customer_data$marital.stat) command
summary(dpus) command
summary(dpus$COW) command
summary(dTrain) command
summary(model) command, 2nd, 3rd, 4th
summary(model)$coefficients
supervised learning
support vectors
Surrogate key columns
SVMs (support vector machines), 2nd, 3rd, 4th
kernel functions
defined
support vectors
overview
problem solving with
spiral example
with good kernels
with oversimple kernels
symbol names (identifiers)
syntax error
synthetic data points
table() command
tab-separated values (.tsv)
tall data form
tapply() command
target class
test (holdout) set, 2nd
test data
text classification
gradient boosting for
local interpretable model-agnostic explanations for
explaining predictions
representing documents for modeling
training text classifier
text2vec package
the kernel trick
The R Foundation
theta angle
thin data form
tidypredict package
tidyr solution
tidyrgather() function
tidyrspread() function
Times attribute
title column
TN (true negatives)
token column
topics
total variance
TP (true positives)
trades
training error
training set, 2nd
training_prepared data frame
train_treated
transactions object, 2nd
transform() function, 2nd
treat package for automatic treatment of
treatment plans, 2nd, 3rd, 4th
tree-based methods, 2nd
bagging
basic decision tree
gradient-boosted trees
gradient boosting for text classification
iris example
using xgboost with categorical variables
random forests
true negative rate
true outcome
true positive rate
truth mark
TSS (total sum of squares)
TSV (tab-separated values)
two-category classification
two-dimensional histograms
typeof() command
unbiased predictors
unconditioned transform
underscore notation
underscore style
ungroup() function
ungrouped data
unimodal distribution
units
cluster analysis
unit problems
unstack() notation
unsupervised learning
unsupervised methods
association rules
example problem
mining with arules package
overview
cluster analysis
assigning new points to clusters
data preparation
distances
hierarchical clustering
k-means algorithm
unsystematic errors
utils package, 2nd
utils::read.table() command
validation, sampling for
value semantics, 2nd
value.var argument
values
values in categorical variables
values in numeric or logical variables
values nature of
value_variables_C() method, 2nd
varImpPlot() function
vectorized operations
vectors, 2nd
version control
to explore projects
finding out who wrote what and when
using git diff to compare files from different commits
using git log to find last time file was around
to record history
choosing project directory structure
starting Git project using command line
using add/commit pairs to checkpoint work
using git log and git status to view progress
using Git through RStudio
to share work
setting up remote repository relations
using push and pull to synchronize work with remote repositories
versioning
vertical offset
View() command, 2nd
visualization and graphics
checking distributions for single variables
bar charts
density plots
dotplots
histograms
checking relationships between two variables
bar charts for two categorical variables
comparing continuous and categorical variables
hexbin plots
line plots
scatter plots and smoothing curves
overview
vtreat variable treatment package, 2nd
cross-frame
dangers of naively reusing data
safely reusing data
data preparation for classification
building model
properly using treatment plan
variable score frame
data preparation for regression modeling
dataset
bull-in-the-china-shop approach
characterizing outcome
impact coding
indicator variables
missing values
phases of
purpose of
treatment plan
with() function, 2nd
wrapr package, 2nd
wrapr pipe
wrapr::orderv()
WSS (within sum of squares), 2nd, 3rd
WVPlots library, 2nd
WVPlots package, 2nd
xcenter attribute
Xcode tools
xgb.cv() function, 2nd
xgboost package, 2nd, 3rd, 4th, 5th, 6th
xgboost() function, 2nd, 3rd
XLS (Excel Spreadsheet)
XLSX
XML (eXtensible Markup Language), 2nd
xscale attribute
YAML (yet another markup language)