Index

[SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]

SYMBOL

; (semicolons)
:: notation
.. notation
.() notation
.BY list
.RData
.RDS
.tsv (tab-separated values)
[,] indexing operator2nd
[[]] operator2nd3rd
@ operator2nd
& operator2nd
&& operator
#' marks
%in% operation
<- operator2nd
= operator2nd
-> operator2nd
| operator2nd
|| operator
$ operator

A

A/B tests
  evaluating
    Fisher's test for independence
    frequentist significance test
  setting up
accuracy evaluation measure2nd
accuracyMeasures() function
additive processes
adjusted R-squared
AIC (Akaike information criterion)
Akismet2nd
all.equal() method
alpha parameter2nd3rd
Apgar test
Apriori algorithm
apriori() function
arcsinh
area under the curve (AUC)2nd
arrange() command
arules package
  apriori() function
  examining data
  inspecting and evaluating rules
  reading in the data
  restricting which items to mine
as.factor() function
as.numeric() command
assign_cluster() function
assignment operators
assignments, left-hand sides of
association rules2nd
  example problem
  mining with arules package
    apriori() function
    examining data
    inspecting and evaluating rules
    reading in the data
    restricting which items to mine
  overview
asw (average silhouette width)2nd
AUC (area under the curve)
automatic printing

B

bagging classifiers
bar areas

bar charts
  checking distributions for single variables
  checking relationships between two variables
  with faceting
base error rate
base package
basic analytics
baskets
Batch method
beta regression
betas
between sum of squares (BSS)2nd3rd
bias
bias variance decomposition
bids
bimodal distribution
binary variables
BinaryYScatterPlot function
binomial classification
binomial distribution
bootstrap evaluation
BSS (between sum of squares)2nd3rd
Buzz dataset
buzz scoring
by keyword

C

C hyperparameter2nd
c() operator2nd3rd
calibration set
Calinski-Harabasz (CH) index2nd3rd
call-by-value effect
CamelCase
CART trees
categorical variables2nd
  bar charts for comparing
  comparing continuous and
  missing data in
  using xgboost with
cbind operator2nd
cboot$bootmean
cdata::pivot_to_rowrecs()
cdata::unpivot_to_blocks
cdatapivot_to_rowrecs() function
cdataunpivot_to_blocks() function
center attribute2nd
centering and scaling
centroid
CH (Calinski-Harabasz) index2nd
char.freq.bang
character class
character types2nd
Character vector
Characterizing task
chi-squared test
chunks
class of interest
class() method2nd
classification2nd3rd
  accuracy
  classification problems
  confusion matrix
  data preparation for
    building model
    mkCrossFrameCExperiment()
    properly using treatment plan
    variable score frame
  defined
  F1
  precision and recall
  sensitivity and specificity
cleaning data
  domain-specific data cleaning
  missing values
    in categorical variables
    in numeric or logical variables
    nature of
    treating as information
    vtreat package for automatic treatment of
ClevelandDotPlot function
client role
cluster analysis2nd
  assigning new points to clusters
  data preparation
  distances
    cosine similarity
    Euclidean distance
    Hamming distance
    Manhattan (city block) distance
  hierarchical clustering with hclust
    bootstrap evaluation
    picking number of clusters
    principal components analysis
  k-means algorithm
    clusterboot()
    kmeans() function
    kmeansruns() function
cluster stability
clusterboot() function2nd3rd
clustering2nd3rd4th
coalescing
Codd-style operators
coding style
coef() function
coefficient of determination
coefficients2nd3rd4th
coefficients(model) function2nd
coefs vector
collinearity2nd
colnames() command
git help
comma-separated values (CSV)

comments
  comment character (#)
  writing effective
complete cases
complete.cases() function
Comprehensive R Archive Network (CRAN)2nd3rd4th
concatenate operator c()
conditional transforms
confusion matrix2nd
continuous histograms
continuous variables, comparing categorical and
coord_flip() function
copy by value semantics
correlation
cos() function
cosine similarity
coverage
CRAN (Comprehensive R Archive Network)2nd3rd4th
create_pruned_vocabulary() function2nd
CRISP-DM (cross-industry, standard, process, for, data mining)
cross-frame2nd
  dangers of naively reusing data
  safely reusing data
crossFrame element
cross-language linkage method, to deploy models
cryptographic hash, SHA
CSV (comma-separated values)
cumsum()
customer churn2nd
customer_data2 dataset
cutree() function
cv.glmnet() function2nd
cva.glmnet() function

D

data architect role
data collection and management stage
data coordinates
data dictionary2nd
data directory

data engineering
  data selection
    ordering rows
    removing records with incomplete data
    subsetting rows and columns
  data transforms
    adding new columns
    aggregating
  multitable transforms
    combining data from multiple tables
    combining two or more ordered data frames quickly
  reshaping transforms
    data coordinates
    moving data from tall to wide form
    moving data from wide to tall form
data frames
data provenance2nd
data range problems
data refresh

data science projects
  roles in
    client
    data architect
    data scientist
    operations
    project sponsor
  setting expectations for
  stages of
    data collection and management
    defining goal
    model deployment and maintenance
    model evaluation and critique
    modeling
    presentation and documentation
data scientists
data selection
  ordering rows
  removing records with incomplete data
  subsetting rows and columns
data shaping, reshaping transforms
  data coordinates
  moving data from tall to wide form
  moving data from wide to tall form
data transformations2nd
  adding new columns
  aggregating
  centering and scaling
  log transformations
  multitable transforms
    combining data from multiple tables
    combining two or more ordered data frames quickly
  normalization
  reshaping
    data coordinates
    moving data from tall to wide form
    moving data from wide to tall form
data tubing
data.frame class
data.frame() function
data.frames2nd3rd4th5th
data.table by argument

data.table class
  adding new columns2nd
  appending columns
  appending rows
  combining many rows into summary rows
  full join
  inner join
  left join
  ordering rows
  removing records with incomplete data
  splitting tables
  subsetting rows and columns
data.table package2nd3rd4th5th6th
data.table::melt.data.table()
databases, using with R
  running database queries using query generator
  thinking relationally about data
datasets package2nd
datatable::dcast.data.table()
datatable::melt.data.table()
DBI package
dbplyr package2nd
dcast.data.table()
decision surface
decision trees
degrees of freedom
delayed class
delegation, to R
dendrograms
denormalized form
denormalized tables
density estimation
density plots
dependent variables2nd
deploying models2nd
  as HTTP services
  by export
  using Shiny
derived columns
design*() methods
design_missingness_treatment() function2nd3rd
designTreatments*() function
designTreatmentsC() function2nd3rd4th
designTreatmentsN() function2nd3rd4th
designTreatmentsZ() function2nd3rd
deviance evaluation measure
df.null - df.model
dgCMatrix class2nd
dim() function2nd
disparity in units
dissimilarity
dist() function
distributions
  binomial distribution
  lognormal distribution
  normal distribution
  other R tools for
  R's distribution naming conventions
documentation
  comments
  predicting popularity
  R markdown
    documenting data and producing model
    example
    purpose of
    technical details
  version control
    to explore projects
    to record history
    to share work
document-term matrix
domain empathy
domain-specific data cleaning
dot arrow pipe2nd
dot notation2nd
dot pipe %.>%
dotplots
dot-product similarity
double density plot
double-precision floating-point
dply::bind_rows

dplyr
  adding new columns2nd
  appending columns
  appending rows
  combining many rows into summary rows
  full join
  inner join
  left join
  ordering rows
  removing records with incomplete data
  splitting tables
  subsetting rows and columns
dplyr::bind_cols
dplyr::filter2nd
dplyr::full_join
dplyr::group_by
dplyr::select
dplyr::summarize
drop = FALSE argument2nd
dtest data frame
dummy variables2nd3rd

E

edf (effective degrees of freedom)
effects coding
efficiency, statistical
elastic net2nd
end user presentations
  showing how model fits user workflow
  showing how to use the model
  summarizing project goals
end-of-statement markers
enrichment rate
ensemble learning
errors2nd
Euclidean distance
evalframe
evaluating models
  classification models
    accuracy
    confusion matrix
    F1 score
    precision and recall
    sensitivity and specificity
  measures of model performance
  overfitting
    K-fold cross-validation
    testing on held-out data
  probability models
    Akaike information criterion
    deviance
    double density plot
    log likelihood
    receiver operating characteristic curve
  scoring models
    root mean square error
    R-squared
Excel Spreadsheet (XLS)
exchangeability
experimental design columns
explain() function2nd
explainers
explanatory variables2nd
explicit dot notation

exploring data for problems
  summary statistics
    data range
    invalid values and outliers
    missing values
    units
  visualization and graphics
    checking distributions for single variables
    checking relationships between two variables
exporting models
extend() method
eXtensible Markup Language (XML)2nd
extrapolation

F

F1 score
facets2nd
facet_wrap layer
facet_wrap() command
factor class
factor coding
factor variables
Factor vector
factors2nd
false negatives (FN)2nd
false positive rate2nd
false positives (FP)
family function
filled bar charts2nd
filter() function
Fisher scoring iterations2nd
Fisher's test for independence
Fisher’s exact test
fit_imdb_model() function
fit_iris_example() function2nd
fixed-width files (FWF)
floating-point format2nd
FN (false negatives)2nd
forecasting
FP (false positives)
fpc package2nd
frequentist significance test
F-statistic
F-test2nd3rd
full joins
function arguments
FWF (fixed-width files)

G

gam package
gam() function2nd3rd4th5th
GAMs (generalized additive models)
  extracting non-linear relationships
  one-dimensional regression example
  overview
  using for logistic regression
  using on actual data
gap statistic
Gaussian distributions
generalization error
generalized additive models.
    See GAMs.
generalized linear models
geom_hex layer
geom_histogram layer
geom_histogram() command
geom_line layer
geom_point layer2nd
geom_smooth function2nd
ggplot() function2nd
ggplot2 package2nd3rd4th5th
ggpubr package
ggstatsplot package
Git
  installing2nd
  starting project using command line
  using git diff to compare files from different commits
  using git log and git status to view progress
  using git log to find last time file was around
  using through RStudio
git blame command
git clone command2nd
git commit command
git diff command
git help log command
git log command
git pull command
git push command2nd
git rebase command
git remote add command
git remote command
git status2nd
git tag command
glm() function2nd3rd4th5th6th7th8th9th10th11th
glmnet
  elastic net
  lasso regression
  ridge regression
glmnet method
glmnet package
glmnet::cv.glmnet() function
glmnet::glmnet() function
glmnetUtils package2nd3rd
goal defining stage
gradient boosting models
gradient-boosted trees2nd
  gradient boosting for text classification
  iris example
  using xgboost with categorical variables
GROUP BY queries
group() function
group_by() command2nd
grouped data

H

Hamming distance
hash mark (#)
hashes
hclust() function2nd3rd
  bootstrap evaluation
  picking number of clusters
    Calinski-Harabasz index
    total within sum of squares
  principal components analysis
head() command2nd3rd
help() command2nd3rd4th
help(match) command
help(model_support) command
help(setwd) command2nd
hexbin plots2nd3rd
HexBinPlot function
hierarchical clustering2nd
  bootstrap evaluation
  picking number of clusters
    Calinski-Harabasz index
    total within sum of squares
  principal components analysis
histograms2nd
Homebrew
homoscedastic errors
horizontal offset
HTML
HTTP services, deploying models as
hyperparameters2nd
hypothesis testing

I

IDE (integrated development environment)
identical() method
if statements
impact
impact coding
implicit printing
importance() function
imputed value
independent variables2nd3rd
indicator variables2nd3rd
infix scalar-valued operators
inner joins
input variables
install.packages() command
integrated development environment (IDE)
interestMeasure() function
intermediate values, organizing
introducing indicators
invalid values
iris dataset2nd3rd4th
itemFrequency() function
items

J

Jaccard coefficient
join command
JSON (JavaScript Object Notation)

K

k(,) function
kernel functions2nd
  defined
  support vectors
kernel trick
kernlab library
k-fold cross-validation
k-means algorithm2nd
  clusterboot()
  kmeans() function
  kmeansruns() function
k-means clustering2nd
kmeans() function2nd
kmeansruns() function
knitr2nd3rd
  documenting data and producing model
    confirming data provenance
    recording performance of naive analysis
    using milestones to save time
  technical details
    block declaration format
    chunk options

L

L1 distance
L1-regularized regression
L2 distance
L2-regularized regression
Laplace smoothing
lasso regression2nd
LaTeX
lattice package
layers
lazy copying
LearnR
least squares method
left joins
length() function
length-zero vector
lhs() function
library() command2nd3rd
library(pkgname) command
lift
LIME (local interpretable model-agnostic explanations)
  automated sanity checking
  example
  for text classification
    explaining predictions
    representing documents for modeling
    training text classifier
  how LIME works
lime package2nd
LIME variable importances
lime() function
line breaks
line of perfect prediction
line plots2nd
line wrapping
linear combination
linear regression
  building model
  finding relations and extracting advice
  making predictions
  PUMS dataset
  reading model summary and characterizing coefficient quality
    coefficients table
    original model call
    overall model quality summaries
    residuals summary
  when assumptions of are violated
link function
lists2nd
lm() command2nd3rd4th5th6th7th8th
load() command
local interpretable model-agnostic explanations.
    See LIME.
loess function
log likelihood2nd
log transformations
logarithmic scale
logical vectors
logistic regression2nd3rd4th5th
  building model
  finding relations and extracting advice
  making predictions
  overview
  reading model summary and characterizing coefficients
    deviance residuals summary
    Fisher scoring iterations
    original model call
    overall model quality summaries
    summary coefficients table
  using generalized additive models for
logit function
logit link2nd
logit space (link space)
logit()
lognormal distribution
lognormally distributed monetary amounts
log-odds, of probabilities
log-scaled density plot

M

magrittr package
magrittr pipe operator %>%2nd3rd
Manhattan (city block) distance
mapping problems to machine learning tasks
  classification problems
  grouping
    association rules
    clustering
  problem-to-method mapping
  scoring problems
margins
Markdown
match() method2nd
matrices
MAX() function
m-dimensional linear model
mean squared error
measurement types
Mercer’s theorem
method chaining
mgcv package2nd
Microsoft Excel workbooks
Microsoft Word
missing values2nd
missing-value imputation
mixture of Gaussians
mkCrossFrame*Experiment() method
mkCrossFrame*Experiment() methods, vtreat’s
mkCrossFrame*Experiment()/$crossFrame pattern
mkCrossFrameCExperiment() function2nd3rd
mkCrossFrameNExperiment() function2nd3rd4th
mkExperiment*() methods
mk_formula() function2nd
mlogit package
model deployment and maintenance stage
model evaluation and critique stage
model matrix2nd
model object
model performance, determining lower bounds on
model.matrix() function2nd3rd4th
modeling
  evaluating models
    classification models
    measures of model performance
    overfitting
    probability models
    scoring models
  local interpretable model-agnostic explanations
    automated sanity checking
    example
    for text classification
    how LIME works
  mapping problems to machine learning tasks
    classification problems
    grouping
    problem-to-method mapping
    scoring problems
  sampling for
  trade-offs
modeling algorithm
model_ridge$lambda.1se
model_ridge$lambda.min
MongoDB
multicategory classification
multimodal data
multinomial classification
multiple comparison bias
multiple comparison problems
multiplicative process
multitable transforms
  combining data from multiple tables
    full joins
    inner joins
    left joins
    right joins
    rolling joins
  combining two or more ordered data frames quickly
    appending columns
    appending rows
    splitting tables
multivariate data matrix
mutable data types
mutate() function2nd3rd4th

N

NA (not available) values2nd3rd4th
  in categorical variables
  in numeric or logical variables
  nature of
  treating as information
  vtreat package for automatic treatment of
  vtreat variable treatment package
na.locf() function2nd
na.omit() function
na_if() function
naked repositories
named arguments
named lists2nd
named maps
NaN (not a number)
narrow data ranges
natural key columns
NB (nota bene or note well)
nchar() function
needsSplit
NEGATIVE (non-spam).
    See truth mark.
negative coefficients
negative R-squareds
newpt data point
n_features parameter
N-fold cross-validation
no operation (no-op)
non-linear combinations
nonsignaling
--no-restore command-line flag set
normal distribution
normalization (rescaling)
normalized form
not a number (NaN)
not available values.
    See NA (not available) values.
nota bene or note well (NB)
nu hyperparameter2nd
null deviance
null model2nd
NULL values2nd3rd
NULL vector
Numeric vector

O

one-hot encoding2nd3rd
one-versus-rest classifier
operations role
operator2nd3rd4th
order() function
outcome variable
outliers
out-of-bag samples
overfitting2nd3rd4th5th
  K-fold cross-validation
  testing on held-out data
overlaid density plot

P

package notation
payload columns
PCR (principal components regression)
peer presentations
  discussing related work
  discussing results and future work
  discussing your approach
  introducing problem
permTestAUC() function
phi() function2nd3rd
piped notation
pipe-separated (vertical bar) files
pivoting2nd
plot() function
plot(gammodel) function
plot_features() function2nd
plot_text_explanations() function
plumber package
PMML (Predictive Model Markup Language)
Poisson regression
POSITIVE (spam).
    See truth mark.
prcomp() function
precision2nd3rd4th5th
predict() function2nd3rd4th5th6th7th
predictions2nd
prepare() method2nd3rd4th5th6th7th
presentations
  end user presentations
    showing how model fits user workflow
    showing how to use the model
    summarizing project goals
  peer presentations
    discussing related work
    discussing results and future work
    discussing your approach
    introducing problem
  project sponsor presentations
    filling in details
    making recommendations and discussing future work
    stating project results
    summarizing project goals
principal components analysis
principal components regression (PCR)
print() function2nd
printing
probability models
  Akaike information criterion
  deviance
  double density plot
  log likelihood
  receiver operating characteristic curve
problem-to-method mapping
project sponsor presentations
  filling in details
  making recommendations and discussing future work
  stating project results
  summarizing project goals
project sponsors
provenance columns
PRTPlot() function
pseudo distance
pseudo R-squared2nd
pseudo-random sample
PUMS (Public Use Microdata Sample) data
  curating data
  examining and conditioning data
  factor coding
  linear regression
  working with
p-value (significance)2nd3rd

Q

quasi-separation
query generators
Quick-R
quotes

R

R2nd3rd
  installing2nd
  installing tools
    book-support materials
    R package system, 2nd
    required packages
  primary data types
    data frames
    factors
    lists
    matrices
    NULL and NA
    slots
    vectors
  primary features of
    assignment
    object system
    share-by-value characteristics
    vectorized operations
  programming basics
    assignment operators
    comment character
    data.frame class
    delegating to R
    factors
    identifiers
    left-hand sides of assignments
    line breaks
    lists
    NA value
    named arguments
    NULL value
    organizing intermediate values
    package notation
    printing
    semicolons
    value semantics
    vectors
  relational databases
    curating data
    examining and conditioning data
    factor coding
    production-size example
    working with
  resources for
    installing R views
    online
  structured data
    less-structured data
    well-structured data
  using databases with
    running database queries using query generator
    thinking relationally about data
R markdown
  documenting data and produce model
    confirming data provenance
    recording performance of naive analysis
    using milestones to save time
  example
  purpose of
  technical details
    block declaration format
    chunk options
RAND command
random forests2nd3rd
  exporting to SQL with tidypredict
  variable importance2nd
randomForest package
randomForest() function2nd
ranking tasks
raw (unscaled) variables
rbind
R-bloggers
Rcpp package2nd
RDF triples
reactive programming
read.table() function2nd
read.transactions() function
reader package
readr package
readRDS() command
readxl package
rebase
recall2nd3rd4th5th6th
receiver operating characteristic curve (ROC)
record grouping
rect.hclust() function
reference level2nd3rd4th
reference semantics
regression modeling, data preparation for
regression testing
regression to the mean
regressions2nd
regularization2nd
  example of quasi-separation
  types of
    elastic net
    lasso regression
    ridge regression
  with glmnet
    elastic net solution
    lasso regression solution
    ridge regression solution
relational databases
  curating data
  examining and conditioning data
  factor coding
  production-size example
  working with
relations, finding
relevel() function
remote relation
residual deviance
residual standard error
residuals2nd
reversion to mediocrity
ridge regression2nd
right joins
rm.duplicates = TRUE argument
rm() command
RMSE (root mean square error)2nd3rd4th5th
ROC (receiver operating characteristic curve)
rolling joins
root mean square error (RMSE)2nd3rd4th5th
rownames() function
rows
roxygen2 R package
rquery package2nd3rd4th
R-squared2nd
  adjusted
  multiple
  pseudo R-squared
RStudio2nd3rd4th
  community for
  installing2nd
  using Git through
RStudio Desktop
RTools
runif() function

S

S programming language
s() function2nd3rd
sample_frac() function
sample_n() function
sampling
  creating sample group columns
  data provenance
  record grouping
  splitting data into training and test sets
save() command
saveRDS() function
scale attribute2nd
scale() function2nd3rd
scaling
scatter plots2nd3rd
schema documentation
scikit-learn, Python
score frame
scoring2nd
scoring models
  root mean square error
  R-squared
scoring problems
scoring residuals
sdata data frame2nd
se = FALSE argument
sensitivity2nd
sentinel values2nd
sepal measurements
separable data
separation
Services method
sessionInfo() command
set.seed() command
setorderv() function
setosa2nd
setwd() function2nd
shadow graphs
shadow histograms
shadow plots2nd
ShadowHist() function
ShadowPlot command
Shiny tool
side-by-side bar charts2nd
sigmoid function
signed logarithm
significance
  lack of
  testing
  vs. goodness of fit
sigr package2nd
sigr::wrapFTest() function
silhouette clustering
similarity
sin() function
single-variable models
slotNames()
slots
smoothing
smoothing curves2nd
SOA (services oriented architecture)
soft margin
sort() function
spam filters
spam model
spam proportion
sparse matrix
specificity2nd
split() function
sponsor sign-off
SQL (Structured Query Language)2nd
SQL WHEN clause
sqldf package
sqr_edist() function
Stack Overflow R section
stack() notation
stacked bar charts2nd3rd
standard error ribbon
statistical efficiency
statistical theory
  A/B tests
    evaluating
    setting up
  power of tests
  specialized statistical tests
  statistical philosophy
    bias variance decomposition
    exchangeability
    statistical efficiency

statistical view of data, examples of
  omitted variable bias
    example of
    overview
    spoiled analysis
    working around
  sampling bias
statistically efficient estimators
statistically significant thresholds
stats package2nd3rd4th
str() function2nd3rd
str(dTrain) command
stringsAsFactors = FALSE argument
string-valued (categorical) variable
structured data
  less-structured tabular data
    examining
    transforming
  well-structured data in comma-separated values format
    examining
    loading
  well-structured data in other data formats
structured values
subset() function2nd
sum(error_sq) (sum squared error)
summarize() function
summary statistics2nd
  typical problems revealed by
    data range issues
    invalid values
    missing values
    outliers
    units
summary() function2nd3rd4th5th6th7th8th
summary(customer_data$marital.stat) command
summary(dpus) command
summary(dpus$COW) command
summary(dTrain) command
summary(model) command2nd3rd4th
summary(model)$coefficients
supervised learning
support vectors
Surrogate key columns
SVMs (support vector machines)2nd3rd4th
  kernel functions
    defined
    support vectors
  overview
  problem solving with
    spiral example
    with good kernels
    with oversimple kernels
symbol names (identifiers)
syntax error
synthetic data points

T

table() command
tab-separated values (.tsv)
tall data form
tapply() command
target class
test (holdout) set2nd
test data

text classification
  gradient boosting for
  local interpretable model-agnostic explanations for
    explaining predictions
    representing documents for modeling
    training text classifier
text2vec package
the kernel trick
The R Foundation
theta angle
thin data form
tidypredict package
tidyr solution
tidyrgather() function
tidyrspread() function
Times attribute
title column
TN (true negatives)
token column
topics
total variance
TP (true positives)
trades
training error
training set2nd
training_prepared data frame
train_treated
transactions object2nd
transform() function2nd
treat package for automatic treatment of
treatment plans2nd3rd4th
tree-based methods2nd
  bagging
  basic decision tree
  gradient-boosted trees
    gradient boosting for text classification
    iris example
    using xgboost with categorical variables
  random forests
true negative rate
true outcome
true positive rate
truth mark
TSS (total sum of squares)
TSV (tab-separated values)
two-category classification
two-dimensional histograms
typeof() command

U

unbiased predictors
unconditioned transform
underscore notation
underscore style
ungroup() function
ungrouped data
unimodal distribution

units
  cluster analysis
  unit problems
unstack() notation
unsupervised learning

unsupervised methods
  association rules
    example problem
    mining with arules package
    overview
  cluster analysis
    assigning new points to clusters
    data preparation
    distances
    hierarchical clustering
    k-means algorithm
unsystematic errors
utils package2nd
utils::read.table() command

V

validation, sampling for
value semantics2nd
value.var argument
values
values in categorical variables
values in numeric or logical variables
values nature of
value_variables_C() method2nd
varImpPlot() function
vectorized operations
vectors2nd
version control
  to explore projects
    finding out who wrote what and when
    using git diff to compare files from different commits
    using git log to find last time file was around
  to record history
    choosing project directory structure
    starting Git project using command line
    using add/commit pairs to checkpoint work
    using git log and git status to view progress
    using Git through RStudio
  to share work
    setting up remote repository relations
    using push and pull to synchronize work with remote repositories
versioning
vertical offset
View() command2nd
visualization and graphics
  checking distributions for single variables
    bar charts
    density plots
    dotplots
    histograms
  checking relationships between two variables
    bar charts for two categorical variables
    comparing continuous and categorical variables
    hexbin plots
    line plots
    scatter plots and smoothing curves
  overview
vtreat variable treatment package2nd
  cross-frame
    dangers of naively reusing data
    safely reusing data
  data preparation for classification
    building model
    properly using treatment plan
    variable score frame
  data preparation for regression modeling
  dataset
    bull-in-the-china-shop approach
    characterizing outcome
  impact coding
  indicator variables
  missing values
  phases of
  purpose of
  treatment plan

W

with() function2nd
wrapr package2nd
wrapr pipe
wrapr::orderv()
WSS (within sum of squares)2nd3rd
WVPlots library2nd
WVPlots package2nd

X

xcenter attribute
Xcode tools
xgb.cv() function2nd
xgboost package2nd3rd4th5th6th
xgboost() function2nd3rd
XLS (Excel Spreadsheet)
XLSX
XML (eXtensible Markup Language)2nd
xscale attribute

Y

YAML (yet another markup language)

Z

zeallot package2nd
zoo package

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset