1. Introduction to data science
Chapter 1. The data science process
1.1. The roles in a data science project
1.2. Stages of a data science project
1.2.2. Data collection and management
1.2.4. Model evaluation and critique
Chapter 2. Starting with R and data
2.2. Working with data from files
2.3. Working with relational databases
3.1. Using summary statistics to spot problems
3.2. Spotting problems using graphics and visualization
3.2.1. Visually checking distributions for a single variable
3.2.2. Visually checking relationships between two variables
4.1.1. Domain-specific data cleaning
4.1.2. Treating missing values
4.1.3. The vtreat package for automatically treating missing variables
4.2.3. Log transformations for skewed and wide distributions
4.3. Sampling for modeling and validation
4.3.1. Test and training splits
Chapter 5. Data engineering and data shaping
5.1.1. Subsetting rows and columns
5.4. Multitable data transforms
5.4.1. Combining two or more ordered data frames quickly
5.4.2. Principal methods to combine data from multiple tables
5.5.1. Moving data from wide to tall form
Chapter 6. Choosing and evaluating models
6.1. Mapping problems to machine learning tasks
6.1.1. Classification problems
6.2.2. Measures of model performance
6.2.3. Evaluating classification models
6.3. Local interpretable model-agnostic explanations (LIME) for explaining model predictions
6.3.1. LIME: Automated sanity checking
6.3.2. Walking through LIME: A small example
6.3.3. LIME for text classification
Chapter 7. Linear and logistic regression
7.1.1. Understanding linear regression
7.1.2. Building a linear regression model
7.1.4. Finding relations and extracting advice
7.1.5. Reading the model summary and characterizing coefficient quality
7.2. Using logistic regression
7.2.1. Understanding logistic regression
7.2.2. Building a logistic regression model
7.2.4. Finding relations and extracting advice from logistic models
7.2.5. Reading the model summary and characterizing coefficients
7.3.1. An example of quasi-separation
Chapter 8. Advanced data preparation
8.1. The purpose of the vtreat package
8.3. Basic data preparation for classification
8.4. Advanced data preparation for classification
8.4.1. Using mkCrossFrameCExperiment()
8.5. Preparing data for regression modeling
8.6. Mastering the vtreat package
Chapter 9. Unsupervised methods
9.1.3. Hierarchical clustering with hclust
9.2.1. Overview of association rules
Chapter 10. Exploring advanced methods
10.1.2. Using bagging to improve prediction
10.1.3. Using random forests to further improve prediction
10.2. Using generalized additive models (GAMs) to learn non-monotone relationships
10.2.2. A one-dimensional regression example
10.2.3. Extracting the non-linear relationships
10.2.4. Using GAM on actual data
10.3. Solving “inseparable” problems using support vector machines
10.3.1. Using an SVM to solve a problem
10.3.2. Understanding support vector machines
Chapter 11. Documentation and deployment
11.2. Using R markdown to produce milestone documentation
11.2.2. knitr technical details
11.2.3. Using knitr to document the Buzz data and produce the model
11.3. Using comments and version control for running documentation
11.3.1. Writing effective comments
11.3.2. Using version control to record history
11.4.1. Deploying demonstrations using Shiny
11.4.2. Deploying models as HTTP services
Chapter 12. Producing effective presentations
12.1. Presenting your results to the project sponsor
12.1.1. Summarizing the project’s goals
12.1.2. Stating the project’s results
12.1.3. Filling in the details
12.2. Presenting your model to end users
12.2.1. Summarizing the project goals
12.2.2. Showing how the model fits user workflow
12.3. Presenting your work to other data scientists
12.3.1. Introducing the problem
12.3.2. Discussing related work
12.3.3. Discussing your approach
A. Starting with R and other tools
B. Important statistical concepts
B.3. Examples of the statistical view of data