Home Page Icon
Home Page
Table of Contents for
cover
Close
cover
by Brindha Priyadarshini Jeyaraman, Ludvig Renbo Olsen,
Practical Machine Learning with R
Preface
About the Book
About the Authors
Description
Learning Objectives
Audience
Approach
Minimum Hardware Requirements
Software Requirements
Conventions
Installation and Setup
Installing R
Installing R Studio
Installing Libraries
Installing the Code Bundle
Additional Resources
Chapter 1
An Introduction to Machine Learning
Introduction
The Machine Learning Process
Raw Data
Data Pre-Processing
The Data Splitting Process
The Training Process
Evaluation Process
Deployment Process
Process Flow for Making Predictions
Introduction to R
Exercise 1: Reading from a CSV File in RStudio
Exercise 2: Performing Operations on a Dataframe
Exploratory Data Analysis (EDA)
View Built-in Datasets in R
Exercise 3: Loading Built-in Datasets
Exercise 4: Viewing Summaries of Data
Visualizing the Data
Activity 1: Finding the Distribution of Diabetic Patients in the PimaIndiansDiabetes Dataset
Activity 2: Grouping the PimaIndiansDiabetes Data
Activity 3: Performing EDA on the PimaIndiansDiabetes Dataset
Machine Learning Models
Types of Prediction
Supervised Learning
Unsupervised Learning
Applications of Machine Learning
Regression
Exercise 5: Building a Linear Classifier in R
Activity 4: Building Linear Models for the GermanCredit Dataset
Activity 5: Using Multiple Variables for a Regression Model for the Boston Housing Dataset
Summary
Chapter 2
Data Cleaning and Pre-processing
Introduction
Advanced Operations on Data Frames
Exercise 6: Sorting the Data Frame
Join Operations
Pre-Processing of Data Frames
Exercise 7: Centering Variables
Exercise 8: Normalizing the Variables
Exercise 9: Scaling the Variables
Activity 6: Centering and Scaling the Variables
Extracting the Principle Components
Exercise 10: Extracting the Principle Components
Subsetting Data
Exercise 11: Subsetting a Data Frame
Data Transposes
Identifying the Input and Output Variables
Identifying the Category of Prediction
Handling Missing Values, Duplicates, and Outliers
Handling Missing Values
Exercise 12: Identifying the Missing Values
Techniques for Handling Missing Values
Exercise 13: Imputing Using the MICE Package
Exercise 14: Performing Predictive Mean Matching
Handling Duplicates
Exercise 15: Identifying Duplicates
Techniques Used to Handle Duplicate Values
Handling Outliers
Exercise 16: Identifying Outlier Values
Techniques Used to Handle Outliers
Exercise 17: Predicting Values to Handle Outliers
Handling Missing Data
Exercise 18: Handling Missing Values
Activity 7: Identifying Outliers
Pre-Processing Categorical Data
Handling Imbalanced Datasets
Undersampling
Exercise 19: Undersampling a Dataset
Oversampling
Exercise 20: Oversampling
ROSE
Exercise 21: Oversampling using ROSE
SMOTE
Exercise 22: Implementing the SMOTE Technique
Activity 8: Oversampling and Undersampling using SMOTE
Activity 9: Sampling and Oversampling using ROSE
Summary
Chapter 3
Feature Engineering
Introduction
Types of Features
Datatype-Based Features
Date and Time Features
Exercise 23: Creating Date Features
Exercise 24: Creating Time Features
Time Series Features
Exercise 25: Binning
Activity 10: Creating Time Series Features – Binning
Summary Statistics
Exercise 26: Finding Description of Features
Standardizing and Rescaling
Handling Categorical Variables
Skewness
Exercise 27: Computing Skewness
Activity 11: Identifying Skewness
Reducing Skewness Using Log Transform
Exercise 28: Using Log Transform
Derived Features or Domain-Specific Features
Adding Features to a Data Frame
Exercise 29: Adding a New Column to an R Data Frame
Handling Redundant Features
Exercise 30: Identifying Redundant Features
Text Features
Exercise 31: Automatically Generating Text Features
Feature Selection
Correlation Analysis
Exercise 32: Plotting Correlation between Two Variables
P-Value
Exercise 33: Calculating the P-Value
Recursive Feature Elimination
Exercise 34: Implementing Recursive Feature Elimination
PCA
Exercise 35: Implementing PCA
Activity 12: Generating PCA
Ranking Features
Variable Importance Approach with Learning Vector Quantization
Exercise 36: Implementing LVQ
Variable Importance Approach Using Random Forests
Exercise 37: Finding Variable Importance in the PimaIndiansDiabetes Dataset
Activity 13: Implementing the Random Forest Approach
Variable Importance Approach Using a Logistic Regression Model
Exercise 38: Implementing the Logistic Regression Model
Determining Variable Importance Using rpart
Exercise 39: Variable Importance Using rpart for the PimaIndiansDiabetes Data
Activity 14: Selecting Features Using Variable Importance
Summary
Chapter 4
Introduction to neuralnet and Evaluation Methods
Introduction
Classification
Binary Classification
Exercise 40: Preparing the Dataset
Balanced Partitioning Using the groupdata2 Package
Exercise 41: Partitioning the Dataset
Exercise 42: Creating Balanced Partitions
Leakage
Exercise 43: Ensuring an Equal Number of Observations Per Class
Standardizing
Neural Networks with neuralnet
Activity 15: Training a Neural Network
Model Selection
Evaluation Metrics
Accuracy
Precision
Recall
Exercise 44: Creating a Confusion Matrix
Exercise 45: Creating Baseline Evaluations
Over and Underfitting
Adding Layers and Nodes in neuralnet
Cross-Validation
Creating Folds
Exercise 46: Writing a Cross-Validation Training Loop
Activity 16: Training and Comparing Neural Network Architectures
Activity 17: Training and Comparing Neural Network Architectures with Cross-Validation
Multiclass Classification Overview
Summary
Chapter 5
Linear and Logistic Regression Models
Introduction
Regression
Linear Regression
Exercise 47: Training Linear Regression Models
R2
Exercise 48: Plotting Model Predictions
Exercise 49: Incrementally Adding Predictors
Comparing Linear Regression Models
Evaluation Metrics
MAE
RMSE
Differences between MAE and RMSE
Exercise 50: Comparing Models with the cvms Package
Interactions
Exercise 51: Adding Interaction Terms to Our Model
Should We Standardize Predictors?
Repeated Cross-Validation
Exercise 52: Running Repeated Cross-Validation
Exercise 53: Validating Models with validate()
Activity 18: Implementing Linear Regression
Log-Transforming Predictors
Exercise 54: Log-Transforming Predictors
Logistic Regression
Exercise 55: Training Logistic Regression Models
Exercise 56: Creating Binomial Baseline Evaluations with cvms
Exercise 57: Creating Gaussian Baseline Evaluations with cvms
Regression and Classification with Decision Trees
Exercise 58: Training Random Forest Models
Model Selection by Multiple Disagreeing Metrics
Pareto Dominance
Exercise 59: Plotting the Pareto Front
Activity 19: Classifying Room Types
Summary
Chapter 6
Unsupervised Learning
Introduction
Overview of Unsupervised Learning (Clustering)
Hard versus Soft Clusters
Flat versus Hierarchical Clustering
Monothetic versus Polythetic Clustering
Exercise 60: Monothetic and Hierarchical Clustering on a Binary Dataset
DIANA
Exercise 61: Implement Hierarchical Clustering Using DIANA
AGNES
Exercise 62: Agglomerative Clustering Using AGNES
Distance Metrics in Clustering
Exercise 63: Calculate Dissimilarity Matrices Using Euclidean and Manhattan Distance
Correlation-Based Distance Metrics
Exercise 64: Apply Correlation-Based Metrics
Applications of Clustering
k-means Clustering
Exploratory Data Analysis Using Scatter Plots
The Elbow Method
Exercise 65: Implementation of k-means Clustering in R
Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset
Summary
Appendix
Chapter 1: An Introduction to Machine Learning
Activity 1: Finding the Distribution of Diabetic Patients in the PimaIndiansDiabetes Dataset
Activity 2: Grouping the PimaIndiansDiabetes Data
Activity 3: Performing EDA on the PimaIndiansDiabetes Dataset
Activity 4: Building Linear Models for the GermanCredit Dataset
Activity 5: Using Multiple Variables for a Regression Model for the Boston Housing Dataset
Chapter 2: Data Cleaning and Pre-processing
Activity 6: Pre-processing using Center and Scale
Activity 7: Identifying Outliers
Activity 8: Oversampling and Undersampling
Activity 9: Sampling and OverSampling using ROSE
Solution:
Chapter 3: Feature Engineering
Activity 10: Calculating Time series Feature – Binning
Activity 11: Identifying Skewness
Activity 12: Generating PCA
Activity 13: Implementing the Random Forest Approach
Activity 14: Selecting Features Using Variable Importance
Chapter 4: Introduction to neuralnet and Evaluation Methods
Activity 15: Training a Neural Network
Activity 16: Training and Comparing Neural Network Architectures
Activity 17: Training and Comparing Neural Network Architectures with Cross-Validation
Chapter 5: Linear and Logistic Regression Models
Activity 18: Implementing Linear Regression
Activity 19: Classifying Room Types
Chapter 6: Unsupervised Learning
Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
B13526_FM_ePub_Final_SW
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset