1 Introduction to feature engineering
1.1 What is feature engineering, and why does it matter?
Who needs feature engineering?
What feature engineering cannot do
1.2 The feature engineering pipeline
1.3 How this book is organized
The five types of feature engineering
A brief overview of this book’s case studies
2 The basics of feature engineering
Qualitative data vs. quantitative data
2.3 The types of feature engineering
2.4 How to evaluate feature engineering efforts
Evaluation metric 1: Machine learning metrics
Evaluation metric 2: Interpretability
Evaluation metric 3: Fairness and bias
Evaluation metric 4: ML complexity and speed
3 Healthcare: Diagnosing COVID-19
3.1 The COVID flu diagnostic dataset
The problem statement and defining success
Imputing missing quantitative data
Imputing missing qualitative data
Numerical feature transformations
3.5 Building our feature engineering pipeline
4 Bias and fairness: Modeling recidivism
The problem statement and defining success
4.3 Measuring bias and fairness
Disparate treatment vs. disparate impact
Building our baseline pipeline
Measuring bias in our baseline model
4.6 Building a bias-aware model
Feature construction: Using the Yeo-Johnson transformer to treat the disparate impact
Feature extraction: Learning fair representation implementation using AIF360
5 Natural language processing: Classifying social media sentiment
5.1 The tweet sentiment dataset
The problem statement and defining success
Feature construction: Bag of words
Training an autoencoder to learn features
Introduction to transfer learning
Using BERT’s pretrained features
6 Computer vision: Object recognition
The problem statement and defining success
6.2 Feature construction: Pixels as features
6.3 Feature extraction: Histogram of oriented gradients
Optimizing dimension reduction with PCA
6.4 Feature learning with VGG-11
Using a pretrained VGG-11 as a feature extractor
Using fine-tuned VGG-11 features with logistic regression
7 Time series analysis: Day trading with machine learning
Rolling/expanding window features
Benefits of using a feature store
Wikipedia, MLOps, and feature stores
8.2 Setting up a feature store with Hopsworks
Using feature groups to select data
8.3 Creating training data in Hopsworks
9.1 Revisiting the feature engineering pipeline
Feature engineering is as crucial as ML model choice
Feature engineering isn’t a one-size-fits-all solution
9.3 Recap of feature engineering
9.4 Data type-specific feature engineering techniques
9.5 Frequently asked questions
When should I dummify categorical variables vs. leaving them as a single column?
How do I know if I need to deal with bias in my data?
9.6 Other feature engineering techniques
Combining learned features with conventional features