Feature Improvement - Cleaning Datasets

In the last two chapters, we have gone from talking about a basic understanding of feature engineering and how it can be used to enhance our machine learning pipelines to getting our hands dirty with datasets and evaluating and understanding the different types of data that we can encounter in the wild.

In this chapter, we will be using what we learned and taking things a step further and begin to change the datasets that we work with. Specifically, we will be starting to clean and augment our datasets. By cleaning, we will generally be referring to the process of altering columns and rows already given to us. By augmenting, we will generally refer to the processes of removing columns and adding columns to datasets. As always, our goal in all of these processes is to enhance our machine learning pipelines.

In the following chapters, we will be:

Identifying missing values in data
Removing harmful data
Imputing (filling in) these missing values
Normalizing/standardizing data
Constructing brand new features
Selecting (removing) features manually and automatically
Using mathematical matrix computations to transform datasets to different dimensions

These methods will help us develop a better sense of which features are important within our data. In this chapter, we will be diving deeper into the first four methods, and leave the other three for future chapters.

Table of Contents for Feature Improvement - Cleaning Datasets

Create new playlist

Sign In

Sign Up

Table of Contents for
Feature Improvement - Cleaning Datasets