Putting it all together

After dealing with a variety of problems with our dataset, from identifying missing values hidden as zeros, imputing missing values, and normalizing data at different scales, it's time to put all of our scores together into a single table and see what combination of feature engineering did the best:

Pipeline description

# rows model learned from

Cross-validated accuracy

Drop missing-valued rows

392

.7449

Impute values with 0

768

.7304

Impute values with mean of column

768

.7318

Impute values with median of column

768

.7357

Z-score normalization with median imputing

768

.7422

Min-max normalization with mean imputing

768

.7461

Row-normalization with mean imputing

768

.6823

It seems as though we were finally able to get a better accuracy by applying mean imputing and min-max normalization to our dataset and still use all 768 available rows. Great!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset