Chapter 13

What is machine learning?

Machine learning is a discipline (a branch of artificial intelligence) that focuses on automatic model building. Machine learning algorithms allow us to automatically find patterns or a hierarchy in data (unsupervised learning), or even predict the property of a given sample after training on the prepared "training" dataset (supervised learning).

What is the difference between supervised and non-supervised learning?

Unsupervised learning algorithms operate on any given dataset with no special preparation required and aim to find patterns or structures without any prior knowledge. Supervised learning models are trained on a properly labeled "training set," which they do by building a generalized model, and then are able to infer values for the new data samples it hasn't seen before.

What are the drawbacks of k-means clustering? Why do we need to use a scaler?

K-means can't define clusters of a non-convex shape since this requires a predefined number of clusters to group by and proper scaling. Scaling is needed to align scales of different units to one scale, but that affects the interpretability of the cluster.

How does the KNN model work? What are the benefits and limitations of such a model?

KNN predicts new records by finding N (hence the name) nearest records in the "training" dataset and inferring value from them (for example, by getting a weighted average). It is very simple, works relatively well on a certain type of data, and needs no time to train—most computations are done in the prediction phase. The limitations of such a model are limited scalability (as it needs access to the whole training set at prediction time), it can't predict beyond the training set, and it has limited accuracy. Most importantly, though, the KNN model makes use of all the features that are provided equally—if it is fed with a non-useful, random feature, it may decrease the performance of the model.

Why does linear regression give more interpretation than KNN? Do we need to scale data in this case?

In contrast to KNN, the linear model generalizes all of the knowledge it gains from training in a simple one-dimensional array of coefficients—one per feature, plus a bias. This is very simple and fast to predict, and provides bird's-eye, simple but relatable interpretability, putting a direct "price tag" on each feature. In this case, scaling is not strictly necessary and will obviously affect interpretation.

How do decision trees work?

The decision trees model is yet another machine learning algorithm. To predict values, the DT model generates a binary tree of "questions," each asking whether a certain feature is greater or smaller than a certain threshold. On each iteration, the algorithm finds a feature and threshold with the maximal difference between the target value. The final "leaves" of this tree are associated with the average/most frequent target variable for the corresponding sample in the training dataset. Decision trees offer good interpretation as they are fast and can perform relatively well, but they are extremely prone to overfitting.

Table of Contents for Chapter 13

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13