Unsupervised learning

In unsupervised learning, data points have no labels related with them. Therefore, we need to put labels on it algorithmically, as shown in the following figure. In other words, the correct classes of the training dataset in unsupervised learning are unknown. Consequently, classes have to be inferred from the unstructured datasets, which imply that the goal of an unsupervised learning algorithm is to preprocess the data in some structured ways by describing its structure.

To overcome this obstacle in unsupervised learning, clustering techniques are commonly used to group the unlabeled samples based on certain similarity measures. Therefore, this task also involves mining hidden patterns toward feature learning. Clustering is the process of intelligently categorizing the items in your dataset. The overall idea is that two items in the same cluster are “closer” to each other than items that belong to separate clusters. That is the general definition, leaving the interpretation of “closeness” open.

Figure 4: Unsupervised learning

Examples include clustering, frequent pattern mining, and dimensionality reduction for solving unsupervised learning problems (it can be applied to supervised learning problems too). We will provide several examples of unsupervised learning, such as k-means, bisecting k-means, Gaussian mixture model, Latent dirichlet allocation (LDA), and so on, in this book. We will also show how to use a dimensionality reduction algorithm such as Principal Component Analysis (PCA) or Singular Value Decomposition (SVD) in supervised learning through regression analysis.

Dimensionality reduction (DR): Dimensionality reduction is a technique used to reduce the number of random variables under certain considerations. This technique is used for both supervised and unsupervised learning. Typical advantages of using DR techniques are as follows:

It reduces the time and storage space required in machine learning tasks
It helps remove multicollinearity and improves the performance of the machine learning model
Data visualization becomes easier when reduced to very low dimensions such as 2D or 3D

Table of Contents for Unsupervised learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Unsupervised learning