What makes deep learning special?

Deep learning employs a stack of multiple hidden layers of non-linear processing units. The input of a hidden layer is the output of its previous layer. This can be easily observed from the examples of a shallow neural network and a deep neural network shown previously.

Features are extracted from each hidden layer. Features from different layers represent abstracts or patterns of different levels. Hence, higher-level features are derived from lower-level features, which are extracted from previous layers. All these together form a hierarchical representation learned from the data.

Take the cats and dogs image classification as an example, in traditional machine learning solutions, the classification step follows a feature extraction process, which is often based on:

Domain knowledge, such as color, shape, color of the animals, shape of the ears in this case, which are usually hand-crafted
Dimensionality reduction, such as principal component analysis (PCA), Latent Dirichlet Allocation (LDA)
Feature engineering techniques, such as histogram of oriented gradients transformation (HOG), Scale Invariant Feature Transform (SIFT), and Speeded up Robust Features (SURF)

The workflow of traditional machine learning solution to cats and dogs classification is displayed as follows:

However, in deep learning based solutions (such as CNNs, which we will be learning shortly), hierarchical representations are derived throughout the latent learning process and features of the highest level are then fed into the final classification step. These features capture the important and distinguishable details in the cat and dog images. Depending on the magic worked in hidden layers:

The low-level features can be edges, lines or dots of whiskers, nose or eyes, ears and so on
The higher-level features can be outlines or contours of the animals

The entire workflow of deep learning solution is shown as follows:

Deep learning removes those manual or explicit feature extraction steps, and instead relies on the training process to automatically discover useful patterns underneath the input data. And through tweaking the layout (number of layers, number of hidden units for a layer, activation function, and so on) of the networks, we can find the most efficient sets of features.

Recall the example of the shallow ANN and that of the deep learning model in the last section, data flow one-way from the input layer to the output. Besides feedforward architectures, deep learning models allow data to proceed in any direction, even to circle back to the input layer. Data looping back from the previous output becomes part of the next input data. Recurrent neural networks (RNNs) are great examples. We will be working on projects using RNNs later in this book. For now, we can still get a sense of what the recurrent or cycle-like architecture looks like from the diagram of RNNs as follows:

The recurrent architecture makes the models applicable to time series data and sequences of inputs. As data from previous time points goes into the training of the current time point, the deep learning recurrent model effectively solves a time series or sequence learning problem in a feedforward manner. In traditional machine learning solutions (read more in Machine Learning for Sequential Data: A Review by T. Dietterich) to time series problems, sliding windows of previous lags are usually provided as current inputs. This can be ineffective as the size of the sliding windows needs to be decided and so does the number of windows, while the recurrent models figure out timely or sequential relationships themselves.

Although we are discussing here all the advantages about deep learning over the other machine learning techniques, we did not make any claim or statement that the modern deep learning is superior to the traditional machine learning. That's right, there is no free lunch in this field, which was also emphasized in my last book, Python Machine Learning By Example. There is no single algorithm that can solve all machine learning problems more efficiently than others. It all depends on specific use cases - in some applications, the "traditional" ones are a better fit, or a deep learning setting makes no difference; in some cases, the "modern" ones yield better performance.

Next, we will see some typical applications of deep learning that will better motivate us to get started in deep learning projects.

Table of Contents for What makes deep learning special?

Create new playlist

Sign In

Sign Up

Table of Contents for
What makes deep learning special?