Leaky and other ReLUs

LeakyReLU adds an activation layer that allows for negative values to have a small slope, rather than just 0, as in the case of the standard ReLU activation function. The standard ReLU encourages sparsity in the network by only allowing neurons with positive activation to fire. However, this also creates a dead neuron state, where parts of the network essentially die off or become untrainable. To overcome this issue, we introduce a leaky form of ReLU activation called LeakyReLU. An example of how this activation works is shown here:



Example of a leaky and parametric ReLU

Pictured in the preceding diagram is Parametric ReLU, which is similar to Leaky, but it allows the network to train the parameter itself. This allows the network to adjust on its own, but it will take longer to train.

The other ReLU variants you can use are summarized here:

  • Exponential Linear (ELU, SELU): These forms of ReLU activate as shown in the diagram as follows:
ELU and SELU
  • Concatenated ReLU (CReLU): This joins the regular and leaky form together to provide a new function that produces two output values. For positive values, it generates [0,x], while for negative values, it returns [x,0]. One thing to note about this layer is the doubling of output, since two values are generated per neuron.
  • ReLU-6: The value of 6 is arbitrary but allows for the network to train sparse neurons. Sparsity is of value because it encourages the network to learn or build stronger weights or bonds. The human brain has been shown to function in a sparse state, with only a few activated neurons at a time. You will often hear the myth that we only use 10% of our brain at a time at most. This may very well be true, but the reasons for this are more mathematical than us being able to use our entire brain. We do use our entire brain, just not all of it at the same time. Stronger individual weights, encouraged by sparsity, allow for the network to make better/stronger decisions. Fewer weights also encourage less overfitting or memorization of data. This can often happen in deep networks with thousands of neurons.

Regularization is another technique we will often use to trim or reduce unneeded or weights and create sparse networks. We will have a few opportunities to look at regularization and sparsity later in the coming chapters.

In the next section, we use what we have learned to build a working music GAN that can generate game music.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset