A GAN for creating music

In our final grand example of this chapter, we are going to look at generating music with GANs for games. Music generation is not especially difficult, but it does allow us to see a whole variation of a GAN that uses LSTM layers to identify sequences and patterns in music. Then it attempts to build that music back from random noise to a passable sequence of notes and melodies. This sample becomes ethereal when you listen to those generated notes and realize the tune originates from a computer brain.

The origins of this sample are pulled from GitHub, https://github.com/megis7/musegen, and developed by Michalis Megisoglou. The reason we look at these code examples is so that we can see the best of what others have produced and learn from those. In some cases, these samples are close to the original, and others not so much. We did have to tweak a few things. Michalis also produced a nice GitHub README on the code he built for his implementation of museGAN, music generation with GAN. If you are interested in building on this example further, be sure to check out the GitHub site as well. There are a few implementations of museGAN available using various libraries; one of them is TensorFlow.

We use Keras in this example in order to make this example easier to understand. If you are serious about using TensorFlow, then be sure to take a look at the TensorFlow version of museGAN as well.

This example trains the discriminator and generator separately, which means it needs to have the discriminator trained first. For our first run, we will run this example with the author's previously generated models, but we still need some setup; let's follow these steps:

We first need to install a couple of dependencies. Open an Anaconda or Python window as an admin and run the following commands:

pip install music21
pip install h5py

Music21 is a Python library for loading MIDI files. MIDI is a music interchange format used to describe, as you might have guessed, music/notes. The original models were trained on a collection of MIDI files that describe 300 chorales of Bach's music. You can locate the project by navigating to the musegen folder and running the script.
Navigate to the project folder and execute the script that runs the previously trained models like so:

cd musegen
python musegen.py or python3 musegen.py

This will load the previously saved models and use those models to train the generator and generate music. You could, of course, train this GAN on other MIDI files of your choosing later as needed. There are plenty of free sources for MIDI files from classical music, to TV theme music, games, and modern pop. We use the author's original models in this example, but the possibilities are endless.
Loading the music files and training can take a really long time, as training typically does. So, take this opportunity to look at the code. Open up the musegen.py file located in the project folder. Take a look at around line 39, as follows:

print('loading networks...')
dir_path = os.path.dirname(os.path.realpath(__file__))
generator = loadModelAndWeights(os.path.join(dir_path, note_generator_dir, 'model.json'),
                               os.path.join(dir_path, note_generator_dir, 'weights-{:02d}.hdf5'.format(generator_epoch)))

This section of code loads the previously trained model from an hdf5 or hierarchical data file. The preceding code sets up a number of variables that define the notes to a vocabulary we will use to generate new notes.
Locate the notegenerator.py file located in the same project folder. Take a look at the creation of the model code, as follows:

x_p = Input(shape=(sequence_length, pitch_dim,), name='pitches_input')
h = LSTM(256, return_sequences=True, name='h_lstm_p_1')(x_p)
h = LSTM(512, return_sequences=True, name='h_lstm_p_2')(h)
h = LSTM(256, return_sequences=True, name='h_lstm_p_3')(h)

# VAE for pitches
z_mean_p = TimeDistributed(Dense(latent_dim_p, kernel_initializer='uniform'))(h)
z_log_var_p = TimeDistributed(Dense(latent_dim_p, kernel_initializer='uniform'))(h)

z_p = Lambda(sampling)([z_mean_p, z_log_var_p])
z_p = TimeDistributed(Dense(pitch_dim, kernel_initializer='uniform', activation='softmax'))(z_p)

x_d = Input(shape=(sequence_length, duration_dim, ), name='durations_input')
h = LSTM(128, return_sequences=True)(x_d)
h = LSTM(256, return_sequences=True)(h)
h = LSTM(128, return_sequences=True)(h)

# VAE for durations
z_mean_d = TimeDistributed(Dense(latent_dim_d, kernel_initializer='uniform'))(h)
z_log_var_d = TimeDistributed(Dense(latent_dim_d, kernel_initializer='uniform'))(h)

z_d = Lambda(sampling)([z_mean_d, z_log_var_d])
z_d = TimeDistributed(Dense(duration_dim, kernel_initializer='uniform', activation='softmax'))(z_d)
conc = Concatenate(axis=-1)([z_p, z_d])
latent = TimeDistributed(Dense(pitch_dim + duration_dim, kernel_initializer='uniform'))(conc)
latent = LSTM(256, return_sequences=False)(latent)

o_p = Dense(pitch_dim, activation='softmax', name='pitches_output', kernel_initializer='uniform')(latent)
o_d = Dense(duration_dim, activation='softmax', name='durations_output', kernel_initializer='uniform')(latent)

Note how we have changed from using Conv2D layers to LSTM layers, since we have gone from image recognition to sequence or note pattern recognition. We have also gone from using more straightforward layers to a complex time-distributed architecture. Also, the author used a concept known as variational auto encoding in order to determine the distribution of notes in a sequence. This network is the most complex we have looked at so far, and there is a lot going on here. Don't fret too much about this example, except to see how the code flows. We will take a closer look at more of these type of advanced time- distributed networks in Chapter 4, Building a Deep Learning Gaming Chatbot.
Let the sample run and generate some music samples into the samples/note-generator folder. As we get into more complex problems, our training time will go from hours to days for very complex problems or more. It is possible that you could easily generate a network that you would not have the computing power to train in a reasonable time.

Open the folder and double-click on one of the sample files to listen to the generated MIDI file. Remember, this music was just generated by a computer brain.

There is a lot of code that we did not cover in this example. So, be sure to go back and go through the musegen.py file to get a better understanding of the flow and types of layers used to build the network generator. In the next section, we explore how to train this GAN.

Table of Contents for A GAN for creating music

Create new playlist

Sign In

Sign Up

Table of Contents for
A GAN for creating music