The discrete cosine transform (DCT) is a transform similar to the Fourier transform, but it tries to represent a signal by a sum of cosine terms only (refer to equation 6.11). The DCT is used for signal compression and in the calculation of the mel frequency spectrum, which I mentioned in the Analyzing the frequency spectrum of audio recipe. We can convert normal frequencies to the mel frequency (a frequency more appropriate for the analysis of speech and music) with the following equation:
The steps to create the mel frequency spectrum are not complicated, but there are quite a few of them. The relevant Wikipedia page is available at https://en.wikipedia.org/wiki/Mel-frequency_cepstrum (retrieved September 2015). If you do a quick web search, you can find a couple of Python libraries that implement the algorithm. I implemented a very simple version of the computation in this recipe.
import dautil as dl from scipy.fftpack import dct import matplotlib.pyplot as plt import ch6util import seaborn as sns import numpy as np from IPython.display import HTML
rate, audio = ch6util.read_wav() transformed = dct(audio)
sp = dl.plotting.Subplotter(2, 2, context) freqs = np.fft.fftfreq(audio.size, 1./rate) indices = np.where(freqs > 0)[0] sp.ax.semilogy(np.abs(transformed)[indices]) sp.label()
sns.distplot(np.log(np.abs(transformed)), ax=sp.next_ax()) sp.label()
sns.distplot(np.angle(transformed), ax=sp.next_ax()) sp.label()
magnitude = ch6util.amplitude(audio) cepstrum = dl.ts.power(np.fft.ifft(np.log(magnitude ** 2))) mel = 1127 * np.log(1 + freqs[indices]/700) sp.next_ax().plot(mel, ch6util.amplitude(dct(np.log(magnitude[indices] ** 2)))) sp.label() HTML(sp.exit())
Refer to the following screenshot for the end result:
The code for this recipe is in the analyzing_dct.ipynb
file in this book's code bundle.