DSP of Billie Jean

Recently I came across a problem in which I had to struggle to convince my audience as to why spectral analysis of audible sounds is so important. As I thought further about it, I found it incumbent upon myself to tell people what I thought were important ideas in audio processing. Specifically I want to lay a foundation of spectral thinking in a thinker’s mind. I personally think that whenever there are audible signals at hand, spectral analysis is quite important. Rather I have come to claim that if someone out there has to deal with an audio signal, and they are oblivious of spectral analysis, they are simply playing with the signal like a blind man trying to appreciate the hues and colors of life under auspicious sunshine.

When it comes to spectral analysis, I would also like to assert that Fourier analysis is going to be enough and sufficient in most of the cases. However, one may subscribe to other methods, such as wavelet analysis, as well. But what do all of these terms mean? Let’s not worry about them right now. We shall talk about simple stuff in the beginning.

In order to make life easier for my audience, I made a screencast of a famous song from Michael Jackson; Billie Jean. It is a well known pop song. And it has all sorts of drum beats, string twists of guitars, and other musical instruments. I am not really well versed with many western musical instruments. But I believe that this pop song is really going to serve my purpose. So, please click on the thumbnail below to open the song in a new tab of your browser window.

First of all I would like to draw your attention to the video. There is a spectral analyzer shown in the video. The analyzer has four boxes with graphs shown in them. As the video plays, you can notice the graphs beating in accordance with the videos. Let me explain these graphs a little bit first.

The graph in the upper left (UL) corner is known as a plot of the Fast Fourier Transform (FFT). Again don’t be confused with the fancy term. In really simple words it plots the different frequencies present in the audio (the song in our case) at any given point in time. It also gives us an idea as what the magnitudes of those frequencies is. What is noticeable about this is that there is white box in the top of the graph that has number in it, and it always keeps on changing its place. Moreover, please notice the way it changes its place. This box has the value of the frequency that has the highest value at any given point in time. As the frequency that has the highest magnitude can change from time to time, depending on what instrument or note is being played at any given point in time, this box keeps on shifting it position along the x-axis.

The way this box changes its position from time to time, remind me of the live dance of late Michael Jackson himself. It appears as if the box is dancing like Michael Jackson. Or at least, it definitely appears as if a dance is going on. Why is that? In my opinion the reason is that Michael wrote the song, composed its music, and choreographed his dance routines by putting all his heart and soul into it. So when he had to perform a live dance, he was moving his body and limbs just in perfect harmony with the changing notes. With every move of his body, he was capturing every change in the notes of the song.

But the white box is no dancer. It is an inanimate object. So why does it move in harmony with the beat of the song. Well, it does not. It is just an exhibition of the most prominent frequency in the song at any given point in time. And as that frequency changes, the box also changes its position. So it appears to be like a shadow of Michael.

What is the point of saying all of this. Let’s review rest of the boxes first. The box in the lower left (LL) corner is a similar plot but on an octave scale. Most music lovers really know what an octave scale is so I am not going to go in the details. To be brief, it is a logarithmic frequency scale. In essence it is also showing what the UL box is showing but on a different scale.

The box in the upper right (UR) corner is the spectrogram. Literally, this is also an exhibition of the spectral (frequcny) contents of the signal; the song in our case.

The last box is the one which is in lower right (LR) corner. This box has a single thread-like delineation in it. It fluctuates along with the music. This is what the music is doing as a function of time. You can think of it as the voltage fluctuation that happens as a function of the beat of the music. What is it anyways? The truth is that this box shows the overall loudness level of the song at any given moment in time. We shall come back to it later as it will help us in understanding various concepts in signal processing.

The last thing is the bar to the left of the analyzer. Now this bar literally shows ups and downs in perfect harmony with the ups and downs of the songs. This is also the loudness level of the song at any given moment in time.

The overall loudness, as the name suggests, is the sum total of the loudness of all the instruments being played in the music at any given point in time. There are different musical instruments involved in the creation of music. There is the sound of the drum. There is the audio of the guitar. And there is the voice of the vocalist. In all, there is a handful of instruments involved. Moreover, each one of them is possibly producing a sound of a different frequency from each other at every point in time. The sum total of the loudness of these sounds of different frequencies at any point in time is what we call as the overall loudness of the audio.Well, it need not be the sum total at all. It can be mean, or an aggregate of any kind. These aggregate loudness levels are shown in LR and the bar to the left of the analyzer.

So the overall loudness has all the information of the audio. We can see that its fluctuation is also in sync with the rhythm of the song. However, a huge problem with the overall loudness level is that it hides a lot of information about the audio. In case of music, we really do not know as to which notes are being played at any point in time. As a result, we remain in complete oblivion about the composition of the music.

Stereo systems actually use the overall loudness in the form of a time-varying voltages to render audible music through their speakers. Voltages are passed to the speakers. And they create audible sounds.

But what if we wanted to know the overall composition of the music? What instruments were played at a particular point in time, what were the notes, and how loud were they? Can we answer these questions? Fortunately the answer to these questions is yes. And we owe it to the genius of an eighteenth century French mathematician who went by the name of Jean-Baptiste Joseph Fourier who invented the famous Fourier transform. Fourier was a major in Napoleon’s army. However, his invention of this transform is bigger than all the military campaigns of the conqueror combined. Actually it would not be an exaggeration to say that Fourier transform has a greater impact than all the wars in history combined. We owe it to Fourier for most of the digital communication we observe around ourselves today.

According to Fourier, all the different frequencies that have been added together are separable. And Fourier transform is the tool to separate them. We see this exactly happening in LL, UL and UR boxes in the analyser. The Fourier transform takes as input what is shown in the LR bow. And creates spectra out of that; what is shown in LL, UL and UR. This is a remarkable contribution of Fourier analysis. As a result of applying it what we get is all the frequencies that make up the music and their respective magnitudes. And we get it for small segments of time. To this end, the Fourier analyzer takes a small segment of music at time (roughly around 25 milliseconds) and reveals the spectral information.

So what is the big deal of performing Fourier analysis. The big deal is that in applying the transform we went from knowing nothing about a seemingly interesting signal, to knowling everything about its spectral makeup. This is a tremendous achievement. Having this information in hand we can understand the mind of the musician as well as train a machine to understand that. Not having this information is tantamount to dealing with random electrical glitches. These glitches are shown in LR.

Photo by frans16611