Recently I came across a problem in which I had to struggle to convince my audience as to why spectral analysis of audible sounds is so important. As I thought further about it, I found it incumbent upon myself to tell people what I thought were important ideas in audio processing. Specifically I want to lay a foundation of spectral thinking in a thinker’s mind. I personally think that whenever there are audible signals at hand, spectral analysis is quite important. Rather I have come to claim that if someone out there has to deal with an audio signal, and they are oblivious of spectral analysis, they are simply playing with the signal like a blind man trying to appreciate the hues and colors of life under auspicious sunshine.
When it comes to spectral analysis, I would also like to assert that Fourier analysis is going to be enough and sufficient in most of the cases. However, one may subscribe to other methods, such as wavelet analysis, as well. But what do all of these terms mean? Let’s not worry about them right now. We shall talk about simple stuff in the beginning.
In order to make life easier for my audience, I made a screencast of a famous song from Michael Jackson; Billie Jean. It is a well known pop song. And it has all sorts of drum beats, string twists of guitars, and other musical instruments. I am not really well versed with many western musical instruments. But I believe that this pop song is really going to serve my purpose. So, please click on the thumbnail below to open the song in a new tab of your browser window.
First of all I would like to draw your attention to the video. There is a spectral analyzer shown in the video. The analyzer has four boxes with graphs shown in them. As the video plays, you can notice the graphs beating in accordance with the videos. Let me explain these graphs a little bit first.
The graph in the upper left (UL) corner is known as a plot of the Fast Fourier Transform (FFT). Again don’t be confused with the fancy term. In really simple words it plots the different frequencies present in the audio (the song in our case) at any given point in time. It also gives us an idea as what the magnitudes of those frequencies is. What is noticeable about this is that there is white box in the top of the graph that has number in it, and it always keeps on changing its place. Moreover, please notice the way it changes its place. This box has the magnitude of the frequency that has the highest value at any given point in time. As the frequency that has the highest magnitude can change from time to time, depending on what instrument or note is being played at any given point in time, this box keeps on shifting it position along the x-axis.
The way this box changes its position from time to time, remind me of the live dance of late Michael Jackson himself. It appears as if the box is dancing like Michael Jackson. Or at least, it definitely appears as if a dance is going on. Why is that? In my opinion the reason is that Michael wrote the song, composed it music, and choreographed his dance routines by putting all his heart and soul in it. So when he had to perform a live dance, he was moving his body and limbs just in perfect harmony with the changing notes. With every move of his body, he was capturing every change in the notes of the song.
But the white box is no dancer. It is an inanimate object. So why does it move in harmony with the beat of the song. Well, it does not. It is just an exhibition of the most prominent frequency in the song at any given point in time. And as that frequency changes, the box also changes its position. So it appears to be like a shadow of Michael.
What is the point of saying all of this. Let’s review rest of the boxes first. The box in the lower left (LL) corner is a similar plot but on an octave scale. Most music lovers really know what an octave scale is so I am not going to go in the details. To be brief, it is a logarithmic frequency scale. In essence it is also showing what the UL box is showing but on a different scale.
The box in the upper right (UR) corner is the spectrogram. Literally, this is also an exhibition of the spectral (frequcny) contents of the signal; the song in our case.
The last box is the one which is in lower right (LR) corner. This box has a single thread-like delineation in it. It fluctuates along with the music. This is what the music is doing as a function of time. You can think of it as the voltage fluctuation that happens as a function of the beat of the music. What is it anyways? The truth is that this box shows the overall loudness level of the song at any given moment in time. We shall come back to it later as it will help us in understanding various concepts in signal processing.
The last thing is the bar to the left of the analyzer. Now this bar literally shows ups and downs in perfect harmony with the ups and downs of the songs. This is also the loudness level of the song at any given moment in time.
If you found an error, highlight it and press Shift + Enter or click here to inform us.