So, in this video, we're going to continue our discussion of computing the spectra of discrete time signals. We'll go into some more practical aspects of how you compute these spectra. This falls in the, regime, what's known as spectral analysis. it's a technical term which means, the details of computing spectra. That are very realistic and reflect the signal, not some artifacts of the signal processing. We'll have to talk about windowing which refers to extracting sections of a longer signal for spectral analysis. You want to do that very carfully and corectly so you don't introduce artifacts. And this whole thing Put together in what's called short-time Fourier analysis discovering how the spectrum of a signal changes with time. We've already encountered this already this is the speech spectrogram so we're going to reveal in this video how the speech spectrogram is computed. OK. So here is the speech spectrogram I've showed you in the previous video and in more detail what we have now is a long signal this is over 1.2 seconds long, sampled at a very high rate. And you can tell by looking at the wave form and time. That is characteristics are changing continuing throughout the whole segment. So what we want to capture now in the frequency domain, what's happening, what do those spectral look like as we go through the signal? And basically the idea is that we extract small sections of the A wave form and we're going to compute their transforms and it turns out that extraction of those pieces turns out to be very important and you've got to do it carefully or else you're going to introduce artifacts. So before we go into the details of that let me ask you a question As you noted before the highest frequency here is 5.5 kilohertz. Notice a sampling word I used to digitize the analog speech signal to ridge it into my computer. Alright. So, you should have gotten that it has to be twice the highest frequency. So the correct answer is 11 kilohertz. Now, 11 kilohertz may seem like kind of an odd number, until you look up what the sampling rate is for. Out of a compact disk for CDs. I think you'll quickly figure out why I love them. Is the reason for why it's, how it's related, rather. To the CD sample, it's kind of interesting. most computers sample at 11 kHz, about that. Alright, let's go into the details of what we were just talking about. So here we have a long signal and we're going to chop it up into pieces. And these pieces are called sections. And the idea is that I'm going, for each section I'm going to compute it's DFT. And evaluate the spectrum. OK, well turns out there's a little problem with doing that directly which we need to explore which means we need to be a bit more precise so that value we take out a section what does that really mean? So what that really means is that you have a long signal. Which you have multiplied by what amounts to a rectangular pulse and in the spectral analysis world this is known as a window because it's through this pulse that you're viewing the signal. You're not seeing anything else on either side, you're viewing the signal through the window. And of course the word rectangular follows from it's shape. Well, let's look at a, an example here in a bit more detail to see what the effect is of multiplying by this window. So suppose we have a signal that looks something like that and we multiply it by a rectangular window, which Curves whenever it occurs and the result is going to be something that looks like this. And the problem is, occurs at the edges, this jump. Not a very big jump here, but a very big jump here. Well, that was not in the original signal. The original signal was a smooth blue line. What these jumps create, the edge effects. What they create are these sections in the spectrum which don't look right. Usually at the high frequency edges, and so, we know this is a speech spectrum, and this is clearly not indicative of the speech spectrum. It's entirely and artifact of using a rectangular window. It's all due to the edge effects, and so, we clearly want to minimize that. How you do that is to selecting a window which gracefully goes to zero at the edges. So, we're going to use this what's called a canyon window. Turns out it is a one cycle of a sinusoid that's been made, it's raised up to be positive and has a maximum amplitude of one. But it equals zero at the edges. So, we can see now that the edge effects can't be there, and now we get a spectrum, once we take the transform in the high frequency region, that greatly resembles the speech spectrum that we know is there. So, no artifacts. We've gotten rid of them. Just by using the Hanning window, well it turns there's another little problem with the Hanning window which we need to talk about. Before I get too far along I'm going to talk about some other details here. Note that I used a length 256 section and I'm using a length 512 transform So I am using a longer transform than the length of the section, and we understand that I'm interested in seeing the spectral details, so that makes a lot of sense. I could have taken an even longer transform if I wanted to, but for this example, I only took one twice as long. Now, this one is a power of 2. There's no reason why the original section has to be a power of 2. I just used powers of 2 cause I'm use to doing it. I could have used 255 or 308, if I wanted to, didn't really matter. But I have to pick a power of 2 For the transform length, because I'm using the FFT. And believe me, when you're computing spectrograms, you want to use the FFT. So, this is where the power of 2 is absolutely necessary, but not so for the sectionals. Well, what's the problem with using the Hamming window? Well, if you look at what happens her., Here are the section boundaries again. And if you look at what you're essentially doing when you apply a hamming window to each section, is that you're ignoring large fractions, portions of the data that could be important because the window goes to 0 At the boundaries of, from these sections. What's happening in those, in these regions, essentially gets set to 0. So, you never see them in this spectrum, they're going to be gone. How do you fix that? And the idea is to use overlapping windows. So, the idea is that we overlap the windows. One after another and producing a picture that looks more like this and now all of the signal gets through and I've overlapped here by a half; here's the original section length, here's the next section length. And I've overlapped by a half here of this section length. You can overlap by more, so that the spectra, the windows come more frequently. If you want to see more temporal detail, more time detail in how the spectrum's changing. you may want less. You can move it over some. You definitely don't want to move it over too much, else you'd be ignoring parts of the original signal. So now we've got all the data come through and now we can compute the spectrogram. So here's the big picture, you take a long signal You use any windows or something like it to go smoothly to the edge you overlap the sections so that you don't miss anything in the data and now you can take a fully transform of each section. And here's why you use the FFT. Because of the overlap by half, I am actually computing twice as many Fourier transforms as I did in the original setup, and so I'm doing lots and lots of transforms, but I'm getting very accurate answers. If it wasn't for the speed and efficiency of the FFT, I couldn't do this. It would take a, way too long for me to be patient enough to wait for the answer. Once I get these transforms I now have spectra and I can display them in all kinds of ways. We're going to display them as an image, you could display them other ways, but I do want to point out that now you can do things like track this peak through here and see how it changes in time. Where it's location and frequency is, changes through time. We get a very good idea of what the structure of the signal is in the frequency domain. So here's our spectrogram and so what I did What really what the display is is that every column of this image is a spectrum, computing using the FFT. we then display the value of that spectrum as a color and a heat map. And, you can see by the fact you can't see the quantization and image, that I'm confusing lots and lots of transforms and that's just the way it is. and it turns out, because of the FFT, I can compute speech spectrogram in real time. What that means is I can compute the spectra just as fast as the data are being sampled by the computer. That's the efficiency and the value of using the FFT. It's really really very important. On a more technical note, the thing you have to do when you're using the, spectrogram, is, you have to determine three things. You have to determine the window length. How much they overlap. And the transform length. In most cases, the transform length is longer than the window length. It depends how much detail you want in the, spectrum that you're trying to examine. The window length is determined by how rapidly things are changing in time in the signal. So that's where the temporal structure. The signal becomes important. In the overlap, a half is a normal default kind of overlap. You may want more overlap to get more detail of how the spectrum is changing. If you use much less than a half you may not be happy with the results because then you'd tend to be missing parts of the signal. With these kind of details and a lot of experience, you too can compute, compute a speech spectrogram that's accurate, accurately reflects what's going on in the signal.