So, in this video, we're going to 
continue our discussion of computing the 
spectra of discrete time signals. 
We'll go into some more practical aspects 
of how you compute these spectra. 
This falls in the, regime, what's known 
as spectral analysis. 
it's a technical term which means, the 
details of computing spectra. 
That are very realistic and reflect the 
signal, not some artifacts of the signal 
processing. 
We'll have to talk about windowing which 
refers to extracting sections of a longer 
signal for spectral analysis. 
You want to do that very carfully and 
corectly so you don't introduce 
artifacts. 
And this whole thing Put together in 
what's called short-time Fourier analysis 
discovering how the spectrum of a signal 
changes with time. 
We've already encountered this already 
this is the speech spectrogram so we're 
going to reveal in this video how the 
speech spectrogram is computed. 
OK. 
So here is the speech spectrogram I've 
showed you in the previous video and in 
more detail what we have now is a long 
signal this is over 1.2 seconds long, 
sampled at a very high rate. 
And you can tell by looking at the wave 
form and time. 
That is characteristics are changing 
continuing throughout the whole segment. 
So what we want to capture now in the 
frequency domain, what's happening, what 
do those spectral look like as we go 
through the signal? And basically the 
idea is that we extract small sections of 
the A wave form and we're going to 
compute their transforms and it turns out 
that extraction of those pieces turns out 
to be very important and you've got to do 
it carefully or else you're going to 
introduce artifacts. 
So before we go into the details of that 
let me ask you a question As you noted 
before the highest frequency here is 5.5 
kilohertz. 
Notice a sampling word I used to digitize 
the analog speech signal to ridge it into 
my computer. 
Alright. 
So, you should have gotten that it has to 
be twice the highest frequency. 
So the correct answer is 11 kilohertz. 
Now, 11 kilohertz may seem like kind of 
an odd number, until you look up what the 
sampling rate is for. 
Out of a compact disk for CDs. 
I think you'll quickly figure out why I 
love them. 
Is the reason for why it's, how it's 
related, rather. 
To the CD sample, it's kind of 
interesting. 
most computers sample at 11 kHz, 
about that. 
Alright, let's go into the details of 
what we were just talking about. 
So here we have a long signal and we're 
going to chop it up into pieces. 
And these pieces are called sections. 
And the idea is that I'm going, for each 
section I'm going to compute it's DFT. 
And evaluate the spectrum. 
OK, well turns out there's a little 
problem with doing that directly which we 
need to explore which means we need to be 
a bit more precise so that value we take 
out a section what does that really mean? 
So what that really means is that you 
have a long signal. 
Which you have multiplied by what amounts 
to a rectangular pulse and in the 
spectral analysis world this is known as 
a window because it's through this pulse 
that you're viewing the signal. 
You're not seeing anything else on either 
side, you're viewing the signal through 
the window. 
And of course the word rectangular 
follows from it's shape. 
Well, let's look at a, an example here in 
a bit more detail to see what the effect 
is of multiplying by this window. 
So suppose we have a signal that looks 
something like that and we multiply it by 
a rectangular window, which Curves 
whenever it occurs and the result is 
going to be something that looks like 
this. 
And the problem is, occurs at the edges, 
this jump. 
Not a very big jump here, but a very big 
jump here. 
Well, that was not in the original 
signal. 
The original signal was a smooth blue 
line. 
What these jumps create, the edge 
effects. 
What they create are these sections in 
the spectrum which don't look right. 
Usually at the high frequency edges, and 
so, we know this is a speech spectrum, 
and this is clearly not indicative of the 
speech spectrum. 
It's entirely and artifact of using a 
rectangular window. 
It's all due to the edge effects, and so, 
we clearly want to minimize that. 
How you do that is to selecting a window 
which gracefully goes to zero at the 
edges. 
So, we're going to use this what's called 
a canyon window. 
Turns out it is a one cycle of a sinusoid 
that's been made, it's raised up to be 
positive and has a maximum amplitude of 
one. 
But it equals zero at the edges. 
So, we can see now that the edge effects 
can't be there, and now we get a 
spectrum, once we take the transform in 
the high frequency region, that greatly 
resembles the speech spectrum that we 
know is there. 
So, no artifacts. 
We've gotten rid of them. 
Just by using the Hanning window, well it 
turns there's another little problem with 
the Hanning window which we need to talk 
about. 
Before I get too far along I'm going to 
talk about some other details here. 
Note that I used a length 256 section and 
I'm using a length 512 transform So I am 
using a longer transform than the length 
of the section, and we understand that 
I'm interested in seeing the spectral 
details, so that makes a lot of sense. 
I could have taken an even longer 
transform if I wanted to, but for this 
example, I only took one twice as long. 
Now, this one is a power of 2. 
There's no reason why the original 
section has to be a power of 2. 
I just used powers of 2 cause I'm use to 
doing it. 
I could have used 255 or 308, if I wanted 
to, didn't really matter. 
But I have to pick a power of 2 For the 
transform length, because I'm using the 
FFT. 
And believe me, when you're computing 
spectrograms, you want to use the FFT. 
So, this is where the power of 2 is 
absolutely necessary, but not so for the 
sectionals. 
Well, what's the problem with using the 
Hamming window? Well, if you look at what 
happens her., Here are the section 
boundaries again. 
And if you look at what you're 
essentially doing when you apply a 
hamming window to each section, is that 
you're ignoring large fractions, portions 
of the data that could be important 
because the window goes to 0 At the 
boundaries of, from these sections. 
What's happening in those, in these 
regions, essentially gets set to 0. 
So, you never see them in this spectrum, 
they're going to be gone. 
How do you fix that? And the idea is to 
use overlapping windows. 
So, the idea is that we overlap the 
windows. 
One after another and producing a picture 
that looks more like this and now all of 
the signal gets through and I've 
overlapped here by a half; here's the 
original section length, here's the next 
section length. 
And I've overlapped by a half here of 
this section length. 
You can overlap by more, so that the 
spectra, the windows come more 
frequently. 
If you want to see more temporal detail, 
more time detail in how the spectrum's 
changing. 
you may want less. 
You can move it over some. 
You definitely don't want to move it over 
too much, else you'd be ignoring parts of 
the original signal. 
So now we've got all the data come 
through and now we can compute the 
spectrogram. 
So here's the big picture, you take a 
long signal You use any windows or 
something like it to go smoothly to the 
edge you overlap the sections so that you 
don't miss anything in the data and now 
you can take a fully transform of each 
section. 
And here's why you use the FFT. 
Because of the overlap by half, I am 
actually computing twice as many Fourier 
transforms as I did in the original 
setup, and so I'm doing lots and lots of 
transforms, but I'm getting very accurate 
answers. 
If it wasn't for the speed and efficiency 
of the FFT, I couldn't do this. 
It would take a, way too long for me to 
be patient enough to wait for the answer. 
Once I get these transforms I now have 
spectra and I can display them in all 
kinds of ways. 
We're going to display them as an image, 
you could display them other ways, but I 
do want to point out that now you can do 
things like track this peak through here 
and see how it changes in time. 
Where it's location and frequency is, 
changes through time. 
We get a very good idea of what the 
structure of the signal is in the 
frequency domain. 
So here's our spectrogram and so what I 
did What really what the display is is 
that every column of this image is a 
spectrum, computing using the FFT. 
we then display the value of that 
spectrum as a color and a heat map. 
And, you can see by the fact you can't 
see the quantization and image, that I'm 
confusing lots and lots of transforms and 
that's just the way it is. 
and it turns out, because of the FFT, I 
can compute speech spectrogram in real 
time. 
What that means is I can compute the 
spectra just as fast as the data are 
being sampled by the computer. 
That's the efficiency and the value of 
using the FFT. 
It's really really very important. 
On a more technical note, the thing you 
have to do when you're using the, 
spectrogram, is, you have to determine 
three things. 
You have to determine the window length. 
How much they overlap. 
And the transform length. 
In most cases, the transform length is 
longer than the window length. 
It depends how much detail you want in 
the, spectrum that you're trying to 
examine. 
The window length is determined by how 
rapidly things are changing in time in 
the signal. 
So that's where the temporal structure. 
The signal becomes important. 
In the overlap, a half is a normal 
default kind of overlap. 
You may want more overlap to get more 
detail of how the spectrum is changing. 
If you use much less than a half you may 
not be happy with the results because 
then you'd tend to be missing parts of 
the signal. 
With these kind of details and a lot of 
experience, you too can compute, compute 
a speech spectrogram that's accurate, 
accurately reflects what's going on in 
the signal.