[MUSIC] So that was one idea of computing an inner product to compute a distance, but here's another really natural inner product measure that we could have. And this is simply to look at the inner product between our one article which might be our query article xq and another article xi, simply as the inner product between these two vectors that we are going to multiply, element-wise each of the different values, add them up, and call that the similarity. So it's the inner product between xi and xq, so it's simply the sum of all our dimensions of the product of the values in these two vectors. So we can think of this as measuring, how much these articles overlap in terms of the vocabularies used and what's the weight of that overlap. Okay, so in this example it would give us a similarity of 13 between these two different articles about soccer. But if we looked at an article about soccer relative to an article about some world news event, then maybe there would be very little overlap. Actually in this case, no overlap in the vocabularies of these articles. So this similarity would be 0. And the similarity that we talked about on the previous slide where we just summed up the products of the different features is very related to a popular similarity metric called cosine similarity where it looks exactly the same as what we had before. But in cosine similarity you're going to divide by the following two terms where I think it's a little bit more straightforward to write it as follows where what we see is that each one of these terms is just normalizing the vector. So we're summing over the square of each element just within vector xi and just within vector xq. So that's, by definition, this norm here. So this is equivalent to this. That's the definition of these bars here, is the norm of the magnitude of this vector. And we can rewrite this further as xi divided by the magnitude of this vector or the norm of the vector transposed times xq divided by the norm of xq. And so what we're doing relative to the example we had on the past two slides is instead of just computing the similarity as the inner product between these two vectors, we're first going to normalize the vectors. And this is a really, really critical difference actually which we'll discuss more. Okay, and so you can show that what we're doing here is equivalent to just looking at the angle between two vectors, regardless of their magnitude. And the reason this normalized inner product of vectors is equivalent to cosine of the angle between the vectors comes straightforwardly from the definition of an inner product. So we know that A transposed B is equal to magnitude of some Vector A magnitude of the Vector B times cosine of the angle between the vectors A and B. Okay, so if we have two different points here, two different articles, let's say there's just a vocabulary with two words, word one and word two. So this is maybe the word count vector for one article and the word count vector for the other article. What cosine similarity is doing was just looking at the cosine of the angle between the angles regardless of the magnitude of this vector. So I want to highlight a couple of things about cosine similarity. One is the fact that it's not actually a proper distance metric like you put in distance because the triangle inequality doesn't hold. But it's also important to know that it's extremely efficient to compute for sparse vectors because you only need to consider the nonzero elements when you're forming this calculation. Okay, but now I want to run through an example of what I mean by this normalization just to make sure it's very clear to everyone. So here's our standard wordcount vector, or maybe TFIDF vector, and when we think about normalizing it, we're simply thinking about dividing by the sum of the squares of the counts in this vector square rooted. And if we do out this calculation then our normalized representation of this document would be as follows. Okay, so let's talk a little bit more about this cosine similarity metric. Let's think about the values that it can take. So let's say there are two really, really similar articles. So they have a very small angle theta, very small theta. Then the cosine of theta Is going to be approximately equal to, you guys remember, 1, as you would hope. High similarity if they're very close together. So let's just remember how we think about cosine. We can think about drawing this unit circle. That does not look at all like a circle. Let's try again. Okay, I'm not sure that is much better but imagine our circle with Radius 1. Well, if we are looking at some angle of theta, then this length here is sign of theta and this length here is cosine of theta. Okay, so as were walking around the circle, when an angle is zero, we see that cosine of theta is one. When we get up to 90 degrees, so vertical, we see that that cosine drops down. Maybe I can switch colors and make this clear instead of waving my hands around. So let's see. When we're at angle 0, you see cosine is 1. When we're at angle pi over 2 or 90 degrees, we see that this distance along this x axis has dropped to 0. And when we shift over here to cosine of pi or 180 degrees, we see that cosine of theta is -1. Okay, so now that we've reviewed cosine a little bit, let's go back to these drawings here and this was supposed to be roughly 90 degrees. It's a little bit more, but when we look at cosine of theta here, we know that it's approximately 0. And if we have, Cosine of theta in this case it's going to be approximately -1. Okay, so in general cosine similarity can range from -1 to 1. But if we restrict ourselves to having just positive features, like we would if we were looking at a TFIDF vector for a document, in our similarity we could never have this example here. We're always going to be living in a positive quadrant, so our angles are going to range from 0 to 90. So our cosine similarity is going to range from 0 to 1. Okay, so this is going to be our focus. And in these cases, the way we're going to define a distance is simply one minus the similarity. And remember, it's not a proper distance, according to formal definitions of a distance metric but we can use it as a measure of distance between articles. [MUSIC]