[MUSIC] So this leads us straight into a discussion of how are we going to compute this distance between two given articles. Well, when we're in 1D one really simple measure that we can use is just Euclidean distance. And hopefully, this should be fairly familiar to you, but this really isn't going to be something of interest to us because this would be assuming that we just have, in our example, just one word in our vocabulary. And in almost all the scenarios that we're going to think about in this specialization, we're going to assume that we have multiple features, or multiple different dimensions that we want to consider. And in this case, things get really interesting. There are lots of different distance functions we can think about using. And just one example of ways in which we can do interesting things in multiple dimensions, is we could think about weighting the different dimensions differently. So we could think about putting different weights on different words in the vocabulary, or different features that we might have. So for example, if you go back to course two when we we're talking about regression and we're talking about predicting the price of a house, we looked at using the nearest neighbor regression to predict that house value and we said, we can put different weight on a different attributes of the house. So maybe if we're thinking about what features are really important for predicting house value, is things like number of bedrooms, number of bathrooms, square footage of that house. Those are really important but maybe other things like number of floors or years renovated are less important to assessing that value. Well, in our document example there's this very similar analogy. So maybe when we're going to compute the similarity between two different articles, maybe we want to weight more heavily on the title, maybe that's really, really informative. And much more so than the body of the article that can have a lot of noise in these words that are kind of hard to account for. Likewise, if there's an article that has an abstract like a scientific article, that might also be more informative than the main body of the article. So these are both examples where you might want to specify weights that are different across the different features that you have. Another case where you might want to weight different features differently, is in scenarios where one of your features varies just a little bit across the different observations you have, but the other feature varies widely. So this can be because one of the features is in a different unit than the other feature or could just be that there's a lot of variance in that dimension. So in this cases what would happen is if you go to just compute something like including distance weighing both of this features equally, well, the one where you get these big changes might dominate these little changes. But really, in practice, it might be that these little changes in Feature 1 that we have here are as important as a larger change in Feature 2, the feature that varies more wildly across the different observations. So in this case, there are a couple things that people tend to do. And they both relate to scaling the feature by some measure of the spread of observations. So one way that you could account for the spread is simply to take for feature J, take every observation in that column. So you take here every row, remember it's a different observation. Each column has a different feature. So you take an entire column of your data matrix and you scale it by the maximum overall values in that column minus the minimum overall values in that column. And you do that for every observation in that column. An alternative is to scale by one over the variants of all observations of that feature. So all of these are cases where we introduce weights across our different features when we're going to computer distance. So formally, we can think about computing what's called Scaled Euclidean distance. Where it looks very much like standard Euclidean distance in multiple dimensions. But now across each one or our different dimensions we have a different weight on that dimension which I'm indicating here with it's ai. So a1 all the way to ad and so these are weights on a different features, and what they represent is the relative importance of these different features. And one example of how you could think about setting the weights is just as binary weights, 0s and 1s. That would be a special case of scaled Euclidian distance computation. And what that's equivalent to is feature selection, because if you set a weight equal to 0, you're knocking out that feature altogether. And it's not getting incorporated into the computation of the distance and so you're saying that future just not matter for the sake of assessing similarity or distance between two different articles. But remember here in contrast the one we talked about things like lasso or other notions or feature selection, here, we're pretty specifying what these weights are or in this binary case which features including which ones are excluded. But overall the thing that I really want to emphasize here is the fact that how we specify our data representation and compute this distance is really, really, really important. And it's a very challenging thing to do. So this idea of feature engineering or feature selection is very important, but it's also a fundamentally hard task and it's a task that there is literature on how to think about going about this feature engineering. But it really is an area in machine learning where a lot of tweaking comes in and this is one of the places where there's a knob to turn, and a lot of domain knowledge often comes in and thinking about how to think about setting these weights, or defining these distances. So I just want to emphasize that it really matters. There is no one solution for how to go about this. But think about it don't just compute some distance and assume that, that represents a distance that's of importance in the application without thinking about what the data is and what's happening when you're going to compute that distance. [MUSIC]