[MUSIC] So far we've been focusing on a document retrieval task where somebody is reading an article and we just want to search over all other articles in the corpus to find one that's most similar or very similar to the article that the person is reading. But in a lot of cases, we're actually interested in discovering a structured representation of our data, of all the articles out there. So, one example of a structured representation we might be interested in is discovering a clustering of articles, so discovering groups of articles that are related to one another. And we're going to discuss both reasons why we would want to perform such clustering as well as algorithms for performing this clustering. So let's dig into the objective of clustering, as well as some motivating applications for performing clustering within the context of our document application. So the goal of clustering is to discover groups of related articles. So, for example, maybe there's a group where all the articles within this group relate to subjects related to sports. And maybe there's another group where all the articles are about world news. If we can discover this type of structure in our data, then if we're doing something like the document retrieval task that we talked about before. Then if a person is reading an article we can look at it what group that article falls in. And then when we go to retrieve another article for that reader, then we can simply search over the articles within that given group. So if a person's reading a sports article, maybe we can recommend another sports article just by searching for the nearest neighbor within this group of sports articles. But we can actually use clustering to do even fancier things. So, as an example, maybe we're interested in learning the preferences of a user over a set of different topics. In this case, we're assuming that the person isn't just interested in sports. They might have other interests as well. And when we're going to present an article to that user, maybe we'd want to explore some of these other preferences. So in this case, if we imagine that we have a clustering of our articles into these groups of related articles, then a user is going to read some subset of the articles in the corpus, and so, these are the articles that are not grayed out here. So these are all the articles that the user read, and then we can imagine that the user gives us some feedback about the articles, whether they like the article or not. So, plus sign is going to say, yep, they like the article, minus sign is going to be that they did not like the article. So this user liked those two articles in Cluster 1, that green Cluster as well this article in Cluster 4, the orange Custer, didn’t like the article that they read in this blue Cluster, Cluster 2. Did not like this other article read in Cluster 4, liked this article in this blue cluster, Cluster 2 and so on. And after we get this feedback or as we're getting it over time, we can use this to learn this type of preference factor over topics. One thing that we're going to discuss is that we don't actually have topic labels like sports, world news and so on. But we do know that there are groups of articles and we can use it to select articles from these groups. Or we can post facto go in and dig in to these groups and put labels on them ourselves. [MUSIC]