[MUSIC] Welcome to the Clustering and Retrieval course, a machine learning specialization. And as we're going to see, clustering and retrieval are both widely used and broadly applicable tools. These tools can be used for anything from finding similar products to the one that you're currently browsing. To discovering groups of patients with related medical conditions. And in the foundation's course, we provided the high level introduction to the concepts of clustering and retrieval. But now in this course, we're going to take a deep dive into the models and algorithms. This course is part of a specialization that was designed to be taken in a specific sequence. So, although you can take this course as a standalone course. To get the experience that we intended, we strongly suggest that you take the entire sequence. In particular, in the Foundations course, we provided a high level overview of the concepts that we're going to be cover throughout the entire specialization. Then we went into much more detailed courses on regression and classification. And in those courses, we learned general machine learning concepts that are going to appear again in this course on clustering and retrieval. And we'll go through a specific list of the concepts that we expect you to know later on in this introduction to the course. But first, let's provide an overview of what this course is about. And remember that in the machine learning foundations course, we said that machine learning was about extracting intelligence from data. Well, in the first part of this course, the machine learning method that we're going to be going through is nearest neighbor search for retrieval task. So in particular here, the inputs are going to be the features of a specific query point. As well as the features of all other datapoints in a specific dataset. And then the output is going to be the nearest neighbor to the query point. Being the point within this entire dataset that's most similar to this query point. So, what's needed for our nearest neighbor search is to search over all these other datapoints. And compute the similarity to these datapoints or their distances and then return the one that's closest. So as a running example that we're going to use throughout this course, let's imagine that we had a whole bunch of documents and somebody's reading a specific document. So this is this query article in gray here and let's say that the user likes this article. And we want to retrieve another article that's similar that they might also like. So this would be the task of finding the nearest neighbor. But maybe we also want to retrieve a set of nearest neighbors, so a set of similar document to display to this user. And we're going to be providing this search over all these other documents to retrieve these nearest neighbors just based on the text of the document. So just based on the features of our datapoints. And we see retrieval applications almost everywhere. So perhaps we have some image and want to find a set of similar images. Or maybe we're shopping for some product and we want to be provided with a set of other similar items that we might want to consider instead. Or like we've discussed, maybe we're reading some article and want to read a related article on a topic. And this kind of idea appears in a lot of streaming contexts where you might be listening to a song, watching a movie or watching a TV show. And you want to be provided with a set of other songs or movies or TV shows that you might also be interested in. Or perhaps you're a user on some social network and based on features of you as a user. You want to be presented with other people that you might want to connect with. So in general, this idea of retrieving similar items can be a really, really powerful tool. And throughout this course we're going to discuss ways to provide this type of search. Then in the second part of the course we're going to turn to clustering as our machine learning method. And here, our inputs are features associated with each of our datapoints within a dataset. And the output are going to be cluster labels associated with each of these datapoints. Where the goal of the clustering method is to discover these disjoint sets of datapoints. And we're going to look at a couple case studies when we're talking about clustering. But one of them is going to be the same idea of a document retrieval task where we have an entire corpus of documents. And here though as opposed to our simple retrieval setting where we're just performing nearest neighbor search. Our goal is to discover a structured representation of this input space. So a structured grouping of our corpus into groups that might relate to things like topics present in this corpus. And just like we saw through retrieval, clustering has applications almost everywhere as well. So maybe when we're doing image search, we don't just want to take a given image and provide a set of related images. But we might want to actually structure our dataset where we're going to discover groups of related images. Or maybe we want to discover groups of Coursera learners like yourselves. Where based on features of the user and past behavior on Coursera. We can find users who are related to one another and might have similar interest and use this to better target courses to the learners. So this very simple ideas of retrieval and clustering. Actually have very, very significant impact in the world and our very much taken for granted. You just go on to your app or your device and you search for something. And you just assume okay, of course, is going to be able to provide a set of other things that might be related to the given query that you've made. Or maybe we can very easily just discover from data that we've seen, what are related groups of people or items. Well, how do we actually do this? So this is what this course is about and we're also going to touch upon how we do this at scale. And throughout this course, like always we're going to learn general machine learning concepts that are very useful. For example, in this course we're going to talk about the task of unsupervised learning. And we're also going to talk about things like MapReduce framework for scaling up algorithms by parallelizing their computations. [MUSIC]