[MUSIC] So this leads directly into this discussion that clustering is an unsupervised task. So let's talk a little bit about what this means. Because so far in this specialization we've assumed that we've been presented with a set of outputs associated with our inputs. So for example in the regression course when we're talking about predicting house prices, in our training data set, we assume that not only do we have the features of the house which are the input, but we also had the price, the sales price of the house, which was output. In the classification course, we assume that we had restaurant reviews, we have a text of those restaurant reviews which are our input and now we also had the output, whether it is a positive or a negative review for the reviews in our training set. So, if we were to imagine that we had labels in this application, maybe we would have labels of articles in term of topics. So these articles where the input again would be the text of the article, just like with reviews but the output would be a topic like sports or world news or entertainment or science. And so in this case, we can think of problem as a multi class classification problem, where imagine you get some new article and you want to label what topic that article is about. Well then you just go into your training set and based on the classifier that you've learned from that training set where labels were provided, you can then think about labeling this document as one of these five different topics presented here. So this is an example of a supervised learning problem, because outputs are provided or these class labels are provided on our training examples. But when we're thinking about doing clustering we're assuming there are no labels provided, or that the labels are really really unreliable. And the goal here is to uncover this grouping structure directly from the data, the input values alone. So in particular, here we might imagine that we have documents with just two words in the vocabulary. So let's say here's word one, counts, and here's word two, counts, and each one of these points is a given document, document Xi. And so in particular, our input to the algorithm is going to be documents represented as vectors, Xi. But then the output are going to be cluster labels. So in particular, maybe we'll label this red cluster as cluster 1, this green cluster as cluster 2, and this blue cluster as cluster 3, then associate it with Xi. The output might be zi = 1, that it's associated with cluster 1, and we are going to output this for every observation in our data set. Just to be clear, maybe this red cluster corresponds to articles about world news, this green cluster corresponds to articles about science, and this blue cluster corresponds to articles about entertainment. So we're not provided with those labels, but we're going to group articles, color them, as we're doing in this picture, based on similarities in the observed input space. So in this simple 2D example, this would be similarities between articles, based on a simple two word vocabulary and counting how many times those words appear in each of these articles. Okay, well this is an example of an unsupervised learning task because all we're presenting our algorithm with are the inputs and the algorithm is supposed to output a set of labels as the output. Well, how can we think about learning such labels? Well a key component of that question is thinking about what defines a cluster? And once we've defined what a cluster means, then we need to define algorithms for inferring these clusterings. Okay, but a cluster is defined by its center, which is often called a centroid, as well as the shape of the cluster. And in this application here, the clusters are defined by ellipses. So here, this red circle is an example of an ellipse, and this green cluster has this elongated ellipse that's at one angle and the blue cluster has another smaller ellipse at a different rotation. So these are the family of shapes that we're exploring, our ellipses, different rotations and stretchings of these ellipses, but that's what we're using to define a cluster. And then, when we have a given observation, again, observation xi, which is a specific document. The way we think about assigning that document to a given cluster is based on something we'll call the score under each cluster. So often we just think about defining this score as the distance to the cluster center, so we can think about, let me choose another color just for fun to make this stand out a little bit more on this slide. So the distance to the cluster centers are these distances here, And what we see is that the smallest distance is the distance to this red cluster. So if the distance to the cluster alone is what defines the score, then this document Xi would be assigned to this red cluster. [MUSIC]