[MUSIC] So far in this module, we've discussed clustering in the context of document analysis, but we see applications of clustering everywhere. As an example, maybe you want to cluster images, so in this case that we're showing here you might want to be able to group all images associated with pictures of the ocean or a pink flower, dog, sunset, clouds, and so on. And this can be useful for things like Google Image search, where a person types a word into Google Images, performs some query on that images dataset, and Google wants to display a set of images for that word. But maybe not all of our images, and the data set have been labeled by human. So if we can discover groups of related images, this can really help in image search, and there are other reasons you might want to perform clustering on images, too. So for example, when, again, using Google Images somebody types in a term. Like the term cardinal, well, that term might have multiple meanings and so if we take all images that have already been labeled, again, maybe this was by a human as cardinal. Then, maybe we want to perform a clustering on the images associated with this word because maybe some of the images correspond to a bird, but others to a baseball team and others to religious figures. So when a person is going and searching for the word cardinal, we don't actually know which of this meanings that person meant, so when we're displaying images instead of having all the images be images of birds but just different variance of that image. Maybe we want to have a diverse set of images, that cover these set of possible meanings. So if we can cluster images all labeled with the same label, we can present result that might better explore what the person is actually intending when they do a Google Image search. So in this case we're using clustering to structure our output. We can also think about applying clustering in applications that are very different than just for the sake of retrieval. For example, maybe we can think about grouping patients based on exhibited medical conditions. If we can form this grouping then we can do things like better study subpopulations as well as diseases. So as an example of this, imagine we have intracranial EEG recordings from a set of patients. So here, we're showing three different brains and some electrodes placed on these brains. And from each one of these patients, we get a set of recordings of different seizure events. Then, we can take each one of these seizure events. And think about clustering based on similarities between the events. And then, based on these types of clusterings, we can learn more about different types of seizures. We can characterize how many different types of seizures do we think there are as well as what are the properties of each one of these seizure types. We can also think about clustering patients based on the types of seizures that they exhibit. And better understand different subpopulations within the class of people who have seizure disorders. Another very different application is looking at products on Amazon. And maybe here we would like to group products based on having similar purchase histories. And when I talk about purchase history of a product what I mean is, if we look at all users that go and purchase a given item and look at other items that they purchase either in that same shopping or in the recent history and we look at that as a description of a product. So this would be aggregated over all users that purchased this product then we can think about clustering similar products. So for example, maybe when people go and buy this crib, we also see that they are buying a car seat and other baby items. And so, from this, we can may be we can lump this crib in with the set of baby items that are available on Amazon and even though let's say a seller labeled this under a category furniture, one thing we can do with this clustering is actually either correct or add a label which is baby, that this is a baby product. And of course, we could also use this type of structure to do product recommendation, somebody goes they have a certain things in their shopping cart and you want to recommend other products they might be interested in. And we can also think about discovering groups of users that have similar purchase behaviors, purchase or viewing behaviors. So there are lots of interesting things we can do through this clustering structure extracted both on users and products. The less obvious example were clustering can be useful is in the following task. So imagine that we want to estimate housing value a very small spatial locations. So here, in this image, we're showing the City of Seattle broken into senses tracks. So these are geographically small regions and we want to assess value within each one of these neighborhoods. But an issue here is the fact that there are very few house sales per region at any given point in time. And we want to be able to asses the value in that region. So how do we do it if there isn't much data? Well, one thing we can think about doing is discover clusters of regions that historically have behaved similarly. And if we discover this cluster of regions, we can think about sharing information between these regions, pooling our observations to form better estimates of value locally. A structurally related task but with a very different application is in forecasting crimes. So now what this image is showing is Washington, DC. Again, broken down into census tracks and the goal is to be able to forecast how many crimes are going to occur in each one of these census tracks at the next point in time? And again, here, we can think about clustering regions to share information and you can show that forming predictions based on a discovered group of regions that behave similarly, leads to better forecasts of crime rates, and if we were to treat each region independently. So as we see, there's a wide range of applications for clustering and the methods that we described in this module extend to any one of these applications. Obviously with some application specific tweaks to capture what it means to be a data object within a cluster and what are notions of distances between these objects. [MUSIC]