[MUSIC] So far in this module,
we've discussed clustering in the context of document analysis, but we
see applications of clustering everywhere. As an example, maybe you want to cluster
images, so in this case that we're showing here you might want to be able
to group all images associated with pictures of the ocean or a pink flower,
dog, sunset, clouds, and so on. And this can be useful for
things like Google Image search, where a person types a word into
Google Images, performs some query on that images dataset, and Google wants to
display a set of images for that word. But maybe not all of our images, and
the data set have been labeled by human. So if we can discover
groups of related images, this can really help in image search, and there are other reasons you might want to
perform clustering on images, too. So for example, when, again, using
Google Images somebody types in a term. Like the term cardinal, well,
that term might have multiple meanings and so if we take all images that
have already been labeled, again, maybe this was by a human as cardinal. Then, maybe we want to perform
a clustering on the images associated with this word because maybe some
of the images correspond to a bird, but others to a baseball team and
others to religious figures. So when a person is going and
searching for the word cardinal, we don't actually know which of
this meanings that person meant, so when we're displaying images
instead of having all the images be images of birds but
just different variance of that image. Maybe we want to have
a diverse set of images, that cover these set of possible meanings. So if we can cluster images all
labeled with the same label, we can present result that might
better explore what the person is actually intending when they
do a Google Image search. So in this case we're using
clustering to structure our output. We can also think about applying
clustering in applications that are very different than just for
the sake of retrieval. For example, maybe we can think about grouping patients
based on exhibited medical conditions. If we can form this grouping then
we can do things like better study subpopulations as well as diseases. So as an example of this, imagine we have intracranial EEG
recordings from a set of patients. So here,
we're showing three different brains and some electrodes placed on these brains. And from each one of these patients, we get a set of recordings
of different seizure events. Then, we can take each one
of these seizure events. And think about clustering based on
similarities between the events. And then,
based on these types of clusterings, we can learn more about
different types of seizures. We can characterize how many different
types of seizures do we think there are as well as what are the properties of
each one of these seizure types. We can also think about clustering
patients based on the types of seizures that they exhibit. And better understand different
subpopulations within the class of people who have seizure disorders. Another very different application
is looking at products on Amazon. And maybe here we would like to group
products based on having similar purchase histories. And when I talk about purchase
history of a product what I mean is, if we look at all users that go and
purchase a given item and look at other items that they purchase
either in that same shopping or in the recent history and we look at
that as a description of a product. So this would be aggregated over all
users that purchased this product then we can think about
clustering similar products. So for example,
maybe when people go and buy this crib, we also see that they are buying
a car seat and other baby items. And so, from this,
we can may be we can lump this crib in with the set of baby items
that are available on Amazon and even though let's say a seller labeled
this under a category furniture, one thing we can do with this clustering
is actually either correct or add a label which is baby,
that this is a baby product. And of course, we could also use this type
of structure to do product recommendation, somebody goes they have a certain
things in their shopping cart and you want to recommend other products
they might be interested in. And we can also think about
discovering groups of users that have similar purchase
behaviors, purchase or viewing behaviors. So there are lots of interesting things we
can do through this clustering structure extracted both on users and products. The less obvious example were clustering
can be useful is in the following task. So imagine that we want to estimate
housing value a very small spatial locations. So here, in this image, we're showing the
City of Seattle broken into senses tracks. So these are geographically
small regions and we want to assess value within
each one of these neighborhoods. But an issue here is the fact that
there are very few house sales per region at any given point in time. And we want to be able to asses
the value in that region. So how do we do it if
there isn't much data? Well, one thing we can think about
doing is discover clusters of regions that historically have behaved similarly. And if we discover this
cluster of regions, we can think about sharing
information between these regions, pooling our observations to form
better estimates of value locally. A structurally related task but with a very different application
is in forecasting crimes. So now what this image is
showing is Washington, DC. Again, broken down into census tracks and the goal is to be able to forecast
how many crimes are going to occur in each one of these census
tracks at the next point in time? And again, here, we can think about
clustering regions to share information and you can show that forming
predictions based on a discovered group of regions that behave similarly,
leads to better forecasts of crime rates, and if we were to treat
each region independently. So as we see,
there's a wide range of applications for clustering and the methods that
we described in this module extend to any one of these applications. Obviously with some application
specific tweaks to capture what it means to be a data
object within a cluster and what are notions of distances
between these objects. [MUSIC]