In this video I'd like to
talk about one last detail
of K-means clustering which is
how to choose the number of
clusters, or how to choose
the value of the parameter
capsule K. To be
honest, there actually isn't a
great way of answering this
or doing this automatically and
by far the most common way
of choosing the number of clusters,
is still choosing it manually
by looking at visualizations or by
looking at the output of the clustering algorithm or something else.
But I do get asked
this question quite a lot of
how do you choose the number of
clusters, and so I just want
to tell you know what
are peoples' current thinking on
it although, the most
common thing is actually to
choose the number of clusters by hand.
A large part of
why it might not always
be easy to choose the
number of clusters is that
it is often generally ambiguous how many clusters there are in the data.
Looking at this data set
some of you may see
four clusters and that
would suggest using K equals 4.
Or some of you may
see two clusters and
that will suggest K equals
2 and now this may see three clusters.
And so, looking at the
data set like this, the
true number of clusters, it actually
seems genuinely ambiguous to me,
and I don't think there is one right answer.
And this is part of our supervised learning.
We are aren't given labels, and
so there isn't always a clear cut answer.
And this is one of the
things that makes it more difficult
to say, have an automatic
algorithm for choosing how many clusters to have.
When people talk about ways of
choosing the number of clusters,
one method that people sometimes
talk about is something called the Elbow Method.
Let me just tell you a little bit about that,
and then mention some of its advantages but also shortcomings.
So the Elbow Method,
what we're going to do is vary
K, which is the total number of clusters.
So, we're going to run K-means
with one cluster, that means really,
everything gets grouped into a
single cluster and compute the
cost function or compute the distortion
J and plot that here.
And then we're going to run K
means with two clusters, maybe
with multiple random initial agents, maybe not.
But then, you know,
with two clusters we should
get, hopefully, a smaller distortion,
and so plot that there.
And then run K-means with three
clusters, hopefully, you get even
smaller distortion and plot that there.
I'm gonna run K-means with four, five and so on.
And so we end up with
a curve showing how the
distortion, you know, goes
down as we increase the number of clusters.
And so we get a curve that maybe looks like this.
And if you look at this
curve, what the Elbow Method does
it says "Well, let's look at this plot.
Looks like there's a clear elbow there".
Right, this is, would be by
analogy to the human arm where,
you know, if you imagine that
you reach out your arm,
then, this is your
shoulder joint, this is your
elbow joint and I guess, your hand is at the end over here.
And so this is the Elbow Method.
Then you find this sort of pattern
where the distortion goes down rapidly
from 1 to 2, and 2 to
3, and then you reach an
elbow at 3, and then
the distortion goes down very slowly after that.
And then it looks like, you
know what, maybe using three
clusters is the right
number of clusters, because that's
the elbow of this curve, right?
That it goes down, distortion goes
down rapidly until K equals
3, really goes down very slowly after that.
So let's pick K equals 3.
If you apply the Elbow Method,
and if you get a plot
that actually looks like this,
then, that's pretty good, and
this would be a reasonable way
of choosing the number of clusters.
It turns out the Elbow Method
isn't used that often, and one
reason is that, if you
actually use this on
a clustering problem, it turns out that
fairly often, you know,
you end up with a curve
that looks much more ambiguous, maybe something like this.
And if you look at this,
I don't know, maybe there's
no clear elbow, but it
looks like distortion continuously goes down,
maybe 3 is a
good number, maybe 4 is
a good number, maybe 5 is also not bad.
And so, if you actually
do this in a practice, you know,
if your plot looks like the one on the left and that's great.
It gives you a clear answer, but
just as often, you end
up with a plot that looks
like the one on the right and
is not clear where the
ready location of the elbow
is. It  makes it harder to
choose a number of clusters using this method.
So maybe the quick summary
of the Elbow Method is that is worth the shot
but I wouldn't necessarily,
you know, have a very high
expectation of it working for any particular problem.
Finally, here's one other way
of how, thinking about how
you choose the value of K,
very often people are running
K-means in order you
get clusters for some later
purpose, or for some sort of downstream purpose.
Maybe you want to use K-means
in order to do market segmentation,
like in the T-shirt sizing example that we talked about.
Maybe you want K-means to organize
a computer cluster better, or
maybe a learning cluster for some
different purpose, and so,
if that later, downstream purpose,
such as market segmentation. If
that gives you an evaluation metric,
then often, a better
way to determine the number of
clusters, is to see
how well different numbers of
clusters serve that later downstream purpose.
Let me step through a specific example.
Let me go through the T-shirt
size example again, and I'm
trying to decide, do I want three T-shirt sizes?
So, I choose K equals 3, then
I might have small, medium and large T-shirts.
Or maybe, I want to choose
K equals 5, and then I
might have, you know, extra
small, small, medium, large
and extra large T-shirt sizes.
So, you can have like 3 T-shirt sizes or four or five T-shirt sizes.
We could also have four T-shirt
sizes, but I'm just
showing three and five here,
just to simplify this slide for now.
So, if I run K-means with
K equals 3, maybe I end
up with, that's my small
and that's my
medium and that's my large.
Whereas, if I run K-means with
5 clusters, maybe I
end up with, those are
my extra small T-shirts, these
are my small, these are
my medium, these are my
large and these are my extra large.
And the nice thing about this
example is that, this then
maybe gives us another way
to choose whether we want
3 or 4 or 5 clusters,
and in particular, what you can
do is, you know, think
about this from the perspective
of the T-shirt business and
ask: "Well if I have
five segments, then how well
will my T-shirts fit my
customers and so, how many T-shirts can I sell?
How happy will my customers be?"
What really makes sense, from the
perspective of the T-shirt business,
in terms of whether, I
want to have Goer T-shirt sizes
so that my T-shirts fit my customers better.
Or do I want to have fewer
T-shirt sizes so that
I make fewer sizes of T-shirts.
And I can sell them to the customers more cheaply.
And so, the t-shirt selling
business, that might give you
a way to decide, between three clusters versus five clusters.
So, that gives you an
example of how a
later downstream purpose like
the problem of deciding what
T-shirts to manufacture, how that
can give you an evaluation metric for choosing the number of clusters.
For those of you that are
doing the program exercises, if
you look at this week's
program exercise associative K-means, that's
an example there of using K-means for image compression.
And so if you were trying to
choose how many clusters
to use for that problem, you could
also, again use the
evaluation metric of image compression
to choose the number of clusters, K?
So, how good do you want the
image to look versus, how much
do you want to compress the file
size of the image, and,
you know, if you do the
programming exercise, what I've just
said will make more sense at that time.
So, just summarize, for the
most part, the number of
customers K is still chosen
by hand by human input or human insight.
One way to try to
do so is to use
the Elbow Method, but I
wouldn't always expect that to
work well, but I think
the better way to think about
how to choose the number of
clusters is to ask, for
what purpose are you running K-means?
And then to think, what is
the number of clusters K that
serves that, you know, whatever
later purpose that you actually run the K-means for.