in earlier videos, I have
said over and over that, when
you are developing machine learning system,
one of the most valuable resources is
your time as the developer
in terms of picking what
to work on next.
Or, you have a team of developers
or a team of engineers working together
on a machine learning system, again
one of the most valuable resources is
the time of the engineers or the developers working on the system.
And what you really want to
avoid is that you or
your colleagues or your friends spend
a lot of time working on
some component, only to realize
after weeks or months of
time spent, that all that
work, you know, just doesn't
make a huge difference on the performance of the final system.
In this video, what I'd
like to to is, to talk about something called ceiling analysis.
When you or your team
are working on a pipeline
machine learning system, this can
sometimes give you a very
strong signal, a very strong
guidance, on what parts
of the pipeline might be the best use of your time to work on.
To talk about ceiling analysis, I'm
going to keep on using the
example of the photo
OCR pipeline and I said
earlier each of these
boxes text detection, character
segmentation, character recognition, each
of these boxes can have even
a small engineering team working
on it, or maybe the
entire system is just built
by you, either way, but
the question is, where should you allocate resources?
Which of these boxes is
most worth your efforts, trying
to improve the performance of.
In order to explain the idea
of ceiling analysis, I'm going
to keep using the example of our photo OCR pipeline.
As I mentioned earlier, each of
these boxes here, each of
these machine learning components could be
the work of even a
small team of engineers, or
maybe the whole system could be built by just one person.
But the question is, where should
you allocate scarce resources?
Now this, which of these
components, or which one or
two or maybe all three of these components
is most worth your time
to try to improve the performance of.
So here's the idea of ceiling analysis.
As in the development process for
other machine learning systems as
well, in order to make
decisions on what to do
for developing the system
is going to be
very helpful to have a
single road number evaluation metric for this learning system.
So let's say we pick characters level accuracy.
So if, you know, given a
test set image, while just
a fraction of alphabets of
characters in the testing image that
we recognize correctly.
Or you can pick some other single world
number evaluation metric, if you
want, but let's say that
whatever evaluation metric we
pick, we get that, we
find that the overall system currently has 72% accuracy.
So, in other
words, we have some set
of test set images and for
each test set images, we
run it through text section, then
character 7 nation, then character
recognition, and we find
that on our test set, the
overall accuracy of the
entire system was 72% on one of the metric you chose.
Now just the idea behind
sealing analysis which is that
we're going to go to let
see the first module of a
machinery pipelines text detection.
And what we are going
to do is we are going to
monkey around with the test set.
We are going to go to the
test set and for every test example
we are just going to provide it
the correct text detection outputs.
In other words, we are going
to the test set and just
manually tell the algorithm
where the text is
in each of the test examples.
So in other words, we
are going to simulate what happens
if we have a text detection
system with a 100%
accuracy, for the purpose
of detecting text in an image.
And really the way you
do that is very simple right, instead
of letting your learning algorithm
detect the text in the images.
You wouldn't say go to the
images and just manually label what
is the location of the text in my test set image.
And you would then let these
correct, so let these ground
true labels of where as
the text be part of
your text set and use these
ground true labels what you
feed in to the next
stage of the pipeline, to the character segmentation pipeline.
So just said it again, by
putting a checkmark over here,
what I mean is Im going
to go to my test set and
just give it the correct answers,
give it the correct labels, for
the text detection part of the pipeline.
So that, as it, I have
a perfect text detection system
on my test One into
do that run this data
to the rest of five points
paper presentation and counter definition.
And then, use the same
evaluation metric as before,
to measure what is the
overall accuracy of the entire system.
And with perfect hopefully the performance goes up.
Let 's say it
goes up 89% and then
were going to keep going, next lets
go to the next selection of
pipeline, two character segmentation and again were going to go to my test.
And now going to
give the correct text detection
output and give the correct
character segmentation outputs and
manually label the correct
segment orientations of text into individual characters.
And see how much that helps.
And let's say it goes up to
90% accuracy for the overall system.
Alright so as always the accuracy is.
Accuracy of the overall systems.
So whatever the final output
of the character recognition system is.
Whatever the final output of
the overall pipeline is, it's going
to measure the accuracy of that.
And then finally like character recognition
system and give that the correct label as well.
And if I do that too then, no surprise that I should get a 100% accuracy.
Now, the nice thing about having
done this analysis analysis is we
can now understand what is
the upside potential for improving each of these components.
So we see that if we get perfect text detection.
Our performance went up from
72 to 89 percent, so
that's' a 17 percent performance gain.
So this means that you've
to take your current system you
spend a lot of time improving text detection.
That means that we could potentially improve
our system's performance by 17 percent.
This seems like it's well worth our while.
Whereas in contrast, when going
from text detection When we
gave it perfect character segmentation, performance went up only by one percent.
So, that's a more sobering message.
It means that no matter how
much time you spend character segmentation,
maybe the upside potential is
going to be pretty small, and maybe
you do not want to
have a large team of engineers
working on character segmentation that
this sort of analysis shows that
even when you give it the
perfect character segmentation, your
performance goes up by only one percent.
So right there, this is really estimates.
What is the ceiling, or what's
an upper bound on how much
you can improve the performance of your
system by working on one of these components?
And finally, going for character,
when we get better
character recognition, the performance went up by ten percent.
So you know, again you
can decide, is a ten
percent improvement, how much is that working out?
It tells you that maybe
with more efforts spent on the
last station of the pipeline,
you can improve the performance
of the systems as well.
Another way of thinking about this
is that, by going through this
sort of analysis you're trying to
figure out, you know, what is
the upside potential, of improving
each of these components or how
much could you possibly gain if
one of these components became absolutely
perfect and just really
places an upper bound on the performance of that system.
So, the idea of ceiling analysis is pretty important.
Let me just illustrate this idea again, but with a different example but a more complex one.
Let's say that you want to
do face recognition from images,
so unless you want to look at
the picture and recognize whether or
not the person in this picture
is a particular friend of yours,
trying to recognize the person shown in this image.
This is a slightly artificial example.
This isn't actually how face
recognition is done in
practice, but I want to step through an example of what a
pipeline might look like to
give you another example of how
a ceiling analysis process might look.
So, we have a
camera image and let's say that we design a pipeline as follows.
Let's say the first thing you want
to do is do pre-processing of
the image, so let's take those
images like I have shown on
the upper right, and let's say we
want to remove the background, so
through pre-processing the background disappears.
Next we want to say detect the face of the person.
That's usually done with a learning algorithm.
So we'll run a sliding
windows crossfire to draw a box around the person's face.
Having detected the face it
turns out that if you
want to recognize people it turns
out that the eyes is a highly useful cue.
We actually, in terms
ofrecognizing your friends, the
appearance of their eyes is actually
one of the most important cues that you use.
So let's run another crossfire to detect the eyes of the person.
So, segment out the eyes,
and then and since this
will give us useful features to
recognize a person, and then
other parts of the face of physical interest.
Maybe segment out the nose,
segment out the mouth, and
then, having found the
eyes, the nose and the mouth,
all of these give us useful
features to maybe feed into
a logistic regression crossfire.
And it's the job of the
crossfire to then give us the
overall label to find the
label for who we think
is the identity of this person.
So this is a kind of complicated pipeline.
It's actually probably more complicated
than you should be using, if you actually want to recognize people.
But there's an illustrative example that's useful to think about for ceiling analysis.
So how do you go through ceiling analysis for this pipeline?
Well, we'll step through these pieces one at a time.
Let's say your overall system has
85 percent accuracy, the first
thing I do is go to
my test set and manually
give it a ground foreground, background,
segmentations, and then manually go to
the test set, and use Photoshop
or something, to just tell it
where's the background, and just
manually remove the background, so
ground true background, and see how much the accuracy changes.
In this example, the accuracy
goes up by 0.1%  so
this is a strong sign that
even if you had perfect background
segmentation your performance, even
if perfect background removal, the
performance of your system isn't going to go up that much.
So this is maybe not worth a
huge effort to work on pre-processing, on background removal.
Then, everything goes to the
test set, given the correct
face detection images, then again
step through the eyes, nose, mouth segmentations in some order.
Pick one order.
Let's give the correct location
of the eyes, correct location of
the nose, correct location of
the mouth, and then finally
if I just give it the correct overall label, I get 100% accuracy.
And so, you know, as
I go through the system
and just give more and more
components the correct labels
in the test set, the performance
So if the overall system goes up,
and you can look at how much
the performance went up on
different steps, so, you know, from
giving it the perfect face detection,
and it looks like the overall
performance of this system went up by 5.9 percent.
So that's a pretty big jump,
means that maybe it's worth quite
a bit of effort on better face detection.
Went four percent there, went
one percent there, one percent
there and three percent there.
So it looks like the
components that most worth
our while are, when
I gave it perfect face detection,
system went up.
By 5.9 performance, might give
it perfect eye segmentation, went up
by 4%, and then my final logistical
crossfire, well there's another 3 percent gap there maybe.
And so, this tells us
maybe one of the components that are most worth our while working on.
And by the way, I
want to tell you, it's a true cautionary story.
The reason I put in this
pre-processing background removal is
because I actually know
of a true story where there
was a research team that actually
literally had two people spend
about a year and a half,
spend 18 months, working on
better background removal.
We are rushing here... I am
obscuring the details for obvious
reasons, but there was a
computer vision application where there
was a team of two engineers
who literally spent I think
about a year and a half, working
on better background removal.
Actually they worked out
really complicated algorithms, so I ended up publishing I think, one research paper.
But after all that work they
found that, it just did
not make a huge difference to
the overall performance of the
actual application they were working on.
And if only, you know if
only someone were to do a [xx] analysis
beforehand, maybe we could have realized this.
And one of them said to me
afterward, you know, if only they
had done the sort of analysis
like this, maybe they could
have realized before that 18 months
of work, that they
should have spent their effort focusing
on some different component than literally
spending 18 months working on background removal.
So to summarize, pipelines are
pretty pervasive and complex machine learning applications.
And when you are working on
a big machine learning application, I
mean I think your time as a developer is so valuable.
So just don't waste your
time working on something that ultimately isn't going to matter.
And in this video, we talked
about this idea of ceiling analysis,
which I've often found to
be a very good tool for
identifying the component, and if
you actually put a focused effort
on that component, and make a
big difference, it would actually
have a huge effect on the
overall performance of your final system.
So, over the years, working
with machine learning, I've actually learned
to not trust my own gut
feeling about what component to work on.
So, very often, when you have
worked with machine learning for a
long time, but often, our local
machine learning problem, and I
may have some gut feeling about,
oh, let's, you know, jump on that component, and just spend more time on that.
That over the years that I
have come to even trust my
own gut feelings and knowing not
to trust gut feelings that much
and instead really have a
solid machine learning problem, where it's
possible to structure things.
To do a ceiling analysis often
does a much better and much
more reliable way for deciding
where to put a focused effort
into, to really improve this,
the performance of some component and
we kind of be sure that when
do that it will actually have
a huge effect on the final performance of your process system.