By now you have seen a lot of different learning algorithms.
And if you've been following along
these videos you should consider
yourself an expert on many state-of-the-art machine learning techniques.
But even among
people that know a certain learning algorithm.
There's often a huge difference between
someone that really knows how
to powerfully and effectively apply
that algorithm, versus someone that's
less familiar with some of
the material that I'm about
to teach and who doesn't really
understand how to apply these
algorithms and can end up
wasting a lot of
their time trying things out that don't really make sense.
What I would like to do is
make sure that if you
are developing machine learning systems,
that you know how to choose
one of the most promising avenues to spend your time pursuing.
And on this and the next
few videos I'm going to
give a number of practical
suggestions, advice, guidelines on how to do that.
And concretely what we'd
focus on is the problem
of, suppose you are
developing a machine learning system
or trying to improve the performance
of a machine learning system, how
do you go about deciding what are
the proxy avenues to try
next?
To explain this, let's continue using
our example of learning to
predict housing prices.
And let's say you've implement and regularize linear regression.
Thus minimizing that cost function
j.  Now suppose that
after you take your learn parameters,
if you test your hypothesis on
the new set of houses, suppose you
find that this is making huge
errors in this prediction of the housing prices.
The question is what should
you then try mixing in order to improve the learning algorithm?
There are many things that one
can think of that could improve
the performance of the learning algorithm.
One thing they could try, is to get more training examples.
And concretely, you can imagine, maybe, you
know, setting up phone surveys, going
door to door, to try to
get more data on how much
different houses sell for.
And the sad thing is I've seen
a lot of people spend a
lot of time collecting more training
examples, thinking oh, if we have
twice as much or ten times
as much training data, that is certainly going to help, right?
But sometimes getting more training
data doesn't actually help
and in the next few videos
we will see why, and we
will see how you
can avoid spending a lot
of time collecting more training data
in settings where it is just not going to help.
Other things you might try are
to well maybe try a smaller set of features.
So if you have some set
of features such as x1,
x2, x3 and so on,
maybe a large number of features.
Maybe you want to spend time
carefully selecting some small
subset of them to prevent overfitting.
Or maybe you need to get additional features.
Maybe the current set of features
aren't informative enough and you
want to collect more data in the sense of getting more features.
And once again this is the
sort of project that can scale
up the huge projects can you
imagine getting phone
surveys to find out more
houses, or extra land
surveys to find out more
about the pieces of land and so on, so a huge project.
And once again it would be
nice to know in advance if
this is going to help before we
spend a lot of time doing something like this.
We can also try
adding polynomial features things like
x2 square x2 square and product
features x1, x2.
We can still spend quite a
lot of time thinking about that and
we can also try other things like
decreasing lambda, the regularization parameter or increasing lambda.
Given a menu of options
like these, some of which
can easily scale up to
six month or longer projects.
Unfortunately, the most common
method that people use to
pick one of these is to go by gut feeling.
In which what many people
will do is sort of randomly
pick one of these options and
maybe say, "Oh, lets go and get more training data."
And easily spend six months collecting
more training data or maybe someone
else would rather be saying, "Well,
let's go collect a lot more features on these houses in our data set."
And I have a lot
of times, sadly seen people spend, you know,
literally 6 months doing one
of these avenues that they have
sort of at random only to
discover six months later that
that really wasn't a promising avenue to pursue.
Fortunately, there is a
pretty simple technique that can
let you very quickly rule
out half of the things
on this list as being potentially
promising things to pursue.
And there is a very simple technique,
that if you run, can easily
rule out many of these options,
and potentially save you
a lot of time pursuing something that's just is not going to work.
In the next two videos
after this, I'm going to
first talk about how to evaluate learning algorithms.
And in the next few
videos after that, I'm
going to talk about these techniques,
which are called the machine learning diagnostics.
And what a diagnostic is, is
a test you can run,
to get insight into what
is or isn't working with
an algorithm, and which will
often give you insight as to
what are promising things to try
to improve a learning algorithm's
performance.
We'll talk about specific diagnostics later in this video sequence.
But I should mention in advance
that diagnostics can take
time to implement and can sometimes,
you know, take quite a
lot of time to implement and
understand but doing so
can be a very good use
of your time when you are
developing learning algorithms because they
can often save you from
spending many months pursuing an
avenue that you could
have found out much earlier just was not going to be fruitful.
So in the next few
videos, I'm going to first
talk about how evaluate your
learning algorithms and after
that I'm going to talk
about some of these diagnostics which will hopefully
let you much more
effectively select more of the
useful things to try mixing
if your goal to improve
the machine learning system.