If you run the learning algorithm
and it doesn't do as well
as you are hoping, almost all
the time it will be because
you have either a high bias
problem or a high variance problem.
In other words they're either an
underfitting problem or an overfitting problem.
And in this case it's very
important to figure out
which of these two problems is
bias or variance or a bit of both that you
actually have.
Because knowing which of these
two things is happening would give
a very strong indicator for whether
the useful and promising ways
to try to improve your algorithm.
In this video, I would like
to delve more deeply into
this bias and various issue and
understand them better as well
as figure out how to look
at and evaluate knows whether or not we might have a bias problem or a variance problem.
Since this would be critical to
figuring out how to improve the performance of learning algorithm that you implement.
So you've already
seen this figure a few times,
where if you fit two simple
hypothesis, like a straight line that that underfits the data.
If you fit a two complex
hypothesis, then that might
fit the training set perfectly but
overfit the data and this
may be hypothesis of some
intermediate level of complexity,
of some, maybe degree two
polynomials are not too low and not too high degree.
That's just right.
And gives you the best
generalization error out of these options.
Now that we're armed with the
notion of training and validation
in test sets, we can understand
the concepts of bias and variance a little bit better.
Concretely, let our
training error and cross
validation error be defined as
in the previous videos, just say,
the squared error, the average
squared error as measured
on the 20 sets or as
measured on the cross validation set.
Now let's plot the following figure.
On the horizontal axis I am
going to plot the degree of polynomial,
so as I go the right
I'm going to be fitting higher and higher order polynomials.
So, we'll do that for this
figure, where maybe d equals 1,
were going to be fitting
very simple functions where as
we are the right of this
this may be
d equals 4 or relatively may
be even larger numbers. I'm going to be fitting
very complex high order polynomials that
might fit the training set with much more complex functions
whereas we're
here on the right of the
horizontal axis, I have much larger values of these
of a much higher degree polynomial, and
so here that is going
to correspond to fitting much
more complex functions to your
training set.
Let's look at
the training error and cause-validation error
and plot them on this figure.
Let's start with the training error.
As we increase the degree of the
polynomial, we're going to
fit our training set better and better and so, if d equals 1
that ever rose to the high training error.
If we have a
very high degree of
polynomial, our training error is going to be really low.
Maybe even zero, because it will fit the training set really well.
And so as we increase
of the greater polynomial we find
typically that the training
error decreases, so I'm
going to write j subscript
train of theta there, because
our training error tends to
decrease with the degree
of the polynomial that we fit to the data.
Next, let's look at the cross validation error. Often that matter, if
we look at the test set error
we'll get a pretty similar result as
if we were to plot the
cross validation error. So, we know that if d equals 1, we're fitting
a very simple function, and
so we may be underfitting the
training set, and so we're
going to go very high cross-validation error.
If we fit, you
know, an intermediate degree polynomial; we
have a d equals 2 in our
example in the previous slide,
we are going to have a
much lower cross-validation error, because
we are just fitting, finding
a much better fit to the data.
And conversely if d were
too high, so if d
took on say a value of
four, then we're again
overfitting and so we
end up with a high value for cross-validation error.
So if you were to vary
this smoothly and plot a
curve you might end up
with a curve like that, where
that's Jcv of theta,
and again if you plot j
test of theta you get something very similar.
And so this sort of
plot also helps us
to better understand the notions
of bias and variance. Concretely, if you
have a learning algorithm that's
not performing as well as
you wanted it to, how
can you figure out if your learning algorithm is suffering.
Concretly, suppose you have applied a
learning algorithm and it is
not performing as well
as your are hoping, so your
cross-validation set error or your test set error is high.
How can we figure out if
the learning algorithm is suffering
from high bias or if it is suffering from high variance.
So the setting of a cross-validation
error being high corresponds to
either this regime or this regime.
So this regime on the
left corresponds to a
high bias problem, that is,
if you are fitting an overly
low order polynomial such as
a plus one, when we
really needed a higher order polynomial to fit the data.
Whereas in contrast, this regime
corresponds to a high variance problem.
That is, if d--the degree of polynomial--was
too large for the data set that we have.
And this figure gives us
a clue for how to distinguish between these two cases.
Concretely, for the high
bias case, that is,
the case of under fitting, what
we find is that both
the cross validation error and
the training error are going to be high.
So, if your algorithm is
suffering from a bias problem,
the training set error
would be high and you
may find that the cross
validation error will also be high.
It might be close, maybe
just slightly higher then a training error.
And so, if you see this combination,
that's a sign that your algorithm
may be suffering from high bias.
In contrast; if
your algorithm is suffering from high
variance; then, if you look here,
we'll notice that, J
train, that is the training
error, is going to be low.
That is, you're fitting the training set very well.
Whereas, your cross validation error, assuming
that this say the squared
error which we're trying to minimize.
Whereas in contrast; your
error on a cross validation
set or your cross function like cross
validation set, will be
much bigger than your training set error.
This double greater than sign,
here, it means much bigger than, all right. So, it's much greater than to multiply great to great.
So this is a double greater
than sign, that is the
map symbol for much greater
than denoted by two greater than signs.
And so if you see this
combination, then what you
find. And so if you see this combination of values, then
that is a clue that
your learning algorithm may be suffering
from high variance and might be overfitting.
And the key that distinguishes these two
cases is if you
have a high bias problem your
training set error will also
be high as your
hypothesis just not fitting the training set well.
And if you have a high
variance problem, your training
set error will usually be low,
that is much lower than the cross validation error.
So, hopefully that gives you
a somewhat better understanding of the
two problems of bias and variance.
I still have a lot more
to say about bias and variance in the next few videos.
But what we will see later; is
that by diagnosing, whether a learning
algorithm may be suffering from high bias or a high variance.
I'll show you even more details on how to do that in later videos.
We'll see that by figuring out
whether a learning algorithm may be
suffering from high bias or
a combination of both that
that would give us much better
guidance for what might be
promising things to try
in order to improve the performance of the learning algorithm.