[MUSIC] [MUSIC] So, in particular we an also face this
issue of overfitting when we get lots and lots of inputs. [MUSIC] That represents a very flexible model that
can run into the same issues that we saw in our demo for polynomial regression. Or more generally, we can say
just if we have lots of features. So we'll say that capital D is very large. And this could be different
functions of our input. But when you include lots and
lots of these functions of our inputs, in our regression model then again
we're in this place where the model has a lot of flexibility to explain the data
and we're subject to becoming overfit. But this issue of overfitting with
respect to increasing model complexity is really relative to how
much data that we have. So let's talk about overfitting
as a function of the number of observations that we have. As well as a function of
the number of inputs. Or the complexity of the model. So in particular if we have
very few observations and it's small, then our models can
rapidly become overfit to the data. Because we have only a few points and
as we're increasing in our model complexity like
the order of the polynomial, it becomes very easy to hit
all of our observations, but in between where we have those
observations, things can go very wild. On the other hand, if we have lots and
lots and lots of observations, even with really, really complex models,
we're not gonna as quickly become overfit because we have dense
observations across our input, so the function is pinned
down basically everywhere. In this example as
a function of square feet. And it's not able to
hit every observation, it's not able to do these
really crazy wiggly things. Okay. So, on the other hand when
we have just one input like number of square feet of a house
in order to avoid overfitting, we need to have observations that are very
dense across number of square feet. So we need to have lots of representative
examples of square feet and house value pairs. So this is actually pretty hard to do,
to have lots of examples of houses of every possible
square feet that you might see. So this is already a hard problem, but it becomes even harder when I increase
the number of inputs in my model. So, for example, just think of
a model where I have square feet and number of bathrooms. And I want to cover all possible
combinations of those two inputs in order to provide representative
examples and avoid overfitting. Well that's really really hard. [MUSIC] [MUSIC] [MUSIC] [MUSIC] [MUSIC]