You now know about linear regression
and gradient descent. The plan from here
on out is to tell you about a couple of
important extensions of these ideas.
Concretely here they are. First it turns
out that in order to solve this
minimization problem, turns out there's an
algorithm for solving for theta zero and
theta one exactly without needing an
iterative algorithm. Without needing this
algorithm like gradient descent that we
had to iterate, you know, multiple times
over. So it turns out there are advantages
and disadvantages of this algorithm that
lets you just solve for theta zero and
theta one, basically in just one shot. One
advantage is that there is no longer a
learning rate alpha that you need to worry
about and set. And so it can be much
faster for some problems. We'll talk about
its advantages and disadvantages later.
Second, we'll also talk about algorithms
for learning with a larger number of
features. So, so far we've been learning
with just one feature, the size of the
house and using that to predict the price,
so we're trying to take x and use that to
predict y. But for other learning problems
we may have a larger number of features.
So for example let's say that you know not
only the size, but also the number of
bedrooms, the number of floors, and the age
of these houses. And you want
to use that to predict the price of the
houses. In that case maybe we'll call
these features x1, x2, x3, and x4. So now
we have, you know, four features. We want to
use these four features to predict why the
price of the house. It turns out with all
of these features, four of them in this
case, it turns out that with multiple
features it becomes harder to
plot or visualize the data. So for example
here we try to plot this type of data
set. Maybe we will have the vertical axis
be the price and maybe we can have one
axis here, and another one here where this
axis is the size of the house, and that
axis is the number of bedrooms. You know,
but this is just plotting, right my first
two features: size and number of bedrooms.
And when we have these additional features
I don't know, I just don't know how to
plot all of these data, right cuz I need
like a 4-dimensional or 5-dimensional
figure. I don't really know how to plot
you know something more than like a
3-dimensional figure, like, like what I
have over here. Also as you can tell, the
notation starts to get a little more
complicated, right. So rather than just
having x our features we now have x1
through x4. And we're using these
subscripts to denote my four different
features. It turns out the best notation
to keep all of this straight and to
understand what's going on with the data
even when we don't quite know how to plot
it. It turns out that the best notation is
the notation of linear algebra. Linear
algebra gives us a notation and a set of
things or a set of operations that we can
do with matrices and vectors. For example.
Here's a matrix where the columns of this
matrix are: The first column is the sizes
of the four houses, the second column was
the number of bedrooms, that's the number
of floors and that was the age of the
home. And so a matrix is a block of
numbers that lets me take all of my data,
all of my x's. All of my features and
organize them efficiently into sort of one
big block of numbers like that. And here
is what we call a vector in linear algebra
where the four numbers here are the prices
of the four houses that we saw on the
previous slide. So. In the next set of
videos what I'm going to do is do a quick
review of linear algebra. If you haven't
seen matrices and vectors before, so if
all of this, everything on this slide is
brand new to you or if you've seen linear
algebra before, but it's been a while so
you aren't completely familiar with it
anymore, then please watch the next set of
videos. And I'll quickly review the linear
algebra you need in order to implement and
use the more powerful versions of linear
regression. It turns out linear algebra
isn't just useful for linear regression
models but these ideas of matrices and
vectors will be useful for helping us
to implement and actually get
computationally efficient implementations
for many later machines learning models as
well. And as you can tell these sorts of
matrices and vectors will give us an
efficient way to start to organize large
amounts of data, when we work with larger
training sets. So, in case, in case
you're not familiar with linear algebra or
in case linear algebra seems like a
complicated, scary concept for those of you who've
never seen it before, don't worry about
it. It turns out in order to implement
machine learning algorithms we need only
the very, very basics of
linear algebra and you'll be able to very
quickly pick up everything you need to
know in the next few videos.
Concretely, to decide if you should
watch the next set of videos, here are the
topics I'm going to cover. Talk about
what are matrices and vectors. Talk about how
to add, subtract, multiply matrices and vectors.
Talk about the ideas of matrix inverses
and transposes. And so, if you are not
sure if you should watch the next set of
videos take a look at these two things. So
if you think you know how to compute this
quantity, this matrix transpose times
another matrix. If you think you know, if
you have seen this stuff before, if you
know how to compute the inverse of matrix
times a vector, minus a number, times
another vector. If these two things look
completely familiar to you then you can
safely skip the optional set of videos on
linear algebra. But if these, concepts, if you're
slightly uncertain what these blocks of
numbers or these matrices of numbers mean,
then please take a look of the next set of
videos and, it'll very quickly teach you what
you need to know about linear algebra in
order to program machine learning
algorithms and deal with large amounts of data.