1
00:00:00,241 --> 00:00:04,383
[MUSIC]

2
00:00:04,383 --> 00:00:07,910
So, to start with,
let's look at the machine learning model.

3
00:00:07,910 --> 00:00:10,009
This multiple regression model.

4
00:00:10,009 --> 00:00:14,244
And let's just recall where we left off
in the last module, where we were talking

5
00:00:14,244 --> 00:00:18,560
about simple linear regression, where our
goal was just to fit a line to the data.

6
00:00:19,890 --> 00:00:21,930
And we just had a single input.

7
00:00:21,930 --> 00:00:25,370
And in our example, we always
talked about having square feet and

8
00:00:25,370 --> 00:00:29,930
trying to model the relationship
between square feet of a house and

9
00:00:29,930 --> 00:00:32,930
the output,
which was the value of the house.

10
00:00:34,360 --> 00:00:39,339
But as the name implies, this simple
linear regression model is really simple.

11
00:00:39,339 --> 00:00:40,456
And in a lot of cases,

12
00:00:40,456 --> 00:00:44,640
we're gonna be interested in more
complex functions of our input.

13
00:00:44,640 --> 00:00:47,780
So one example of this is something
called polynomial regression.

14
00:00:47,780 --> 00:00:51,574
And we actually saw this back in
the first course in the specialization.

15
00:00:51,574 --> 00:00:52,859
And in that case,

16
00:00:52,859 --> 00:00:58,453
what we did was we took our simple linear
regression model and our fit of that.

17
00:00:58,453 --> 00:01:01,986
Of course at that time, when we were in
the first course of the specialization,

18
00:01:01,986 --> 00:01:05,362
we didn't have all this terminology
that we learned in the last module, but

19
00:01:05,362 --> 00:01:08,500
now we know that this is
a simple linear regression model.

20
00:01:08,500 --> 00:01:12,530
And we take this fit and we show it
to our friend and we say, hey, look,

21
00:01:12,530 --> 00:01:13,950
this is so cool.

22
00:01:13,950 --> 00:01:15,980
I have this line that I fit to my data,
and

23
00:01:15,980 --> 00:01:18,660
now I can predict the value of my house.

24
00:01:18,660 --> 00:01:23,092
And your friend is a little
bit skeptical and says, dude,

25
00:01:23,092 --> 00:01:28,980
it's not a linear relationship between
square feet and the value of a house.

26
00:01:28,980 --> 00:01:31,810
He's looking at the data and
saying he just doesn't believe it.

27
00:01:33,190 --> 00:01:35,863
Instead, he thinks it's a quadratic fit.

28
00:01:35,863 --> 00:01:39,940
So, what your friend is saying is that he
doesn't believe the model that you use.

29
00:01:39,940 --> 00:01:44,601
He doesn't believe that it's just this
linear relationship, of course plus error.

30
00:01:44,601 --> 00:01:48,318
He thinks that there's this quadratic
function, which has the following

31
00:01:48,318 --> 00:01:52,850
equation, underlying the relationship
between square feet and house value.

32
00:01:52,850 --> 00:01:53,540
And again,

33
00:01:53,540 --> 00:01:56,359
our regression model is gonna assume
that there's some noise around that.

34
00:01:58,390 --> 00:02:01,430
But of course, you could consider
even higher order polynomials.

35
00:02:01,430 --> 00:02:07,040
For example, here, I'm showing some pth
order polynomial that you might choose to

36
00:02:07,040 --> 00:02:12,428
be your model of the relationship between
square feet and the value of the house.

37
00:02:12,428 --> 00:02:16,583
So here's our generic
polynomial regression model,

38
00:02:16,583 --> 00:02:22,721
where we take our observation, yi, and
model it as this polynomial in terms of,

39
00:02:22,721 --> 00:02:27,888
for example, square feet of our house,
which is just some input x.

40
00:02:27,888 --> 00:02:31,580
And then we assume that
there's some error, epsilon i.

41
00:02:31,580 --> 00:02:36,104
So that's the error associated
with the ith observation.

42
00:02:36,104 --> 00:02:40,021
And what we see is that in this model, in
contrast to our simple linear regression

43
00:02:40,021 --> 00:02:43,550
model, we have all these powers of x
that are now appearing in the model.

44
00:02:44,780 --> 00:02:50,220
And what we can do, is we can treat
these different powers of x as features.

45
00:02:50,220 --> 00:02:52,213
Okay, so now we're introducing
this new word features.

46
00:02:52,213 --> 00:02:56,940
And what features are,
they're just some function of your input.

47
00:02:56,940 --> 00:03:01,610
So, in this case, in particular,
our features, just to be very explicit,

48
00:03:01,610 --> 00:03:03,750
our first feature of the model
is just the number 1,

49
00:03:03,750 --> 00:03:06,280
which is called the constant feature.

50
00:03:06,280 --> 00:03:10,270
Then the second feature of our model is x.

51
00:03:10,270 --> 00:03:14,820
So, that's just the linear term, just
like we had in simple linear regression.

52
00:03:14,820 --> 00:03:17,640
And then our third feature is x squared.

53
00:03:17,640 --> 00:03:22,420
And we keep going up to our p+1 feature,
which is x to the power p.

54
00:03:23,480 --> 00:03:27,840
And associated with each one of these
features in our model is a parameter.

55
00:03:27,840 --> 00:03:33,850
So we have some p+1 parameters, w0,
which is just the intercept term.

56
00:03:33,850 --> 00:03:41,392
All the way up to wp, the coefficient
associated with pth power of our input.

57
00:03:41,392 --> 00:03:45,369
[MUSIC]