1
00:00:00,000 --> 00:00:04,820
[MUSIC]

2
00:00:04,820 --> 00:00:08,991
Okay, so here's a summary of the large
set of topics that we've covered in this

3
00:00:08,991 --> 00:00:10,290
course.

4
00:00:10,290 --> 00:00:12,880
We talked about a bunch
of models including,

5
00:00:12,880 --> 00:00:16,570
different models of linear regression from
simple regression to multiple regression.

6
00:00:16,570 --> 00:00:19,500
We talked about doing ridge regression and
Lasso.

7
00:00:19,500 --> 00:00:22,930
And then, Nearest neighbors and
kernel regression.

8
00:00:22,930 --> 00:00:27,170
And we also talked about some very
important optimisation algorithms like

9
00:00:27,170 --> 00:00:29,340
Gradient descent and Coordinate descent.

10
00:00:29,340 --> 00:00:31,870
And really just this notion
of what is optimization and

11
00:00:31,870 --> 00:00:34,160
how do you go about doing it.

12
00:00:34,160 --> 00:00:38,980
And then we talked about concepts that
generalise well beyond regression.

13
00:00:38,980 --> 00:00:41,410
This include things like Loss functions,

14
00:00:41,410 --> 00:00:45,200
this very important concept of
the bias variance trade-off.

15
00:00:45,200 --> 00:00:48,594
Talking about cross-validation,
sparsity, overfitting,

16
00:00:48,594 --> 00:00:51,480
feature selection, model selection.

17
00:00:51,480 --> 00:00:54,880
And these are ideas that we're
going to see in most of the courses

18
00:00:54,880 --> 00:00:55,830
in this specialization.

19
00:00:57,560 --> 00:01:01,410
So, we spent a lot of time,
teaching the methods of this module, and

20
00:01:01,410 --> 00:01:04,490
now I've spent a lot of time
summarizing what we learned, but

21
00:01:04,490 --> 00:01:08,360
I want to take a minute to talk about
what we didn't cover in this course.

22
00:01:08,360 --> 00:01:11,670
So there are actually a few
important topics that unfortunately,

23
00:01:11,670 --> 00:01:15,820
we didn't have time to go through in this
course, and I want to highlight them here.

24
00:01:15,820 --> 00:01:21,230
One is the fact that in this course, we
focus on just having a unit area output.

25
00:01:21,230 --> 00:01:26,040
Which, for example, was the value of
a house or the sales price of a house.

26
00:01:26,040 --> 00:01:30,170
But of course you could
have a multivariate output.

27
00:01:30,170 --> 00:01:34,020
And in cases where that multivariate
output, where the dimensions

28
00:01:34,020 --> 00:01:38,210
are correlated, you need to do
slightly more complicated things.

29
00:01:38,210 --> 00:01:41,090
But in contrast if you assume
that each of these outputs.

30
00:01:41,090 --> 00:01:44,740
Are independent of each other,
then you can just do

31
00:01:44,740 --> 00:01:48,050
the methods we described independently for
each dimension.

32
00:01:48,050 --> 00:01:50,000
The other thing that we
haven't covered yet,

33
00:01:50,000 --> 00:01:53,290
is this idea of what's called
maximum likelihood estimation.

34
00:01:53,290 --> 00:01:56,500
We're gonna go through that in
the classification course, but

35
00:01:56,500 --> 00:01:59,870
I wanna mention that in
the context of regression,

36
00:01:59,870 --> 00:02:02,710
if you've heard of maximum
likelihood estimation.

37
00:02:02,710 --> 00:02:08,110
It results in exactly the same
objective we had with our

38
00:02:08,110 --> 00:02:11,050
minimizing our residual sum of squares.

39
00:02:11,050 --> 00:02:15,280
Assuming that your model has what
are called normal or Gaussian errors,

40
00:02:15,280 --> 00:02:17,740
that this epsilon term
that we've talked about.

41
00:02:17,740 --> 00:02:22,010
Remember Y equals WX plus epsilon.

42
00:02:23,060 --> 00:02:25,840
Well, that epsilon,
if we assume it's normally distributed,

43
00:02:25,840 --> 00:02:30,310
or sometimes people say Gaussian
distributed, then maximum likelihood

44
00:02:30,310 --> 00:02:34,180
estimation is exactly equivalent to
what we've talked about in this course.

45
00:02:34,180 --> 00:02:34,830
And like I said,

46
00:02:34,830 --> 00:02:38,939
we'll learn more about maximum likelihood
estimation in the classification course.

47
00:02:39,960 --> 00:02:44,280
But one really, really important thing
that we didn't talk about in this course,

48
00:02:44,280 --> 00:02:49,170
which truthfully pains me, being a
statistician, are statistical inferences.

49
00:02:49,170 --> 00:02:52,460
We just focused on what are called
these ideas of point estimation.

50
00:02:52,460 --> 00:02:56,880
We just returned a W-hat value or
estimated coefficients, but

51
00:02:56,880 --> 00:03:01,950
we didn't talk about any notion of what
our measure of uncertainty about those

52
00:03:01,950 --> 00:03:04,600
estimated coefficients or our predictions.

53
00:03:04,600 --> 00:03:09,770
So, again, there's noise inherent to
the data so we can think of having

54
00:03:09,770 --> 00:03:15,560
measures of uncertainty about our
predictions or our estimated coefficients.

55
00:03:15,560 --> 00:03:17,640
So this is referred to as inference and

56
00:03:17,640 --> 00:03:20,110
it's a really important topic
that we did not go through here.

57
00:03:21,120 --> 00:03:24,980
Another cool set of methods are what
are called generalized linear models and

58
00:03:24,980 --> 00:03:28,050
we're actually gonna see an example of
a generalized linear model in the class

59
00:03:28,050 --> 00:03:33,620
vacation course, so you will get to see
this but I want to bring it up here.

60
00:03:33,620 --> 00:03:36,390
And what generalized linear
models allow you to do,

61
00:03:36,390 --> 00:03:41,720
is form regressions when you have
certain restrictions on your output.

62
00:03:41,720 --> 00:03:43,530
Like the output is always.

63
00:03:43,530 --> 00:03:46,250
Positive or bounded or
positive and bounded.

64
00:03:46,250 --> 00:03:48,320
Or it's gonna be discreet
value like were gonna talk

65
00:03:48,320 --> 00:03:49,780
about in the classification course.

66
00:03:49,780 --> 00:03:52,760
Just a yes or no response.

67
00:03:52,760 --> 00:03:56,550
We'll, if we're assuming that Gaussian,
our errors are Gaussian,

68
00:03:56,550 --> 00:03:59,830
like they talk about with this
maximum likelihood estimation or

69
00:03:59,830 --> 00:04:03,130
in this course what we talked
about of having zero mean but

70
00:04:03,130 --> 00:04:08,690
the observations were equally likely to
be above or below the true function and

71
00:04:08,690 --> 00:04:12,590
they're actually unbounded in how far
they could be above or below that true

72
00:04:12,590 --> 00:04:17,220
function, well the regression
models that we've talked about so

73
00:04:17,220 --> 00:04:21,410
far are inappropriate for forming
predictions if those predicted values have

74
00:04:21,410 --> 00:04:23,990
these types of constraints or
specific structures to them.

75
00:04:25,140 --> 00:04:27,930
And generalized linear
models allow us to cope with

76
00:04:27,930 --> 00:04:31,220
certain types of these structures,
very efficiently.

77
00:04:31,220 --> 00:04:34,220
Another really powerful tool that we

78
00:04:34,220 --> 00:04:37,120
didn't describe in this course is
something called the Regression tree.

79
00:04:37,120 --> 00:04:40,290
And that's because we're gonna cover
it in the classification course.

80
00:04:40,290 --> 00:04:43,290
Actually more generally,
these methods are referred to as CART,

81
00:04:43,290 --> 00:04:47,220
which are Classification And
Regression Trees.

82
00:04:47,220 --> 00:04:50,250
Because what you do is you form a tree.

83
00:04:50,250 --> 00:04:53,190
And that structure's the same whether
we're looking at classification or

84
00:04:53,190 --> 00:04:53,840
regression.

85
00:04:55,490 --> 00:04:59,620
But, we're gonna focusing on describing
these structures in the context

86
00:04:59,620 --> 00:05:03,680
of classification because they're a lot
simpler to understand in that context.

87
00:05:03,680 --> 00:05:06,520
But I wanna emphasize that those
same tools that we're gonna

88
00:05:06,520 --> 00:05:09,470
learn in the next course can
be used in regression as well.

89
00:05:10,590 --> 00:05:11,950
And of course, there are lots and

90
00:05:11,950 --> 00:05:16,350
lots of other methods that we
haven't described in this course.

91
00:05:16,350 --> 00:05:21,280
Regression has an extremely
long history in statistics, so

92
00:05:21,280 --> 00:05:24,280
there are lots of things that
are potentially of interest.

93
00:05:24,280 --> 00:05:28,880
But in this course, we really try to focus
in on the main concepts that are useful

94
00:05:28,880 --> 00:05:31,640
in modern machine learning
applications of regression.