1
00:00:00,360 --> 00:00:01,805
You have heard about orthogonalization.

2
00:00:01,805 --> 00:00:06,561
How to set up your dev and test sets,
human level performance as a proxy for

3
00:00:06,561 --> 00:00:11,110
Bayes's error and how to estimate
your avoidable bias and variance.

4
00:00:11,110 --> 00:00:14,121
Let's pull it all together
into a set of guidelines for

5
00:00:14,121 --> 00:00:17,473
how to improve the performance
of your learning algorithm.

6
00:00:17,473 --> 00:00:22,149
So, I think getting a supervised learning
algorithm to work well means fundamentally

7
00:00:22,149 --> 00:00:24,656
hoping or
assuming that you can do two things.

8
00:00:24,656 --> 00:00:30,015
First is that you can fit
the training set pretty well and

9
00:00:30,015 --> 00:00:37,670
you can think of this as roughly saying
that you can achieve low avoidable bias.

10
00:00:38,830 --> 00:00:42,992
And the second thing you're assuming
can do well is that doing well in

11
00:00:42,992 --> 00:00:47,369
the training set generalizes pretty
well to the dev set or the test set and

12
00:00:47,369 --> 00:00:50,488
this is sort of saying that
variance is not too bad.

13
00:00:50,488 --> 00:00:53,717
And in the spirit of thought organization,

14
00:00:53,717 --> 00:00:58,779
what you see is that there's a second
set of knobs to fix the avoidable

15
00:00:58,779 --> 00:01:03,245
bias issues such as training
[INAUDIBLE] or training longer.

16
00:01:03,245 --> 00:01:08,635
And there's a separate set of things you
can use to address Damien's problems,

17
00:01:08,635 --> 00:01:12,369
such as regularization or
getting more training data.

18
00:01:12,369 --> 00:01:16,543
So to summarize of the process
seen in the last several videos,

19
00:01:16,543 --> 00:01:21,190
if you want to improve the performance
of your machine on your system,

20
00:01:21,190 --> 00:01:26,234
I would recommend looking at the
difference between your training error and

21
00:01:26,234 --> 00:01:31,163
your proxy for base error and this gives
you a sense of the avoidable bias.

22
00:01:31,163 --> 00:01:35,297
In other words, just how much better do
you think you should be trying to do on

23
00:01:35,297 --> 00:01:39,366
your training set and then look at
the difference between your dev error and

24
00:01:39,366 --> 00:01:41,382
your training error as an estimate.

25
00:01:41,382 --> 00:01:43,871
So, it's how much of
a variance problem you have.

26
00:01:43,871 --> 00:01:44,711
In other words,

27
00:01:44,711 --> 00:01:48,671
how much harder you should be working to
make your performance generalize from

28
00:01:48,671 --> 00:01:52,392
the training set to the desk set,
that it wasn't trained on explicitly?

29
00:02:04,393 --> 00:02:09,201
So to whatever extent you want
to try to reduce avoidable bias,

30
00:02:09,201 --> 00:02:13,386
I would try to apply tactics
like train a bigger model.

31
00:02:13,386 --> 00:02:18,124
So, you can just do better on your
training sets or train longer.

32
00:02:18,124 --> 00:02:21,196
Use a better optimization
algorithm such as.

33
00:02:24,005 --> 00:02:27,433
Adds momentum or RMS prop,

34
00:02:27,433 --> 00:02:32,060
or use a better algorithm like ADOM.

35
00:02:34,874 --> 00:02:39,894
Or one of the thing you could try is to
just find a better new nether architecture

36
00:02:39,894 --> 00:02:45,220
or better said, hyperparameters and this
could include everything from changing

37
00:02:45,220 --> 00:02:50,187
the activation functions or changing
the number of layers or hidden do this.

38
00:02:50,187 --> 00:02:55,341
Although you do that, it would be in
the direction of increasing the model size

39
00:02:55,341 --> 00:03:00,654
to China other models or other models
architectures, such as the current neural

40
00:03:00,654 --> 00:03:06,500
network and competitive neural networks
which we'll see in later courses.

41
00:03:06,500 --> 00:03:09,520
Whether or
not a new neural network architecture will

42
00:03:09,520 --> 00:03:12,800
fit your training set better is
sometimes hard to tell in events,

43
00:03:12,800 --> 00:03:16,570
but sometimes you can get much better
results with a better architecture.

44
00:03:18,500 --> 00:03:20,941
Next to the extent that you
find out variance is a problem.

45
00:03:20,941 --> 00:03:26,417
Some of the many of the techniques you
could try, then includes the following.

46
00:03:26,417 --> 00:03:30,762
You can try to get more data,
because getting more data to train on

47
00:03:30,762 --> 00:03:35,437
could help you generalize better to
dev set data that you didn't see.

48
00:03:35,437 --> 00:03:37,759
You could try regularization.

49
00:03:37,759 --> 00:03:43,000
So this includes things like or dropout,

50
00:03:43,000 --> 00:03:50,501
or data augmentation which she talks
about the in the previous course.

51
00:03:50,501 --> 00:03:55,187
Or once again, you can also try
various neural network architecture,

52
00:03:55,187 --> 00:03:58,467
hyperparameters search to
see if that can help you

53
00:03:58,467 --> 00:04:02,390
find a new architecture that
is better suited for problem.

54
00:04:03,810 --> 00:04:07,430
I think that this notion of bias or
avoidable bias and

55
00:04:07,430 --> 00:04:12,150
there is one of those things that
easily learned, but tough to master and

56
00:04:12,150 --> 00:04:16,090
we're able to systematically find
the concept from this week's videos.

57
00:04:16,090 --> 00:04:20,244
You actually be much more efficient and
much more systematic and much more

58
00:04:20,244 --> 00:04:24,734
strategic than a lot of machine learning
teams in terms of how to systematically

59
00:04:24,734 --> 00:04:28,567
go by improving the performance
of their machine learning system.

60
00:04:28,567 --> 00:04:32,982
So, that this week's whole work
will allow you to practice and

61
00:04:32,982 --> 00:04:36,832
exercise more your understanding
of these concepts.

62
00:04:36,832 --> 00:04:38,950
Best of luck with this homework and

63
00:04:38,950 --> 00:04:42,463
I look forward to also seeing
you in next week's videos.

64
00:06:19,757 --> 00:06:20,701
Variances are further