1
00:00:00,000 --> 00:00:01,470
In the previous video,

2
00:00:01,470 --> 00:00:04,778
you saw how looking at training error and depth error can help you

3
00:00:04,778 --> 00:00:09,280
diagnose whether your algorithm has a bias or a variance problem, or maybe both.

4
00:00:09,280 --> 00:00:11,880
It turns out that this information that lets you much more

5
00:00:11,880 --> 00:00:15,030
systematically using what they call a basic

6
00:00:15,030 --> 00:00:18,165
recipe for machine learning and lets you much more systematically

7
00:00:18,165 --> 00:00:21,510
go about improving your algorithms' performance. Let's take a look.

8
00:00:21,510 --> 00:00:22,900
When training a neural network,

9
00:00:22,900 --> 00:00:24,975
here's a basic recipe I will use.

10
00:00:24,975 --> 00:00:26,920
After having trained an initial model,

11
00:00:26,920 --> 00:00:28,185
I will first ask,

12
00:00:28,185 --> 00:00:30,570
does your algorithm have high bias?

13
00:00:30,570 --> 00:00:33,709
And so to try and evaluate if there is high bias,

14
00:00:33,709 --> 00:00:35,820
you should look at, really,

15
00:00:35,820 --> 00:00:40,260
the training set or the training data performance.

16
00:00:40,260 --> 00:00:43,260
Right. And so, if it does have high bias,

17
00:00:43,260 --> 00:00:45,735
does not even fit in the training set that well,

18
00:00:45,735 --> 00:00:49,695
some things you could try would be to try pick a network,

19
00:00:49,695 --> 00:00:52,680
such as more hidden layers or more hidden units,

20
00:00:52,680 --> 00:00:54,825
or you could train it longer.

21
00:00:54,825 --> 00:00:58,953
Maybe run trains longer or try some more advanced optimization algorithms,

22
00:00:58,953 --> 00:01:00,795
which we'll talk about later in this course.

23
00:01:00,795 --> 00:01:03,030
Or you can also try,

24
00:01:03,030 --> 00:01:06,285
this is kind of a, maybe it work, maybe it won't.

25
00:01:06,285 --> 00:01:10,680
But we'll see later that there are a lot of different neural network architectures

26
00:01:10,680 --> 00:01:15,450
and maybe you can find a new network architecture that's better suited for this problem.

27
00:01:15,450 --> 00:01:17,760
Putting this in parentheses because one of those things that,

28
00:01:17,760 --> 00:01:19,380
you just have to try.

29
00:01:19,380 --> 00:01:20,925
Maybe you can make it work, maybe not.

30
00:01:20,925 --> 00:01:24,170
Whereas getting a bigger network almost always helps.

31
00:01:24,170 --> 00:01:26,761
And training longer doesn't always help,

32
00:01:26,761 --> 00:01:28,450
but it certainly never hurts.

33
00:01:28,450 --> 00:01:29,793
So when training a learning algorithm,

34
00:01:29,793 --> 00:01:34,100
I would try these things until I can at least get rid of the bias problems,

35
00:01:34,100 --> 00:01:39,945
as in go back after I've tried this and keep doing that until I can fit,

36
00:01:39,945 --> 00:01:42,460
at least, fit the training set pretty well.

37
00:01:42,460 --> 00:01:44,760
And usually if you have a big enough network,

38
00:01:44,760 --> 00:01:49,470
you should usually be able to fit the training data well so long

39
00:01:49,470 --> 00:01:54,150
as it's a problem that is possible for someone to do, alright?

40
00:01:54,150 --> 00:01:55,787
If the image is very blurry,

41
00:01:55,787 --> 00:01:57,300
it may be impossible to fit it.

42
00:01:57,300 --> 00:01:59,531
But if at least a human can do well on the task,

43
00:01:59,531 --> 00:02:01,540
if you think base error is not too high,

44
00:02:01,540 --> 00:02:04,244
then by training a big enough network you should be able to,

45
00:02:04,244 --> 00:02:07,275
hopefully, do well, at least on the training set.

46
00:02:07,275 --> 00:02:09,970
To at least fit or overfit the training set.

47
00:02:09,970 --> 00:02:14,743
Once you reduce bias to acceptable amounts then ask,

48
00:02:14,743 --> 00:02:17,040
do you have a variance problem?

49
00:02:17,040 --> 00:02:21,410
And so to evaluate that I would look at dev set performance.

50
00:02:21,410 --> 00:02:24,310
Are you able to generalize from a pretty good training

51
00:02:24,310 --> 00:02:28,595
set performance to having a pretty good dev set performance?

52
00:02:28,595 --> 00:02:30,915
And if you have high variance, well,

53
00:02:30,915 --> 00:02:34,015
best way to solve a high variance problem is to get more data.

54
00:02:34,015 --> 00:02:35,199
If you can get it this,

55
00:02:35,199 --> 00:02:36,875
you know, can only help.

56
00:02:36,875 --> 00:02:40,490
But sometimes you can't get more data.

57
00:02:40,490 --> 00:02:43,300
Or you could try regularization,

58
00:02:43,300 --> 00:02:45,078
which we'll talk about in the next video,

59
00:02:45,078 --> 00:02:46,630
to try to reduce overfitting.

60
00:02:46,630 --> 00:02:50,930
And then also, again, sometimes you just have to try it.

61
00:02:50,930 --> 00:02:54,310
But if you can find a more appropriate neural network architecture,

62
00:02:54,310 --> 00:02:57,335
sometimes that can reduce your variance problem as well,

63
00:02:57,335 --> 00:03:00,785
as well as reduce your bias problem. But how to do that?

64
00:03:00,785 --> 00:03:04,045
It's harder to be totally systematic how you do that.

65
00:03:04,045 --> 00:03:06,175
But so I try these things and I kind of keep going back,

66
00:03:06,175 --> 00:03:11,785
until hopefully you find something with both low bias and low variance,

67
00:03:11,785 --> 00:03:14,594
whereupon you would be done.

68
00:03:14,594 --> 00:03:16,390
So a couple of points to notice.

69
00:03:16,390 --> 00:03:19,736
First is that, depending on whether you have high bias or high variance,

70
00:03:19,736 --> 00:03:24,405
the set of things you should try could be quite different.

71
00:03:24,405 --> 00:03:26,860
So I'll usually use the training dev set to try to

72
00:03:26,860 --> 00:03:29,920
diagnose if you have a bias or variance problem,

73
00:03:29,920 --> 00:03:33,920
and then use that to select the appropriate subset of things to try.

74
00:03:33,920 --> 00:03:37,270
So for example, if you actually have a high bias problem,

75
00:03:37,270 --> 00:03:40,300
getting more training data is actually not going to help.

76
00:03:40,300 --> 00:03:44,140
Or at least it's not the most efficient thing to do.

77
00:03:44,140 --> 00:03:47,770
So being clear on how much of a bias problem or variance problem or

78
00:03:47,770 --> 00:03:52,563
both can help you focus on selecting the most useful things to try.

79
00:03:52,563 --> 00:03:56,725
Second, in the earlier era of machine learning,

80
00:03:56,725 --> 00:04:02,465
there used to be a lot of discussion on what is called the bias variance tradeoff.

81
00:04:02,465 --> 00:04:04,604
And the reason for that was that,

82
00:04:04,604 --> 00:04:06,385
for a lot of the things you could try,

83
00:04:06,385 --> 00:04:09,340
you could increase bias and reduce variance,

84
00:04:09,340 --> 00:04:11,920
or reduce bias and increase variance.

85
00:04:11,920 --> 00:04:15,400
But back in the pre-deep learning era,

86
00:04:15,400 --> 00:04:17,165
we didn't have many tools,

87
00:04:17,165 --> 00:04:19,755
we didn't have as many tools that just reduce

88
00:04:19,755 --> 00:04:24,380
bias or that just reduce variance without hurting the other one.

89
00:04:24,380 --> 00:04:28,750
But in the modern deep learning, big data era,

90
00:04:28,750 --> 00:04:31,705
so long as you can keep training a bigger network,

91
00:04:31,705 --> 00:04:34,200
and so long as you can keep getting more data,

92
00:04:34,200 --> 00:04:36,360
which isn't always the case for either of these,

93
00:04:36,360 --> 00:04:37,950
but if that's the case,

94
00:04:37,950 --> 00:04:40,590
then getting a bigger network almost always just

95
00:04:40,590 --> 00:04:43,625
reduces your bias without necessarily hurting your variance,

96
00:04:43,625 --> 00:04:46,157
so long as you regularize appropriately.

97
00:04:46,157 --> 00:04:48,810
And getting more data pretty much always

98
00:04:48,810 --> 00:04:52,370
reduces your variance and doesn't hurt your bias much.

99
00:04:52,370 --> 00:04:54,195
So what's really happened is that,

100
00:04:54,195 --> 00:04:55,845
with these two steps,

101
00:04:55,845 --> 00:04:57,405
the ability to train, pick a network,

102
00:04:57,405 --> 00:04:58,560
or get more data,

103
00:04:58,560 --> 00:05:03,375
we now have tools to drive down bias and just drive down bias,

104
00:05:03,375 --> 00:05:05,700
or drive down variance and just drive down variance,

105
00:05:05,700 --> 00:05:09,655
without really hurting the other thing that much.

106
00:05:09,655 --> 00:05:12,240
And I think this has been one of the big reasons

107
00:05:12,240 --> 00:05:16,348
that deep learning has been so useful for supervised learning,

108
00:05:16,348 --> 00:05:18,840
that there's much less of this tradeoff where you

109
00:05:18,840 --> 00:05:21,345
have to carefully balance bias and variance,

110
00:05:21,345 --> 00:05:25,053
but sometimes you just have more options for reducing bias

111
00:05:25,053 --> 00:05:30,315
or reducing variance without necessarily increasing the other one.

112
00:05:30,315 --> 00:05:33,698
And, in fact, [inaudible] you have a well regularized network.

113
00:05:33,698 --> 00:05:36,795
We'll talk about regularization starting from the next video.

114
00:05:36,795 --> 00:05:40,110
Training a bigger network almost never hurts.

115
00:05:40,110 --> 00:05:44,634
And the main cost of training a neural network that's too big is just computational time,

116
00:05:44,634 --> 00:05:46,490
so long as you're regularizing.

117
00:05:46,490 --> 00:05:49,440
So I hope this gives you a sense of the basic structure of how to

118
00:05:49,440 --> 00:05:53,255
organize your machine learning problem to diagnose bias and variance,

119
00:05:53,255 --> 00:05:57,325
and then try to select the right operation for you to make progress on your problem.

120
00:05:57,325 --> 00:06:01,367
One of the things I mentioned several times in the video is regularization,

121
00:06:01,367 --> 00:06:03,825
is a very useful technique for reducing variance.

122
00:06:03,825 --> 00:06:07,130
There is a little bit of a bias variance tradeoff when you use regularization.

123
00:06:07,130 --> 00:06:09,045
It might increase the bias a little bit,

124
00:06:09,045 --> 00:06:13,090
although often not too much if you have a huge enough network.

125
00:06:13,090 --> 00:06:16,735
But let's dive into more details in the next video so you can

126
00:06:16,735 --> 00:06:21,000
better understand how to apply regularization to your neural network.