1
00:00:02,400 --> 00:00:07,760
We talked about how you want your learning algorithm to do well on the training set but

2
00:00:07,760 --> 00:00:10,085
sometimes you don't actually want to do too

3
00:00:10,085 --> 00:00:12,765
well and knowing what human level performance is,

4
00:00:12,765 --> 00:00:15,070
can tell you exactly how well

5
00:00:15,070 --> 00:00:18,250
but not too well you want your algorithm to do on the training set.

6
00:00:18,250 --> 00:00:19,392
Let me show you what I mean.

7
00:00:19,392 --> 00:00:24,320
We have used Cat classification a lot and given a picture,

8
00:00:24,320 --> 00:00:32,195
let's say humans have near-perfect accuracy so the human level error is one percent.

9
00:00:32,195 --> 00:00:34,475
In that case, if your learning algorithm achieves

10
00:00:34,475 --> 00:00:38,915
8 percent training error and 10 percent dev error,

11
00:00:38,915 --> 00:00:44,500
then maybe you wanted to do better on the training set.

12
00:00:44,500 --> 00:00:49,510
So the fact that there's a huge gap between how well your algorithm does on

13
00:00:49,510 --> 00:00:52,010
your training set versus how humans do

14
00:00:52,010 --> 00:00:55,625
shows that your algorithm isn't even fitting the training set well.

15
00:00:55,625 --> 00:00:59,210
So in terms of tools to reduce bias or variance,

16
00:00:59,210 --> 00:01:03,835
in this case I would say focus on reducing bias.

17
00:01:03,835 --> 00:01:09,410
So you want to do things like train a bigger neural network or run training set longer,

18
00:01:09,410 --> 00:01:12,003
just try to do better on the training set.

19
00:01:12,003 --> 00:01:15,050
But now let's look at the same training error and dev

20
00:01:15,050 --> 00:01:19,340
error and imagine that human level performance was not 1%.

21
00:01:19,340 --> 00:01:22,120
So this copy is over but you know in

22
00:01:22,120 --> 00:01:25,170
a different application or maybe on a different data set,

23
00:01:25,170 --> 00:01:30,180
let's say that human level error is actually 7.5%.

24
00:01:30,180 --> 00:01:33,890
Maybe the images in your data set are so blurry that even humans

25
00:01:33,890 --> 00:01:37,917
can't tell whether there's a cat in this picture.

26
00:01:37,917 --> 00:01:41,090
This example is maybe slightly contrived because humans

27
00:01:41,090 --> 00:01:44,525
are actually very good at looking at pictures and telling if there's a cat in it or not.

28
00:01:44,525 --> 00:01:46,490
But for the sake of this example,

29
00:01:46,490 --> 00:01:48,270
let's say your data sets images are

30
00:01:48,270 --> 00:01:54,680
so blurry or so low resolution that even humans get 7.5% error.

31
00:01:54,680 --> 00:01:56,720
In this case, even though

32
00:01:56,720 --> 00:02:00,305
your training error and dev error are the same as the other example,

33
00:02:00,305 --> 00:02:04,016
you see that maybe you're actually doing just fine on the training set.

34
00:02:04,016 --> 00:02:07,980
It's doing only a little bit worse than human level performance.

35
00:02:07,980 --> 00:02:10,010
And in this second example,

36
00:02:10,010 --> 00:02:14,295
you would maybe want to focus on reducing this component,

37
00:02:14,295 --> 00:02:19,390
reducing the variance in your learning algorithm.

38
00:02:19,390 --> 00:02:21,650
So you might try regularization to try to bring

39
00:02:21,650 --> 00:02:25,600
your dev error closer to your training error for example.

40
00:02:25,600 --> 00:02:29,790
So in the earlier courses discussion on bias and variance,

41
00:02:29,790 --> 00:02:36,900
we were mainly assuming that there were tasks where Bayes error is nearly zero.

42
00:02:36,900 --> 00:02:39,280
So to explain what just happened here,

43
00:02:39,280 --> 00:02:42,150
for our Cat classification example,

44
00:02:42,150 --> 00:02:47,510
think of human level error as

45
00:02:47,510 --> 00:02:56,030
a proxy or as a estimate for Bayes error or for Bayes optimal error.

46
00:02:56,030 --> 00:02:58,660
And for computer vision tasks,

47
00:02:58,660 --> 00:03:02,450
this is a pretty reasonable proxy because humans are actually very good at

48
00:03:02,450 --> 00:03:08,750
computer vision and so whatever a human can do is maybe not too far from Bayes error.

49
00:03:08,750 --> 00:03:11,930
By definition, human level error is worse than

50
00:03:11,930 --> 00:03:14,840
Bayes error because nothing could be better than

51
00:03:14,840 --> 00:03:19,665
Bayes error but human level error might not be too far from Bayes error.

52
00:03:19,665 --> 00:03:25,145
So the surprising thing we saw here is that depending on what human level error is

53
00:03:25,145 --> 00:03:31,214
or really this is really approximately Bayes error or so we assume it to be,

54
00:03:31,214 --> 00:03:35,273
but depending on what we think is achievable,

55
00:03:35,273 --> 00:03:40,970
with the same training error and dev error in these two cases,

56
00:03:40,970 --> 00:03:47,510
we decided to focus on bias reduction tactics or on variance reduction tactics.

57
00:03:47,510 --> 00:03:51,710
And what happened is in the example on the left,

58
00:03:51,710 --> 00:03:55,850
8% training error is really high when you think you could get it down

59
00:03:55,850 --> 00:04:01,310
to 1% and so bias reduction tactics could help you do that.

60
00:04:01,310 --> 00:04:02,885
Whereas in the example on the right,

61
00:04:02,885 --> 00:04:07,140
if you think that Bayes error is 7.5%

62
00:04:07,140 --> 00:04:12,265
and here we're using human level error as an estimate or as a proxy for Bayes error,

63
00:04:12,265 --> 00:04:13,622
but you think that Bayes error is

64
00:04:13,622 --> 00:04:15,860
close to seven point five percent then you know there's not

65
00:04:15,860 --> 00:04:20,195
that much headroom for reducing your training error further down.

66
00:04:20,195 --> 00:04:24,710
You don't really want it to be that much better than 7.5% because you could achieve

67
00:04:24,710 --> 00:04:29,780
that only by maybe starting to offer further training so,

68
00:04:29,780 --> 00:04:32,910
and instead, there's much more room for improvement

69
00:04:32,910 --> 00:04:36,380
in terms of taking this 2% gap and trying to

70
00:04:36,380 --> 00:04:38,660
reduce that by using

71
00:04:38,660 --> 00:04:43,370
variance reduction techniques such as regularization or maybe getting more training data.

72
00:04:43,370 --> 00:04:47,463
So to give these things a couple of names,

73
00:04:47,463 --> 00:04:50,490
this is not widely used terminology but I

74
00:04:50,490 --> 00:04:54,075
found this useful terminology and a useful way of thinking about it,

75
00:04:54,075 --> 00:04:58,380
which is I'm going to call the difference between Bayes error or

76
00:04:58,380 --> 00:05:05,670
approximation of Bayes error and the training error to be the avoidable bias.

77
00:05:05,670 --> 00:05:11,830
So what you want is maybe keep improving your training performance

78
00:05:11,830 --> 00:05:14,020
until you get down to Bayes error but you don't

79
00:05:14,020 --> 00:05:16,565
actually want to do better than Bayes error.

80
00:05:16,565 --> 00:05:20,740
You can't actually do better than Bayes error unless you're overfitting.

81
00:05:20,740 --> 00:05:24,879
And this, the difference between your training area and the dev error,

82
00:05:24,879 --> 00:05:29,775
there's a measure still of the variance problem of your algorithm.

83
00:05:29,775 --> 00:05:35,350
And the term avoidable bias acknowledges that there's some bias or

84
00:05:35,350 --> 00:05:38,140
some minimum level of error that you just

85
00:05:38,140 --> 00:05:42,975
cannot get below which is that if Bayes error is 7.5%,

86
00:05:42,975 --> 00:05:46,885
you don't actually want to get below that level of error.

87
00:05:46,885 --> 00:05:50,650
So rather than saying that if you're training error is 8%,

88
00:05:50,650 --> 00:05:53,427
then the 8% is a measure of bias in this example,

89
00:05:53,427 --> 00:06:01,520
you're saying that the avoidable bias is maybe 0.5% or 0.5% is a measure of

90
00:06:01,520 --> 00:06:06,220
the avoidable bias whereas 2% is a measure of the variance and

91
00:06:06,220 --> 00:06:11,378
so there's much more room in reducing this 2% than in reducing this 0.5%.

92
00:06:11,378 --> 00:06:14,384
Whereas in contrast in the example on the left,

93
00:06:14,384 --> 00:06:20,055
this 7% is a measure of the avoidable bias,

94
00:06:20,055 --> 00:06:24,275
whereas 2% is a measure of how much variance you have.

95
00:06:24,275 --> 00:06:25,960
And so in this example on the left,

96
00:06:25,960 --> 00:06:31,789
there's much more potential in focusing on reducing that avoidable bias.

97
00:06:31,789 --> 00:06:33,310
So in this example,

98
00:06:33,310 --> 00:06:35,845
understanding human level error,

99
00:06:35,845 --> 00:06:38,220
understanding your estimate of Bayes error really

100
00:06:38,220 --> 00:06:42,420
causes you in different scenarios to focus on different tactics,

101
00:06:42,420 --> 00:06:45,970
whether bias avoidance tactics or variance avoidance tactics.

102
00:06:45,970 --> 00:06:48,820
There's quite a lot more nuance in how you factor in

103
00:06:48,820 --> 00:06:53,800
human level performance into how you make decisions in choosing what to focus on.

104
00:06:53,800 --> 00:06:55,970
Thus in the next video, go deeper into

105
00:06:55,970 --> 00:06:59,460
understanding of what human level performance really mean.