1
00:00:01,430 --> 00:00:05,280
In the last few years, a lot more machine
learning teams have been talking about

2
00:00:05,280 --> 00:00:09,480
comparing the machine learning
systems to human level performance.

3
00:00:09,480 --> 00:00:10,260
Why is this?

4
00:00:10,260 --> 00:00:12,110
I think there are two main reasons.

5
00:00:12,110 --> 00:00:15,130
First is that because of
advances in deep learning,

6
00:00:15,130 --> 00:00:18,170
machine learning algorithms
are suddenly working much better and so

7
00:00:18,170 --> 00:00:22,170
it has become much more feasible in a lot
of application areas for machine learning

8
00:00:22,170 --> 00:00:26,560
algorithms to actually become competitive
with human-level performance.

9
00:00:26,560 --> 00:00:29,760
Second, it turns out that
the workflow of designing and

10
00:00:29,760 --> 00:00:33,500
building a machine learning system,
the workflow is much more efficient

11
00:00:33,500 --> 00:00:36,550
when you're trying to do
something that humans can also do.

12
00:00:36,550 --> 00:00:40,450
So in those settings, it becomes
natural to talk about comparing, or

13
00:00:40,450 --> 00:00:43,200
trying to mimic human-level performance.

14
00:00:43,200 --> 00:00:45,110
Let's see a couple examples
of what this means.

15
00:00:46,250 --> 00:00:50,090
I've seen on a lot of machine learning
tasks that as you work on a problem over

16
00:00:50,090 --> 00:00:56,060
time, so the x-axis, time, this could
be many months or even many years

17
00:00:56,060 --> 00:00:59,965
over which some team or some research
community is working on a problem.

18
00:00:59,965 --> 00:01:07,540
Progress tends to be relatively rapid as
you approach human level performance.

19
00:01:07,540 --> 00:01:12,115
But then after a while, the algorithm
surpasses human-level performance and

20
00:01:12,115 --> 00:01:14,998
then progress and
accuracy actually slows down.

21
00:01:14,998 --> 00:01:17,140
And maybe it keeps getting better but

22
00:01:17,140 --> 00:01:21,560
after surpassing human level performance
it can still get better, but performance,

23
00:01:21,560 --> 00:01:26,420
the slope of how rapid the accuracy's
going up, often that slows down.

24
00:01:26,420 --> 00:01:31,450
And the hope is it achieves some
theoretical optimum level of performance.

25
00:01:32,730 --> 00:01:37,704
And over time,
as you keep training the algorithm,

26
00:01:37,704 --> 00:01:42,792
maybe bigger and
bigger models on more and more data,

27
00:01:42,792 --> 00:01:49,915
the performance approaches but
never surpasses some theoretical limit,

28
00:01:49,915 --> 00:01:53,900
which is called the Bayes optimal error.

29
00:01:53,900 --> 00:01:57,048
So Bayes optimal error,
think of this as the best possible error.

30
00:01:59,946 --> 00:02:02,247
And that's just the way for

31
00:02:02,247 --> 00:02:08,258
any function mapping from x to y to
surpass a certain level of accuracy.

32
00:02:08,258 --> 00:02:14,678
So for example, for speech recognition,
if x is audio clips, some audio is just so

33
00:02:14,678 --> 00:02:20,022
noisy it is impossible to tell what
is in the correct transcription.

34
00:02:20,022 --> 00:02:23,790
So the perfect error may not be 100%.

35
00:02:23,790 --> 00:02:25,150
Or for cat recognition.

36
00:02:25,150 --> 00:02:29,945
Maybe some images are so blurry,
that it is just impossible for

37
00:02:29,945 --> 00:02:34,705
anyone or anything to tell whether or
not there's a cat in that picture.

38
00:02:34,705 --> 00:02:39,195
So, the perfect level of
accuracy may not be 100%.

39
00:02:39,195 --> 00:02:44,655
And Bayes optimal error, or Bayesian
optimal error, or sometimes Bayes error

40
00:02:44,655 --> 00:02:51,065
for short, is the very best theoretical
function for mapping from x to y.

41
00:02:52,330 --> 00:02:53,770
That can never be surpassed.

42
00:02:56,180 --> 00:03:00,290
So it should be no surprise that this
purple line, no matter how many years you

43
00:03:00,290 --> 00:03:05,330
work on a problem you can never surpass
Bayes error, Bayes optimal error.

44
00:03:05,330 --> 00:03:08,670
And it turns out that
progress is often quite fast

45
00:03:08,670 --> 00:03:10,910
until you surpass human level performance.

46
00:03:12,010 --> 00:03:16,180
And it sometimes slows down after
you surpass human level performance.

47
00:03:16,180 --> 00:03:18,150
And I think there are two reasons for

48
00:03:18,150 --> 00:03:22,490
that, for why progress often slows down
when you surpass human level performance.

49
00:03:22,490 --> 00:03:25,740
One reason is that human
level performance is for

50
00:03:25,740 --> 00:03:28,640
many tasks not that far
from Bayes' optimal error.

51
00:03:28,640 --> 00:03:32,040
People are very good at looking at
images and telling if there's a cat or

52
00:03:32,040 --> 00:03:34,810
listening to audio and transcribing it.

53
00:03:34,810 --> 00:03:38,920
So, by the time you surpass human level
performance maybe there's not that much

54
00:03:38,920 --> 00:03:41,030
head room to still improve.

55
00:03:42,390 --> 00:03:46,677
But the second reason is that so long as
your performance is worse than human level

56
00:03:46,677 --> 00:03:50,776
performance, then there are actually
certain tools you could use to improve

57
00:03:50,776 --> 00:03:55,340
performance that are harder to use once
you've surpassed human level performance.

58
00:03:55,340 --> 00:03:57,541
So here's what I mean.

59
00:03:59,980 --> 00:04:02,407
For tasks that humans are quite good at,
and

60
00:04:02,407 --> 00:04:07,066
this includes looking at pictures and
recognizing things, or listening to audio,

61
00:04:07,066 --> 00:04:11,480
or reading language, really natural data
tasks humans tend to be very good at.

62
00:04:11,480 --> 00:04:16,344
For tasks that humans are good at, so
long as your machine learning algorithm is

63
00:04:16,344 --> 00:04:20,426
still worse than the human,
you can get labeled data from humans.

64
00:04:20,426 --> 00:04:25,055
That is you can ask people, ask higher
humans, to label examples for you so

65
00:04:25,055 --> 00:04:29,560
that you can have more data to
feed your learning algorithm.

66
00:04:29,560 --> 00:04:33,010
Something we'll talk about next
week is manual error analysis.

67
00:04:33,010 --> 00:04:37,290
But so long as humans are still performing
better than any other algorithm, you can

68
00:04:37,290 --> 00:04:41,010
ask people to look at examples that your
algorithm's getting wrong, and try to gain

69
00:04:41,010 --> 00:04:44,270
insight in terms of why a person got it
right but the algorithm got it wrong.

70
00:04:44,270 --> 00:04:47,130
And we'll see next week that this helps
improve your algorithm's performance.

71
00:04:48,230 --> 00:04:51,220
And you can also get a better
analysis of bias and

72
00:04:51,220 --> 00:04:53,240
variance which we'll talk
about in a little bit.

73
00:04:53,240 --> 00:04:56,810
But so long as your algorithm
is still doing worse then humans

74
00:04:56,810 --> 00:05:00,290
you have these important tactics for
improving your algorithm.

75
00:05:00,290 --> 00:05:03,540
Whereas once your algorithm
is doing better than humans,

76
00:05:03,540 --> 00:05:06,140
then these three tactics
are harder to apply.

77
00:05:07,250 --> 00:05:11,931
So, this is maybe another reason why
comparing to human level performance

78
00:05:11,931 --> 00:05:15,360
is helpful,
especially on tasks that humans do well.

79
00:05:17,665 --> 00:05:21,860
And why machine learning
algorithms tend to be really good

80
00:05:21,860 --> 00:05:25,590
at trying to replicate tasks that
people can do and kind of catch up and

81
00:05:25,590 --> 00:05:29,690
maybe slightly surpass
human level performance.

82
00:05:29,690 --> 00:05:34,129
In particular, even though you know what
is bias and what is variance it turns out

83
00:05:34,129 --> 00:05:38,373
that knowing how well humans can do on
a task can help you understand better how

84
00:05:38,373 --> 00:05:43,430
much you should try to reduce bias and how
much you should try to reduce variance.

85
00:05:43,430 --> 00:05:45,810
I want to show you an example
of this in the next video.