1 00:00:01,430 --> 00:00:05,280 In the last few years, a lot more machine learning teams have been talking about 2 00:00:05,280 --> 00:00:09,480 comparing the machine learning systems to human level performance. 3 00:00:09,480 --> 00:00:10,260 Why is this? 4 00:00:10,260 --> 00:00:12,110 I think there are two main reasons. 5 00:00:12,110 --> 00:00:15,130 First is that because of advances in deep learning, 6 00:00:15,130 --> 00:00:18,170 machine learning algorithms are suddenly working much better and so 7 00:00:18,170 --> 00:00:22,170 it has become much more feasible in a lot of application areas for machine learning 8 00:00:22,170 --> 00:00:26,560 algorithms to actually become competitive with human-level performance. 9 00:00:26,560 --> 00:00:29,760 Second, it turns out that the workflow of designing and 10 00:00:29,760 --> 00:00:33,500 building a machine learning system, the workflow is much more efficient 11 00:00:33,500 --> 00:00:36,550 when you're trying to do something that humans can also do. 12 00:00:36,550 --> 00:00:40,450 So in those settings, it becomes natural to talk about comparing, or 13 00:00:40,450 --> 00:00:43,200 trying to mimic human-level performance. 14 00:00:43,200 --> 00:00:45,110 Let's see a couple examples of what this means. 15 00:00:46,250 --> 00:00:50,090 I've seen on a lot of machine learning tasks that as you work on a problem over 16 00:00:50,090 --> 00:00:56,060 time, so the x-axis, time, this could be many months or even many years 17 00:00:56,060 --> 00:00:59,965 over which some team or some research community is working on a problem. 18 00:00:59,965 --> 00:01:07,540 Progress tends to be relatively rapid as you approach human level performance. 19 00:01:07,540 --> 00:01:12,115 But then after a while, the algorithm surpasses human-level performance and 20 00:01:12,115 --> 00:01:14,998 then progress and accuracy actually slows down. 21 00:01:14,998 --> 00:01:17,140 And maybe it keeps getting better but 22 00:01:17,140 --> 00:01:21,560 after surpassing human level performance it can still get better, but performance, 23 00:01:21,560 --> 00:01:26,420 the slope of how rapid the accuracy's going up, often that slows down. 24 00:01:26,420 --> 00:01:31,450 And the hope is it achieves some theoretical optimum level of performance. 25 00:01:32,730 --> 00:01:37,704 And over time, as you keep training the algorithm, 26 00:01:37,704 --> 00:01:42,792 maybe bigger and bigger models on more and more data, 27 00:01:42,792 --> 00:01:49,915 the performance approaches but never surpasses some theoretical limit, 28 00:01:49,915 --> 00:01:53,900 which is called the Bayes optimal error. 29 00:01:53,900 --> 00:01:57,048 So Bayes optimal error, think of this as the best possible error. 30 00:01:59,946 --> 00:02:02,247 And that's just the way for 31 00:02:02,247 --> 00:02:08,258 any function mapping from x to y to surpass a certain level of accuracy. 32 00:02:08,258 --> 00:02:14,678 So for example, for speech recognition, if x is audio clips, some audio is just so 33 00:02:14,678 --> 00:02:20,022 noisy it is impossible to tell what is in the correct transcription. 34 00:02:20,022 --> 00:02:23,790 So the perfect error may not be 100%. 35 00:02:23,790 --> 00:02:25,150 Or for cat recognition. 36 00:02:25,150 --> 00:02:29,945 Maybe some images are so blurry, that it is just impossible for 37 00:02:29,945 --> 00:02:34,705 anyone or anything to tell whether or not there's a cat in that picture. 38 00:02:34,705 --> 00:02:39,195 So, the perfect level of accuracy may not be 100%. 39 00:02:39,195 --> 00:02:44,655 And Bayes optimal error, or Bayesian optimal error, or sometimes Bayes error 40 00:02:44,655 --> 00:02:51,065 for short, is the very best theoretical function for mapping from x to y. 41 00:02:52,330 --> 00:02:53,770 That can never be surpassed. 42 00:02:56,180 --> 00:03:00,290 So it should be no surprise that this purple line, no matter how many years you 43 00:03:00,290 --> 00:03:05,330 work on a problem you can never surpass Bayes error, Bayes optimal error. 44 00:03:05,330 --> 00:03:08,670 And it turns out that progress is often quite fast 45 00:03:08,670 --> 00:03:10,910 until you surpass human level performance. 46 00:03:12,010 --> 00:03:16,180 And it sometimes slows down after you surpass human level performance. 47 00:03:16,180 --> 00:03:18,150 And I think there are two reasons for 48 00:03:18,150 --> 00:03:22,490 that, for why progress often slows down when you surpass human level performance. 49 00:03:22,490 --> 00:03:25,740 One reason is that human level performance is for 50 00:03:25,740 --> 00:03:28,640 many tasks not that far from Bayes' optimal error. 51 00:03:28,640 --> 00:03:32,040 People are very good at looking at images and telling if there's a cat or 52 00:03:32,040 --> 00:03:34,810 listening to audio and transcribing it. 53 00:03:34,810 --> 00:03:38,920 So, by the time you surpass human level performance maybe there's not that much 54 00:03:38,920 --> 00:03:41,030 head room to still improve. 55 00:03:42,390 --> 00:03:46,677 But the second reason is that so long as your performance is worse than human level 56 00:03:46,677 --> 00:03:50,776 performance, then there are actually certain tools you could use to improve 57 00:03:50,776 --> 00:03:55,340 performance that are harder to use once you've surpassed human level performance. 58 00:03:55,340 --> 00:03:57,541 So here's what I mean. 59 00:03:59,980 --> 00:04:02,407 For tasks that humans are quite good at, and 60 00:04:02,407 --> 00:04:07,066 this includes looking at pictures and recognizing things, or listening to audio, 61 00:04:07,066 --> 00:04:11,480 or reading language, really natural data tasks humans tend to be very good at. 62 00:04:11,480 --> 00:04:16,344 For tasks that humans are good at, so long as your machine learning algorithm is 63 00:04:16,344 --> 00:04:20,426 still worse than the human, you can get labeled data from humans. 64 00:04:20,426 --> 00:04:25,055 That is you can ask people, ask higher humans, to label examples for you so 65 00:04:25,055 --> 00:04:29,560 that you can have more data to feed your learning algorithm. 66 00:04:29,560 --> 00:04:33,010 Something we'll talk about next week is manual error analysis. 67 00:04:33,010 --> 00:04:37,290 But so long as humans are still performing better than any other algorithm, you can 68 00:04:37,290 --> 00:04:41,010 ask people to look at examples that your algorithm's getting wrong, and try to gain 69 00:04:41,010 --> 00:04:44,270 insight in terms of why a person got it right but the algorithm got it wrong. 70 00:04:44,270 --> 00:04:47,130 And we'll see next week that this helps improve your algorithm's performance. 71 00:04:48,230 --> 00:04:51,220 And you can also get a better analysis of bias and 72 00:04:51,220 --> 00:04:53,240 variance which we'll talk about in a little bit. 73 00:04:53,240 --> 00:04:56,810 But so long as your algorithm is still doing worse then humans 74 00:04:56,810 --> 00:05:00,290 you have these important tactics for improving your algorithm. 75 00:05:00,290 --> 00:05:03,540 Whereas once your algorithm is doing better than humans, 76 00:05:03,540 --> 00:05:06,140 then these three tactics are harder to apply. 77 00:05:07,250 --> 00:05:11,931 So, this is maybe another reason why comparing to human level performance 78 00:05:11,931 --> 00:05:15,360 is helpful, especially on tasks that humans do well. 79 00:05:17,665 --> 00:05:21,860 And why machine learning algorithms tend to be really good 80 00:05:21,860 --> 00:05:25,590 at trying to replicate tasks that people can do and kind of catch up and 81 00:05:25,590 --> 00:05:29,690 maybe slightly surpass human level performance. 82 00:05:29,690 --> 00:05:34,129 In particular, even though you know what is bias and what is variance it turns out 83 00:05:34,129 --> 00:05:38,373 that knowing how well humans can do on a task can help you understand better how 84 00:05:38,373 --> 00:05:43,430 much you should try to reduce bias and how much you should try to reduce variance. 85 00:05:43,430 --> 00:05:45,810 I want to show you an example of this in the next video.