1 00:00:04,180 --> 00:00:08,250 Deep learning is exciting because it learns these complex features of images. 2 00:00:08,250 --> 00:00:09,940 And as we discussed earlier. 3 00:00:09,940 --> 00:00:13,660 They've had tremendous impact over the recent years in a variety of 4 00:00:13,660 --> 00:00:15,280 computer vision applications. 5 00:00:15,280 --> 00:00:17,720 Let me show you a couple of early examples. 6 00:00:17,720 --> 00:00:23,466 So, on the top of the slide here, what you see is an example of identifying 7 00:00:23,466 --> 00:00:27,230 traffic signs based on neural networks. 8 00:00:27,230 --> 00:00:30,540 So these are a data of German traffic signs and 9 00:00:30,540 --> 00:00:34,370 the idea is for every image, identify what sign it is. 10 00:00:34,370 --> 00:00:38,970 And they were able to get 99.5% accuracy using a deep neural network, 11 00:00:38,970 --> 00:00:40,690 which is pretty cool. 12 00:00:40,690 --> 00:00:46,500 On the bottom there, you see an example that came out of some work from Google on 13 00:00:46,500 --> 00:00:50,210 identifying the house numbers based on what's called Street View data. 14 00:00:50,210 --> 00:00:53,970 This is the data that Google uses driving around cars and 15 00:00:53,970 --> 00:00:57,270 photographing all sorts of streets around the world. 16 00:00:57,270 --> 00:00:59,760 And you see the images are pretty complex, and 17 00:00:59,760 --> 00:01:04,940 still they're able to get 97.8% accuracy on the per character level. 18 00:01:06,520 --> 00:01:08,100 Now these were exciting results. 19 00:01:08,100 --> 00:01:09,831 But the one that changed everything. 20 00:01:09,831 --> 00:01:16,106 The really excited field happened in 2012. 21 00:01:16,106 --> 00:01:21,060 So for many years, there was a image competition called ImageNet. 22 00:01:21,060 --> 00:01:25,857 And in 2012, the ImageNet competition included 1.2 million 23 00:01:25,857 --> 00:01:30,340 training images from about 1,000 different categories. 24 00:01:30,340 --> 00:01:34,390 And the idea was can you classify this image. 25 00:01:34,390 --> 00:01:38,560 Is it not just a dog, but is it a golden retriever or a labrador? 26 00:01:38,560 --> 00:01:40,660 Very, very fine level detail. 27 00:01:42,770 --> 00:01:45,170 Now there's many teams competing. 28 00:01:45,170 --> 00:01:47,430 These are the top 3 teams. 29 00:01:47,430 --> 00:01:53,580 A team called OXFORD_VGG, which got pretty decent accuracy. 30 00:01:53,580 --> 00:01:57,164 So if you look at their top five guesses can you get the right thing 31 00:01:57,164 --> 00:01:58,460 in five guesses. 32 00:01:58,460 --> 00:02:02,791 They were getting only about 25% error. 33 00:02:02,791 --> 00:02:05,230 There's a thing called ISI that did a little bit better. 34 00:02:05,230 --> 00:02:08,240 And those are using traditional techniques like the SIFT[1], 35 00:02:08,240 --> 00:02:10,760 a little bit more elaborate, kinda like that. 36 00:02:10,760 --> 00:02:14,157 Now that year, is a team called SuperVision. 37 00:02:14,157 --> 00:02:19,752 That team used a deep, neural network and had a huge gain over the competitors and 38 00:02:19,752 --> 00:02:25,432 that performance really sparked a lot of excitement of using deep neural networks 39 00:02:25,432 --> 00:02:30,451 in computer vision because if they have to just use hand coded features, 40 00:02:30,451 --> 00:02:32,950 you'd learn them automatically. 41 00:02:34,826 --> 00:02:39,807 Now that neural network that won the competition with the SuperVision team was 42 00:02:39,807 --> 00:02:45,088 called the AlexNet neural network and I'm showing here an image from their paper, 43 00:02:45,088 --> 00:02:49,918 and that neural network involved 8 layers, 60 million parameters, and 44 00:02:49,918 --> 00:02:53,917 was only possible because of new training algorithms the could 45 00:02:53,917 --> 00:02:57,087 deal with lots of images and lots parameters, and 46 00:02:57,087 --> 00:03:01,418 the GPU implementation that would really scale to large data sets. 47 00:03:01,418 --> 00:03:05,229 [MUSIC]