1 00:00:00,590 --> 00:00:05,387 One of the challenges of face recognition is that you need to solve the one-shot 2 00:00:05,387 --> 00:00:07,010 learning problem. 3 00:00:07,010 --> 00:00:10,380 What that means is that for most face recognition applications 4 00:00:10,380 --> 00:00:14,695 you need to be able to recognize a person given just one single image, or 5 00:00:14,695 --> 00:00:17,640 given just one example of that person's face. 6 00:00:17,640 --> 00:00:18,922 And, historically, 7 00:00:18,922 --> 00:00:23,580 deep learning algorithms don't work well if you have only one training example. 8 00:00:23,580 --> 00:00:26,163 Let's see an example of what this means and 9 00:00:26,163 --> 00:00:29,270 talk about how to address this problem. 10 00:00:29,270 --> 00:00:33,270 Let's say you have a database of four pictures of employees in you're 11 00:00:33,270 --> 00:00:34,310 organization. 12 00:00:34,310 --> 00:00:38,680 These are actually some of my colleagues at Deeplearning.AI, Khan, Danielle, 13 00:00:38,680 --> 00:00:40,350 Younes and Thian. 14 00:00:40,350 --> 00:00:43,340 Now let's say someone shows up at the office and 15 00:00:43,340 --> 00:00:46,880 they want to be let through the turnstile. 16 00:00:46,880 --> 00:00:52,710 What the system has to do is, despite ever having seen only one image of Danielle, 17 00:00:52,710 --> 00:00:56,170 to recognize that this is actually the same person. 18 00:00:56,170 --> 00:00:59,640 And, in contrast, if it sees someone that's not in this database, 19 00:00:59,640 --> 00:01:04,810 then it should recognize that this is not any of the four persons in the database. 20 00:01:04,810 --> 00:01:06,560 So in the one shot learning problem, 21 00:01:06,560 --> 00:01:11,860 you have to learn from just one example to recognize the person again. 22 00:01:11,860 --> 00:01:12,780 And you need this for 23 00:01:12,780 --> 00:01:17,520 most face recognition systems because you might have only one picture 24 00:01:17,520 --> 00:01:22,450 of each of your employees or of your team members in your employee database. 25 00:01:22,450 --> 00:01:27,990 So one approach you could try is to input the image of the person, 26 00:01:27,990 --> 00:01:30,000 feed it too a ConvNet. 27 00:01:30,000 --> 00:01:36,560 And have it output a label, y, using a softmax unit with four outputs or maybe 28 00:01:36,560 --> 00:01:41,600 five outputs corresponding to each of these four persons or none of the above. 29 00:01:41,600 --> 00:01:44,530 So that would be 5 outputs in the softmax. 30 00:01:44,530 --> 00:01:46,140 But this really doesn't work well. 31 00:01:46,140 --> 00:01:50,410 Because if you have such a small training set it is really not enough 32 00:01:50,410 --> 00:01:54,400 to train a robust neural network for this task. 33 00:01:54,400 --> 00:01:57,660 And also what if a new person joins your team? 34 00:01:57,660 --> 00:02:01,040 So now you have 5 persons you need to recognize, so 35 00:02:01,040 --> 00:02:03,520 there should now be six outputs. 36 00:02:03,520 --> 00:02:06,070 Do you have to retrain the ConvNet every time? 37 00:02:06,070 --> 00:02:08,110 That just doesn't seem like a good approach. 38 00:02:08,110 --> 00:02:12,480 So to carry out face recognition, to carry out one-shot learning. 39 00:02:12,480 --> 00:02:14,390 So instead, to make this work, 40 00:02:14,390 --> 00:02:18,350 what you're going to do instead is learn a similarity function. 41 00:02:18,350 --> 00:02:22,190 In particular, you want a neural network to learn a function 42 00:02:22,190 --> 00:02:26,930 which going to denote d, which inputs two images and 43 00:02:26,930 --> 00:02:30,780 outputs the degree of difference between the two images. 44 00:02:30,780 --> 00:02:34,780 So if the two images are of the same person, 45 00:02:34,780 --> 00:02:37,660 you want this to output a small number. 46 00:02:37,660 --> 00:02:42,550 And if the two images are of two very different people you want it to output 47 00:02:42,550 --> 00:02:43,610 a large number. 48 00:02:43,610 --> 00:02:48,710 So during recognition time, if the degree of difference between them is less than 49 00:02:48,710 --> 00:02:54,560 some threshold called tau, which is a hyperparameter. 50 00:02:54,560 --> 00:02:59,290 Then you would predict that these two pictures are the same person. 51 00:02:59,290 --> 00:03:05,000 And if it is greater than tau, you would predict that these are different persons. 52 00:03:06,580 --> 00:03:12,510 And so this is how you address the face verification problem. 53 00:03:12,510 --> 00:03:17,370 To use this for a recognition task, what you do is, 54 00:03:17,370 --> 00:03:23,200 given this new picture, you will use this function d to compare these two images. 55 00:03:23,200 --> 00:03:28,450 And maybe I'll output a very large number, let's say 10, for this example. 56 00:03:28,450 --> 00:03:32,500 And then you compare this with the second image in your database. 57 00:03:32,500 --> 00:03:37,580 And because these two are the same person, hopefully you output a very small number. 58 00:03:37,580 --> 00:03:42,110 You do this for the other images in your database and so on. 59 00:03:43,140 --> 00:03:48,630 And based on this, you would figure out that this is actually that person, 60 00:03:48,630 --> 00:03:50,130 which is Danielle. 61 00:03:50,130 --> 00:03:53,634 And in contrast, if someone not in your database shows up, 62 00:03:53,634 --> 00:03:57,629 as you use the function d to make all of these pairwise comparisons, 63 00:03:57,629 --> 00:04:03,050 hopefully d will output have a very large number for all four pairwise comparisons. 64 00:04:03,050 --> 00:04:07,610 And then you say that this is not any one of the four persons in the database. 65 00:04:07,610 --> 00:04:11,860 Notice how this allows you to solve the one-shot learning problem. 66 00:04:11,860 --> 00:04:16,970 So long as you can learn this function d, which inputs a pair of images and 67 00:04:16,970 --> 00:04:20,600 tells you, basically, if they're the same person or different persons. 68 00:04:20,600 --> 00:04:23,790 Then if you have someone new join your team, 69 00:04:23,790 --> 00:04:28,850 you can add a fifth person to your database, and it just works fine. 70 00:04:30,320 --> 00:04:34,452 So you've seen how learning this function d, which inputs two images, 71 00:04:34,452 --> 00:04:38,110 allows you to address the one-shot learning problem. 72 00:04:38,110 --> 00:04:41,720 In the next video, let's take a look at how you can actually 73 00:04:41,720 --> 00:04:44,550 train the neural network to learn dysfunction d.