One of the challenges of face recognition
is that you need to solve the one-shot learning problem. What that means is that for
most face recognition applications you need to be able to recognize
a person given just one single image, or given just one example
of that person's face. And, historically, deep learning algorithms don't work well
if you have only one training example. Let's see an example
of what this means and talk about how to address this problem. Let's say you have a database of
four pictures of employees in you're organization. These are actually some of my colleagues
at Deeplearning.AI, Khan, Danielle, Younes and Thian. Now let's say someone
shows up at the office and they want to be let through the turnstile. What the system has to do is, despite ever
having seen only one image of Danielle, to recognize that this is
actually the same person. And, in contrast, if it sees someone
that's not in this database, then it should recognize that this is not
any of the four persons in the database. So in the one shot learning problem, you have to learn from just one
example to recognize the person again. And you need this for most face recognition systems because
you might have only one picture of each of your employees or of your
team members in your employee database. So one approach you could try is
to input the image of the person, feed it too a ConvNet. And have it output a label, y, using
a softmax unit with four outputs or maybe five outputs corresponding to each of
these four persons or none of the above. So that would be 5 outputs in the softmax. But this really doesn't work well. Because if you have such a small
training set it is really not enough to train a robust neural network for
this task. And also what if a new
person joins your team? So now you have 5 persons
you need to recognize, so there should now be six outputs. Do you have to retrain
the ConvNet every time? That just doesn't seem
like a good approach. So to carry out face recognition,
to carry out one-shot learning. So instead, to make this work, what you're going to do instead
is learn a similarity function. In particular, you want a neural
network to learn a function which going to denote d,
which inputs two images and outputs the degree of difference
between the two images. So if the two images
are of the same person, you want this to output a small number. And if the two images are of two very
different people you want it to output a large number. So during recognition time, if the degree
of difference between them is less than some threshold called tau,
which is a hyperparameter. Then you would predict that these
two pictures are the same person. And if it is greater than tau, you would
predict that these are different persons. And so this is how you address
the face verification problem. To use this for a recognition task,
what you do is, given this new picture, you will use this
function d to compare these two images. And maybe I'll output a very large number,
let's say 10, for this example. And then you compare this with
the second image in your database. And because these two are the same person,
hopefully you output a very small number. You do this for the other images
in your database and so on. And based on this, you would figure
out that this is actually that person, which is Danielle. And in contrast,
if someone not in your database shows up, as you use the function d to make
all of these pairwise comparisons, hopefully d will output have a very large
number for all four pairwise comparisons. And then you say that this is not any
one of the four persons in the database. Notice how this allows you to solve
the one-shot learning problem. So long as you can learn this function d,
which inputs a pair of images and tells you, basically, if they're
the same person or different persons. Then if you have someone
new join your team, you can add a fifth person to your
database, and it just works fine. So you've seen how learning this
function d, which inputs two images, allows you to address
the one-shot learning problem. In the next video,
let's take a look at how you can actually train the neural network
to learn dysfunction d.