1
00:00:00,590 --> 00:00:05,387
One of the challenges of face recognition
is that you need to solve the one-shot

2
00:00:05,387 --> 00:00:07,010
learning problem.

3
00:00:07,010 --> 00:00:10,380
What that means is that for
most face recognition applications

4
00:00:10,380 --> 00:00:14,695
you need to be able to recognize
a person given just one single image, or

5
00:00:14,695 --> 00:00:17,640
given just one example
of that person's face.

6
00:00:17,640 --> 00:00:18,922
And, historically,

7
00:00:18,922 --> 00:00:23,580
deep learning algorithms don't work well
if you have only one training example.

8
00:00:23,580 --> 00:00:26,163
Let's see an example
of what this means and

9
00:00:26,163 --> 00:00:29,270
talk about how to address this problem.

10
00:00:29,270 --> 00:00:33,270
Let's say you have a database of
four pictures of employees in you're

11
00:00:33,270 --> 00:00:34,310
organization.

12
00:00:34,310 --> 00:00:38,680
These are actually some of my colleagues
at Deeplearning.AI, Khan, Danielle,

13
00:00:38,680 --> 00:00:40,350
Younes and Thian.

14
00:00:40,350 --> 00:00:43,340
Now let's say someone
shows up at the office and

15
00:00:43,340 --> 00:00:46,880
they want to be let through the turnstile.

16
00:00:46,880 --> 00:00:52,710
What the system has to do is, despite ever
having seen only one image of Danielle,

17
00:00:52,710 --> 00:00:56,170
to recognize that this is
actually the same person.

18
00:00:56,170 --> 00:00:59,640
And, in contrast, if it sees someone
that's not in this database,

19
00:00:59,640 --> 00:01:04,810
then it should recognize that this is not
any of the four persons in the database.

20
00:01:04,810 --> 00:01:06,560
So in the one shot learning problem,

21
00:01:06,560 --> 00:01:11,860
you have to learn from just one
example to recognize the person again.

22
00:01:11,860 --> 00:01:12,780
And you need this for

23
00:01:12,780 --> 00:01:17,520
most face recognition systems because
you might have only one picture

24
00:01:17,520 --> 00:01:22,450
of each of your employees or of your
team members in your employee database.

25
00:01:22,450 --> 00:01:27,990
So one approach you could try is
to input the image of the person,

26
00:01:27,990 --> 00:01:30,000
feed it too a ConvNet.

27
00:01:30,000 --> 00:01:36,560
And have it output a label, y, using
a softmax unit with four outputs or maybe

28
00:01:36,560 --> 00:01:41,600
five outputs corresponding to each of
these four persons or none of the above.

29
00:01:41,600 --> 00:01:44,530
So that would be 5 outputs in the softmax.

30
00:01:44,530 --> 00:01:46,140
But this really doesn't work well.

31
00:01:46,140 --> 00:01:50,410
Because if you have such a small
training set it is really not enough

32
00:01:50,410 --> 00:01:54,400
to train a robust neural network for
this task.

33
00:01:54,400 --> 00:01:57,660
And also what if a new
person joins your team?

34
00:01:57,660 --> 00:02:01,040
So now you have 5 persons
you need to recognize, so

35
00:02:01,040 --> 00:02:03,520
there should now be six outputs.

36
00:02:03,520 --> 00:02:06,070
Do you have to retrain
the ConvNet every time?

37
00:02:06,070 --> 00:02:08,110
That just doesn't seem
like a good approach.

38
00:02:08,110 --> 00:02:12,480
So to carry out face recognition,
to carry out one-shot learning.

39
00:02:12,480 --> 00:02:14,390
So instead, to make this work,

40
00:02:14,390 --> 00:02:18,350
what you're going to do instead
is learn a similarity function.

41
00:02:18,350 --> 00:02:22,190
In particular, you want a neural
network to learn a function

42
00:02:22,190 --> 00:02:26,930
which going to denote d,
which inputs two images and

43
00:02:26,930 --> 00:02:30,780
outputs the degree of difference
between the two images.

44
00:02:30,780 --> 00:02:34,780
So if the two images
are of the same person,

45
00:02:34,780 --> 00:02:37,660
you want this to output a small number.

46
00:02:37,660 --> 00:02:42,550
And if the two images are of two very
different people you want it to output

47
00:02:42,550 --> 00:02:43,610
a large number.

48
00:02:43,610 --> 00:02:48,710
So during recognition time, if the degree
of difference between them is less than

49
00:02:48,710 --> 00:02:54,560
some threshold called tau,
which is a hyperparameter.

50
00:02:54,560 --> 00:02:59,290
Then you would predict that these
two pictures are the same person.

51
00:02:59,290 --> 00:03:05,000
And if it is greater than tau, you would
predict that these are different persons.

52
00:03:06,580 --> 00:03:12,510
And so this is how you address
the face verification problem.

53
00:03:12,510 --> 00:03:17,370
To use this for a recognition task,
what you do is,

54
00:03:17,370 --> 00:03:23,200
given this new picture, you will use this
function d to compare these two images.

55
00:03:23,200 --> 00:03:28,450
And maybe I'll output a very large number,
let's say 10, for this example.

56
00:03:28,450 --> 00:03:32,500
And then you compare this with
the second image in your database.

57
00:03:32,500 --> 00:03:37,580
And because these two are the same person,
hopefully you output a very small number.

58
00:03:37,580 --> 00:03:42,110
You do this for the other images
in your database and so on.

59
00:03:43,140 --> 00:03:48,630
And based on this, you would figure
out that this is actually that person,

60
00:03:48,630 --> 00:03:50,130
which is Danielle.

61
00:03:50,130 --> 00:03:53,634
And in contrast,
if someone not in your database shows up,

62
00:03:53,634 --> 00:03:57,629
as you use the function d to make
all of these pairwise comparisons,

63
00:03:57,629 --> 00:04:03,050
hopefully d will output have a very large
number for all four pairwise comparisons.

64
00:04:03,050 --> 00:04:07,610
And then you say that this is not any
one of the four persons in the database.

65
00:04:07,610 --> 00:04:11,860
Notice how this allows you to solve
the one-shot learning problem.

66
00:04:11,860 --> 00:04:16,970
So long as you can learn this function d,
which inputs a pair of images and

67
00:04:16,970 --> 00:04:20,600
tells you, basically, if they're
the same person or different persons.

68
00:04:20,600 --> 00:04:23,790
Then if you have someone
new join your team,

69
00:04:23,790 --> 00:04:28,850
you can add a fifth person to your
database, and it just works fine.

70
00:04:30,320 --> 00:04:34,452
So you've seen how learning this
function d, which inputs two images,

71
00:04:34,452 --> 00:04:38,110
allows you to address
the one-shot learning problem.

72
00:04:38,110 --> 00:04:41,720
In the next video,
let's take a look at how you can actually

73
00:04:41,720 --> 00:04:44,550
train the neural network
to learn dysfunction d.