1
00:00:00,185 --> 00:00:04,747
[MUSIC]

2
00:00:04,747 --> 00:00:08,230
So, we've learned that deep
neural networks are really cool,

3
00:00:08,230 --> 00:00:12,097
high accuracy tool, but they can be
really hard to build and learn, and

4
00:00:12,097 --> 00:00:14,640
require lots and lots of data.

5
00:00:14,640 --> 00:00:17,510
So next, we're gonna talk about
something really exciting.

6
00:00:17,510 --> 00:00:21,240
Which is called deep features,
which allow you to build neural networks,

7
00:00:21,240 --> 00:00:23,690
even when you don't have a lot of data.

8
00:00:23,690 --> 00:00:29,340
So, if you go back to our data
image classification pipeline,

9
00:00:29,340 --> 00:00:35,730
where we start with an image, we detected
some features, or other representations,

10
00:00:35,730 --> 00:00:42,220
and we've had that to a simple classifier,
like a linear classifier.

11
00:00:42,220 --> 00:00:45,280
The question here is can we somehow

12
00:00:45,280 --> 00:00:48,660
use the features that we learn
through the neural network?

13
00:00:48,660 --> 00:00:54,390
Those cool ones at the corners, edges and
even faces, to feed that classifier?

14
00:00:55,410 --> 00:00:56,940
So can we do something a little different?

15
00:00:58,830 --> 00:00:59,720
The idea here,

16
00:00:59,720 --> 00:01:03,350
that you have deep features,
is something called transfer learning.

17
00:01:03,350 --> 00:01:06,940
So transfer learning is pretty old idea,
that's been around for quite a while, but

18
00:01:06,940 --> 00:01:12,100
had a lot of impact in recent years
in the area of deep neural networks.

19
00:01:12,100 --> 00:01:16,500
So the idea here is, I train the neural
network in a case where I have lots and

20
00:01:16,500 --> 00:01:17,510
lots of data.

21
00:01:17,510 --> 00:01:21,109
So for example, in a task of
differentiating cats versus dogs.

22
00:01:22,110 --> 00:01:27,590
And I learned that eight layers,
16 million parameter complex new network.

23
00:01:27,590 --> 00:01:31,500
And I get great accuracy in
the task of cat versus dog.

24
00:01:31,500 --> 00:01:35,910
Now, the cool thing is to say, okay,
what if I have a little bit of data,

25
00:01:35,910 --> 00:01:41,080
not tons of data for new tasks,
let's say I am detecting chairs and

26
00:01:41,080 --> 00:01:45,150
elephants and cars and
cameras in hundreds of categories.

27
00:01:45,150 --> 00:01:51,420
Can we somehow use the features that we
learned in cats versus dog to combine for

28
00:01:51,420 --> 00:01:57,209
simple classifier and feed that and get
great accuracy on this 101 new categories.

29
00:01:58,620 --> 00:02:00,660
That's the idea of transfer learning.

30
00:02:00,660 --> 00:02:06,270
The features I learned from cat
versus dog get transferred to provide

31
00:02:06,270 --> 00:02:10,590
accuracy in the new task, which is
detecting elephants, cameras and so on.

32
00:02:11,890 --> 00:02:14,730
To understand transfer
learning deep neural networks,

33
00:02:14,730 --> 00:02:19,120
let's revise the idea of what
a deep neural network might learn.

34
00:02:19,120 --> 00:02:22,270
So here's a deep neural
network of cat versus dogs.

35
00:02:22,270 --> 00:02:25,660
And let's say that we have really good
accuracy there for that task, task one,

36
00:02:25,660 --> 00:02:27,410
cats versus dogs.

37
00:02:27,410 --> 00:02:31,970
If you look at the last few layers, they
really focus on the cat versus dog task.

38
00:02:31,970 --> 00:02:33,212
They're very specific.

39
00:02:33,212 --> 00:02:36,295
It kinda like I showed you, there's
an example where there was a detection of

40
00:02:36,295 --> 00:02:37,860
colors inside that last layer.

41
00:02:39,050 --> 00:02:41,510
Now the ones in the middle
are much more general.

42
00:02:41,510 --> 00:02:45,470
They can represent things like corners and
edges and circles and

43
00:02:45,470 --> 00:02:51,310
squiggly patterns and things that can
really generalize from the cat versus dog

44
00:02:51,310 --> 00:02:57,500
task to this more general
101 categories task.

45
00:02:57,500 --> 00:03:02,453
So let's talk about how we can deal
with second task, 101 categories.

46
00:03:02,453 --> 00:03:07,944
So we learn deep neural network for
the cat versus dog can apply for Task 2.

47
00:03:07,944 --> 00:03:09,672
Now, if you think about it,

48
00:03:09,672 --> 00:03:14,005
this end piece of the neural network
is very specific to cat versus dog.

49
00:03:14,005 --> 00:03:17,429
So it's not that useful for
detecting chairs perhaps.

50
00:03:17,429 --> 00:03:22,385
So what we can do is chop off the last
few layers, the last layer lets see it

51
00:03:22,385 --> 00:03:27,287
in the network and keep the weights
fixed for the first several layers.

52
00:03:27,287 --> 00:03:30,026
Because those are good those
detected good features.

53
00:03:30,026 --> 00:03:34,629
This last layer with a simple linear
classifier which is simple, so

54
00:03:34,629 --> 00:03:39,149
I can just train in the little bits
of data that I have about chairs,

55
00:03:39,149 --> 00:03:41,507
cars, elephants, and cameras.

56
00:03:44,201 --> 00:03:47,853
So going back to the example
that we described earlier,

57
00:03:47,853 --> 00:03:49,910
where we had three layers.

58
00:03:49,910 --> 00:03:53,800
The first layer detected
diagonal edges and edges.

59
00:03:53,800 --> 00:03:57,510
The second one detected
squiggly patterns and corners.

60
00:03:57,510 --> 00:04:01,580
While the third one was about colors and
faces,

61
00:04:01,580 --> 00:04:07,030
we now can use those layers for new task,
but we need to be a little careful.

62
00:04:07,030 --> 00:04:11,773
Layer 3 might be too specific but
layers 1 and 2 can be quite useful.

63
00:04:11,773 --> 00:04:15,598
So now that we learned about
transfer learning the concept,

64
00:04:15,598 --> 00:04:20,481
let's review that deep learning pipeline,
but using these deep features.

65
00:04:20,481 --> 00:04:24,580
So I'm gonna start with some labeled data,
not tons, just a little bit is enough.

66
00:04:24,580 --> 00:04:27,720
And then I'm gonna extract
features using that

67
00:04:27,720 --> 00:04:30,110
deep neural network
just like we described.

68
00:04:30,110 --> 00:04:35,021
I'm gonna split this data set into
training and test of validation set.

69
00:04:35,021 --> 00:04:38,488
I'm gonna learn the simple
classifier like linear classifier,

70
00:04:38,488 --> 00:04:41,330
support vector machines, simple things.

71
00:04:41,330 --> 00:04:45,670
And as I validated,
since it's a simple classifier,

72
00:04:45,670 --> 00:04:47,900
there's not many parameters to tune.

73
00:04:47,900 --> 00:04:49,100
Pretty easy to do.

74
00:04:49,100 --> 00:04:51,840
Can be learned with little data and
do quite well.

75
00:04:52,920 --> 00:04:56,570
And in fact, we see an application
off the application where this idea

76
00:04:56,570 --> 00:05:00,830
works extremely well, and is exactly
the idea that I showed you in the demo,

77
00:05:00,830 --> 00:05:04,970
the beginning of the module when I
showed you how to buy new dresses.

78
00:05:04,970 --> 00:05:10,020
We didn't have lots of data about visual
description dresses, but we used something

79
00:05:10,020 --> 00:05:16,940
that was trained on imagenet to provide
you with a good dress shopping experience.

80
00:05:18,030 --> 00:05:22,550
Now, you may ask,
how general are these deep features?

81
00:05:22,550 --> 00:05:27,540
Can they really be used for
interesting, extremely unusual tasks?

82
00:05:28,780 --> 00:05:30,540
Well, actually, you will be surprised.

83
00:05:31,750 --> 00:05:33,345
In fact, let's talk about trash.

84
00:05:33,345 --> 00:05:36,530
[LAUGH] A company called compology.

85
00:05:36,530 --> 00:05:38,190
It's a pretty interesting company.

86
00:05:38,190 --> 00:05:43,540
They're trying to reinvent
how trash collection is done.

87
00:05:43,540 --> 00:05:48,570
Normally, the trash truck goes from house
to house, from business to business, and

88
00:05:48,570 --> 00:05:51,780
collects trash on a regular basis,
every day, once a week, and so on.

89
00:05:52,830 --> 00:05:56,690
They wanna change that and
optimize the path of trucks and

90
00:05:56,690 --> 00:06:00,930
how trash is collected to minimize
the amount of time spent.

91
00:06:00,930 --> 00:06:04,100
And the way they do that is by
installing cameras on trash cans

92
00:06:04,100 --> 00:06:06,540
to figure out what's in there and
how full it is.

93
00:06:06,540 --> 00:06:14,480
Well we don't have tons of labeled data of
how images look for full trash cans, but

94
00:06:14,480 --> 00:06:19,480
when they did this, use deep features and
a little bit of training data from humans,

95
00:06:19,480 --> 00:06:24,780
marking the depth of trash cans
to learn a trash detector,

96
00:06:24,780 --> 00:06:29,877
and be able to optimize the path of
trucks, in order to better serve.

97
00:06:29,877 --> 00:06:36,650
So decrease the amount of time
trucks need to collect garbage.

98
00:06:36,650 --> 00:06:38,823
So deep features are useful even for
garbage.

99
00:06:38,823 --> 00:06:44,399
[MUSIC]