1
00:00:00,270 --> 00:00:04,351
You have learned a lot about ConvNets,
everything ranging from

2
00:00:04,351 --> 00:00:08,888
the architecture of the ConvNet to
how to use it for image recognition,

3
00:00:08,888 --> 00:00:13,590
to object detection, to face
recognition and neural-style transfer.

4
00:00:13,590 --> 00:00:17,626
And even though most of
the discussion has focused on images,

5
00:00:17,626 --> 00:00:21,205
on sort of 2D data,
because images are so pervasive.

6
00:00:21,205 --> 00:00:26,135
It turns out that many of the ideas
you've learned about also apply,

7
00:00:26,135 --> 00:00:30,640
not just to 2D images but
also to 1D data as well as to 3D data.

8
00:00:30,640 --> 00:00:33,048
Let's take a look.

9
00:00:33,048 --> 00:00:38,506
In the first week of this course,
you learned about the 2D convolution,

10
00:00:38,506 --> 00:00:44,340
where you might input a 14 x 14 image and
convolve that with a 5 x 5 filter.

11
00:00:44,340 --> 00:00:49,097
And you saw how 14 x 14
convolved with 5 x 5,

12
00:00:49,097 --> 00:00:52,590
this gives you a 10 x 10 output.

13
00:00:52,590 --> 00:00:58,662
And if you have multiple channels,
maybe those 14 x 14 x 3,

14
00:00:58,662 --> 00:01:03,170
then it would be 5 x 5
that matches the same 3.

15
00:01:03,170 --> 00:01:08,460
And then if you have multiple filters, say
16 filters, you end up with 10 x 10 x 16.

16
00:01:08,460 --> 00:01:14,430
It turns out that a similar idea
can be applied to 1D data as well.

17
00:01:14,430 --> 00:01:21,328
For example, on the left is an EKG signal,
also called an electrocardioagram.

18
00:01:21,328 --> 00:01:25,577
Basically if you place an electrode
over your chest, this measures

19
00:01:25,577 --> 00:01:29,910
the little voltages that vary across
your chest as your heart beats.

20
00:01:29,910 --> 00:01:34,562
Because the little electric waves
generated by your heart's beating can be

21
00:01:34,562 --> 00:01:36,823
measured with a pair of electrodes.

22
00:01:36,823 --> 00:01:40,490
And so
this is an EKG of someone's heart beating.

23
00:01:40,490 --> 00:01:45,930
And so each of these peaks
corresponds to one heartbeat.

24
00:01:45,930 --> 00:01:49,970
So if you want to use EKG signals
to make medical diagnoses, for

25
00:01:49,970 --> 00:01:55,062
example, then you would have 1D
data because what EKG data is,

26
00:01:55,062 --> 00:02:01,610
is it's a time series showing
the voltage at each instant in time.

27
00:02:01,610 --> 00:02:04,500
So rather than a 14 x
14 dimensional input,

28
00:02:04,500 --> 00:02:08,160
maybe you just have
a 14 dimensional input.

29
00:02:08,160 --> 00:02:11,770
And in that case, you might want to
convolve this with a 1 dimensional filter.

30
00:02:11,770 --> 00:02:16,420
So rather than the 5 by 5,
you just have 5 dimensional filter.

31
00:02:16,420 --> 00:02:21,481
So with 2D data what a convolution will
allow you to do was to take the same 5 x 5

32
00:02:21,481 --> 00:02:26,950
feature detector and apply it across at
different positions throughout the image.

33
00:02:26,950 --> 00:02:31,110
And that's how you wound up
with your 10 x 10 output.

34
00:02:31,110 --> 00:02:36,258
What a 1D filter allows you to do is
take your 5 dimensional filter and

35
00:02:36,258 --> 00:02:42,860
similarly apply that in lots of different
positions throughout this 1D signal.

36
00:02:42,860 --> 00:02:45,510
And so if you apply this convolution,

37
00:02:45,510 --> 00:02:50,270
what you find is that a 14
dimensional thing convolved with

38
00:02:50,270 --> 00:02:55,370
this 5 dimensional thing, this would
give you a 10 dimensional output.

39
00:02:55,370 --> 00:03:00,496
And again, if you have multiple channels,
you might have in this case you

40
00:03:00,496 --> 00:03:06,381
can use just 1 channel, if you have 1 lead
or 1 electrode for EKG, so times 5 x 1.

41
00:03:06,381 --> 00:03:12,468
And if you have 16 filters,
maybe end up with 10 x 16 over there,

42
00:03:12,468 --> 00:03:16,300
and this could be one
layer of your ConvNet.

43
00:03:16,300 --> 00:03:20,257
And then for the next layer of your
ConvNet, if you input a 10 x 16

44
00:03:20,257 --> 00:03:25,560
dimensional input and you might convolve
that with a 5 dimensional filter again.

45
00:03:25,560 --> 00:03:29,583
Then these have 16 channels,
so that has a match.

46
00:03:29,583 --> 00:03:34,585
And we have 32 filters,
then the output of another layer

47
00:03:34,585 --> 00:03:39,190
would be 6 x 32,
if you have 32 filters, right?

48
00:03:39,190 --> 00:03:42,268
And the analogy to the the 2D data,

49
00:03:42,268 --> 00:03:46,779
this is similar to all of
the 10 x 10 x 16 data and

50
00:03:46,779 --> 00:03:51,860
convolve it with a 5 x 5 x 16,
and that has to match.

51
00:03:51,860 --> 00:03:54,568
That will give you a 6
by 6 dimensional output,

52
00:03:54,568 --> 00:03:58,080
and you have 32 filters,
that's where the 32 comes from.

53
00:03:58,080 --> 00:04:03,567
So all of these ideas apply also to
1D data, where you can have the same

54
00:04:03,567 --> 00:04:08,884
feature detector, such as this,
apply to a variety of positions.

55
00:04:08,884 --> 00:04:13,430
For example, to detect the different
heartbeats in an EKG signal.

56
00:04:13,430 --> 00:04:18,505
But to use the same set of features to
detect the heartbeats even at different

57
00:04:18,505 --> 00:04:23,836
positions along these time series, and
so ConvNet can be used even on 1D data.

58
00:04:23,836 --> 00:04:28,501
For along with 1D data applications, you
actually use a recurrent neural network,

59
00:04:28,501 --> 00:04:30,790
which you learn about in the next course.

60
00:04:30,790 --> 00:04:36,520
But some people can also try
using ConvNets in these problems.

61
00:04:36,520 --> 00:04:39,990
And in the next course on sequence models,
which we will talk about

62
00:04:39,990 --> 00:04:43,310
recurring neural networks and
LCM and other models like that.

63
00:04:43,310 --> 00:04:47,545
We'll talk about the pros and cons of
using 1D ConvNets versus some of those

64
00:04:47,545 --> 00:04:51,070
other models that are explicitly
designed to sequenced data.

65
00:04:51,070 --> 00:04:54,290
So that's the generalization
from 2D to 1D.

66
00:04:54,290 --> 00:04:56,510
How about 3D data?

67
00:04:56,510 --> 00:04:58,900
Well, what is three dimensional data?

68
00:04:58,900 --> 00:05:04,720
It is that, instead of having a 1D list
of numbers or a 2D matrix of numbers,

69
00:05:04,720 --> 00:05:11,060
you now have a 3D block, a three
dimensional input volume of numbers.

70
00:05:11,060 --> 00:05:15,123
So here's the example of that
which is if you take a CT scan,

71
00:05:15,123 --> 00:05:20,510
this is a type of X-ray scan that gives
a three dimensional model of your body.

72
00:05:20,510 --> 00:05:24,746
But what a CT scan does is it takes
different slices through your body.

73
00:05:24,746 --> 00:05:28,465
So as you scan through a CT
scan which I'm doing here,

74
00:05:28,465 --> 00:05:33,507
you can look at different slices of
the human torso to see how they look and

75
00:05:33,507 --> 00:05:37,090
so this data is fundamentally
three dimensional.

76
00:05:37,090 --> 00:05:43,039
And one way to think of this data is
if your data now has some height,

77
00:05:43,039 --> 00:05:46,558
some width, and then also some depth.

78
00:05:46,558 --> 00:05:50,359
Where this is the different
slices through this volume,

79
00:05:50,359 --> 00:05:53,840
are the different slices
through the torso.

80
00:05:53,840 --> 00:05:57,660
So if you want to apply a ConvNet
to detect features in this

81
00:05:57,660 --> 00:06:02,470
three dimensional CAT scan or CT scan,
then you can generalize the ideas from

82
00:06:02,470 --> 00:06:07,020
the first slide to three
dimensional convolutions as well.

83
00:06:07,020 --> 00:06:10,356
So if you have a 3D volume, and for

84
00:06:10,356 --> 00:06:15,764
the sake of simplicity let's
say is 14 x 14 x 14 and

85
00:06:15,764 --> 00:06:21,770
so this is the height, width,
and depth of the input CT scan.

86
00:06:21,770 --> 00:06:25,735
And again, just like images
they'll all have to be square,

87
00:06:25,735 --> 00:06:29,450
a 3D volume doesn't have to
be a perfect cube as well.

88
00:06:29,450 --> 00:06:32,210
So the height and
width of a image can be different, and

89
00:06:32,210 --> 00:06:36,118
in the same way the height and width and
the depth of a CT scan can be different.

90
00:06:36,118 --> 00:06:40,560
But I'm just using 14 x 14 x 14
here to simplify the discussion.

91
00:06:40,560 --> 00:06:45,849
And if you convolve this with
a now a 5 x 5 x 5 filter,

92
00:06:45,849 --> 00:06:50,788
so you're filters now
are also three dimensional

93
00:06:50,788 --> 00:06:55,863
then this would give you
a 10 x 10 x 10 volume.

94
00:06:55,863 --> 00:07:01,366
And technically, you could also have by 1,
if this is the number of channels.

95
00:07:01,366 --> 00:07:06,715
So this is just a 3D volume, but
your data can also have different

96
00:07:06,715 --> 00:07:11,489
numbers of channels,
then this would be times 1 as well.

97
00:07:11,489 --> 00:07:17,472
Because the number of channels here and
the number of channels here has to match.

98
00:07:17,472 --> 00:07:22,371
And then if you have 16 filters did
a 5 x 5 x 5 x 1 then the next output

99
00:07:22,371 --> 00:07:24,790
will be a 10 x 10 x 10 x 16.

100
00:07:24,790 --> 00:07:30,129
So this could be one layer of your
ConvNet over 3D data, and if the next

101
00:07:30,129 --> 00:07:36,660
layer of the ConvNet convolves this again
with a 5 x 5 x 5 x 16 dimensional filter.

102
00:07:36,660 --> 00:07:40,666
So this number of channels has
to match data as usual, and

103
00:07:40,666 --> 00:07:46,190
if you have 32 filters then similar to
what you saw was ConvNet of the images.

104
00:07:46,190 --> 00:07:54,350
Now you'll end up with a 6 x 6
x 6 volume across 32 channels.

105
00:07:54,350 --> 00:07:57,992
So 3D data can also be learned on,

106
00:07:57,992 --> 00:08:02,020
sort of directly using
a three dimensional ConvNet.

107
00:08:02,020 --> 00:08:07,500
And what these filters do is really
detect features across your 3D data,

108
00:08:08,730 --> 00:08:13,180
CAT scans, medical scans as
one example of 3D volumes.

109
00:08:13,180 --> 00:08:18,450
But another example of data, you could
treat as a 3D volume would be movie data,

110
00:08:18,450 --> 00:08:23,410
where the different slices could be
different slices in time through a movie.

111
00:08:23,410 --> 00:08:28,171
And you could use this to detect motion or
people taking actions in movies.

112
00:08:28,171 --> 00:08:31,868
So that's it on generalization
of ConvNets from

113
00:08:31,868 --> 00:08:35,520
2D data to also 1D as well as 3D data.

114
00:08:35,520 --> 00:08:40,395
Image data is so pervasive that the vast
majority of ConvNets are on 2D data,

115
00:08:40,395 --> 00:08:45,420
on image data, but I hope that these other
models will be helpful to you as well.

116
00:08:45,420 --> 00:08:48,588
So this is it,
this is the last video of this week and

117
00:08:48,588 --> 00:08:51,570
the last video of this course on ConvNets.

118
00:08:51,570 --> 00:08:53,810
You've learned a lot about ConvNets and

119
00:08:53,810 --> 00:08:58,380
I hope you find many of these
ideas useful for your future work.

120
00:08:58,380 --> 00:09:01,600
So congratulations on
finishing these videos.

121
00:09:01,600 --> 00:09:04,150
I hope you enjoyed this
week's exercise and

122
00:09:04,150 --> 00:09:07,850
I look forward also to seeing you in
the next course on sequence models.