1
00:00:01,050 --> 00:00:03,840
The term, Deep Learning,
refers to training Neural Networks,

2
00:00:03,840 --> 00:00:06,050
sometimes very large Neural Networks.

3
00:00:06,050 --> 00:00:08,400
So what exactly is a Neural Network?

4
00:00:08,400 --> 00:00:11,340
In this video, let's try to give
you some of the basic intuitions.

5
00:00:12,850 --> 00:00:16,540
Let's start to
the Housing Price Prediction example.

6
00:00:16,540 --> 00:00:20,599
Let's say you have a data sets with six
houses, so you know the size of the houses

7
00:00:20,599 --> 00:00:24,478
in square feet or square meters and you
know the price of the house and you want

8
00:00:24,478 --> 00:00:28,501
to fit a function to predict the price
of the houses, the function of the size.

9
00:00:28,501 --> 00:00:33,509
So if you are familiar with linear
regression you might say, well let's

10
00:00:33,509 --> 00:00:38,450
put a straight line to these data so
and we get a straight line like that.

11
00:00:38,450 --> 00:00:41,850
But to be Pathans you might
say well we know that prices

12
00:00:41,850 --> 00:00:43,770
can never be negative, right.

13
00:00:43,770 --> 00:00:48,050
So instead of the straight line fit
which eventually will become negative,

14
00:00:48,050 --> 00:00:49,960
let's bend the curve here.

15
00:00:49,960 --> 00:00:51,530
So it just ends up zero here.

16
00:00:51,530 --> 00:00:56,770
So this thick blue line ends
up being your function for

17
00:00:56,770 --> 00:00:59,760
predicting the price of the house
as a function of this size.

18
00:00:59,760 --> 00:01:03,310
Whereas zero here and then there's
a straight line fit to the right.

19
00:01:04,408 --> 00:01:08,735
So you can think of this function that
you've just fit the housing prices

20
00:01:08,735 --> 00:01:11,880
as a very simple neural network.

21
00:01:11,880 --> 00:01:14,230
It's almost as simple as
possible neural network.

22
00:01:14,230 --> 00:01:15,000
Let me draw it here.

23
00:01:17,220 --> 00:01:22,170
We have as the input to the neural network
the size of a house which one we call x.

24
00:01:22,170 --> 00:01:26,791
It goes into this node,
this little circle and

25
00:01:26,791 --> 00:01:30,940
then it outputs the price which we call y.

26
00:01:30,940 --> 00:01:37,183
So this little circle, which is
a single neuron in a neural network,

27
00:01:37,183 --> 00:01:41,830
implements this function
that we drew on the left.

28
00:01:43,350 --> 00:01:48,940
And all the neuron does is it inputs
the size, computes this linear function,

29
00:01:48,940 --> 00:01:51,960
takes a max of zero, and
then outputs the estimated price.

30
00:01:53,190 --> 00:01:58,230
And by the way in the neural network
literature, you see this function a lot.

31
00:01:58,230 --> 00:02:00,992
This function which goes
to zero sometimes and

32
00:02:00,992 --> 00:02:03,550
then it'll takes of as a straight line.

33
00:02:03,550 --> 00:02:09,108
This function is called a ReLU
function which stands for

34
00:02:09,108 --> 00:02:17,620
rectified linear units.

35
00:02:17,620 --> 00:02:18,252
So R-E-L-U.
And

36
00:02:18,252 --> 00:02:22,520
rectify just means taking a max of 0 which
is why you get a function shape like this.

37
00:02:23,640 --> 00:02:25,550
You don't need to worry
about ReLU units for

38
00:02:25,550 --> 00:02:30,200
now but it's just something you
see again later in this course.

39
00:02:30,200 --> 00:02:33,790
So if this is a single neuron,
neural network,

40
00:02:33,790 --> 00:02:38,870
really a tiny little neural network,
a larger neural network

41
00:02:38,870 --> 00:02:44,520
is then formed by taking many of the
single neurons and stacking them together.

42
00:02:44,520 --> 00:02:50,700
So, if you think of this neuron that's
being like a single Lego brick, you then

43
00:02:50,700 --> 00:02:55,270
get a bigger neural network by stacking
together many of these Lego bricks.

44
00:02:55,270 --> 00:02:56,110
Let's see an example.

45
00:02:57,260 --> 00:03:02,220
Let’s say that instead of predicting
the price of a house just from the size,

46
00:03:02,220 --> 00:03:04,330
you now have other features.

47
00:03:04,330 --> 00:03:08,164
You know other things about the host,
such as the number of bedrooms,

48
00:03:08,164 --> 00:03:13,630
I should have wrote [INAUDIBLE] bedrooms,
and you might think that one of the things

49
00:03:13,630 --> 00:03:18,820
that really affects the price of
a house is family size, right?

50
00:03:18,820 --> 00:03:21,882
So can this house fit your family
of three, or family of four, or

51
00:03:21,882 --> 00:03:22,687
family of five?

52
00:03:22,687 --> 00:03:26,351
And it's really based on the size in
square feet or square meters, and

53
00:03:26,351 --> 00:03:28,960
the number of bedrooms
that determines whether or

54
00:03:28,960 --> 00:03:31,462
not a house can fit your
family's family size.

55
00:03:31,462 --> 00:03:34,909
And then maybe you know the zip codes,

56
00:03:34,909 --> 00:03:40,520
in different countries it's
called a postal code of a house.

57
00:03:40,520 --> 00:03:48,820
And the zip code maybe as a future
to tells you, walkability?

58
00:03:48,820 --> 00:03:51,434
So is this neighborhood highly walkable?

59
00:03:51,434 --> 00:03:53,635
Thing just walks the grocery store?

60
00:03:53,635 --> 00:03:54,194
Walk the school?

61
00:03:54,194 --> 00:03:55,250
Do you need to drive?

62
00:03:55,250 --> 00:03:57,870
And some people prefer highly
walkable neighborhoods.

63
00:03:57,870 --> 00:04:06,145
And then the zip code as well as
the wealth maybe tells you, right.

64
00:04:06,145 --> 00:04:09,200
Certainly in the United States but
some other countries as well.

65
00:04:09,200 --> 00:04:13,590
Tells you how good is the school quality.

66
00:04:13,590 --> 00:04:17,820
So each of these little circles I'm
drawing, can be one of those ReLU,

67
00:04:17,820 --> 00:04:22,670
rectified linear units or
some other slightly non linear function.

68
00:04:22,670 --> 00:04:24,936
So that based on the size and
number of bedrooms,

69
00:04:24,936 --> 00:04:28,420
you can estimate the family size,
their zip code, based on walkability,

70
00:04:28,420 --> 00:04:32,050
based on zip code and
wealth can estimate the school quality.

71
00:04:32,050 --> 00:04:35,660
And then finally you might think that well
the way people decide how much they're

72
00:04:35,660 --> 00:04:38,880
will to pay for a house, is they look at
the things that really matter to them.

73
00:04:38,880 --> 00:04:43,060
In this case family size,
walkability, and school quality and

74
00:04:43,060 --> 00:04:45,210
that helps you predict the price.

75
00:04:46,330 --> 00:04:51,740
So in the example x is
all of these four inputs.

76
00:04:53,470 --> 00:04:56,460
And y is the price you're
trying to predict.

77
00:04:57,960 --> 00:05:03,350
And so by stacking together a few of the
single neurons or the simple predictors

78
00:05:03,350 --> 00:05:07,360
we have from the previous slide, we now
have a slightly larger neural network.

79
00:05:07,360 --> 00:05:10,850
How you manage neural network
is that when you implement it,

80
00:05:10,850 --> 00:05:15,860
you need to give it just the input x and

81
00:05:15,860 --> 00:05:20,740
the output y for a number of
examples in your training set and

82
00:05:20,740 --> 00:05:23,580
all this things in the middle,
they will figure out by itself.

83
00:05:25,435 --> 00:05:29,225
So what you actually implement is this.

84
00:05:29,225 --> 00:05:32,055
Where, here, you have a neural
network with four inputs.

85
00:05:32,055 --> 00:05:35,455
So the input features might be the size,
number of bedrooms,

86
00:05:35,455 --> 00:05:40,365
the zip code or postal code, and
the wealth of the neighborhood.

87
00:05:40,365 --> 00:05:44,805
And so given these input features,

88
00:05:44,805 --> 00:05:50,200
the job of the neural network
will be to predict the price y.

89
00:05:50,200 --> 00:05:55,942
And notice also that each of these circle,
these are called hidden units in

90
00:05:55,942 --> 00:06:02,310
the neural network, that each of them
takes its inputs all four input features.

91
00:06:02,310 --> 00:06:08,139
So for example, rather than saying these
first nodes represent family size and

92
00:06:08,139 --> 00:06:12,056
family size depends only
on the features X1 and X2.

93
00:06:12,056 --> 00:06:15,302
Instead, we're going to say,
well neural network,

94
00:06:15,302 --> 00:06:18,200
you decide whatever you
want this known to be.

95
00:06:18,200 --> 00:06:21,070
And we'll give you all four of the
features to complete whatever you want.

96
00:06:21,070 --> 00:06:26,170
So we say that layers that
this is input layer and

97
00:06:26,170 --> 00:06:28,960
this layer in the middle of the neural
network are density connected.

98
00:06:28,960 --> 00:06:31,740
Because every input feature
is connected to every one

99
00:06:31,740 --> 00:06:33,980
of these circles in the middle.

100
00:06:33,980 --> 00:06:38,630
And the remarkable thing about neural
networks is that, given enough data about

101
00:06:38,630 --> 00:06:43,290
x and y, given enough training examples
with both x and y, neural networks

102
00:06:43,290 --> 00:06:47,450
are remarkably good at figuring out
functions that accurately map from x to y.

103
00:06:48,990 --> 00:06:51,680
So, that's a basic neural network.

104
00:06:51,680 --> 00:06:54,290
In turns out that as you build
out your own neural networks,

105
00:06:54,290 --> 00:06:57,130
you probably find them to be most useful,
most powerful

106
00:06:57,130 --> 00:07:01,620
in supervised learning incentives, meaning
that you're trying to take an input x and

107
00:07:01,620 --> 00:07:06,980
map it to some output y, like we just saw
in the housing price prediction example.

108
00:07:06,980 --> 00:07:11,490
In the next video let's go over some
more examples of supervised learning and

109
00:07:11,490 --> 00:07:15,670
some examples of where you might find your
networks to be incredibly helpful for

110
00:07:15,670 --> 00:07:16,670
your applications as well.