1
00:00:00,280 --> 00:00:01,330
In the last video, we gave

2
00:00:01,570 --> 00:00:03,540
a mathematical definition of how

3
00:00:03,700 --> 00:00:04,990
to represent or how to

4
00:00:05,090 --> 00:00:07,160
compute the hypotheses used by Neural Network.

5
00:00:08,420 --> 00:00:09,620
In this video, I like

6
00:00:09,730 --> 00:00:11,280
show you how to actually

7
00:00:11,450 --> 00:00:14,040
carry out that computation efficiently, and

8
00:00:14,710 --> 00:00:16,050
that is show you a vector rise implementation.

9
00:00:17,660 --> 00:00:18,930
And second, and more importantly, I want

10
00:00:19,100 --> 00:00:21,110
to start giving you intuition about

11
00:00:21,390 --> 00:00:22,590
why these neural network representations

12
00:00:23,360 --> 00:00:24,640
might be a good idea and how

13
00:00:25,010 --> 00:00:27,290
they can help us to learn complex nonlinear hypotheses.

14
00:00:28,970 --> 00:00:29,880
Consider this neural network.

15
00:00:30,520 --> 00:00:31,720
Previously we said that

16
00:00:32,010 --> 00:00:33,070
the sequence of steps that we

17
00:00:33,170 --> 00:00:34,090
need in order to compute

18
00:00:34,650 --> 00:00:35,850
the output of a hypotheses

19
00:00:36,320 --> 00:00:37,780
is these equations given on

20
00:00:37,950 --> 00:00:38,770
the left where we compute

21
00:00:39,540 --> 00:00:41,330
the activation values of the

22
00:00:41,450 --> 00:00:43,220
three hidden uses and then

23
00:00:43,420 --> 00:00:44,580
we use those to compute the

24
00:00:44,650 --> 00:00:45,710
final output of our hypotheses

25
00:00:46,680 --> 00:00:48,410
h of x. 
Now, I'm going

26
00:00:48,480 --> 00:00:50,200
to define a few extra terms.

27
00:00:50,570 --> 00:00:52,210
So, this term that I'm

28
00:00:52,410 --> 00:00:54,090
underlining here, I'm going to

29
00:00:54,180 --> 00:00:55,560
define that to be

30
00:00:56,230 --> 00:00:58,410
z superscript 2 subscript 1.

31
00:00:58,790 --> 00:00:59,830
So that we have that

32
00:01:00,650 --> 00:01:02,310
a(2)1, which is this

33
00:01:02,470 --> 00:01:03,930
term is equal to

34
00:01:04,170 --> 00:01:06,020
g of z to 1.

35
00:01:06,130 --> 00:01:08,100
And by the

36
00:01:08,180 --> 00:01:09,750
way, these superscript 2, you

37
00:01:10,570 --> 00:01:11,580
know, what that means is that

38
00:01:11,870 --> 00:01:12,960
the z2 and this a2

39
00:01:13,080 --> 00:01:14,140
as well, the superscript

40
00:01:14,840 --> 00:01:16,450
2 in parentheses means that these

41
00:01:16,740 --> 00:01:18,330
are values associated with layer

42
00:01:18,570 --> 00:01:19,810
2, that is with the hidden

43
00:01:20,100 --> 00:01:21,390
layer in the neural network.

44
00:01:22,820 --> 00:01:25,200
Now this term here

45
00:01:25,990 --> 00:01:27,640
I'm going to similarly define as

46
00:01:29,530 --> 00:01:30,140
z(2)2.

47
00:01:30,490 --> 00:01:31,860
And finally, this last

48
00:01:32,170 --> 00:01:33,100
term here that I'm underlining,

49
00:01:34,160 --> 00:01:37,040
let me define that as z(2)3.

50
00:01:37,090 --> 00:01:38,710
So that similarly we have a(2)3

51
00:01:38,850 --> 00:01:43,200
equals g of

52
00:01:44,990 --> 00:01:45,360
z(2)3.

53
00:01:45,480 --> 00:01:46,760
So these z values are just

54
00:01:47,290 --> 00:01:48,940
a linear combination, a weighted

55
00:01:49,360 --> 00:01:51,200
linear combination, of the

56
00:01:51,490 --> 00:01:52,800
input values x0, x1,

57
00:01:53,060 --> 00:01:55,350
x2, x3 that go into a particular neuron.

58
00:01:57,090 --> 00:01:58,260
Now if you look at

59
00:01:58,900 --> 00:02:00,470
this block of numbers,

60
00:02:01,990 --> 00:02:03,310
you may notice that that block

61
00:02:03,490 --> 00:02:05,880
of numbers corresponds suspiciously similar

62
00:02:06,950 --> 00:02:08,330
to the matrix vector

63
00:02:08,800 --> 00:02:10,260
operation, matrix vector multiplication

64
00:02:11,070 --> 00:02:12,710
of x1 times the

65
00:02:12,790 --> 00:02:14,840
vector x. Using this observation

66
00:02:15,580 --> 00:02:18,730
we're going to be able to vectorize this computation

67
00:02:19,700 --> 00:02:20,280
of the neural network.

68
00:02:21,470 --> 00:02:23,510
Concretely, let's define the

69
00:02:23,680 --> 00:02:24,810
feature vector x as usual

70
00:02:25,290 --> 00:02:27,020
to be the vector of x0, x1,

71
00:02:27,260 --> 00:02:28,550
x2, x3 where x0

72
00:02:29,010 --> 00:02:30,280
as usual is always equal

73
00:02:30,610 --> 00:02:31,860
1 and that defines

74
00:02:32,390 --> 00:02:33,420
z2 to be the vector

75
00:02:34,360 --> 00:02:37,250
of these z-values, you know, of z(2)1 z(2)2, z(2)3.

76
00:02:38,560 --> 00:02:40,210
And notice that, there, z2 this

77
00:02:40,440 --> 00:02:42,500
is a three dimensional vector.

78
00:02:43,910 --> 00:02:47,200
We can now vectorize the computation

79
00:02:48,270 --> 00:02:48,860
of a(2)1, a(2)2, a(2)3 as follows.

80
00:02:49,490 --> 00:02:50,690
We can just write this in two steps.

81
00:02:51,500 --> 00:02:53,400
We can compute z2 as theta

82
00:02:53,950 --> 00:02:55,490
1 times x and that

83
00:02:55,790 --> 00:02:57,020
would give us this vector z2;

84
00:02:57,400 --> 00:02:59,360
and then a2 is

85
00:02:59,860 --> 00:03:02,180
g of z2 and just

86
00:03:02,440 --> 00:03:03,860
to be clear z2 here, This

87
00:03:04,200 --> 00:03:05,880
is a three-dimensional vector and

88
00:03:06,060 --> 00:03:08,150
a2 is also a three-dimensional

89
00:03:08,810 --> 00:03:10,410
vector and thus this

90
00:03:10,690 --> 00:03:12,680
activation g. This applies the

91
00:03:12,950 --> 00:03:15,290
sigmoid function element-wise to each

92
00:03:15,550 --> 00:03:18,290
of the z2's elements.
And

93
00:03:18,380 --> 00:03:19,270
by the way, to make our notation

94
00:03:19,950 --> 00:03:21,260
a little more consistent with

95
00:03:21,440 --> 00:03:23,330
what we'll do later, in this

96
00:03:23,590 --> 00:03:24,600
input layer we have the

97
00:03:24,670 --> 00:03:25,840
inputs x, but we

98
00:03:25,960 --> 00:03:26,950
can also thing it is

99
00:03:27,300 --> 00:03:29,270
as in activations of the first layers.

100
00:03:29,680 --> 00:03:30,430
So, if I defined a1 to

101
00:03:30,470 --> 00:03:32,510
be equal to x. So,

102
00:03:32,660 --> 00:03:34,270
the a1 is vector, I can

103
00:03:34,500 --> 00:03:35,520
now take this x here

104
00:03:36,230 --> 00:03:38,850
and replace this with z2 equals theta1

105
00:03:39,570 --> 00:03:40,680
times a1 just by defining

106
00:03:41,410 --> 00:03:43,350
a1 to be activations in my input layer.

107
00:03:44,990 --> 00:03:46,000
Now, with what I've written

108
00:03:46,280 --> 00:03:47,500
so far I've now gotten

109
00:03:47,900 --> 00:03:49,940
myself the values for a1,

110
00:03:50,820 --> 00:03:52,690
a2, a3, and really

111
00:03:52,780 --> 00:03:53,980
I should put the

112
00:03:54,290 --> 00:03:55,600
superscripts there as well.

113
00:03:56,430 --> 00:03:57,530
But I need one more

114
00:03:57,940 --> 00:03:59,810
value, which is I also want this a(0)2

115
00:04:00,050 --> 00:04:02,050
and that corresponds to

116
00:04:02,250 --> 00:04:04,350
a bias unit in the

117
00:04:04,550 --> 00:04:06,420
hidden layer that goes to the output there.

118
00:04:06,990 --> 00:04:07,780
Of course, there was a

119
00:04:07,810 --> 00:04:08,850
bias unit here too that,

120
00:04:09,000 --> 00:04:10,060
you know, it just didn't draw

121
00:04:10,270 --> 00:04:11,820
under here but to

122
00:04:11,970 --> 00:04:13,100
take care of this extra bias unit,

123
00:04:13,870 --> 00:04:15,650
what we're going to do is add

124
00:04:16,320 --> 00:04:18,720
an extra a0 superscript 2,

125
00:04:18,890 --> 00:04:20,870
that's equal to one, and after

126
00:04:21,010 --> 00:04:21,990
taking this step we now have

127
00:04:22,290 --> 00:04:23,860
that a2 is going to

128
00:04:24,010 --> 00:04:25,390
be a four dimensional feature

129
00:04:25,690 --> 00:04:26,820
vector because we just added

130
00:04:27,300 --> 00:04:28,490
this extra, you know,

131
00:04:28,620 --> 00:04:30,260
a0 which is equal to

132
00:04:30,500 --> 00:04:31,700
1 corresponding to the bias unit

133
00:04:32,080 --> 00:04:33,550
in the hidden layer.
And finally,

134
00:04:35,080 --> 00:04:37,620
to compute the actual

135
00:04:38,070 --> 00:04:40,100
value output of our hypotheses, we

136
00:04:40,250 --> 00:04:41,190
then simply need to compute

137
00:04:42,470 --> 00:04:44,980
z3. So z3 is

138
00:04:45,350 --> 00:04:47,940
equal to this term here that I'm just underlining.

139
00:04:48,800 --> 00:04:51,450
This inner term there is z3.

140
00:04:53,980 --> 00:04:55,160
And z3 is stated

141
00:04:55,500 --> 00:04:57,120
2 times a2 and finally

142
00:04:57,810 --> 00:04:59,560
my hypotheses output h of x which

143
00:04:59,750 --> 00:05:01,210
is a3 that is

144
00:05:01,360 --> 00:05:03,910
the activation of my

145
00:05:04,750 --> 00:05:06,040
one and only unit in

146
00:05:06,290 --> 00:05:09,500
the output layer. So, that's just the real number. You can write it as a3

147
00:05:10,050 --> 00:05:12,390
or as a(3)1 and that's g of z3.

148
00:05:13,240 --> 00:05:15,020
This process of computing h of x

149
00:05:15,940 --> 00:05:18,110
is also called forward propagation

150
00:05:19,130 --> 00:05:20,440
and is called that because we

151
00:05:20,550 --> 00:05:21,310
start of with the activations

152
00:05:22,010 --> 00:05:24,400
of the input-units and then

153
00:05:24,940 --> 00:05:26,770
we sort of forward-propagate that to the

154
00:05:26,860 --> 00:05:29,390
hidden layer and compute the activations of the

155
00:05:29,580 --> 00:05:30,400
hidden layer and then we

156
00:05:30,540 --> 00:05:32,040
sort of forward propagate that

157
00:05:32,760 --> 00:05:36,270
and compute the activations of

158
00:05:37,480 --> 00:05:39,170
the output layer, but this process of computing the activations from the input then

159
00:05:39,290 --> 00:05:40,400
the hidden then the output layer,

160
00:05:40,940 --> 00:05:42,030
and that's also called forward propagation

161
00:05:43,320 --> 00:05:44,150
and what we just did is

162
00:05:44,310 --> 00:05:45,370
we just worked out a vector

163
00:05:45,740 --> 00:05:47,140
wise implementation of this

164
00:05:47,280 --> 00:05:48,890
procedure.
So, if you

165
00:05:48,970 --> 00:05:50,260
implement it using these equations

166
00:05:50,800 --> 00:05:51,740
that we have on the right, these

167
00:05:51,850 --> 00:05:53,280
would give you an efficient way

168
00:05:53,460 --> 00:05:54,980
or both of the efficient way of

169
00:05:55,120 --> 00:05:56,130
computing h of x.

170
00:05:58,250 --> 00:05:59,860
This forward propagation view also

171
00:06:00,860 --> 00:06:02,270
helps us to understand what

172
00:06:02,550 --> 00:06:03,640
Neural Networks might be doing

173
00:06:04,110 --> 00:06:05,290
and why they might help us to

174
00:06:05,510 --> 00:06:07,170
learn interesting nonlinear hypotheses.

175
00:06:08,670 --> 00:06:09,760
Consider the following neural network

176
00:06:10,500 --> 00:06:11,820
and let's say I cover up

177
00:06:12,040 --> 00:06:13,810
the left path of this picture for now.

178
00:06:14,650 --> 00:06:16,170
If you look at what's left in this picture.

179
00:06:17,030 --> 00:06:18,020
This looks a lot like

180
00:06:18,260 --> 00:06:19,520
logistic regression where what

181
00:06:19,660 --> 00:06:20,570
we're doing is we're using

182
00:06:20,990 --> 00:06:22,000
that note, that's just the

183
00:06:22,130 --> 00:06:23,770
logistic regression unit and we're

184
00:06:24,120 --> 00:06:26,060
using that to make a

185
00:06:26,380 --> 00:06:28,290
prediction h of x. 
And concretely,

186
00:06:28,440 --> 00:06:30,340
what the hypotheses is outputting

187
00:06:30,710 --> 00:06:31,830
is h of x is going

188
00:06:31,890 --> 00:06:33,760
to be equal to g which

189
00:06:33,980 --> 00:06:38,110
is my sigmoid activation function times theta 0

190
00:06:38,560 --> 00:06:40,450
times a0 is equal

191
00:06:41,270 --> 00:06:43,380
to 1 plus theta 1

192
00:06:45,220 --> 00:06:49,080
plus theta 2

193
00:06:49,260 --> 00:06:52,090
times a2 plus theta

194
00:06:52,830 --> 00:06:55,180
3 times a3 whether

195
00:06:55,370 --> 00:06:56,910
values a1, a2, a3

196
00:06:57,050 --> 00:06:59,860
are those given by these three given units.

197
00:07:01,060 --> 00:07:02,790
Now, to be actually consistent

198
00:07:03,490 --> 00:07:05,000
to my early notation. Actually, we

199
00:07:05,170 --> 00:07:06,360
need to, you know, fill in

200
00:07:06,470 --> 00:07:10,700
these superscript 2's here everywhere

201
00:07:12,260 --> 00:07:13,920
and I also have these

202
00:07:14,160 --> 00:07:16,800
indices 1 there because I

203
00:07:16,930 --> 00:07:20,610
have only one output unit, but if you focus on the blue parts of the notation.

204
00:07:20,930 --> 00:07:21,900
This is, you know, this looks

205
00:07:22,150 --> 00:07:23,680
awfully like the standard logistic

206
00:07:23,870 --> 00:07:25,530
regression model, except that

207
00:07:25,600 --> 00:07:28,060
I now have a capital theta instead of lower case theta.

208
00:07:29,170 --> 00:07:30,690
And what this is

209
00:07:30,850 --> 00:07:32,520
doing is just logistic regression.

210
00:07:33,660 --> 00:07:35,240
But where the features fed into

211
00:07:35,590 --> 00:07:37,250
logistic regression are these

212
00:07:38,200 --> 00:07:40,170
values computed by the hidden layer.

213
00:07:41,340 --> 00:07:42,690
Just to say that again, what

214
00:07:42,910 --> 00:07:44,420
this neural network is doing is

215
00:07:45,130 --> 00:07:47,050
just like logistic regression, except

216
00:07:47,440 --> 00:07:48,900
that rather than using the

217
00:07:49,110 --> 00:07:50,770
original features x1, x2, x3,

218
00:07:52,400 --> 00:07:54,260
is using these new features a1, a2, a3.

219
00:07:54,440 --> 00:07:56,810
Again, we'll put the superscripts

220
00:07:58,130 --> 00:08:00,380
there, you know, to be consistent with the notation.

221
00:08:02,820 --> 00:08:04,610
And the cool thing about this,

222
00:08:05,040 --> 00:08:06,220
is that the features a1, a2,

223
00:08:06,720 --> 00:08:08,310
a3, they themselves are learned

224
00:08:08,760 --> 00:08:09,930
as functions of the input.

225
00:08:10,960 --> 00:08:12,640
Concretely, the function mapping from

226
00:08:13,320 --> 00:08:14,540
layer 1 to layer 2,

227
00:08:14,810 --> 00:08:16,390
that is determined by some

228
00:08:16,750 --> 00:08:18,550
other set of parameters, theta 1.

229
00:08:19,380 --> 00:08:20,210
So it's as if the

230
00:08:20,270 --> 00:08:22,030
neural network, instead of being

231
00:08:22,240 --> 00:08:24,050
constrained to feed the

232
00:08:24,120 --> 00:08:25,760
features x1, x2, x3 to logistic regression.

233
00:08:26,210 --> 00:08:27,440
It gets to

234
00:08:27,720 --> 00:08:29,320
learn its own features, a1,

235
00:08:29,810 --> 00:08:32,010
a2, a3, to feed into the

236
00:08:32,130 --> 00:08:33,950
logistic regression and as

237
00:08:34,650 --> 00:08:36,270
you can imagine depending on

238
00:08:36,360 --> 00:08:37,690
what parameters it chooses for

239
00:08:37,900 --> 00:08:39,880
theta 1. 
You can learn some pretty interesting

240
00:08:40,390 --> 00:08:42,460
and complex features and therefore

241
00:08:43,780 --> 00:08:44,830
you can end up with a

242
00:08:45,050 --> 00:08:46,650
better hypotheses than if

243
00:08:46,840 --> 00:08:47,870
you were constrained to use

244
00:08:48,020 --> 00:08:50,520
the raw features x1, x2 or x3 or if

245
00:08:50,640 --> 00:08:52,530
you will constrain to say choose the

246
00:08:52,620 --> 00:08:53,730
polynomial terms, you know,

247
00:08:53,920 --> 00:08:55,550
x1, x2, x3, and so on.

248
00:08:55,790 --> 00:08:57,250
But instead, this algorithm has

249
00:08:57,530 --> 00:08:59,130
the flexibility to try

250
00:08:59,420 --> 00:09:01,990
to learn whatever features at once, using

251
00:09:02,680 --> 00:09:03,990
these a1, a2, a3 in

252
00:09:04,110 --> 00:09:05,190
order to feed into this

253
00:09:05,510 --> 00:09:07,830
last unit that's essentially

254
00:09:09,240 --> 00:09:11,920
a logistic regression here. 
I realized

255
00:09:12,550 --> 00:09:13,970
this example is described as

256
00:09:14,060 --> 00:09:15,500
a somewhat high level and so

257
00:09:15,750 --> 00:09:16,520
I'm not sure if this intuition

258
00:09:17,440 --> 00:09:18,870
of the neural network, you know, having

259
00:09:19,720 --> 00:09:21,420
more complex features will quite

260
00:09:21,630 --> 00:09:23,120
make sense yet, but if

261
00:09:23,210 --> 00:09:24,440
it doesn't yet in the next

262
00:09:24,810 --> 00:09:25,860
two videos I'm going to

263
00:09:25,970 --> 00:09:27,300
go through a specific example

264
00:09:28,250 --> 00:09:29,590
of how a neural network can

265
00:09:29,830 --> 00:09:30,860
use this hidden there to compute

266
00:09:31,250 --> 00:09:32,880
more complex features to feed

267
00:09:33,130 --> 00:09:34,520
into this final output layer

268
00:09:35,060 --> 00:09:37,100
and how that can learn more complex hypotheses.

269
00:09:37,920 --> 00:09:39,120
So, in case what I'm

270
00:09:39,180 --> 00:09:40,090
saying here doesn't quite make

271
00:09:40,230 --> 00:09:41,650
sense, stick with me

272
00:09:41,810 --> 00:09:42,960
for the next two videos and

273
00:09:43,190 --> 00:09:44,370
hopefully out there working through

274
00:09:44,580 --> 00:09:46,690
those examples this explanation will

275
00:09:47,030 --> 00:09:48,640
make a little bit more sense.

276
00:09:49,020 --> 00:09:49,740
But just the point O. You

277
00:09:49,820 --> 00:09:51,120
can have neural networks with

278
00:09:51,470 --> 00:09:52,990
other types of diagrams as

279
00:09:53,080 --> 00:09:54,270
well, and the way that

280
00:09:54,450 --> 00:09:58,000
neural networks are connected, that's called the architecture.

281
00:09:58,390 --> 00:10:00,150
So the term architecture refers to

282
00:10:00,490 --> 00:10:02,380
how the different neurons are connected to each other.

283
00:10:03,220 --> 00:10:04,180
This is an example

284
00:10:04,840 --> 00:10:06,300
of a different neural network architecture

285
00:10:07,480 --> 00:10:08,750
and once again you may

286
00:10:09,260 --> 00:10:10,770
be able to get this intuition of

287
00:10:10,940 --> 00:10:12,180
how the second layer,

288
00:10:12,900 --> 00:10:14,120
here we have three heading units

289
00:10:14,910 --> 00:10:16,200
that are computing some complex

290
00:10:16,660 --> 00:10:17,900
function maybe of the

291
00:10:17,990 --> 00:10:19,530
input layer, and then the

292
00:10:19,730 --> 00:10:20,750
third layer can take the

293
00:10:20,840 --> 00:10:22,260
second layer's features and compute

294
00:10:22,550 --> 00:10:24,070
even more complex features in layer three

295
00:10:24,980 --> 00:10:25,880
so that by the time you get

296
00:10:25,960 --> 00:10:27,160
to the output layer, layer four,

297
00:10:27,900 --> 00:10:29,130
you can have even more

298
00:10:29,370 --> 00:10:30,690
complex features of what

299
00:10:30,860 --> 00:10:32,040
you are able to compute in

300
00:10:32,280 --> 00:10:34,710
layer three and so get very interesting nonlinear hypotheses.

301
00:10:36,730 --> 00:10:37,580
By the way, in a network

302
00:10:37,810 --> 00:10:38,980
like this, layer one, this is

303
00:10:39,130 --> 00:10:40,670
called an input layer. Layer four

304
00:10:41,360 --> 00:10:43,170
is still our output layer, and

305
00:10:43,340 --> 00:10:45,040
this network has two hidden layers.

306
00:10:46,000 --> 00:10:47,440
So anything that's not an

307
00:10:48,000 --> 00:10:49,020
input layer or an output

308
00:10:49,340 --> 00:10:50,590
layer is called a hidden layer.

309
00:10:53,390 --> 00:10:54,470
So, hopefully from this video

310
00:10:54,760 --> 00:10:55,840
you've gotten a sense of

311
00:10:56,140 --> 00:10:58,360
how the feed forward propagation step

312
00:10:58,830 --> 00:11:00,230
in a neural network works where you

313
00:11:00,390 --> 00:11:01,670
start from the activations of

314
00:11:01,720 --> 00:11:03,150
the input layer and forward

315
00:11:03,450 --> 00:11:04,480
propagate that to the

316
00:11:04,570 --> 00:11:05,560
first hidden layer, then the second

317
00:11:06,070 --> 00:11:08,200
hidden layer, and then finally the output layer.

318
00:11:08,990 --> 00:11:10,250
And you also saw how

319
00:11:10,560 --> 00:11:12,010
we can vectorize that computation.

320
00:11:13,660 --> 00:11:14,830
In the next, I realized

321
00:11:15,240 --> 00:11:16,680
that some of the intuitions in this

322
00:11:16,850 --> 00:11:19,220
video of how, you know, other certain

323
00:11:19,550 --> 00:11:22,570
layers are computing complex features of the early layers.

324
00:11:22,910 --> 00:11:23,540
I realized some of that intuition

325
00:11:24,190 --> 00:11:26,660
may be still slightly abstract and kind of a high level.

326
00:11:27,450 --> 00:11:28,240
And so what I would like

327
00:11:28,350 --> 00:11:29,460
to do in the two videos

328
00:11:30,210 --> 00:11:31,540
is work through a detailed example

329
00:11:32,510 --> 00:11:33,810
of how a neural network can

330
00:11:33,960 --> 00:11:35,740
be used to compute nonlinear

331
00:11:36,710 --> 00:11:38,030
functions of the input and

332
00:11:38,330 --> 00:11:39,450
hope that will give you a

333
00:11:39,540 --> 00:11:40,860
good sense of the sorts of

334
00:11:41,010 --> 00:11:44,630
complex nonlinear hypotheses we can get out of Neural Networks.