1
00:00:00,280 --> 00:00:04,479
In this video, I'd like to tell you about the idea of vectorization.

2
00:00:04,480 --> 00:00:06,471
So, whether you're using Octave

3
00:00:06,471 --> 00:00:08,277
or a similar language like MATLAB

4
00:00:08,277 --> 00:00:09,604
or whether you're using Python

5
00:00:09,604 --> 00:00:12,520
and NumPy or Java CC++.

6
00:00:12,520 --> 00:00:14,850
All of these languages have either

7
00:00:14,850 --> 00:00:16,708
built into them or have

8
00:00:16,720 --> 00:00:19,439
readily and easily accessible, different

9
00:00:19,439 --> 00:00:21,806
numerical linear algebra libraries.

10
00:00:21,820 --> 00:00:23,335
They're usually very well written,

11
00:00:23,335 --> 00:00:25,695
highly optimized, often so that developed by

12
00:00:25,695 --> 00:00:29,181
people that, you know, have PhDs in numerical computing or

13
00:00:29,181 --> 00:00:32,075
they are really specializing numerical computing.

14
00:00:32,075 --> 00:00:33,944
And when you're implementing machine

15
00:00:33,960 --> 00:00:35,904
learning algorithms, if you're able

16
00:00:35,930 --> 00:00:37,797
to take advantage of these

17
00:00:37,810 --> 00:00:39,296
linear algebra libraries or these

18
00:00:39,310 --> 00:00:41,600
numerical linear algebra libraries and

19
00:00:41,620 --> 00:00:43,387
mix the routine calls to them

20
00:00:43,387 --> 00:00:45,172
rather than sort of right call

21
00:00:45,180 --> 00:00:48,029
yourself to do things that these libraries could be doing.

22
00:00:48,040 --> 00:00:49,612
If you do that then

23
00:00:49,612 --> 00:00:51,872
often you get that "first is more efficient".

24
00:00:51,880 --> 00:00:53,179
So, just run more quickly and

25
00:00:53,179 --> 00:00:54,891
take better advantage of

26
00:00:54,891 --> 00:00:56,631
any parallel hardware your computer

27
00:00:56,631 --> 00:00:58,254
may have and so on.

28
00:00:58,270 --> 00:01:00,533
And second, it also means

29
00:01:00,540 --> 00:01:03,075
that you end up with less code that you need to write.

30
00:01:03,075 --> 00:01:04,962
So have a simpler implementation

31
00:01:04,962 --> 00:01:08,532
that is, therefore, maybe also more likely to be bug free.

32
00:01:08,550 --> 00:01:10,534
And as a concrete example.

33
00:01:10,570 --> 00:01:12,726
Rather than writing code

34
00:01:12,726 --> 00:01:15,061
yourself to multiply matrices, if

35
00:01:15,061 --> 00:01:16,300
you let Octave do it by

36
00:01:16,300 --> 00:01:18,145
typing a times b,

37
00:01:18,145 --> 00:01:19,833
that will use a very efficient

38
00:01:19,833 --> 00:01:22,318
routine to multiply the 2 matrices.

39
00:01:22,340 --> 00:01:23,985
And there's a bunch of examples like

40
00:01:24,010 --> 00:01:27,220
these where you use appropriate vectorized implementations.

41
00:01:27,220 --> 00:01:30,062
You get much simpler code, and much more efficient code.

42
00:01:30,280 --> 00:01:33,071
Let's look at some examples.

43
00:01:33,071 --> 00:01:34,937
Here's a usual hypothesis of linear

44
00:01:34,937 --> 00:01:36,415
regression and if you

45
00:01:36,415 --> 00:01:37,348
want to compute H of

46
00:01:37,348 --> 00:01:40,032
X, notice that there is a sum on the right.

47
00:01:40,032 --> 00:01:41,130
And so one thing you could

48
00:01:41,130 --> 00:01:42,775
do is compute the sum

49
00:01:42,775 --> 00:01:46,611
from J equals 0 to J equals N yourself.

50
00:01:46,620 --> 00:01:48,000
Another way to think of this

51
00:01:48,000 --> 00:01:49,210
is to think of h

52
00:01:49,210 --> 00:01:52,029
of x as theta transpose x

53
00:01:52,029 --> 00:01:53,262
and what you can do is

54
00:01:53,262 --> 00:01:55,654
think of this as you know, computing this

55
00:01:55,660 --> 00:01:57,823
in a product between 2 vectors

56
00:01:57,840 --> 00:02:00,135
where theta is, you know, your

57
00:02:00,135 --> 00:02:01,784
vector say theta 0, theta 1,

58
00:02:01,800 --> 00:02:04,812
theta 2 if you have 2 features.

59
00:02:04,812 --> 00:02:06,410
If n equals 2 and if

60
00:02:06,450 --> 00:02:08,133
you think of x as this

61
00:02:08,133 --> 00:02:11,810
vector, x0, x1, x2

62
00:02:11,884 --> 00:02:13,952
and these 2 views can

63
00:02:13,952 --> 00:02:17,539
give you 2 different implementations.

64
00:02:17,560 --> 00:02:18,909
Here's what I mean.

65
00:02:18,909 --> 00:02:21,012
Here's an unvectorized implementation for

66
00:02:21,040 --> 00:02:22,454
how to compute h of

67
00:02:22,454 --> 00:02:26,120
x and by unvectorized I mean, without vectorization.

68
00:02:26,130 --> 00:02:29,479
We might first initialize, you know, prediction to be 0.0.

69
00:02:29,479 --> 00:02:32,383
This is going to eventually, the

70
00:02:32,383 --> 00:02:34,287
prediction is going to be

71
00:02:34,300 --> 00:02:36,090
h of x and then

72
00:02:36,090 --> 00:02:37,258
I'm going to have a for loop for

73
00:02:37,270 --> 00:02:38,354
j equals one through n+1

74
00:02:38,354 --> 00:02:40,792
prediction gets incremented by

75
00:02:40,792 --> 00:02:41,822
theta j times xj.

76
00:02:41,822 --> 00:02:44,737
So, it's kind of this expression over here.

77
00:02:44,737 --> 00:02:47,223
By the way, I should mention in these

78
00:02:47,223 --> 00:02:48,894
vectors right over here, I

79
00:02:48,900 --> 00:02:51,102
had these vectors being 0 index.

80
00:02:51,110 --> 00:02:52,600
So, I had theta 0 theta 1,

81
00:02:52,600 --> 00:02:54,390
theta 2, but because MATLAB

82
00:02:54,390 --> 00:02:56,713
is one index, theta 0

83
00:02:56,713 --> 00:02:58,019
in MATLAB, we might

84
00:02:58,019 --> 00:03:00,204
end up representing as theta

85
00:03:00,204 --> 00:03:02,042
1 and this second element

86
00:03:02,042 --> 00:03:04,392
ends up as theta

87
00:03:04,392 --> 00:03:05,862
2 and this third element

88
00:03:05,880 --> 00:03:08,002
may end up as theta

89
00:03:08,002 --> 00:03:09,952
3 just because vectors in

90
00:03:09,960 --> 00:03:11,998
MATLAB are indexed starting

91
00:03:11,998 --> 00:03:13,525
from 1 even though our real

92
00:03:13,525 --> 00:03:15,436
theta and x here starting,

93
00:03:15,450 --> 00:03:17,002
indexing from 0, which

94
00:03:17,002 --> 00:03:18,785
is why here I have a for loop

95
00:03:18,785 --> 00:03:20,498
j goes from 1 through n+1

96
00:03:20,498 --> 00:03:22,225
rather than j go through

97
00:03:22,225 --> 00:03:26,243
0 up to n, right? But

98
00:03:26,300 --> 00:03:27,870
so, this is an

99
00:03:27,870 --> 00:03:29,571
unvectorized implementation in that we

100
00:03:29,571 --> 00:03:31,373
have a for loop that summing up

101
00:03:31,373 --> 00:03:34,018
the n elements of the sum.

102
00:03:34,050 --> 00:03:35,646
In contrast, here's how you

103
00:03:35,646 --> 00:03:38,400
write a vectorized implementation which

104
00:03:38,410 --> 00:03:39,959
is that you would think

105
00:03:39,959 --> 00:03:42,618
of x and theta

106
00:03:42,618 --> 00:03:43,955
as vectors, and you just set

107
00:03:43,955 --> 00:03:46,039
prediction equals theta transpose

108
00:03:46,039 --> 00:03:48,347
times x. You're just computing like so.

109
00:03:48,360 --> 00:03:51,011
Instead of writing all these

110
00:03:51,011 --> 00:03:52,966
lines of code with the for loop,

111
00:03:52,966 --> 00:03:54,242
you instead have one line

112
00:03:54,242 --> 00:03:56,648
of code and what this

113
00:03:56,648 --> 00:03:57,555
line of code on the right

114
00:03:57,555 --> 00:03:59,237
will do is it use

115
00:03:59,237 --> 00:04:01,829
Octaves highly optimized numerical

116
00:04:01,840 --> 00:04:03,859
linear algebra routines to compute

117
00:04:03,859 --> 00:04:06,245
this inner product between the

118
00:04:06,245 --> 00:04:08,186
two vectors, theta and X.
And not

119
00:04:08,190 --> 00:04:10,182
only is the vectorized implementation

120
00:04:10,182 --> 00:04:14,664
simpler, it will also run more efficiently.

121
00:04:15,820 --> 00:04:17,792
So, that was Octave, but

122
00:04:17,792 --> 00:04:19,912
issue of vectorization applies to

123
00:04:19,920 --> 00:04:22,020
other programming languages as well.

124
00:04:22,040 --> 00:04:24,947
Let's look at an example in C++.

125
00:04:24,947 --> 00:04:27,965
Here's what an unvectorized implementation might look like.

126
00:04:27,965 --> 00:04:31,395
We again initialize prediction, you know, to

127
00:04:31,395 --> 00:04:32,518
0.0 and then we now have a full

128
00:04:32,518 --> 00:04:34,508
loop for J0 up to

129
00:04:34,508 --> 00:04:36,819
n.  Prediction + equals

130
00:04:36,830 --> 00:04:38,546
theta j times x j where

131
00:04:38,560 --> 00:04:42,777
again, you have this x + for loop that you write yourself.

132
00:04:42,777 --> 00:04:44,843
In contrast, using a good

133
00:04:44,850 --> 00:04:46,498
numerical linear algebra library in

134
00:04:46,498 --> 00:04:48,965
C++, you could use

135
00:04:48,990 --> 00:04:54,440
write the function like or rather.

136
00:04:54,560 --> 00:04:56,533
In contrast, using a good

137
00:04:56,533 --> 00:04:58,152
numerical linear algebra library in

138
00:04:58,152 --> 00:05:00,686
C++, you can instead

139
00:05:00,686 --> 00:05:02,470
write code that might look like this.

140
00:05:02,470 --> 00:05:03,985
So, depending on the details

141
00:05:03,985 --> 00:05:05,595
of your numerical linear algebra

142
00:05:05,595 --> 00:05:06,790
library, you might be

143
00:05:06,830 --> 00:05:08,580
able to have an object that

144
00:05:08,580 --> 00:05:09,918
is a C++ object which is

145
00:05:09,918 --> 00:05:11,328
vector theta and a C++

146
00:05:11,350 --> 00:05:13,436
object which is a vector X,

147
00:05:13,436 --> 00:05:15,552
and you just take theta dot

148
00:05:15,552 --> 00:05:18,115
transpose times x where

149
00:05:18,120 --> 00:05:20,092
this times becomes C++ to

150
00:05:20,092 --> 00:05:22,028
overload the operator so

151
00:05:22,028 --> 00:05:26,156
that you can just multiply these two vectors in C++.

152
00:05:26,156 --> 00:05:28,091
And depending on, you know,  the details

153
00:05:28,110 --> 00:05:29,515
of your numerical and linear algebra

154
00:05:29,515 --> 00:05:30,855
library, you might end

155
00:05:30,855 --> 00:05:31,894
up using a slightly different and

156
00:05:31,894 --> 00:05:33,636
syntax, but by relying

157
00:05:33,636 --> 00:05:35,758
on a library to do this in a product.

158
00:05:35,760 --> 00:05:37,064
You can get a much simpler piece

159
00:05:37,064 --> 00:05:40,623
of code and a much more efficient one.

160
00:05:40,623 --> 00:05:43,582
Let's now look at a more sophisticated example.

161
00:05:43,582 --> 00:05:45,015
Just to remind you here's our

162
00:05:45,015 --> 00:05:46,792
update rule for gradient descent

163
00:05:46,792 --> 00:05:48,794
for linear regression and so,

164
00:05:48,794 --> 00:05:50,488
we update theta j using this

165
00:05:50,488 --> 00:05:53,672
rule for all values of J equals 0, 1, 2, and so on.

166
00:05:53,672 --> 00:05:56,259
And if I just write

167
00:05:56,260 --> 00:05:58,206
out these equations for

168
00:05:58,206 --> 00:06:00,048
theta 0 Theta one, theta two.

169
00:06:00,048 --> 00:06:02,173
Assuming we have two features.

170
00:06:02,173 --> 00:06:03,469
So N equals 2.

171
00:06:03,469 --> 00:06:04,607
Then these are the updates we

172
00:06:04,610 --> 00:06:07,388
perform to theta zero, theta one, theta two.

173
00:06:07,410 --> 00:06:08,982
where you might remember my

174
00:06:08,982 --> 00:06:10,825
saying in an earlier video

175
00:06:10,825 --> 00:06:14,783
that these should be simultaneous updates.

176
00:06:14,783 --> 00:06:16,268
So let's see if

177
00:06:16,268 --> 00:06:17,725
we can come up with a

178
00:06:17,725 --> 00:06:20,723
vectorized implementation of this.

179
00:06:20,740 --> 00:06:22,598
Here are my same 3 equations written

180
00:06:22,598 --> 00:06:24,182
on a slightly smaller font and you

181
00:06:24,182 --> 00:06:25,517
can imagine that 1 wait

182
00:06:25,520 --> 00:06:26,716
to implement this three lines

183
00:06:26,720 --> 00:06:27,798
of code is to have a

184
00:06:27,798 --> 00:06:28,968
for loop that says, you

185
00:06:28,968 --> 00:06:31,682
know, for j equals 0,

186
00:06:31,682 --> 00:06:33,305
1 through 2 the update

187
00:06:33,305 --> 00:06:35,603
theta J or something like that.

188
00:06:35,603 --> 00:06:36,760
But instead, let's come up

189
00:06:36,760 --> 00:06:40,975
with a vectorized implementation and see if we can have a simpler way.

190
00:06:40,975 --> 00:06:42,711
So, basically compress these three

191
00:06:42,757 --> 00:06:44,314
lines of code or a

192
00:06:44,314 --> 00:06:48,518
for loop that, you know, effectively does these 3 sets, 1 set at a time.

193
00:06:48,518 --> 00:06:49,688
Let's see who can these 3

194
00:06:49,688 --> 00:06:51,402
steps and compress them into

195
00:06:51,402 --> 00:06:53,972
1 line of vectorized code.

196
00:06:53,976 --> 00:06:55,476
Here's the idea.

197
00:06:55,480 --> 00:06:56,462
What I'm going to do is I'm

198
00:06:56,462 --> 00:06:59,131
going to think of theta

199
00:06:59,131 --> 00:07:00,633
as a vector and I'm

200
00:07:00,633 --> 00:07:04,214
going to update theta as theta

201
00:07:04,270 --> 00:07:07,468
minus alpha times some

202
00:07:07,468 --> 00:07:11,650
other vector, delta, where

203
00:07:11,650 --> 00:07:13,689
delta is going to be

204
00:07:13,700 --> 00:07:15,876
equal to 1 over

205
00:07:15,876 --> 00:07:18,408
m, sum from I equals

206
00:07:18,450 --> 00:07:22,151
one through m and then

207
00:07:22,180 --> 00:07:25,570
this term on the

208
00:07:25,720 --> 00:07:28,118
right, okay?

209
00:07:28,118 --> 00:07:31,205
So, let me explain what's going on here.

210
00:07:31,220 --> 00:07:32,666
Here, I'm going to treat

211
00:07:32,666 --> 00:07:35,322
theta as a vector

212
00:07:35,350 --> 00:07:38,106
so, there's an N+1 dimensional vector.

213
00:07:38,110 --> 00:07:40,291
I'm saying that theta gets, you know, updated

214
00:07:40,310 --> 00:07:43,922
as--that's the vector, our N+1.

215
00:07:43,922 --> 00:07:45,319
Alpha is a real

216
00:07:45,319 --> 00:07:47,395
number and delta

217
00:07:47,410 --> 00:07:49,941
here is a vector.

218
00:07:49,960 --> 00:07:54,278
So, this subtraction operation, that's a vector subtraction.

219
00:07:54,278 --> 00:07:55,255
Okay?

220
00:07:55,255 --> 00:07:56,977
Because alpha times delta

221
00:07:56,977 --> 00:07:58,385
is a vector and so

222
00:07:58,385 --> 00:08:00,369
I'm saying if theta gets, you know, this

223
00:08:00,369 --> 00:08:04,217
vector, alpha times delta subtracted from it.

224
00:08:04,240 --> 00:08:06,563
So, what is the vector delta?

225
00:08:06,563 --> 00:08:10,220
Well, this vector delta looks like this.

226
00:08:10,256 --> 00:08:12,092
And what this meant to

227
00:08:12,092 --> 00:08:14,595
be is really meant to be

228
00:08:14,620 --> 00:08:17,102
this thing over here.

229
00:08:17,140 --> 00:08:19,200
Concretely, delta will be

230
00:08:19,220 --> 00:08:22,165
a N+1 dimensional vector and

231
00:08:22,165 --> 00:08:23,978
the very first element of

232
00:08:23,978 --> 00:08:27,767
the vector delta is going to be equal to that.

233
00:08:27,770 --> 00:08:29,513
So, if we have

234
00:08:29,513 --> 00:08:31,565
the delta, you know, if we index it

235
00:08:31,565 --> 00:08:34,469
from 0--this is delta 0, delta 1, delta 2.

236
00:08:34,469 --> 00:08:36,541
What I want is that

237
00:08:36,560 --> 00:08:39,033
delta 0 is equal

238
00:08:39,040 --> 00:08:41,267
to, you know, this

239
00:08:41,267 --> 00:08:42,359
first box also green up

240
00:08:42,360 --> 00:08:45,306
above and indeed, you might

241
00:08:45,306 --> 00:08:47,108
be able to convince yourself that delta

242
00:08:47,108 --> 00:08:48,681
0 is this 1 of m,

243
00:08:48,681 --> 00:08:50,102
sum of, you know, h of

244
00:08:50,102 --> 00:08:53,356
x.   xi minus

245
00:08:53,400 --> 00:08:58,315
yi times xi0.

246
00:08:58,315 --> 00:08:59,748
So, let's just make

247
00:08:59,748 --> 00:09:01,064
sure that we're on the

248
00:09:01,064 --> 00:09:03,998
same page about how delta really is computed.

249
00:09:03,998 --> 00:09:05,488
Delta is one of m

250
00:09:05,488 --> 00:09:08,284
times the sum over here

251
00:09:08,284 --> 00:09:09,871
and, you know, what is this sum?

252
00:09:09,871 --> 00:09:11,426
Well, this term over

253
00:09:11,426 --> 00:09:17,115
here, that's a real number.

254
00:09:17,150 --> 00:09:21,219
And the second term over here, xi.

255
00:09:21,219 --> 00:09:23,892
This term over there is a

256
00:09:23,910 --> 00:09:26,109
vector, right? Because xi might

257
00:09:26,109 --> 00:09:26,982
be a vector.

258
00:09:26,990 --> 00:09:29,630
That would be

259
00:09:29,975 --> 00:09:36,115
xi0, xi1, xi2 right?

260
00:09:36,130 --> 00:09:38,246
And what is the summation?

261
00:09:38,246 --> 00:09:40,241
Well, what does summation say

262
00:09:40,250 --> 00:09:43,292
is that this term

263
00:09:43,502 --> 00:09:46,555
over here.

264
00:09:47,280 --> 00:09:54,801
This is equal to h+x1-y1 times

265
00:09:54,870 --> 00:09:59,099
x1 + h of

266
00:09:59,115 --> 00:10:02,778
x2-y2 times x2

267
00:10:02,778 --> 00:10:05,396
+ you know, and so on.

268
00:10:05,396 --> 00:10:06,404
Okay?

269
00:10:06,404 --> 00:10:07,420
Because this is a summation of

270
00:10:07,420 --> 00:10:09,013
the I. So, as I

271
00:10:09,013 --> 00:10:11,345
ranges from I1 through m,

272
00:10:11,345 --> 00:10:15,144
you get these different terms and you're summing up these terms.

273
00:10:15,160 --> 00:10:16,221
And the meaning of each of these

274
00:10:16,221 --> 00:10:18,262
terms is a lot like

275
00:10:18,262 --> 00:10:19,807
- if you remember actually from

276
00:10:19,807 --> 00:10:24,100
the earlier quiz in this, if you solve this equation.

277
00:10:24,110 --> 00:10:25,560
We said that in order to

278
00:10:25,560 --> 00:10:27,250
vectorize this code, we

279
00:10:27,250 --> 00:10:30,755
will instead set u2v+5w. So,

280
00:10:30,770 --> 00:10:32,391
we're saying that the vector u

281
00:10:32,391 --> 00:10:33,706
is equal to 2 times

282
00:10:33,706 --> 00:10:35,568
the vector v plus 5 times

283
00:10:35,570 --> 00:10:37,198
the vector w. So, just an

284
00:10:37,198 --> 00:10:39,023
example of how to

285
00:10:39,023 --> 00:10:42,453
add different vectors and this summation is the same thing.

286
00:10:42,453 --> 00:10:44,919
It's a saying that this

287
00:10:44,950 --> 00:10:49,766
summation over here is just some real number right?

288
00:10:49,840 --> 00:10:50,996
That's kind of like the number

289
00:10:51,010 --> 00:10:52,698
2 and some other number

290
00:10:52,711 --> 00:10:54,085
times the vector x1.

291
00:10:54,085 --> 00:10:56,792
This is like 2 times v instead

292
00:10:56,792 --> 00:10:59,177
with some other number times x1

293
00:10:59,177 --> 00:11:01,712
and then plus, you know, instead of

294
00:11:01,712 --> 00:11:03,475
5xw, we instead have some

295
00:11:03,475 --> 00:11:05,212
other real number plus some

296
00:11:05,212 --> 00:11:06,850
other vector and then you

297
00:11:06,860 --> 00:11:08,909
add on other vectors, you know,

298
00:11:08,909 --> 00:11:10,528
plus ... plus the other

299
00:11:10,540 --> 00:11:12,234
vectors, which is why

300
00:11:12,234 --> 00:11:15,178
overall, this thing

301
00:11:15,178 --> 00:11:17,015
over here, that whole

302
00:11:17,015 --> 00:11:19,745
quantity, that delta is

303
00:11:19,770 --> 00:11:23,685
just some vector, and concretely, the

304
00:11:23,685 --> 00:11:26,373
3 elements of delta correspond

305
00:11:26,373 --> 00:11:28,813
if n2, the 3 elements

306
00:11:28,820 --> 00:11:31,512
of delta correspond exactly to

307
00:11:31,512 --> 00:11:33,349
this thing to the second

308
00:11:33,349 --> 00:11:35,075
thing and this third

309
00:11:35,075 --> 00:11:36,401
thing, which is why

310
00:11:36,410 --> 00:11:38,299
when you update theta, according to

311
00:11:38,299 --> 00:11:40,979
theta minus alpha delta,

312
00:11:41,010 --> 00:11:42,760
we end up having exactly the

313
00:11:42,830 --> 00:11:44,948
same simultaneous updates as the

314
00:11:44,960 --> 00:11:47,825
update rules that we have on top.

315
00:11:47,840 --> 00:11:48,960
So, I know that there

316
00:11:48,960 --> 00:11:50,466
was a lot that happened on

317
00:11:50,500 --> 00:11:52,608
the slides, but again, feel

318
00:11:52,650 --> 00:11:54,489
free to pause the video and

319
00:11:54,510 --> 00:11:56,592
I either encourage you to

320
00:11:56,592 --> 00:11:58,247
step through the difference. If

321
00:11:58,247 --> 00:11:59,451
you're unsure of what just happen,

322
00:11:59,451 --> 00:12:01,719
I encourage you to step through

323
00:12:01,719 --> 00:12:02,940
the slide to make sure you

324
00:12:02,940 --> 00:12:04,578
understand why is it

325
00:12:04,580 --> 00:12:07,048
that this update here with

326
00:12:07,060 --> 00:12:09,612
this definition of delta, right?

327
00:12:09,612 --> 00:12:10,943
Why is it that that equal

328
00:12:10,943 --> 00:12:13,714
to this update on top and

329
00:12:13,714 --> 00:12:15,033
it's still not clear when insight is

330
00:12:15,033 --> 00:12:18,395
that, you know, this thing over here.

331
00:12:18,400 --> 00:12:20,628
That's exactly the vector

332
00:12:20,628 --> 00:12:22,109
x and so, we're

333
00:12:22,109 --> 00:12:23,342
just taking, you know, all

334
00:12:23,342 --> 00:12:25,516
3 of these computations and compressing

335
00:12:25,516 --> 00:12:27,106
them into one step

336
00:12:27,106 --> 00:12:29,778
with the this vector delta,

337
00:12:29,778 --> 00:12:31,292
which is why we can come

338
00:12:31,292 --> 00:12:33,465
up with a vectorized implementation of

339
00:12:33,490 --> 00:12:36,942
this step of linear regression this way.

340
00:12:36,942 --> 00:12:38,639
So I hope this

341
00:12:38,660 --> 00:12:40,660
step makes sense, and do

342
00:12:40,660 --> 00:12:41,791
look at the video and make

343
00:12:41,791 --> 00:12:44,013
sure and see if you can understand it.

344
00:12:44,013 --> 00:12:46,058
In case you don't understand The

345
00:12:46,058 --> 00:12:48,029
equivalence of this math if

346
00:12:48,029 --> 00:12:49,435
you implement this, this turns

347
00:12:49,435 --> 00:12:50,944
out to be the right answer anyway,

348
00:12:50,944 --> 00:12:52,224
so even if you didn't

349
00:12:52,224 --> 00:12:56,403
quite understand the equivalence, if you just implement it this way,

350
00:12:56,410 --> 00:12:58,992
you'll be able to get linear regressions to work.

351
00:12:58,992 --> 00:13:00,663
So, if you're able to

352
00:13:00,663 --> 00:13:02,216
figure out why these 2 steps

353
00:13:02,216 --> 00:13:04,122
are equivalent then hopefully that

354
00:13:04,122 --> 00:13:06,239
would give you a better understanding of vectorization

355
00:13:06,239 --> 00:13:10,121
as well, and finally,

356
00:13:10,121 --> 00:13:12,355
if you're implementing linear

357
00:13:12,370 --> 00:13:14,872
regression using more than one or two features.

358
00:13:14,872 --> 00:13:16,548
So, sometimes we use linear

359
00:13:16,550 --> 00:13:18,078
regression with tens or hundreds

360
00:13:18,078 --> 00:13:19,968
thousands of features, but if

361
00:13:19,980 --> 00:13:21,853
you use the vectorized implementation

362
00:13:21,853 --> 00:13:23,735
of linear regression, usually that

363
00:13:23,735 --> 00:13:25,605
will run much faster than if

364
00:13:25,605 --> 00:13:26,892
you had say your old

365
00:13:26,892 --> 00:13:28,163
for loop that was you

366
00:13:28,163 --> 00:13:31,485
know, updating theta 0 then theta 1 then theta 2 yourself.

367
00:13:31,500 --> 00:13:33,769
So, using a vectorized implementation, you

368
00:13:33,769 --> 00:13:34,688
should be able to get a

369
00:13:34,688 --> 00:13:37,762
much more efficient implementation of linear regression.

370
00:13:37,790 --> 00:13:39,347
And when you vectorize later

371
00:13:39,347 --> 00:13:40,430
algorithms that we'll see in

372
00:13:40,430 --> 00:13:41,554
this class is a good

373
00:13:41,554 --> 00:13:43,367
trick whether an octave

374
00:13:43,367 --> 00:13:44,767
or some of the language, the C++

375
00:13:44,767 --> 00:13:48,474
Java for getting your code to run more efficiently.