1 00:00:00,280 --> 00:00:04,479 In this video, I'd like to tell you about the idea of vectorization. 2 00:00:04,480 --> 00:00:06,471 So, whether you're using Octave 3 00:00:06,471 --> 00:00:08,277 or a similar language like MATLAB 4 00:00:08,277 --> 00:00:09,604 or whether you're using Python 5 00:00:09,604 --> 00:00:12,520 and NumPy or Java CC++. 6 00:00:12,520 --> 00:00:14,850 All of these languages have either 7 00:00:14,850 --> 00:00:16,708 built into them or have 8 00:00:16,720 --> 00:00:19,439 readily and easily accessible, different 9 00:00:19,439 --> 00:00:21,806 numerical linear algebra libraries. 10 00:00:21,820 --> 00:00:23,335 They're usually very well written, 11 00:00:23,335 --> 00:00:25,695 highly optimized, often so that developed by 12 00:00:25,695 --> 00:00:29,181 people that, you know, have PhDs in numerical computing or 13 00:00:29,181 --> 00:00:32,075 they are really specializing numerical computing. 14 00:00:32,075 --> 00:00:33,944 And when you're implementing machine 15 00:00:33,960 --> 00:00:35,904 learning algorithms, if you're able 16 00:00:35,930 --> 00:00:37,797 to take advantage of these 17 00:00:37,810 --> 00:00:39,296 linear algebra libraries or these 18 00:00:39,310 --> 00:00:41,600 numerical linear algebra libraries and 19 00:00:41,620 --> 00:00:43,387 mix the routine calls to them 20 00:00:43,387 --> 00:00:45,172 rather than sort of right call 21 00:00:45,180 --> 00:00:48,029 yourself to do things that these libraries could be doing. 22 00:00:48,040 --> 00:00:49,612 If you do that then 23 00:00:49,612 --> 00:00:51,872 often you get that "first is more efficient". 24 00:00:51,880 --> 00:00:53,179 So, just run more quickly and 25 00:00:53,179 --> 00:00:54,891 take better advantage of 26 00:00:54,891 --> 00:00:56,631 any parallel hardware your computer 27 00:00:56,631 --> 00:00:58,254 may have and so on. 28 00:00:58,270 --> 00:01:00,533 And second, it also means 29 00:01:00,540 --> 00:01:03,075 that you end up with less code that you need to write. 30 00:01:03,075 --> 00:01:04,962 So have a simpler implementation 31 00:01:04,962 --> 00:01:08,532 that is, therefore, maybe also more likely to be bug free. 32 00:01:08,550 --> 00:01:10,534 And as a concrete example. 33 00:01:10,570 --> 00:01:12,726 Rather than writing code 34 00:01:12,726 --> 00:01:15,061 yourself to multiply matrices, if 35 00:01:15,061 --> 00:01:16,300 you let Octave do it by 36 00:01:16,300 --> 00:01:18,145 typing a times b, 37 00:01:18,145 --> 00:01:19,833 that will use a very efficient 38 00:01:19,833 --> 00:01:22,318 routine to multiply the 2 matrices. 39 00:01:22,340 --> 00:01:23,985 And there's a bunch of examples like 40 00:01:24,010 --> 00:01:27,220 these where you use appropriate vectorized implementations. 41 00:01:27,220 --> 00:01:30,062 You get much simpler code, and much more efficient code. 42 00:01:30,280 --> 00:01:33,071 Let's look at some examples. 43 00:01:33,071 --> 00:01:34,937 Here's a usual hypothesis of linear 44 00:01:34,937 --> 00:01:36,415 regression and if you 45 00:01:36,415 --> 00:01:37,348 want to compute H of 46 00:01:37,348 --> 00:01:40,032 X, notice that there is a sum on the right. 47 00:01:40,032 --> 00:01:41,130 And so one thing you could 48 00:01:41,130 --> 00:01:42,775 do is compute the sum 49 00:01:42,775 --> 00:01:46,611 from J equals 0 to J equals N yourself. 50 00:01:46,620 --> 00:01:48,000 Another way to think of this 51 00:01:48,000 --> 00:01:49,210 is to think of h 52 00:01:49,210 --> 00:01:52,029 of x as theta transpose x 53 00:01:52,029 --> 00:01:53,262 and what you can do is 54 00:01:53,262 --> 00:01:55,654 think of this as you know, computing this 55 00:01:55,660 --> 00:01:57,823 in a product between 2 vectors 56 00:01:57,840 --> 00:02:00,135 where theta is, you know, your 57 00:02:00,135 --> 00:02:01,784 vector say theta 0, theta 1, 58 00:02:01,800 --> 00:02:04,812 theta 2 if you have 2 features. 59 00:02:04,812 --> 00:02:06,410 If n equals 2 and if 60 00:02:06,450 --> 00:02:08,133 you think of x as this 61 00:02:08,133 --> 00:02:11,810 vector, x0, x1, x2 62 00:02:11,884 --> 00:02:13,952 and these 2 views can 63 00:02:13,952 --> 00:02:17,539 give you 2 different implementations. 64 00:02:17,560 --> 00:02:18,909 Here's what I mean. 65 00:02:18,909 --> 00:02:21,012 Here's an unvectorized implementation for 66 00:02:21,040 --> 00:02:22,454 how to compute h of 67 00:02:22,454 --> 00:02:26,120 x and by unvectorized I mean, without vectorization. 68 00:02:26,130 --> 00:02:29,479 We might first initialize, you know, prediction to be 0.0. 69 00:02:29,479 --> 00:02:32,383 This is going to eventually, the 70 00:02:32,383 --> 00:02:34,287 prediction is going to be 71 00:02:34,300 --> 00:02:36,090 h of x and then 72 00:02:36,090 --> 00:02:37,258 I'm going to have a for loop for 73 00:02:37,270 --> 00:02:38,354 j equals one through n+1 74 00:02:38,354 --> 00:02:40,792 prediction gets incremented by 75 00:02:40,792 --> 00:02:41,822 theta j times xj. 76 00:02:41,822 --> 00:02:44,737 So, it's kind of this expression over here. 77 00:02:44,737 --> 00:02:47,223 By the way, I should mention in these 78 00:02:47,223 --> 00:02:48,894 vectors right over here, I 79 00:02:48,900 --> 00:02:51,102 had these vectors being 0 index. 80 00:02:51,110 --> 00:02:52,600 So, I had theta 0 theta 1, 81 00:02:52,600 --> 00:02:54,390 theta 2, but because MATLAB 82 00:02:54,390 --> 00:02:56,713 is one index, theta 0 83 00:02:56,713 --> 00:02:58,019 in MATLAB, we might 84 00:02:58,019 --> 00:03:00,204 end up representing as theta 85 00:03:00,204 --> 00:03:02,042 1 and this second element 86 00:03:02,042 --> 00:03:04,392 ends up as theta 87 00:03:04,392 --> 00:03:05,862 2 and this third element 88 00:03:05,880 --> 00:03:08,002 may end up as theta 89 00:03:08,002 --> 00:03:09,952 3 just because vectors in 90 00:03:09,960 --> 00:03:11,998 MATLAB are indexed starting 91 00:03:11,998 --> 00:03:13,525 from 1 even though our real 92 00:03:13,525 --> 00:03:15,436 theta and x here starting, 93 00:03:15,450 --> 00:03:17,002 indexing from 0, which 94 00:03:17,002 --> 00:03:18,785 is why here I have a for loop 95 00:03:18,785 --> 00:03:20,498 j goes from 1 through n+1 96 00:03:20,498 --> 00:03:22,225 rather than j go through 97 00:03:22,225 --> 00:03:26,243 0 up to n, right? But 98 00:03:26,300 --> 00:03:27,870 so, this is an 99 00:03:27,870 --> 00:03:29,571 unvectorized implementation in that we 100 00:03:29,571 --> 00:03:31,373 have a for loop that summing up 101 00:03:31,373 --> 00:03:34,018 the n elements of the sum. 102 00:03:34,050 --> 00:03:35,646 In contrast, here's how you 103 00:03:35,646 --> 00:03:38,400 write a vectorized implementation which 104 00:03:38,410 --> 00:03:39,959 is that you would think 105 00:03:39,959 --> 00:03:42,618 of x and theta 106 00:03:42,618 --> 00:03:43,955 as vectors, and you just set 107 00:03:43,955 --> 00:03:46,039 prediction equals theta transpose 108 00:03:46,039 --> 00:03:48,347 times x. You're just computing like so. 109 00:03:48,360 --> 00:03:51,011 Instead of writing all these 110 00:03:51,011 --> 00:03:52,966 lines of code with the for loop, 111 00:03:52,966 --> 00:03:54,242 you instead have one line 112 00:03:54,242 --> 00:03:56,648 of code and what this 113 00:03:56,648 --> 00:03:57,555 line of code on the right 114 00:03:57,555 --> 00:03:59,237 will do is it use 115 00:03:59,237 --> 00:04:01,829 Octaves highly optimized numerical 116 00:04:01,840 --> 00:04:03,859 linear algebra routines to compute 117 00:04:03,859 --> 00:04:06,245 this inner product between the 118 00:04:06,245 --> 00:04:08,186 two vectors, theta and X. And not 119 00:04:08,190 --> 00:04:10,182 only is the vectorized implementation 120 00:04:10,182 --> 00:04:14,664 simpler, it will also run more efficiently. 121 00:04:15,820 --> 00:04:17,792 So, that was Octave, but 122 00:04:17,792 --> 00:04:19,912 issue of vectorization applies to 123 00:04:19,920 --> 00:04:22,020 other programming languages as well. 124 00:04:22,040 --> 00:04:24,947 Let's look at an example in C++. 125 00:04:24,947 --> 00:04:27,965 Here's what an unvectorized implementation might look like. 126 00:04:27,965 --> 00:04:31,395 We again initialize prediction, you know, to 127 00:04:31,395 --> 00:04:32,518 0.0 and then we now have a full 128 00:04:32,518 --> 00:04:34,508 loop for J0 up to 129 00:04:34,508 --> 00:04:36,819 n. Prediction + equals 130 00:04:36,830 --> 00:04:38,546 theta j times x j where 131 00:04:38,560 --> 00:04:42,777 again, you have this x + for loop that you write yourself. 132 00:04:42,777 --> 00:04:44,843 In contrast, using a good 133 00:04:44,850 --> 00:04:46,498 numerical linear algebra library in 134 00:04:46,498 --> 00:04:48,965 C++, you could use 135 00:04:48,990 --> 00:04:54,440 write the function like or rather. 136 00:04:54,560 --> 00:04:56,533 In contrast, using a good 137 00:04:56,533 --> 00:04:58,152 numerical linear algebra library in 138 00:04:58,152 --> 00:05:00,686 C++, you can instead 139 00:05:00,686 --> 00:05:02,470 write code that might look like this. 140 00:05:02,470 --> 00:05:03,985 So, depending on the details 141 00:05:03,985 --> 00:05:05,595 of your numerical linear algebra 142 00:05:05,595 --> 00:05:06,790 library, you might be 143 00:05:06,830 --> 00:05:08,580 able to have an object that 144 00:05:08,580 --> 00:05:09,918 is a C++ object which is 145 00:05:09,918 --> 00:05:11,328 vector theta and a C++ 146 00:05:11,350 --> 00:05:13,436 object which is a vector X, 147 00:05:13,436 --> 00:05:15,552 and you just take theta dot 148 00:05:15,552 --> 00:05:18,115 transpose times x where 149 00:05:18,120 --> 00:05:20,092 this times becomes C++ to 150 00:05:20,092 --> 00:05:22,028 overload the operator so 151 00:05:22,028 --> 00:05:26,156 that you can just multiply these two vectors in C++. 152 00:05:26,156 --> 00:05:28,091 And depending on, you know, the details 153 00:05:28,110 --> 00:05:29,515 of your numerical and linear algebra 154 00:05:29,515 --> 00:05:30,855 library, you might end 155 00:05:30,855 --> 00:05:31,894 up using a slightly different and 156 00:05:31,894 --> 00:05:33,636 syntax, but by relying 157 00:05:33,636 --> 00:05:35,758 on a library to do this in a product. 158 00:05:35,760 --> 00:05:37,064 You can get a much simpler piece 159 00:05:37,064 --> 00:05:40,623 of code and a much more efficient one. 160 00:05:40,623 --> 00:05:43,582 Let's now look at a more sophisticated example. 161 00:05:43,582 --> 00:05:45,015 Just to remind you here's our 162 00:05:45,015 --> 00:05:46,792 update rule for gradient descent 163 00:05:46,792 --> 00:05:48,794 for linear regression and so, 164 00:05:48,794 --> 00:05:50,488 we update theta j using this 165 00:05:50,488 --> 00:05:53,672 rule for all values of J equals 0, 1, 2, and so on. 166 00:05:53,672 --> 00:05:56,259 And if I just write 167 00:05:56,260 --> 00:05:58,206 out these equations for 168 00:05:58,206 --> 00:06:00,048 theta 0 Theta one, theta two. 169 00:06:00,048 --> 00:06:02,173 Assuming we have two features. 170 00:06:02,173 --> 00:06:03,469 So N equals 2. 171 00:06:03,469 --> 00:06:04,607 Then these are the updates we 172 00:06:04,610 --> 00:06:07,388 perform to theta zero, theta one, theta two. 173 00:06:07,410 --> 00:06:08,982 where you might remember my 174 00:06:08,982 --> 00:06:10,825 saying in an earlier video 175 00:06:10,825 --> 00:06:14,783 that these should be simultaneous updates. 176 00:06:14,783 --> 00:06:16,268 So let's see if 177 00:06:16,268 --> 00:06:17,725 we can come up with a 178 00:06:17,725 --> 00:06:20,723 vectorized implementation of this. 179 00:06:20,740 --> 00:06:22,598 Here are my same 3 equations written 180 00:06:22,598 --> 00:06:24,182 on a slightly smaller font and you 181 00:06:24,182 --> 00:06:25,517 can imagine that 1 wait 182 00:06:25,520 --> 00:06:26,716 to implement this three lines 183 00:06:26,720 --> 00:06:27,798 of code is to have a 184 00:06:27,798 --> 00:06:28,968 for loop that says, you 185 00:06:28,968 --> 00:06:31,682 know, for j equals 0, 186 00:06:31,682 --> 00:06:33,305 1 through 2 the update 187 00:06:33,305 --> 00:06:35,603 theta J or something like that. 188 00:06:35,603 --> 00:06:36,760 But instead, let's come up 189 00:06:36,760 --> 00:06:40,975 with a vectorized implementation and see if we can have a simpler way. 190 00:06:40,975 --> 00:06:42,711 So, basically compress these three 191 00:06:42,757 --> 00:06:44,314 lines of code or a 192 00:06:44,314 --> 00:06:48,518 for loop that, you know, effectively does these 3 sets, 1 set at a time. 193 00:06:48,518 --> 00:06:49,688 Let's see who can these 3 194 00:06:49,688 --> 00:06:51,402 steps and compress them into 195 00:06:51,402 --> 00:06:53,972 1 line of vectorized code. 196 00:06:53,976 --> 00:06:55,476 Here's the idea. 197 00:06:55,480 --> 00:06:56,462 What I'm going to do is I'm 198 00:06:56,462 --> 00:06:59,131 going to think of theta 199 00:06:59,131 --> 00:07:00,633 as a vector and I'm 200 00:07:00,633 --> 00:07:04,214 going to update theta as theta 201 00:07:04,270 --> 00:07:07,468 minus alpha times some 202 00:07:07,468 --> 00:07:11,650 other vector, delta, where 203 00:07:11,650 --> 00:07:13,689 delta is going to be 204 00:07:13,700 --> 00:07:15,876 equal to 1 over 205 00:07:15,876 --> 00:07:18,408 m, sum from I equals 206 00:07:18,450 --> 00:07:22,151 one through m and then 207 00:07:22,180 --> 00:07:25,570 this term on the 208 00:07:25,720 --> 00:07:28,118 right, okay? 209 00:07:28,118 --> 00:07:31,205 So, let me explain what's going on here. 210 00:07:31,220 --> 00:07:32,666 Here, I'm going to treat 211 00:07:32,666 --> 00:07:35,322 theta as a vector 212 00:07:35,350 --> 00:07:38,106 so, there's an N+1 dimensional vector. 213 00:07:38,110 --> 00:07:40,291 I'm saying that theta gets, you know, updated 214 00:07:40,310 --> 00:07:43,922 as--that's the vector, our N+1. 215 00:07:43,922 --> 00:07:45,319 Alpha is a real 216 00:07:45,319 --> 00:07:47,395 number and delta 217 00:07:47,410 --> 00:07:49,941 here is a vector. 218 00:07:49,960 --> 00:07:54,278 So, this subtraction operation, that's a vector subtraction. 219 00:07:54,278 --> 00:07:55,255 Okay? 220 00:07:55,255 --> 00:07:56,977 Because alpha times delta 221 00:07:56,977 --> 00:07:58,385 is a vector and so 222 00:07:58,385 --> 00:08:00,369 I'm saying if theta gets, you know, this 223 00:08:00,369 --> 00:08:04,217 vector, alpha times delta subtracted from it. 224 00:08:04,240 --> 00:08:06,563 So, what is the vector delta? 225 00:08:06,563 --> 00:08:10,220 Well, this vector delta looks like this. 226 00:08:10,256 --> 00:08:12,092 And what this meant to 227 00:08:12,092 --> 00:08:14,595 be is really meant to be 228 00:08:14,620 --> 00:08:17,102 this thing over here. 229 00:08:17,140 --> 00:08:19,200 Concretely, delta will be 230 00:08:19,220 --> 00:08:22,165 a N+1 dimensional vector and 231 00:08:22,165 --> 00:08:23,978 the very first element of 232 00:08:23,978 --> 00:08:27,767 the vector delta is going to be equal to that. 233 00:08:27,770 --> 00:08:29,513 So, if we have 234 00:08:29,513 --> 00:08:31,565 the delta, you know, if we index it 235 00:08:31,565 --> 00:08:34,469 from 0--this is delta 0, delta 1, delta 2. 236 00:08:34,469 --> 00:08:36,541 What I want is that 237 00:08:36,560 --> 00:08:39,033 delta 0 is equal 238 00:08:39,040 --> 00:08:41,267 to, you know, this 239 00:08:41,267 --> 00:08:42,359 first box also green up 240 00:08:42,360 --> 00:08:45,306 above and indeed, you might 241 00:08:45,306 --> 00:08:47,108 be able to convince yourself that delta 242 00:08:47,108 --> 00:08:48,681 0 is this 1 of m, 243 00:08:48,681 --> 00:08:50,102 sum of, you know, h of 244 00:08:50,102 --> 00:08:53,356 x. xi minus 245 00:08:53,400 --> 00:08:58,315 yi times xi0. 246 00:08:58,315 --> 00:08:59,748 So, let's just make 247 00:08:59,748 --> 00:09:01,064 sure that we're on the 248 00:09:01,064 --> 00:09:03,998 same page about how delta really is computed. 249 00:09:03,998 --> 00:09:05,488 Delta is one of m 250 00:09:05,488 --> 00:09:08,284 times the sum over here 251 00:09:08,284 --> 00:09:09,871 and, you know, what is this sum? 252 00:09:09,871 --> 00:09:11,426 Well, this term over 253 00:09:11,426 --> 00:09:17,115 here, that's a real number. 254 00:09:17,150 --> 00:09:21,219 And the second term over here, xi. 255 00:09:21,219 --> 00:09:23,892 This term over there is a 256 00:09:23,910 --> 00:09:26,109 vector, right? Because xi might 257 00:09:26,109 --> 00:09:26,982 be a vector. 258 00:09:26,990 --> 00:09:29,630 That would be 259 00:09:29,975 --> 00:09:36,115 xi0, xi1, xi2 right? 260 00:09:36,130 --> 00:09:38,246 And what is the summation? 261 00:09:38,246 --> 00:09:40,241 Well, what does summation say 262 00:09:40,250 --> 00:09:43,292 is that this term 263 00:09:43,502 --> 00:09:46,555 over here. 264 00:09:47,280 --> 00:09:54,801 This is equal to h+x1-y1 times 265 00:09:54,870 --> 00:09:59,099 x1 + h of 266 00:09:59,115 --> 00:10:02,778 x2-y2 times x2 267 00:10:02,778 --> 00:10:05,396 + you know, and so on. 268 00:10:05,396 --> 00:10:06,404 Okay? 269 00:10:06,404 --> 00:10:07,420 Because this is a summation of 270 00:10:07,420 --> 00:10:09,013 the I. So, as I 271 00:10:09,013 --> 00:10:11,345 ranges from I1 through m, 272 00:10:11,345 --> 00:10:15,144 you get these different terms and you're summing up these terms. 273 00:10:15,160 --> 00:10:16,221 And the meaning of each of these 274 00:10:16,221 --> 00:10:18,262 terms is a lot like 275 00:10:18,262 --> 00:10:19,807 - if you remember actually from 276 00:10:19,807 --> 00:10:24,100 the earlier quiz in this, if you solve this equation. 277 00:10:24,110 --> 00:10:25,560 We said that in order to 278 00:10:25,560 --> 00:10:27,250 vectorize this code, we 279 00:10:27,250 --> 00:10:30,755 will instead set u2v+5w. So, 280 00:10:30,770 --> 00:10:32,391 we're saying that the vector u 281 00:10:32,391 --> 00:10:33,706 is equal to 2 times 282 00:10:33,706 --> 00:10:35,568 the vector v plus 5 times 283 00:10:35,570 --> 00:10:37,198 the vector w. So, just an 284 00:10:37,198 --> 00:10:39,023 example of how to 285 00:10:39,023 --> 00:10:42,453 add different vectors and this summation is the same thing. 286 00:10:42,453 --> 00:10:44,919 It's a saying that this 287 00:10:44,950 --> 00:10:49,766 summation over here is just some real number right? 288 00:10:49,840 --> 00:10:50,996 That's kind of like the number 289 00:10:51,010 --> 00:10:52,698 2 and some other number 290 00:10:52,711 --> 00:10:54,085 times the vector x1. 291 00:10:54,085 --> 00:10:56,792 This is like 2 times v instead 292 00:10:56,792 --> 00:10:59,177 with some other number times x1 293 00:10:59,177 --> 00:11:01,712 and then plus, you know, instead of 294 00:11:01,712 --> 00:11:03,475 5xw, we instead have some 295 00:11:03,475 --> 00:11:05,212 other real number plus some 296 00:11:05,212 --> 00:11:06,850 other vector and then you 297 00:11:06,860 --> 00:11:08,909 add on other vectors, you know, 298 00:11:08,909 --> 00:11:10,528 plus ... plus the other 299 00:11:10,540 --> 00:11:12,234 vectors, which is why 300 00:11:12,234 --> 00:11:15,178 overall, this thing 301 00:11:15,178 --> 00:11:17,015 over here, that whole 302 00:11:17,015 --> 00:11:19,745 quantity, that delta is 303 00:11:19,770 --> 00:11:23,685 just some vector, and concretely, the 304 00:11:23,685 --> 00:11:26,373 3 elements of delta correspond 305 00:11:26,373 --> 00:11:28,813 if n2, the 3 elements 306 00:11:28,820 --> 00:11:31,512 of delta correspond exactly to 307 00:11:31,512 --> 00:11:33,349 this thing to the second 308 00:11:33,349 --> 00:11:35,075 thing and this third 309 00:11:35,075 --> 00:11:36,401 thing, which is why 310 00:11:36,410 --> 00:11:38,299 when you update theta, according to 311 00:11:38,299 --> 00:11:40,979 theta minus alpha delta, 312 00:11:41,010 --> 00:11:42,760 we end up having exactly the 313 00:11:42,830 --> 00:11:44,948 same simultaneous updates as the 314 00:11:44,960 --> 00:11:47,825 update rules that we have on top. 315 00:11:47,840 --> 00:11:48,960 So, I know that there 316 00:11:48,960 --> 00:11:50,466 was a lot that happened on 317 00:11:50,500 --> 00:11:52,608 the slides, but again, feel 318 00:11:52,650 --> 00:11:54,489 free to pause the video and 319 00:11:54,510 --> 00:11:56,592 I either encourage you to 320 00:11:56,592 --> 00:11:58,247 step through the difference. If 321 00:11:58,247 --> 00:11:59,451 you're unsure of what just happen, 322 00:11:59,451 --> 00:12:01,719 I encourage you to step through 323 00:12:01,719 --> 00:12:02,940 the slide to make sure you 324 00:12:02,940 --> 00:12:04,578 understand why is it 325 00:12:04,580 --> 00:12:07,048 that this update here with 326 00:12:07,060 --> 00:12:09,612 this definition of delta, right? 327 00:12:09,612 --> 00:12:10,943 Why is it that that equal 328 00:12:10,943 --> 00:12:13,714 to this update on top and 329 00:12:13,714 --> 00:12:15,033 it's still not clear when insight is 330 00:12:15,033 --> 00:12:18,395 that, you know, this thing over here. 331 00:12:18,400 --> 00:12:20,628 That's exactly the vector 332 00:12:20,628 --> 00:12:22,109 x and so, we're 333 00:12:22,109 --> 00:12:23,342 just taking, you know, all 334 00:12:23,342 --> 00:12:25,516 3 of these computations and compressing 335 00:12:25,516 --> 00:12:27,106 them into one step 336 00:12:27,106 --> 00:12:29,778 with the this vector delta, 337 00:12:29,778 --> 00:12:31,292 which is why we can come 338 00:12:31,292 --> 00:12:33,465 up with a vectorized implementation of 339 00:12:33,490 --> 00:12:36,942 this step of linear regression this way. 340 00:12:36,942 --> 00:12:38,639 So I hope this 341 00:12:38,660 --> 00:12:40,660 step makes sense, and do 342 00:12:40,660 --> 00:12:41,791 look at the video and make 343 00:12:41,791 --> 00:12:44,013 sure and see if you can understand it. 344 00:12:44,013 --> 00:12:46,058 In case you don't understand The 345 00:12:46,058 --> 00:12:48,029 equivalence of this math if 346 00:12:48,029 --> 00:12:49,435 you implement this, this turns 347 00:12:49,435 --> 00:12:50,944 out to be the right answer anyway, 348 00:12:50,944 --> 00:12:52,224 so even if you didn't 349 00:12:52,224 --> 00:12:56,403 quite understand the equivalence, if you just implement it this way, 350 00:12:56,410 --> 00:12:58,992 you'll be able to get linear regressions to work. 351 00:12:58,992 --> 00:13:00,663 So, if you're able to 352 00:13:00,663 --> 00:13:02,216 figure out why these 2 steps 353 00:13:02,216 --> 00:13:04,122 are equivalent then hopefully that 354 00:13:04,122 --> 00:13:06,239 would give you a better understanding of vectorization 355 00:13:06,239 --> 00:13:10,121 as well, and finally, 356 00:13:10,121 --> 00:13:12,355 if you're implementing linear 357 00:13:12,370 --> 00:13:14,872 regression using more than one or two features. 358 00:13:14,872 --> 00:13:16,548 So, sometimes we use linear 359 00:13:16,550 --> 00:13:18,078 regression with tens or hundreds 360 00:13:18,078 --> 00:13:19,968 thousands of features, but if 361 00:13:19,980 --> 00:13:21,853 you use the vectorized implementation 362 00:13:21,853 --> 00:13:23,735 of linear regression, usually that 363 00:13:23,735 --> 00:13:25,605 will run much faster than if 364 00:13:25,605 --> 00:13:26,892 you had say your old 365 00:13:26,892 --> 00:13:28,163 for loop that was you 366 00:13:28,163 --> 00:13:31,485 know, updating theta 0 then theta 1 then theta 2 yourself. 367 00:13:31,500 --> 00:13:33,769 So, using a vectorized implementation, you 368 00:13:33,769 --> 00:13:34,688 should be able to get a 369 00:13:34,688 --> 00:13:37,762 much more efficient implementation of linear regression. 370 00:13:37,790 --> 00:13:39,347 And when you vectorize later 371 00:13:39,347 --> 00:13:40,430 algorithms that we'll see in 372 00:13:40,430 --> 00:13:41,554 this class is a good 373 00:13:41,554 --> 00:13:43,367 trick whether an octave 374 00:13:43,367 --> 00:13:44,767 or some of the language, the C++ 375 00:13:44,767 --> 00:13:48,474 Java for getting your code to run more efficiently.