In this video, I'd like to tell you about the idea of vectorization. So, whether you're using Octave or a similar language like MATLAB or whether you're using Python and NumPy or Java CC++. All of these languages have either built into them or have readily and easily accessible, different numerical linear algebra libraries. They're usually very well written, highly optimized, often so that developed by people that, you know, have PhDs in numerical computing or they are really specializing numerical computing. And when you're implementing machine learning algorithms, if you're able to take advantage of these linear algebra libraries or these numerical linear algebra libraries and mix the routine calls to them rather than sort of right call yourself to do things that these libraries could be doing. If you do that then often you get that "first is more efficient". So, just run more quickly and take better advantage of any parallel hardware your computer may have and so on. And second, it also means that you end up with less code that you need to write. So have a simpler implementation that is, therefore, maybe also more likely to be bug free. And as a concrete example. Rather than writing code yourself to multiply matrices, if you let Octave do it by typing a times b, that will use a very efficient routine to multiply the 2 matrices. And there's a bunch of examples like these where you use appropriate vectorized implementations. You get much simpler code, and much more efficient code. Let's look at some examples. Here's a usual hypothesis of linear regression and if you want to compute H of X, notice that there is a sum on the right. And so one thing you could do is compute the sum from J equals 0 to J equals N yourself. Another way to think of this is to think of h of x as theta transpose x and what you can do is think of this as you know, computing this in a product between 2 vectors where theta is, you know, your vector say theta 0, theta 1, theta 2 if you have 2 features. If n equals 2 and if you think of x as this vector, x0, x1, x2 and these 2 views can give you 2 different implementations. Here's what I mean. Here's an unvectorized implementation for how to compute h of x and by unvectorized I mean, without vectorization. We might first initialize, you know, prediction to be 0.0. This is going to eventually, the prediction is going to be h of x and then I'm going to have a for loop for j equals one through n+1 prediction gets incremented by theta j times xj. So, it's kind of this expression over here. By the way, I should mention in these vectors right over here, I had these vectors being 0 index. So, I had theta 0 theta 1, theta 2, but because MATLAB is one index, theta 0 in MATLAB, we might end up representing as theta 1 and this second element ends up as theta 2 and this third element may end up as theta 3 just because vectors in MATLAB are indexed starting from 1 even though our real theta and x here starting, indexing from 0, which is why here I have a for loop j goes from 1 through n+1 rather than j go through 0 up to n, right? But so, this is an unvectorized implementation in that we have a for loop that summing up the n elements of the sum. In contrast, here's how you write a vectorized implementation which is that you would think of x and theta as vectors, and you just set prediction equals theta transpose times x. You're just computing like so. Instead of writing all these lines of code with the for loop, you instead have one line of code and what this line of code on the right will do is it use Octaves highly optimized numerical linear algebra routines to compute this inner product between the two vectors, theta and X. And not only is the vectorized implementation simpler, it will also run more efficiently. So, that was Octave, but issue of vectorization applies to other programming languages as well. Let's look at an example in C++. Here's what an unvectorized implementation might look like. We again initialize prediction, you know, to 0.0 and then we now have a full loop for J0 up to n. Prediction + equals theta j times x j where again, you have this x + for loop that you write yourself. In contrast, using a good numerical linear algebra library in C++, you could use write the function like or rather. In contrast, using a good numerical linear algebra library in C++, you can instead write code that might look like this. So, depending on the details of your numerical linear algebra library, you might be able to have an object that is a C++ object which is vector theta and a C++ object which is a vector X, and you just take theta dot transpose times x where this times becomes C++ to overload the operator so that you can just multiply these two vectors in C++. And depending on, you know, the details of your numerical and linear algebra library, you might end up using a slightly different and syntax, but by relying on a library to do this in a product. You can get a much simpler piece of code and a much more efficient one. Let's now look at a more sophisticated example. Just to remind you here's our update rule for gradient descent for linear regression and so, we update theta j using this rule for all values of J equals 0, 1, 2, and so on. And if I just write out these equations for theta 0 Theta one, theta two. Assuming we have two features. So N equals 2. Then these are the updates we perform to theta zero, theta one, theta two. where you might remember my saying in an earlier video that these should be simultaneous updates. So let's see if we can come up with a vectorized implementation of this. Here are my same 3 equations written on a slightly smaller font and you can imagine that 1 wait to implement this three lines of code is to have a for loop that says, you know, for j equals 0, 1 through 2 the update theta J or something like that. But instead, let's come up with a vectorized implementation and see if we can have a simpler way. So, basically compress these three lines of code or a for loop that, you know, effectively does these 3 sets, 1 set at a time. Let's see who can these 3 steps and compress them into 1 line of vectorized code. Here's the idea. What I'm going to do is I'm going to think of theta as a vector and I'm going to update theta as theta minus alpha times some other vector, delta, where delta is going to be equal to 1 over m, sum from I equals one through m and then this term on the right, okay? So, let me explain what's going on here. Here, I'm going to treat theta as a vector so, there's an N+1 dimensional vector. I'm saying that theta gets, you know, updated as--that's the vector, our N+1. Alpha is a real number and delta here is a vector. So, this subtraction operation, that's a vector subtraction. Okay? Because alpha times delta is a vector and so I'm saying if theta gets, you know, this vector, alpha times delta subtracted from it. So, what is the vector delta? Well, this vector delta looks like this. And what this meant to be is really meant to be this thing over here. Concretely, delta will be a N+1 dimensional vector and the very first element of the vector delta is going to be equal to that. So, if we have the delta, you know, if we index it from 0--this is delta 0, delta 1, delta 2. What I want is that delta 0 is equal to, you know, this first box also green up above and indeed, you might be able to convince yourself that delta 0 is this 1 of m, sum of, you know, h of x. xi minus yi times xi0. So, let's just make sure that we're on the same page about how delta really is computed. Delta is one of m times the sum over here and, you know, what is this sum? Well, this term over here, that's a real number. And the second term over here, xi. This term over there is a vector, right? Because xi might be a vector. That would be xi0, xi1, xi2 right? And what is the summation? Well, what does summation say is that this term over here. This is equal to h+x1-y1 times x1 + h of x2-y2 times x2 + you know, and so on. Okay? Because this is a summation of the I. So, as I ranges from I1 through m, you get these different terms and you're summing up these terms. And the meaning of each of these terms is a lot like - if you remember actually from the earlier quiz in this, if you solve this equation. We said that in order to vectorize this code, we will instead set u2v+5w. So, we're saying that the vector u is equal to 2 times the vector v plus 5 times the vector w. So, just an example of how to add different vectors and this summation is the same thing. It's a saying that this summation over here is just some real number right? That's kind of like the number 2 and some other number times the vector x1. This is like 2 times v instead with some other number times x1 and then plus, you know, instead of 5xw, we instead have some other real number plus some other vector and then you add on other vectors, you know, plus ... plus the other vectors, which is why overall, this thing over here, that whole quantity, that delta is just some vector, and concretely, the 3 elements of delta correspond if n2, the 3 elements of delta correspond exactly to this thing to the second thing and this third thing, which is why when you update theta, according to theta minus alpha delta, we end up having exactly the same simultaneous updates as the update rules that we have on top. So, I know that there was a lot that happened on the slides, but again, feel free to pause the video and I either encourage you to step through the difference. If you're unsure of what just happen, I encourage you to step through the slide to make sure you understand why is it that this update here with this definition of delta, right? Why is it that that equal to this update on top and it's still not clear when insight is that, you know, this thing over here. That's exactly the vector x and so, we're just taking, you know, all 3 of these computations and compressing them into one step with the this vector delta, which is why we can come up with a vectorized implementation of this step of linear regression this way. So I hope this step makes sense, and do look at the video and make sure and see if you can understand it. In case you don't understand The equivalence of this math if you implement this, this turns out to be the right answer anyway, so even if you didn't quite understand the equivalence, if you just implement it this way, you'll be able to get linear regressions to work. So, if you're able to figure out why these 2 steps are equivalent then hopefully that would give you a better understanding of vectorization as well, and finally, if you're implementing linear regression using more than one or two features. So, sometimes we use linear regression with tens or hundreds thousands of features, but if you use the vectorized implementation of linear regression, usually that will run much faster than if you had say your old for loop that was you know, updating theta 0 then theta 1 then theta 2 yourself. So, using a vectorized implementation, you should be able to get a much more efficient implementation of linear regression. And when you vectorize later algorithms that we'll see in this class is a good trick whether an octave or some of the language, the C++ Java for getting your code to run more efficiently.