[MUSIC] >> So the simple alternative
approach is gradient descent. Where, remember that
the gradient descent algorithm, we just initialize our vector of parameter
somewhere and take these gradient steps. And eventually, we will converge to the optimum of this problem. Okay, so what does this algorithm
look like for multiple regression? Well, it looks very similar to
our simple linear regression, where we say while not converged,
we're gonna take our w parameters. And we're gonna update them by subtracting
sum step size atta times the gradient of our residual sum of squares,
at our previous set of parameters wt. So what is our residual sum of squares? Sorry, the gradient of the residual sum
of squares, I'm writing right here, so this update is w at iteration t. The minus sign and this minus
sign will turn into a plus sign. Two eta times this h matrix, h transpose y- Hw at iteration t. And what is this here? Well, h times w at iteration t is
my predicted set of observations, the whole vector of them. Assuming that I use w at iteration
t performing those predictions. Okay, so what this version of
the algorithm is doing is it's taking our entire w vector, all the regression
coefficients in our model, and updating them all at once using
this matrix notation shown here. [MUSIC]