[MUSIC] Well in our coordinate
descent algorithm for lasso. And actually all of our coordinate descent
algorithms that we've presented we have this line that says while not converged. And the question is how
are we assessing Convergence? Well, when should I stop
in coordinate descent? In gradient descent, remember we're looking at
the magnitude of that gradient vector. And stopping when the magnitude of the
vector was below some tolerance epsilon. Well here, we don't have these
gradients we're computing, so we have to do something else. One thing we know, though,
is that for convex objectives, the steps that we're taking as we're
going through this algorithm are gonna become smaller and smaller and smaller
as we're moving towards our optimum. Well at least in strongly
convex functions, we know that we're converging
to our optimal solution. And so one thing that we can do is we can
measure the size of these steps that we're taking through a full cycle
of our different coordinates. Because I wanna emphasize, we have to cycle through all
of our coordinates, zero to d. Before judging whether to stop, because
it's possible that one coordinate or a few coordinates might have small steps,
but then you get to another coordinate, and you still take a large step. But if, over an entire sweep of all
coordinates, if the maximum step that you take in that entire
cycle is less than your tolerance epsilon, then that's one way you
can assess that your algorithms converged. I also wanna mention that this
Coordinate descent algorithm is just one of many possible ways
of solving this lasso objective. So classically, lasso was solved using what's called lars
least angle regression and shrinkage. And that was popular up
until roughly 2008 when an older algorithm was kinda
rediscovered and popularized. Which is doing this coordinate
descent approach for lasso. But more recently there's been a lot a lot
of activity in the area of coming up with efficient parallel lines and distributed
implementations of lasso solvers. These include a parallel
version of coordinate descent. And other parallel learning approaches
like parallel stochastic gradient descent or thinking about this
kind of distribute and average approach that's
fairly popular as well. And one of the most popular
approaches specifically for lasso is something called, Alternating
direction method of multipliers, or ADMM, and that's been really popular within
the community of people using lasso. [MUSIC]