1 00:00:00,406 --> 00:00:04,573 [MUSIC] 2 00:00:04,573 --> 00:00:07,200 Well in our coordinate descent algorithm for lasso. 3 00:00:08,360 --> 00:00:12,767 And actually all of our coordinate descent algorithms that we've presented we have 4 00:00:12,767 --> 00:00:14,948 this line that says while not converged. 5 00:00:14,948 --> 00:00:18,071 And the question is how are we assessing Convergence? 6 00:00:19,871 --> 00:00:22,840 Well, when should I stop in coordinate descent? 7 00:00:22,840 --> 00:00:23,600 In gradient descent, 8 00:00:23,600 --> 00:00:28,350 remember we're looking at the magnitude of that gradient vector. 9 00:00:28,350 --> 00:00:34,030 And stopping when the magnitude of the vector was below some tolerance epsilon. 10 00:00:34,030 --> 00:00:37,190 Well here, we don't have these gradients we're computing, so 11 00:00:37,190 --> 00:00:39,410 we have to do something else. 12 00:00:39,410 --> 00:00:43,560 One thing we know, though, is that for convex objectives, 13 00:00:44,890 --> 00:00:51,100 the steps that we're taking as we're going through this algorithm are gonna 14 00:00:51,100 --> 00:00:56,700 become smaller and smaller and smaller as we're moving towards our optimum. 15 00:00:56,700 --> 00:00:58,500 Well at least in strongly convex functions, 16 00:00:58,500 --> 00:01:01,950 we know that we're converging to our optimal solution. 17 00:01:01,950 --> 00:01:06,650 And so one thing that we can do is we can measure the size of these steps that we're 18 00:01:06,650 --> 00:01:12,400 taking through a full cycle of our different coordinates. 19 00:01:12,400 --> 00:01:13,970 Because I wanna emphasize, 20 00:01:13,970 --> 00:01:18,410 we have to cycle through all of our coordinates, zero to d. 21 00:01:18,410 --> 00:01:23,810 Before judging whether to stop, because it's possible that one coordinate or 22 00:01:23,810 --> 00:01:26,740 a few coordinates might have small steps, but then you get to another coordinate, 23 00:01:26,740 --> 00:01:28,380 and you still take a large step. 24 00:01:28,380 --> 00:01:33,830 But if, over an entire sweep of all coordinates, if the maximum step 25 00:01:33,830 --> 00:01:39,360 that you take in that entire cycle is less than your 26 00:01:39,360 --> 00:01:44,090 tolerance epsilon, then that's one way you can assess that your algorithms converged. 27 00:01:44,090 --> 00:01:49,029 I also wanna mention that this Coordinate descent algorithm is 28 00:01:49,029 --> 00:01:54,069 just one of many possible ways of solving this lasso objective. 29 00:01:54,069 --> 00:01:55,136 So classically, 30 00:01:55,136 --> 00:02:00,730 lasso was solved using what's called lars least angle regression and shrinkage. 31 00:02:00,730 --> 00:02:04,910 And that was popular up until roughly 2008 when 32 00:02:04,910 --> 00:02:09,400 an older algorithm was kinda rediscovered and popularized. 33 00:02:09,400 --> 00:02:14,620 Which is doing this coordinate descent approach for lasso. 34 00:02:14,620 --> 00:02:18,970 But more recently there's been a lot a lot of activity in the area of coming up with 35 00:02:18,970 --> 00:02:24,870 efficient parallel lines and distributed implementations of lasso solvers. 36 00:02:24,870 --> 00:02:27,670 These include a parallel version of coordinate descent. 37 00:02:27,670 --> 00:02:32,740 And other parallel learning approaches like parallel stochastic gradient descent 38 00:02:32,740 --> 00:02:35,430 or thinking about this kind of distribute and 39 00:02:35,430 --> 00:02:39,530 average approach that's fairly popular as well. 40 00:02:39,530 --> 00:02:42,756 And one of the most popular approaches specifically for 41 00:02:42,756 --> 00:02:47,765 lasso is something called, Alternating direction method of multipliers, or ADMM, 42 00:02:47,765 --> 00:02:52,242 and that's been really popular within the community of people using lasso. 43 00:02:52,242 --> 00:02:56,869 [MUSIC]