1
00:00:00,406 --> 00:00:04,573
[MUSIC]

2
00:00:04,573 --> 00:00:07,200
Well in our coordinate
descent algorithm for lasso.

3
00:00:08,360 --> 00:00:12,767
And actually all of our coordinate descent
algorithms that we've presented we have

4
00:00:12,767 --> 00:00:14,948
this line that says while not converged.

5
00:00:14,948 --> 00:00:18,071
And the question is how
are we assessing Convergence?

6
00:00:19,871 --> 00:00:22,840
Well, when should I stop
in coordinate descent?

7
00:00:22,840 --> 00:00:23,600
In gradient descent,

8
00:00:23,600 --> 00:00:28,350
remember we're looking at
the magnitude of that gradient vector.

9
00:00:28,350 --> 00:00:34,030
And stopping when the magnitude of the
vector was below some tolerance epsilon.

10
00:00:34,030 --> 00:00:37,190
Well here, we don't have these
gradients we're computing, so

11
00:00:37,190 --> 00:00:39,410
we have to do something else.

12
00:00:39,410 --> 00:00:43,560
One thing we know, though,
is that for convex objectives,

13
00:00:44,890 --> 00:00:51,100
the steps that we're taking as we're
going through this algorithm are gonna

14
00:00:51,100 --> 00:00:56,700
become smaller and smaller and smaller
as we're moving towards our optimum.

15
00:00:56,700 --> 00:00:58,500
Well at least in strongly
convex functions,

16
00:00:58,500 --> 00:01:01,950
we know that we're converging
to our optimal solution.

17
00:01:01,950 --> 00:01:06,650
And so one thing that we can do is we can
measure the size of these steps that we're

18
00:01:06,650 --> 00:01:12,400
taking through a full cycle
of our different coordinates.

19
00:01:12,400 --> 00:01:13,970
Because I wanna emphasize,

20
00:01:13,970 --> 00:01:18,410
we have to cycle through all
of our coordinates, zero to d.

21
00:01:18,410 --> 00:01:23,810
Before judging whether to stop, because
it's possible that one coordinate or

22
00:01:23,810 --> 00:01:26,740
a few coordinates might have small steps,
but then you get to another coordinate,

23
00:01:26,740 --> 00:01:28,380
and you still take a large step.

24
00:01:28,380 --> 00:01:33,830
But if, over an entire sweep of all
coordinates, if the maximum step

25
00:01:33,830 --> 00:01:39,360
that you take in that entire
cycle is less than your

26
00:01:39,360 --> 00:01:44,090
tolerance epsilon, then that's one way you
can assess that your algorithms converged.

27
00:01:44,090 --> 00:01:49,029
I also wanna mention that this
Coordinate descent algorithm is

28
00:01:49,029 --> 00:01:54,069
just one of many possible ways
of solving this lasso objective.

29
00:01:54,069 --> 00:01:55,136
So classically,

30
00:01:55,136 --> 00:02:00,730
lasso was solved using what's called lars
least angle regression and shrinkage.

31
00:02:00,730 --> 00:02:04,910
And that was popular up
until roughly 2008 when

32
00:02:04,910 --> 00:02:09,400
an older algorithm was kinda
rediscovered and popularized.

33
00:02:09,400 --> 00:02:14,620
Which is doing this coordinate
descent approach for lasso.

34
00:02:14,620 --> 00:02:18,970
But more recently there's been a lot a lot
of activity in the area of coming up with

35
00:02:18,970 --> 00:02:24,870
efficient parallel lines and distributed
implementations of lasso solvers.

36
00:02:24,870 --> 00:02:27,670
These include a parallel
version of coordinate descent.

37
00:02:27,670 --> 00:02:32,740
And other parallel learning approaches
like parallel stochastic gradient descent

38
00:02:32,740 --> 00:02:35,430
or thinking about this
kind of distribute and

39
00:02:35,430 --> 00:02:39,530
average approach that's
fairly popular as well.

40
00:02:39,530 --> 00:02:42,756
And one of the most popular
approaches specifically for

41
00:02:42,756 --> 00:02:47,765
lasso is something called, Alternating
direction method of multipliers, or ADMM,

42
00:02:47,765 --> 00:02:52,242
and that's been really popular within
the community of people using lasso.

43
00:02:52,242 --> 00:02:56,869
[MUSIC]