1
00:00:00,012 --> 00:00:05,162
Let's move on to the second sense in 
which the generic local search algorithm 

2
00:00:05,162 --> 00:00:09,312
is under-determined. 
The main y loop reads, if there is a 

3
00:00:09,312 --> 00:00:14,312
superior neighboring solution Y, then 
reset the current solution to be y. 

4
00:00:14,312 --> 00:00:18,137
But there may of course be many 
legitimate choices for y. 

5
00:00:18,137 --> 00:00:23,537
For example, in the maximum cut problem, 
given a cut, there may be many vertices 

6
00:00:23,537 --> 00:00:28,917
whose Switch to the opposite group yields 
a cut with strictley more crossing edges. 

7
00:00:28,917 --> 00:00:33,505
So when you have multiple superior 
neighboring solutions, which one should 

8
00:00:33,505 --> 00:00:36,527
you choose. 
Well this is a tricky question and the 

9
00:00:36,527 --> 00:00:39,683
existing theory does not give us a clear 
cut answer. 

10
00:00:39,683 --> 00:00:43,802
Indeed it's seems like the right answer 
is highly domain dependent. 

11
00:00:43,802 --> 00:00:48,767
Answering the question is probably going 
to require some serious experimentation 

12
00:00:48,767 --> 00:00:51,624
on your part. 
With the data that you're interested in. 

13
00:00:51,624 --> 00:00:55,338
One general point I can make is that 
recall when you're using local search to 

14
00:00:55,338 --> 00:00:59,164
generate from scratch an approximate 
solution to an approximation problem you 

15
00:00:59,164 --> 00:01:02,563
want to inject randomness into the 
algorithm to coax it to explore the 

16
00:01:02,563 --> 00:01:06,124
solution space and return as many 
different locally optimal solutions as 

17
00:01:06,124 --> 00:01:08,232
possible over a bunch of independent 
runs. 

18
00:01:08,232 --> 00:01:11,882
You're going to remember the best of 
those locally optimal solutions. 

19
00:01:11,882 --> 00:01:15,082
Is going to return, remember, the best 
one at the end of the day. 

20
00:01:15,082 --> 00:01:19,066
So this is a second opportunity, in 
addition to the starting point, where you 

21
00:01:19,066 --> 00:01:23,031
can inject randomness into local search. 
If you have many possible improving 

22
00:01:23,031 --> 00:01:27,016
values of y, choose 1 At random. 
Alternatively you could try to be clever 

23
00:01:27,016 --> 00:01:31,078
about which y they should choose. 
For example, if there's multiple superior 

24
00:01:31,078 --> 00:01:33,557
neighboring y's, you could go to the best 
one. 

25
00:01:33,557 --> 00:01:37,843
For example, in the maximum cut problem, 
if there's lots of different vertices, 

26
00:01:37,843 --> 00:01:41,955
who've switched to either group, will 
yield a superior cut Pick the vertex 

27
00:01:41,955 --> 00:01:46,017
which increases the number of crossing 
edges by the largest amount. 

28
00:01:46,017 --> 00:01:50,252
In the traveling salesman problem, 
amongst all neighboring tours with 

29
00:01:50,252 --> 00:01:53,747
smaller costs, pick the one with the 
Minimum over all costs. 

30
00:01:53,747 --> 00:01:56,760
So this is a perfectly sensible rule 
about how to choose y. 

31
00:01:56,760 --> 00:02:01,055
It is myopic, and you could imagine doing 
much more sophisticated things to decided 

32
00:02:01,055 --> 00:02:04,031
which y to go to next. 
And indeed, if this is a problem you 

33
00:02:04,031 --> 00:02:08,040
really care about, you might want to work 
hard to figure out what are the right 

34
00:02:08,040 --> 00:02:11,312
heuristics for choosing y, that work well 
in your application. 

35
00:02:11,312 --> 00:02:15,360
A third question that you have to answer, 
in order to have a precisely defined 

36
00:02:15,360 --> 00:02:18,289
local search algorithm, is what are the 
neighborhoods. 

37
00:02:18,289 --> 00:02:22,091
For many problems, there is significant 
flexibility in how to define the 

38
00:02:22,091 --> 00:02:25,052
neighborhoods. 
And theory again does not give clear-cut 

39
00:02:25,052 --> 00:02:27,304
answers about how they should be 
designed. 

40
00:02:27,304 --> 00:02:30,429
Once again this seems like a highly 
domain dependent issue. 

41
00:02:30,429 --> 00:02:34,710
If you want to use local search on a 
problem of interest It's probably going 

42
00:02:34,710 --> 00:02:38,895
to be up to you to empirically explore 
which neighborhood choices seem to lead 

43
00:02:38,895 --> 00:02:41,716
the best performance of the local search 
algorithm. 

44
00:02:41,716 --> 00:02:45,892
One issue is likely that you'll have to 
grapple with is figuring out how big to 

45
00:02:45,892 --> 00:02:49,555
make your neighborhoods. 
To illustrate this point, let's return to 

46
00:02:49,555 --> 00:02:52,948
the maximum cut problem. 
So there we defined the neighbors of a 

47
00:02:52,948 --> 00:02:57,492
cut to be the other cuts you can reach by 
taking a single vertex and Moving into 

48
00:02:57,492 --> 00:03:00,919
the other group. 
This means each cut has a linear number, 

49
00:03:00,919 --> 00:03:05,502
o of n, neighboring cuts, but it's easy 
to make the neighborhood bigger if we 

50
00:03:05,502 --> 00:03:07,839
want. 
For eaxmple, we could define the 

51
00:03:07,839 --> 00:03:12,317
neighbors of a cut to be those cuts you 
can reach by taking you, 2 vertices or 

52
00:03:12,317 --> 00:03:15,297
fewer, and switching them to the opposite 
groups. 

53
00:03:15,297 --> 00:03:18,937
Now each cut is going to have a quadratic 
number of neighbors. 

54
00:03:18,937 --> 00:03:23,392
More generally of course, we could allow 
what single local move to do k swaps of 

55
00:03:23,392 --> 00:03:27,832
vertices between the 2 sides, and then 
the neighbor size would be o(n)^k. 

56
00:03:27,832 --> 00:03:32,012
So, what are the pros and cons of 
enlarging your neighborhood size. 

57
00:03:32,012 --> 00:03:36,067
We generally speaking, the bigger you 
make your neighborhoods, the more time 

58
00:03:36,067 --> 00:03:40,261
you're going to have to invest searching 
for in improving solution in each step. 

59
00:03:40,261 --> 00:03:44,227
For example, in the maximum cut problem 
if we implement things in a straight 

60
00:03:44,227 --> 00:03:48,122
forward way If we only allow 1 vertex to 
be switched in an iteration then we only 

61
00:03:48,122 --> 00:03:51,710
have to search through a linear number of 
options to figure out if there's an 

62
00:03:51,710 --> 00:03:54,941
improving solution or not. 
On the other hand, if we allow 2 vertices 

63
00:03:54,941 --> 00:03:58,682
to be switched in a given iteration we 
have to search through a quadratic number 

64
00:03:58,682 --> 00:04:01,788
of possibilities whether or not we're 
currently locally optimal. 

65
00:04:01,788 --> 00:04:05,301
So bigger the neighborhood generally 
speaking the longer its going to take to 

66
00:04:05,301 --> 00:04:08,967
check whether or not you're currently 
locally optimal or whether there's some 

67
00:04:08,967 --> 00:04:11,592
better solution that you're supposed to 
move on to. 

68
00:04:11,592 --> 00:04:15,821
The good news about enlarging your 
neighborhood size is that you're going to 

69
00:04:15,821 --> 00:04:19,465
have only fewer local optima. 
And in general, some of these local 

70
00:04:19,465 --> 00:04:23,866
optima that you're pruning are going to 
be bad solutions that you're happy to be 

71
00:04:23,866 --> 00:04:26,362
rid of. 
If you look back at the example I gave 

72
00:04:26,362 --> 00:04:30,774
you in the previous video that 
demonstrated that the simple local search 

73
00:04:30,774 --> 00:04:34,505
for maximum cut can be off by 50%. 
You'll see that if we just enlarge the 

74
00:04:34,505 --> 00:04:38,642
neighborhoods to allow 2 vertices to be 
swapped in the same iteration, then all 

75
00:04:38,642 --> 00:04:42,437
of a sudden on that 4 vertex example, 
local search would be guaranteed to 

76
00:04:42,437 --> 00:04:47,227
produced the globally maximum cut. 
The bad locally optimal cuts have been 

77
00:04:47,227 --> 00:04:49,862
pruned away. 
Now, even if you allow 2 vertices to be 

78
00:04:49,862 --> 00:04:53,462
swapped in a given iteration, there's 
going to be more elaborate examples 

79
00:04:53,462 --> 00:04:57,547
showing the local search might be off by 
50%, but on many instances allowing this 

80
00:04:57,547 --> 00:05:01,372
larger neighborhood will give you better 
performance from local search. 

81
00:05:01,372 --> 00:05:05,418
Summarizing one high level design 
decision you should be clear on in your 

82
00:05:05,418 --> 00:05:09,256
head before you apply the local search 
heuristic design paradigm to a 

83
00:05:09,256 --> 00:05:13,668
computational problem is how much do you 
care about solution quality versus how 

84
00:05:13,668 --> 00:05:16,634
much do you care About the computational 
resources required. 

85
00:05:16,634 --> 00:05:20,033
If you care about solution quality a lot 
and you're willing to either wait or 

86
00:05:20,033 --> 00:05:23,480
you're willing to throw a lot of hardware 
at the problem, that suggests using 

87
00:05:23,480 --> 00:05:26,105
bigger neighborhoods. 
They're slower to search but you're 

88
00:05:26,105 --> 00:05:28,494
likely to get a better solution at the 
end of the day. 

89
00:05:28,494 --> 00:05:32,016
If you really want something quick and 
dirty, you want it to be fast, you don't 

90
00:05:32,016 --> 00:05:35,269
care that much about the solution 
quality, that suggests using simpler, 

91
00:05:35,269 --> 00:05:38,484
smaller neighborhoods. 
Neighborhoods that are fast to search, 

92
00:05:38,484 --> 00:05:42,483
knowing that some of the local optima you 
might get might not be very good. 

93
00:05:42,483 --> 00:05:46,137
Let me reiterate, these are just 
guidelines, these are not gospel. 

94
00:05:46,137 --> 00:05:50,322
In all computational problems, but 
especially with local search, the way you 

95
00:05:50,322 --> 00:05:53,808
proceed has to be guided by the 
particulars of your application. 

96
00:05:53,808 --> 00:05:57,862
So make sure you code up a bunch of 
different approaches, see what works. 

97
00:05:57,862 --> 00:06:00,491
Go with that. 
For our final two questions let's 

98
00:06:00,491 --> 00:06:04,573
supposed you've resolved the initial 
three and you have a fully specified 

99
00:06:04,573 --> 00:06:08,142
local search algorithm. 
So you've made a decision about exactly 

100
00:06:08,142 --> 00:06:12,335
what your neighborhoods are, you figured 
out the sweet spot for you between 

101
00:06:12,335 --> 00:06:16,808
efficient searchability and the solution 
quality you're likely to get at the end 

102
00:06:16,808 --> 00:06:19,547
of the day. 
You've made a decision about exactly how 

103
00:06:19,547 --> 00:06:21,747
you're going to generate the initial 
solution. 

104
00:06:21,747 --> 00:06:25,122
And you've made a decision about how, 
when you have multiple neighboring 

105
00:06:25,122 --> 00:06:27,822
superior solutions, which one you're 
going to go to next. 

106
00:06:27,822 --> 00:06:31,222
Now, lets talk about what kind of 
performance guarantees you can expect, 

107
00:06:31,222 --> 00:06:34,397
from a local search algorithm. 
So lets first just talk about running 

108
00:06:34,397 --> 00:06:37,947
time, and lets begin with the modest 
question, is it at least the case, that 

109
00:06:37,947 --> 00:06:41,275
this local search algorithm is guaranteed 
to converge Eventually. 

110
00:06:41,275 --> 00:06:44,880
In many of the scenarios you are likely 
to come across, the answer is yes. 

111
00:06:44,880 --> 00:06:47,303
Here's what I mean. 
Suppose you're dealing with a 

112
00:06:47,303 --> 00:06:50,705
computational problem where the set of 
possible solutions is finite. 

113
00:06:50,705 --> 00:06:54,157
And moreover, your local search is 
governed by an objective function. 

114
00:06:54,157 --> 00:06:57,687
And the way you define a superior 
neighboring solution, is that it's one 

115
00:06:57,687 --> 00:07:01,560
with better objective function value. 
This is exactly what we were doing in the 

116
00:07:01,560 --> 00:07:04,535
maximum cut problem. 
There the space was finite, it was just 

117
00:07:04,535 --> 00:07:08,660
the set of exponentially many graph Cuts 
and our objective function which is the 

118
00:07:08,660 --> 00:07:11,772
number of crossing cuts. 
Similarly for the traveling salesman 

119
00:07:11,772 --> 00:07:14,988
problem, the space is finite. 
It's just the roughly infactorial 

120
00:07:14,988 --> 00:07:18,919
possible tours, and again, how do you 
decide which tour to go to next? You look 

121
00:07:18,919 --> 00:07:22,102
one that decreases the objective function 
value of the tour. 

122
00:07:22,102 --> 00:07:24,824
Total cost of the tour. 
Whenever you have those 2 properties, 

123
00:07:24,824 --> 00:07:28,226
finiteness and strict improvement in the 
objective function, local search is 

124
00:07:28,226 --> 00:07:31,111
guaranteed to terminate. 
You can't cycle, because every time you 

125
00:07:31,111 --> 00:07:34,751
iterate you get something with a strictly 
better objective function value, and you 

126
00:07:34,751 --> 00:07:38,109
can't go on forever, eventually you'll 
run out of the finitely many possible 

127
00:07:38,109 --> 00:07:40,906
things to try. 
There is, of course, no reason to be 

128
00:07:40,906 --> 00:07:43,651
impressed by the finite conversions of 
local search. 

129
00:07:43,651 --> 00:07:47,361
After all, brute force search, equally 
well terminates in finite time. 

130
00:07:47,361 --> 00:07:51,102
So this class is all about having 
efficient algorithms that run quickly. 

131
00:07:51,102 --> 00:07:55,223
So the real question you want to ask is, 
local search guaranteed to converge, in 

132
00:07:55,223 --> 00:07:58,322
say, polynomial Here, the answer is 
generally negative. 

133
00:07:58,322 --> 00:08:02,357
When we studied the unweighted maximum 
cut problem, that was the exception that 

134
00:08:02,357 --> 00:08:05,267
proves the rule. 
That, there, we only needed a quadratic 

135
00:08:05,267 --> 00:08:09,082
number of iterations before finding a 
locally optimal solution, but, as we 

136
00:08:09,082 --> 00:08:12,892
mentioned in passing, even if you just 
pass to the weighted version of. 

137
00:08:12,892 --> 00:08:16,631
Of the maximum cut problem. 
There already, a local search might need, 

138
00:08:16,631 --> 00:08:20,834
in the worst case, an exponential of 
iterations, before halting with a locally 

139
00:08:20,834 --> 00:08:23,951
optimal solution. 
In practice, however, the situation is 

140
00:08:23,951 --> 00:08:27,847
rather different, with local search 
heuristics often, finding, locally 

141
00:08:27,847 --> 00:08:31,347
optimal solutions, quite quickly. 
We do not at present have a very 

142
00:08:31,347 --> 00:08:35,137
satisfactory theory that explains, or 
predicts, the running time of local 

143
00:08:35,137 --> 00:08:37,967
search algorithms. 
If you want to read more about it, you 

144
00:08:37,967 --> 00:08:40,312
might search for the keyword smooth 
analysis. 

145
00:08:40,312 --> 00:08:44,262
Another nice feature of local search 
algorithms, is that even if you're in the 

146
00:08:44,262 --> 00:08:47,412
unlucky case, where your algorithms are 
going to take a long. 

147
00:08:47,412 --> 00:08:51,147
In time, before it finds a locally 
optimal solution, you can always just 

148
00:08:51,147 --> 00:08:53,824
stop it early. 
So when you start the search, you can 

149
00:08:53,824 --> 00:08:57,640
just stay look, if after 24 hours you 
haven't found for me an local optimal 

150
00:08:57,640 --> 00:09:00,933
solution, just tell me the best solution 
you've found thus far. 

151
00:09:00,933 --> 00:09:04,861
So in addition to running time, we want 
to measure the performance of a local 

152
00:09:04,861 --> 00:09:07,494
search algorithm in terms of its solution 
quality. 

153
00:09:07,494 --> 00:09:11,092
So, is that going to be any good? So here 
the answer is definitely no. 

154
00:09:11,092 --> 00:09:14,691
And again, the maximum cut problem was 
the exception that proves the rule. 

155
00:09:14,691 --> 00:09:18,005
That's a very unusual problem that you 
can prove at least some kind of 

156
00:09:18,005 --> 00:09:21,472
performance guarantee about the local, 
locally optimal solutions. 

157
00:09:21,472 --> 00:09:25,807
For most of the optimization problems in 
which you might be tempted to apply the 

158
00:09:25,807 --> 00:09:30,212
local search design paradigm, there will 
be locally optimal solutions quite far 

159
00:09:30,212 --> 00:09:33,932
from globally optimal ones. 
Moreover this is not just a theoretical 

160
00:09:33,932 --> 00:09:36,832
pathology. 
Local search algorithms in practice will 

161
00:09:36,832 --> 00:09:40,652
sometimes generate extrememly lousy 
locally optimal solutions. 

162
00:09:40,652 --> 00:09:44,416
So this ties back into a point we made 
earlier, which is if you're using local 

163
00:09:44,416 --> 00:09:48,211
search not as a just post processing 
improving step, but actually to generate 

164
00:09:48,211 --> 00:09:52,087
from scratch a hopefully near optimal 
solution to an optimization problem, you 

165
00:09:52,087 --> 00:09:55,842
don't just want to run it once, because 
you don't know what you're going to get. 

166
00:09:55,842 --> 00:09:59,813
Rather you want to run it many times, 
making random decisions along the way, 

167
00:09:59,813 --> 00:10:04,024
either from a random starting point, or 
choosing random improving solutions to 

168
00:10:04,024 --> 00:10:07,813
move to, so that you explore as best you 
can the space of all solutions. 

169
00:10:07,813 --> 00:10:11,668
You want over many executions of local 
search to get your hands on as many 

170
00:10:11,668 --> 00:10:15,846
different locally optimal solutions as 
you can, they you can remember the best 

171
00:10:15,846 --> 00:10:19,028
one. 
Hopefully the best one at least will be 

172
00:10:19,028 --> 00:10:22,077
pretty close to a globally optimal 
solution.