1 00:00:00,012 --> 00:00:05,162 Let's move on to the second sense in which the generic local search algorithm 2 00:00:05,162 --> 00:00:09,312 is under-determined. The main y loop reads, if there is a 3 00:00:09,312 --> 00:00:14,312 superior neighboring solution Y, then reset the current solution to be y. 4 00:00:14,312 --> 00:00:18,137 But there may of course be many legitimate choices for y. 5 00:00:18,137 --> 00:00:23,537 For example, in the maximum cut problem, given a cut, there may be many vertices 6 00:00:23,537 --> 00:00:28,917 whose Switch to the opposite group yields a cut with strictley more crossing edges. 7 00:00:28,917 --> 00:00:33,505 So when you have multiple superior neighboring solutions, which one should 8 00:00:33,505 --> 00:00:36,527 you choose. Well this is a tricky question and the 9 00:00:36,527 --> 00:00:39,683 existing theory does not give us a clear cut answer. 10 00:00:39,683 --> 00:00:43,802 Indeed it's seems like the right answer is highly domain dependent. 11 00:00:43,802 --> 00:00:48,767 Answering the question is probably going to require some serious experimentation 12 00:00:48,767 --> 00:00:51,624 on your part. With the data that you're interested in. 13 00:00:51,624 --> 00:00:55,338 One general point I can make is that recall when you're using local search to 14 00:00:55,338 --> 00:00:59,164 generate from scratch an approximate solution to an approximation problem you 15 00:00:59,164 --> 00:01:02,563 want to inject randomness into the algorithm to coax it to explore the 16 00:01:02,563 --> 00:01:06,124 solution space and return as many different locally optimal solutions as 17 00:01:06,124 --> 00:01:08,232 possible over a bunch of independent runs. 18 00:01:08,232 --> 00:01:11,882 You're going to remember the best of those locally optimal solutions. 19 00:01:11,882 --> 00:01:15,082 Is going to return, remember, the best one at the end of the day. 20 00:01:15,082 --> 00:01:19,066 So this is a second opportunity, in addition to the starting point, where you 21 00:01:19,066 --> 00:01:23,031 can inject randomness into local search. If you have many possible improving 22 00:01:23,031 --> 00:01:27,016 values of y, choose 1 At random. Alternatively you could try to be clever 23 00:01:27,016 --> 00:01:31,078 about which y they should choose. For example, if there's multiple superior 24 00:01:31,078 --> 00:01:33,557 neighboring y's, you could go to the best one. 25 00:01:33,557 --> 00:01:37,843 For example, in the maximum cut problem, if there's lots of different vertices, 26 00:01:37,843 --> 00:01:41,955 who've switched to either group, will yield a superior cut Pick the vertex 27 00:01:41,955 --> 00:01:46,017 which increases the number of crossing edges by the largest amount. 28 00:01:46,017 --> 00:01:50,252 In the traveling salesman problem, amongst all neighboring tours with 29 00:01:50,252 --> 00:01:53,747 smaller costs, pick the one with the Minimum over all costs. 30 00:01:53,747 --> 00:01:56,760 So this is a perfectly sensible rule about how to choose y. 31 00:01:56,760 --> 00:02:01,055 It is myopic, and you could imagine doing much more sophisticated things to decided 32 00:02:01,055 --> 00:02:04,031 which y to go to next. And indeed, if this is a problem you 33 00:02:04,031 --> 00:02:08,040 really care about, you might want to work hard to figure out what are the right 34 00:02:08,040 --> 00:02:11,312 heuristics for choosing y, that work well in your application. 35 00:02:11,312 --> 00:02:15,360 A third question that you have to answer, in order to have a precisely defined 36 00:02:15,360 --> 00:02:18,289 local search algorithm, is what are the neighborhoods. 37 00:02:18,289 --> 00:02:22,091 For many problems, there is significant flexibility in how to define the 38 00:02:22,091 --> 00:02:25,052 neighborhoods. And theory again does not give clear-cut 39 00:02:25,052 --> 00:02:27,304 answers about how they should be designed. 40 00:02:27,304 --> 00:02:30,429 Once again this seems like a highly domain dependent issue. 41 00:02:30,429 --> 00:02:34,710 If you want to use local search on a problem of interest It's probably going 42 00:02:34,710 --> 00:02:38,895 to be up to you to empirically explore which neighborhood choices seem to lead 43 00:02:38,895 --> 00:02:41,716 the best performance of the local search algorithm. 44 00:02:41,716 --> 00:02:45,892 One issue is likely that you'll have to grapple with is figuring out how big to 45 00:02:45,892 --> 00:02:49,555 make your neighborhoods. To illustrate this point, let's return to 46 00:02:49,555 --> 00:02:52,948 the maximum cut problem. So there we defined the neighbors of a 47 00:02:52,948 --> 00:02:57,492 cut to be the other cuts you can reach by taking a single vertex and Moving into 48 00:02:57,492 --> 00:03:00,919 the other group. This means each cut has a linear number, 49 00:03:00,919 --> 00:03:05,502 o of n, neighboring cuts, but it's easy to make the neighborhood bigger if we 50 00:03:05,502 --> 00:03:07,839 want. For eaxmple, we could define the 51 00:03:07,839 --> 00:03:12,317 neighbors of a cut to be those cuts you can reach by taking you, 2 vertices or 52 00:03:12,317 --> 00:03:15,297 fewer, and switching them to the opposite groups. 53 00:03:15,297 --> 00:03:18,937 Now each cut is going to have a quadratic number of neighbors. 54 00:03:18,937 --> 00:03:23,392 More generally of course, we could allow what single local move to do k swaps of 55 00:03:23,392 --> 00:03:27,832 vertices between the 2 sides, and then the neighbor size would be o(n)^k. 56 00:03:27,832 --> 00:03:32,012 So, what are the pros and cons of enlarging your neighborhood size. 57 00:03:32,012 --> 00:03:36,067 We generally speaking, the bigger you make your neighborhoods, the more time 58 00:03:36,067 --> 00:03:40,261 you're going to have to invest searching for in improving solution in each step. 59 00:03:40,261 --> 00:03:44,227 For example, in the maximum cut problem if we implement things in a straight 60 00:03:44,227 --> 00:03:48,122 forward way If we only allow 1 vertex to be switched in an iteration then we only 61 00:03:48,122 --> 00:03:51,710 have to search through a linear number of options to figure out if there's an 62 00:03:51,710 --> 00:03:54,941 improving solution or not. On the other hand, if we allow 2 vertices 63 00:03:54,941 --> 00:03:58,682 to be switched in a given iteration we have to search through a quadratic number 64 00:03:58,682 --> 00:04:01,788 of possibilities whether or not we're currently locally optimal. 65 00:04:01,788 --> 00:04:05,301 So bigger the neighborhood generally speaking the longer its going to take to 66 00:04:05,301 --> 00:04:08,967 check whether or not you're currently locally optimal or whether there's some 67 00:04:08,967 --> 00:04:11,592 better solution that you're supposed to move on to. 68 00:04:11,592 --> 00:04:15,821 The good news about enlarging your neighborhood size is that you're going to 69 00:04:15,821 --> 00:04:19,465 have only fewer local optima. And in general, some of these local 70 00:04:19,465 --> 00:04:23,866 optima that you're pruning are going to be bad solutions that you're happy to be 71 00:04:23,866 --> 00:04:26,362 rid of. If you look back at the example I gave 72 00:04:26,362 --> 00:04:30,774 you in the previous video that demonstrated that the simple local search 73 00:04:30,774 --> 00:04:34,505 for maximum cut can be off by 50%. You'll see that if we just enlarge the 74 00:04:34,505 --> 00:04:38,642 neighborhoods to allow 2 vertices to be swapped in the same iteration, then all 75 00:04:38,642 --> 00:04:42,437 of a sudden on that 4 vertex example, local search would be guaranteed to 76 00:04:42,437 --> 00:04:47,227 produced the globally maximum cut. The bad locally optimal cuts have been 77 00:04:47,227 --> 00:04:49,862 pruned away. Now, even if you allow 2 vertices to be 78 00:04:49,862 --> 00:04:53,462 swapped in a given iteration, there's going to be more elaborate examples 79 00:04:53,462 --> 00:04:57,547 showing the local search might be off by 50%, but on many instances allowing this 80 00:04:57,547 --> 00:05:01,372 larger neighborhood will give you better performance from local search. 81 00:05:01,372 --> 00:05:05,418 Summarizing one high level design decision you should be clear on in your 82 00:05:05,418 --> 00:05:09,256 head before you apply the local search heuristic design paradigm to a 83 00:05:09,256 --> 00:05:13,668 computational problem is how much do you care about solution quality versus how 84 00:05:13,668 --> 00:05:16,634 much do you care About the computational resources required. 85 00:05:16,634 --> 00:05:20,033 If you care about solution quality a lot and you're willing to either wait or 86 00:05:20,033 --> 00:05:23,480 you're willing to throw a lot of hardware at the problem, that suggests using 87 00:05:23,480 --> 00:05:26,105 bigger neighborhoods. They're slower to search but you're 88 00:05:26,105 --> 00:05:28,494 likely to get a better solution at the end of the day. 89 00:05:28,494 --> 00:05:32,016 If you really want something quick and dirty, you want it to be fast, you don't 90 00:05:32,016 --> 00:05:35,269 care that much about the solution quality, that suggests using simpler, 91 00:05:35,269 --> 00:05:38,484 smaller neighborhoods. Neighborhoods that are fast to search, 92 00:05:38,484 --> 00:05:42,483 knowing that some of the local optima you might get might not be very good. 93 00:05:42,483 --> 00:05:46,137 Let me reiterate, these are just guidelines, these are not gospel. 94 00:05:46,137 --> 00:05:50,322 In all computational problems, but especially with local search, the way you 95 00:05:50,322 --> 00:05:53,808 proceed has to be guided by the particulars of your application. 96 00:05:53,808 --> 00:05:57,862 So make sure you code up a bunch of different approaches, see what works. 97 00:05:57,862 --> 00:06:00,491 Go with that. For our final two questions let's 98 00:06:00,491 --> 00:06:04,573 supposed you've resolved the initial three and you have a fully specified 99 00:06:04,573 --> 00:06:08,142 local search algorithm. So you've made a decision about exactly 100 00:06:08,142 --> 00:06:12,335 what your neighborhoods are, you figured out the sweet spot for you between 101 00:06:12,335 --> 00:06:16,808 efficient searchability and the solution quality you're likely to get at the end 102 00:06:16,808 --> 00:06:19,547 of the day. You've made a decision about exactly how 103 00:06:19,547 --> 00:06:21,747 you're going to generate the initial solution. 104 00:06:21,747 --> 00:06:25,122 And you've made a decision about how, when you have multiple neighboring 105 00:06:25,122 --> 00:06:27,822 superior solutions, which one you're going to go to next. 106 00:06:27,822 --> 00:06:31,222 Now, lets talk about what kind of performance guarantees you can expect, 107 00:06:31,222 --> 00:06:34,397 from a local search algorithm. So lets first just talk about running 108 00:06:34,397 --> 00:06:37,947 time, and lets begin with the modest question, is it at least the case, that 109 00:06:37,947 --> 00:06:41,275 this local search algorithm is guaranteed to converge Eventually. 110 00:06:41,275 --> 00:06:44,880 In many of the scenarios you are likely to come across, the answer is yes. 111 00:06:44,880 --> 00:06:47,303 Here's what I mean. Suppose you're dealing with a 112 00:06:47,303 --> 00:06:50,705 computational problem where the set of possible solutions is finite. 113 00:06:50,705 --> 00:06:54,157 And moreover, your local search is governed by an objective function. 114 00:06:54,157 --> 00:06:57,687 And the way you define a superior neighboring solution, is that it's one 115 00:06:57,687 --> 00:07:01,560 with better objective function value. This is exactly what we were doing in the 116 00:07:01,560 --> 00:07:04,535 maximum cut problem. There the space was finite, it was just 117 00:07:04,535 --> 00:07:08,660 the set of exponentially many graph Cuts and our objective function which is the 118 00:07:08,660 --> 00:07:11,772 number of crossing cuts. Similarly for the traveling salesman 119 00:07:11,772 --> 00:07:14,988 problem, the space is finite. It's just the roughly infactorial 120 00:07:14,988 --> 00:07:18,919 possible tours, and again, how do you decide which tour to go to next? You look 121 00:07:18,919 --> 00:07:22,102 one that decreases the objective function value of the tour. 122 00:07:22,102 --> 00:07:24,824 Total cost of the tour. Whenever you have those 2 properties, 123 00:07:24,824 --> 00:07:28,226 finiteness and strict improvement in the objective function, local search is 124 00:07:28,226 --> 00:07:31,111 guaranteed to terminate. You can't cycle, because every time you 125 00:07:31,111 --> 00:07:34,751 iterate you get something with a strictly better objective function value, and you 126 00:07:34,751 --> 00:07:38,109 can't go on forever, eventually you'll run out of the finitely many possible 127 00:07:38,109 --> 00:07:40,906 things to try. There is, of course, no reason to be 128 00:07:40,906 --> 00:07:43,651 impressed by the finite conversions of local search. 129 00:07:43,651 --> 00:07:47,361 After all, brute force search, equally well terminates in finite time. 130 00:07:47,361 --> 00:07:51,102 So this class is all about having efficient algorithms that run quickly. 131 00:07:51,102 --> 00:07:55,223 So the real question you want to ask is, local search guaranteed to converge, in 132 00:07:55,223 --> 00:07:58,322 say, polynomial Here, the answer is generally negative. 133 00:07:58,322 --> 00:08:02,357 When we studied the unweighted maximum cut problem, that was the exception that 134 00:08:02,357 --> 00:08:05,267 proves the rule. That, there, we only needed a quadratic 135 00:08:05,267 --> 00:08:09,082 number of iterations before finding a locally optimal solution, but, as we 136 00:08:09,082 --> 00:08:12,892 mentioned in passing, even if you just pass to the weighted version of. 137 00:08:12,892 --> 00:08:16,631 Of the maximum cut problem. There already, a local search might need, 138 00:08:16,631 --> 00:08:20,834 in the worst case, an exponential of iterations, before halting with a locally 139 00:08:20,834 --> 00:08:23,951 optimal solution. In practice, however, the situation is 140 00:08:23,951 --> 00:08:27,847 rather different, with local search heuristics often, finding, locally 141 00:08:27,847 --> 00:08:31,347 optimal solutions, quite quickly. We do not at present have a very 142 00:08:31,347 --> 00:08:35,137 satisfactory theory that explains, or predicts, the running time of local 143 00:08:35,137 --> 00:08:37,967 search algorithms. If you want to read more about it, you 144 00:08:37,967 --> 00:08:40,312 might search for the keyword smooth analysis. 145 00:08:40,312 --> 00:08:44,262 Another nice feature of local search algorithms, is that even if you're in the 146 00:08:44,262 --> 00:08:47,412 unlucky case, where your algorithms are going to take a long. 147 00:08:47,412 --> 00:08:51,147 In time, before it finds a locally optimal solution, you can always just 148 00:08:51,147 --> 00:08:53,824 stop it early. So when you start the search, you can 149 00:08:53,824 --> 00:08:57,640 just stay look, if after 24 hours you haven't found for me an local optimal 150 00:08:57,640 --> 00:09:00,933 solution, just tell me the best solution you've found thus far. 151 00:09:00,933 --> 00:09:04,861 So in addition to running time, we want to measure the performance of a local 152 00:09:04,861 --> 00:09:07,494 search algorithm in terms of its solution quality. 153 00:09:07,494 --> 00:09:11,092 So, is that going to be any good? So here the answer is definitely no. 154 00:09:11,092 --> 00:09:14,691 And again, the maximum cut problem was the exception that proves the rule. 155 00:09:14,691 --> 00:09:18,005 That's a very unusual problem that you can prove at least some kind of 156 00:09:18,005 --> 00:09:21,472 performance guarantee about the local, locally optimal solutions. 157 00:09:21,472 --> 00:09:25,807 For most of the optimization problems in which you might be tempted to apply the 158 00:09:25,807 --> 00:09:30,212 local search design paradigm, there will be locally optimal solutions quite far 159 00:09:30,212 --> 00:09:33,932 from globally optimal ones. Moreover this is not just a theoretical 160 00:09:33,932 --> 00:09:36,832 pathology. Local search algorithms in practice will 161 00:09:36,832 --> 00:09:40,652 sometimes generate extrememly lousy locally optimal solutions. 162 00:09:40,652 --> 00:09:44,416 So this ties back into a point we made earlier, which is if you're using local 163 00:09:44,416 --> 00:09:48,211 search not as a just post processing improving step, but actually to generate 164 00:09:48,211 --> 00:09:52,087 from scratch a hopefully near optimal solution to an optimization problem, you 165 00:09:52,087 --> 00:09:55,842 don't just want to run it once, because you don't know what you're going to get. 166 00:09:55,842 --> 00:09:59,813 Rather you want to run it many times, making random decisions along the way, 167 00:09:59,813 --> 00:10:04,024 either from a random starting point, or choosing random improving solutions to 168 00:10:04,024 --> 00:10:07,813 move to, so that you explore as best you can the space of all solutions. 169 00:10:07,813 --> 00:10:11,668 You want over many executions of local search to get your hands on as many 170 00:10:11,668 --> 00:10:15,846 different locally optimal solutions as you can, they you can remember the best 171 00:10:15,846 --> 00:10:19,028 one. Hopefully the best one at least will be 172 00:10:19,028 --> 00:10:22,077 pretty close to a globally optimal solution.