1 00:00:00,250 --> 00:00:02,840 Okay, welcome back. This is supplemental material for the 2 00:00:02,840 --> 00:00:05,050 assignments. I'm really happy to see you here. 3 00:00:05,050 --> 00:00:07,302 Okay? we are going to talk about the traveling 4 00:00:07,302 --> 00:00:10,182 salesman problem, which is a really challenging, you know optimization 5 00:00:10,182 --> 00:00:14,240 problem. So this is a challenging assignment. 6 00:00:14,240 --> 00:00:17,410 and we're going to review some of the concept here. 7 00:00:17,410 --> 00:00:20,010 And some of the things that we do in these assignments is actually simplifying 8 00:00:20,010 --> 00:00:22,570 your life and I want to go over some of them because you can actually use some of 9 00:00:22,570 --> 00:00:25,830 these things on other assignments as well. 10 00:00:25,830 --> 00:00:29,015 So remember, you know, what a traveling salesman problem is, is that you have a 11 00:00:29,015 --> 00:00:32,151 bunch of points and what you want to do is to find a tool which is visiting every 12 00:00:32,151 --> 00:00:37,384 one of these points exactly once. Okay, and minimize the overall travel 13 00:00:37,384 --> 00:00:40,070 distance. Or cost, whatever. 14 00:00:40,070 --> 00:00:42,962 Okay? So, this is a beautiful tool that you see 15 00:00:42,962 --> 00:00:48,378 on that side, you know? It's a tool doing 0, 4, 8, 9, 5 and so 16 00:00:48,378 --> 00:00:52,796 on, okay? So an interesting point here is that you 17 00:00:52,796 --> 00:00:58,470 can essentially see a tool as a permutation of all the vertices, okay? 18 00:00:58,470 --> 00:01:01,638 And this is nice because, you know, it basically tells you how you can actually, 19 00:01:01,638 --> 00:01:05,596 you know, output a solution. And you can also think that this is good 20 00:01:05,596 --> 00:01:07,680 representation, but it may not be. Right? 21 00:01:07,680 --> 00:01:10,380 So, this is one of the things we'll need to discuss at some point. 22 00:01:10,380 --> 00:01:12,515 Okay? So, so in this, in this particular 23 00:01:12,515 --> 00:01:16,595 assignment, we are going to focus on a very specific class of traveling salesman 24 00:01:16,595 --> 00:01:19,580 problem. They are going to be Euclidean, it's a 25 00:01:19,580 --> 00:01:22,650 metric space, so computationally these problems are easier. 26 00:01:22,650 --> 00:01:25,752 And also they have a big, you know, very nice features that, you know, I will, you 27 00:01:25,752 --> 00:01:29,090 know, I will tell you about in a in a moment. 28 00:01:29,090 --> 00:01:31,106 Okay? So, it will help you tremendously 29 00:01:31,106 --> 00:01:35,280 actually, the fact that we're looking only at Euclidean TSPs. 30 00:01:35,280 --> 00:01:37,442 Okay? So, in Euclidean TSPs, essentially, 31 00:01:37,442 --> 00:01:42,090 everyone of the points is the coordinates, you know, is a coordinate. 32 00:01:42,090 --> 00:01:46,120 Okay, so you see for instance, you know, that point 0, vertex 0, as you know, 33 00:01:46,120 --> 00:01:50,150 coordinates 17 and 27. And this is essentially going to be the 34 00:01:50,150 --> 00:01:51,870 input of your problem. Okay? 35 00:01:51,870 --> 00:01:54,665 And once you have that, we'll be able to compute a distance between any of these 36 00:01:54,665 --> 00:01:57,369 two points. And that's going to be the distance that 37 00:01:57,369 --> 00:02:00,407 we will be using when we are trying to evaluate the, the, the, the length of a 38 00:02:00,407 --> 00:02:03,870 particular tour that we are trying to compute. 39 00:02:03,870 --> 00:02:05,600 Okay. So in a sense, the only thing that the 40 00:02:05,600 --> 00:02:08,768 input will be concerned with is giving you these coordinates for the point, and 41 00:02:08,768 --> 00:02:11,744 from that, we, we don't have to compute this big, you know, give you this big 42 00:02:11,744 --> 00:02:16,128 distance matrix. You can actually compute it yourself. 43 00:02:16,128 --> 00:02:19,029 Okay? So, so essentially once you want to find 44 00:02:19,029 --> 00:02:23,431 out what is the distance between, you know, 0 and 4, you will use the formula 45 00:02:23,431 --> 00:02:28,822 that you see here, okay? Which is basically, actually you, you, 46 00:02:28,822 --> 00:02:32,350 and you see, you know, the actual values on the bottom. 47 00:02:32,350 --> 00:02:34,920 You'll take the x coordinates of the two points. 48 00:02:34,920 --> 00:02:36,950 Okay? Subtract, you know, one from the other. 49 00:02:36,950 --> 00:02:39,826 Square it. Do the same thing for the y coordinate, 50 00:02:39,826 --> 00:02:42,210 you know, square it. Sum them. 51 00:02:42,210 --> 00:02:44,400 and then take the square root of the whole thing. 52 00:02:44,400 --> 00:02:47,130 And that basically is giving you the distance. 53 00:02:47,130 --> 00:02:49,423 Between, let's say .0 and 4. Okay? 54 00:02:49,423 --> 00:02:53,111 So what you do is you take 17 subtract 42, square that. 55 00:02:53,111 --> 00:02:57,131 You take, you know, 27 and, and, and subtract 56, square that and take the 56 00:02:57,131 --> 00:03:02,285 overall square root. And that gives you the distance of 38.2 57 00:03:02,285 --> 00:03:06,680 288 which is the distance between these two points. 58 00:03:06,680 --> 00:03:08,950 the Euclidean distance between these two points. 59 00:03:08,950 --> 00:03:10,808 Okay? So, what is important here is that we 60 00:03:10,808 --> 00:03:14,419 will only give you the coordinate. you know, of the, of the various 61 00:03:14,419 --> 00:03:16,943 vertices. And you can compute easily using this 62 00:03:16,943 --> 00:03:20,070 formula, the distance between any pair of vertices. 63 00:03:20,070 --> 00:03:22,411 Okay? And so essentially, this is summarizing 64 00:03:22,411 --> 00:03:25,507 what I told you. The only thing that we really need to 65 00:03:25,507 --> 00:03:29,106 tell you is what is the,you know, what are the coordinates between all the 66 00:03:29,106 --> 00:03:33,920 vertices and that is going to be the input of your problem. 67 00:03:33,920 --> 00:03:35,635 From there you can deduce everything else. 68 00:03:35,635 --> 00:03:40,171 Okay, so this is a formulation of the travelling salesmen problem where you see 69 00:03:40,171 --> 00:03:44,518 the objective function computing all these distances between all the various 70 00:03:44,518 --> 00:03:50,330 cities in the tour, okay. So we are basically modelling here, okay, 71 00:03:50,330 --> 00:03:53,470 the tour as a permutation of the vertices. 72 00:03:53,470 --> 00:03:54,760 Okay. this doesn't mean that you have to model 73 00:03:54,760 --> 00:03:57,410 your problem like this, okay. So in these supplemental materials 74 00:03:57,410 --> 00:04:00,594 sometimes we use model. You know, we make no guarantee that these 75 00:04:00,594 --> 00:04:01,810 are good models. Okay? 76 00:04:01,810 --> 00:04:06,150 It's probably not a very good idea to recompute all these things all the time. 77 00:04:06,150 --> 00:04:08,650 Okay? So you may think of other representation. 78 00:04:08,650 --> 00:04:12,470 It may also not be a very good idea to model the problem as a permutation. 79 00:04:12,470 --> 00:04:15,480 Okay? So some technology may not like that. 80 00:04:15,480 --> 00:04:16,430 Okay? And some may. 81 00:04:16,430 --> 00:04:18,817 Okay? So, in a sense, you know, every time we 82 00:04:18,817 --> 00:04:22,636 put a description of the problem we, we typically make no guarantees whatsoever 83 00:04:22,636 --> 00:04:26,916 on whether it's good or not. Its just a particular formalization of 84 00:04:26,916 --> 00:04:29,473 the problem. Typically the Python script may use that 85 00:04:29,473 --> 00:04:32,645 formalization to give you a basic solution, but, you know, don't make any 86 00:04:32,645 --> 00:04:35,940 assumption that this is a good formulation. 87 00:04:35,940 --> 00:04:40,450 Don't be, you know, misled by the fact that we are putting that formulation. 88 00:04:40,450 --> 00:04:43,550 It may be completely crappy from a computational standpoint. 89 00:04:43,550 --> 00:04:45,795 It's just very convenient for stating the problem. 90 00:04:45,795 --> 00:04:48,132 Okay? So, in a sense what I need to tell you 91 00:04:48,132 --> 00:04:51,108 now is essentially what the input and output should be for this particular 92 00:04:51,108 --> 00:04:53,090 assignment. Okay? 93 00:04:53,090 --> 00:04:55,400 So the input is very simple. Okay? 94 00:04:55,400 --> 00:04:57,300 So I'm telling you how many points there are. 95 00:04:57,300 --> 00:04:58,942 Okay? And some of these problems, you know, may 96 00:04:58,942 --> 00:05:02,040 have, you know, 50 points, some other ones will have many more. 97 00:05:02,040 --> 00:05:04,225 Okay? And then for every one of these points, 98 00:05:04,225 --> 00:05:07,620 we'll basically give you the x and y coordinates. 99 00:05:07,620 --> 00:05:09,373 Okay? And as I told you, using the formula that 100 00:05:09,373 --> 00:05:12,225 you've seen before, we can compute a distance between any of these, any two 101 00:05:12,225 --> 00:05:14,000 points. Okay? 102 00:05:14,000 --> 00:05:17,685 The output, you know, is basically the format that we are using systematically, 103 00:05:17,685 --> 00:05:21,095 okay, so you have to specify the objective function, whether you have, you 104 00:05:21,095 --> 00:05:26,269 know, proved optimality or not. And then, in this particular case, we ask 105 00:05:26,269 --> 00:05:29,482 you to put, you know, a sequence, which is a permutation of all the vertices, 106 00:05:29,482 --> 00:05:32,504 okay? So this is basically describing the tour 107 00:05:32,504 --> 00:05:35,870 that you have if you connect the last one to the first one, okay? 108 00:05:35,870 --> 00:05:38,558 So in a sense here we ask you the permutation of the end, once again it 109 00:05:38,558 --> 00:05:41,960 doesn't mean that your modeling is to use that, okay? 110 00:05:41,960 --> 00:05:45,445 You can use a modeling where the decision variables are very different. 111 00:05:45,445 --> 00:05:48,965 Okay, but at the end you will have to, you will have to transform this decision 112 00:05:48,965 --> 00:05:52,320 variables into something which is a sequence. 113 00:05:52,320 --> 00:05:57,180 Okay. this is a particular example, okay. 114 00:05:57,180 --> 00:06:02,350 So you basically see here inside a input five vertices and the various coordinate. 115 00:06:02,350 --> 00:06:06,256 The first point is 0 0, the second one is 0 you know and 0.5 okay and so on and so 116 00:06:06,256 --> 00:06:09,698 forth. And then you see a solution here which 117 00:06:09,698 --> 00:06:12,722 has a length of 5.2 and the tour is, you know, 0 4 1 3 2, and so back to back to 0 118 00:06:12,722 --> 00:06:18,904 at the end, okay. And so essentially you see a permutation 119 00:06:18,904 --> 00:06:23,732 of the vertices here, okay. And you see the value of the objective 120 00:06:23,732 --> 00:06:25,440 function. Okay? 121 00:06:25,440 --> 00:06:27,710 So that's what you'll need to output. Okay? 122 00:06:27,710 --> 00:06:30,614 Obviously once again and I will repeat that ever and ever, you know, forever, is 123 00:06:30,614 --> 00:06:33,342 that we can obviously compute the value of the objective function for this 124 00:06:33,342 --> 00:06:37,610 permutation, and that's one of the things we'll check in the script. 125 00:06:37,610 --> 00:06:40,022 Okay? So, now one of the really important 126 00:06:40,022 --> 00:06:43,620 features of the traveling salesman problem, okay. 127 00:06:43,620 --> 00:06:46,392 And especially the Euclidean traveling salesman problem, and the fact that we 128 00:06:46,392 --> 00:06:48,870 are actually giving you these coordinates, is the fact that you will be 129 00:06:48,870 --> 00:06:52,020 able to visualize what you're doing. Okay? 130 00:06:52,020 --> 00:06:55,284 So, you can use the location of these points you know, in the plane to actually 131 00:06:55,284 --> 00:06:58,905 see you know, to actually visualize your tour, see where they are, visualize your 132 00:06:58,905 --> 00:07:02,193 tour. So, this beautiful output that you 133 00:07:02,193 --> 00:07:06,500 thought was very great, if you try to visualize it, this is what it gives you. 134 00:07:06,500 --> 00:07:10,733 And this is pretty, pretty crappy, right? So as soon as you have edges that are 135 00:07:10,733 --> 00:07:14,876 crossing each other, bad news, okay? So that means that you are very far from 136 00:07:14,876 --> 00:07:17,429 optimality, okay? So one of the things that you want to do 137 00:07:17,429 --> 00:07:20,159 in this particular, in this particular assignment is visualize how good your 138 00:07:20,159 --> 00:07:23,680 tour is when you are using different techniques. 139 00:07:23,680 --> 00:07:26,584 And that will help you a lot, because it will identify bottlenecks, things that 140 00:07:26,584 --> 00:07:30,220 your algorithm is not doing well, things that you should focus on. 141 00:07:30,220 --> 00:07:31,290 And things like this. Okay. 142 00:07:31,290 --> 00:07:35,619 Now, the TSP is very nice for doing this. But in most of the assignments this is 143 00:07:35,619 --> 00:07:39,416 something that I highly recommend. If you can build a visualization to see 144 00:07:39,416 --> 00:07:41,440 how your algorithm is doing. You know? 145 00:07:41,440 --> 00:07:44,550 Modeling various aspects of the problems. The objective functions. 146 00:07:44,550 --> 00:07:46,550 You know? The various aspects of your algorithm. 147 00:07:46,550 --> 00:07:49,235 If it's a constraint programming algorithm iii. 148 00:07:49,235 --> 00:07:52,320 If it's a, if it's a local search algorithm, you know? 149 00:07:52,320 --> 00:07:55,160 What is the sequence of solutions that it gets your fast. 150 00:07:55,160 --> 00:07:55,960 You know? It goes down. 151 00:07:55,960 --> 00:07:58,540 Whether it stays at the same place many times, okay? 152 00:07:58,540 --> 00:08:00,515 Things like that. If you are using mathematical 153 00:08:00,515 --> 00:08:03,630 programming, what is the relaxation? How good is it? 154 00:08:03,630 --> 00:08:05,203 Okay. So these things you want to visualize 155 00:08:05,203 --> 00:08:07,740 because there will give you a lot of insight on your algorithms, and I'm 156 00:08:07,740 --> 00:08:12,456 speaking from experience, you know. I spent, you know hours and hours looking 157 00:08:12,456 --> 00:08:16,858 at a screen of you know, data. And sometimes the simple visualization 158 00:08:16,858 --> 00:08:20,620 will give you the haha, the wow factor that you need to actually understand what 159 00:08:20,620 --> 00:08:25,037 your algorithm is not doing well. So this is a good example here, it's 160 00:08:25,037 --> 00:08:28,558 actually pretty easy to do. Build this visualization, it will help 161 00:08:28,558 --> 00:08:31,382 you. And this is of use if you think oh, this 162 00:08:31,382 --> 00:08:35,850 is a pretty nice tool, right. And so assignment tips. 163 00:08:35,850 --> 00:08:37,424 Okay? So there are a lot of assignment tips on 164 00:08:37,424 --> 00:08:40,015 this particular problem. There are a lot of things you can do. 165 00:08:40,015 --> 00:08:42,150 Okay? Here are a couple of ones. 166 00:08:42,150 --> 00:08:44,650 Okay? So first, if you, if you implement a 167 00:08:44,650 --> 00:08:48,280 local search, one of the things you need to focus on is, how quickly can I explore 168 00:08:48,280 --> 00:08:51,030 the neighborhood? Okay? 169 00:08:51,030 --> 00:08:53,590 So, we can have, you know, a lot of very sophisticated neighborhood or a lot of 170 00:08:53,590 --> 00:08:56,110 very simple neighborhood but we may want to explore them and they may be very 171 00:08:56,110 --> 00:08:59,780 large. How you do that very quickly, okay? 172 00:08:59,780 --> 00:09:02,904 So this is a key aspect of, you know, the TSP. 173 00:09:02,904 --> 00:09:06,330 Symmetries and dominance, there are plenty in these problems as well. 174 00:09:06,330 --> 00:09:09,282 You have to think about them, okay? Can you actually avoid, you know, av-, 175 00:09:09,282 --> 00:09:12,699 you know, exploring sub paths and things like that that are dominated by others, 176 00:09:12,699 --> 00:09:17,290 or they are symmetrical to others, you know, try to think about that. 177 00:09:17,290 --> 00:09:20,392 Now, one of the things in this problem as well is that essentially, every city if 178 00:09:20,392 --> 00:09:24,610 you are you know, visiting cities can be connected to every other city. 179 00:09:24,610 --> 00:09:27,250 Okay, every vertex can be connected to every other vertex. 180 00:09:27,250 --> 00:09:29,980 But do you need all of them? Okay, so think about it you know. 181 00:09:29,980 --> 00:09:33,321 You are trying to do a tour of you know, the United States. 182 00:09:33,321 --> 00:09:35,700 Are you going to connect Miami to Seattle? 183 00:09:35,700 --> 00:09:38,628 You are trying to do a tour of Australia, are you going to connect Perth and 184 00:09:38,628 --> 00:09:40,980 Brisbane? Okay, think about that. 185 00:09:40,980 --> 00:09:44,449 Okay, so are you going to do that? And also, can you prove that you don't 186 00:09:44,449 --> 00:09:48,296 need these edges, okay. So, that's, that's one of the things that 187 00:09:48,296 --> 00:09:51,584 you can think about. if you do a complete search, think about 188 00:09:51,584 --> 00:09:54,410 the various ways you can actually compute lower bounds. 189 00:09:54,410 --> 00:09:56,700 There are many, many ways of computing lower bounds. 190 00:09:56,700 --> 00:10:00,495 Different trade offs between you know, speed of computation and quality of the 191 00:10:00,495 --> 00:10:03,538 lower bound. And then, as I said, you know, visualize, 192 00:10:03,538 --> 00:10:06,390 look at the solution. Look at what your algorithm is doing. 193 00:10:06,390 --> 00:10:08,510 What about the algorithm that you are going to do. 194 00:10:08,510 --> 00:10:11,340 Sometimes you will be surprised at what it does. 195 00:10:11,340 --> 00:10:14,400 And it may be useful to actually visualize, to see. 196 00:10:14,400 --> 00:10:16,527 Some of the aspects that you can improve. Okay? 197 00:10:16,527 --> 00:10:18,868 I find this is an amazing assignment, okay? 198 00:10:18,868 --> 00:10:22,207 And people have, you know as I told you this is one of the most studied problem 199 00:10:22,207 --> 00:10:25,901 in optimization. Its very interesting that you actually 200 00:10:25,901 --> 00:10:28,490 try you know some of the techniques on this one. 201 00:10:28,490 --> 00:10:31,196 So look its very specific, you know a lot of different techniques and different 202 00:10:31,196 --> 00:10:33,840 fields you know have been applied to this problem. 203 00:10:33,840 --> 00:10:36,090 Okay, good luck.