1 00:00:00,012 --> 00:00:02,095 Okay. So this is Discrete Optimization, Greedy 2 00:00:02,095 --> 00:00:05,040 Algorithm Part 2. So what we want to do here today, okay, 3 00:00:05,040 --> 00:00:08,802 is basically looking again at greedy algorithm, okay, on a very simple problem 4 00:00:08,802 --> 00:00:12,606 called set covering. Of course it's NP-hard, so it's not that 5 00:00:12,606 --> 00:00:14,518 simple. Okay? 6 00:00:14,518 --> 00:00:18,088 And then demonstrating, essentially, the essence of the class, which is building, 7 00:00:18,088 --> 00:00:21,421 you know solutions in an incremental fashion. 8 00:00:21,421 --> 00:00:24,125 every one of them improving on the previous one. 9 00:00:24,125 --> 00:00:27,134 We're going to illustrate on the greedy algorithm, but it applies all the 10 00:00:27,134 --> 00:00:31,445 technologies that we cover in our class. You start from the basis, you see how the 11 00:00:31,445 --> 00:00:34,695 basis behave, you [INAUDIBLE] identify the bottlenecks, and then, you try to 12 00:00:34,695 --> 00:00:37,954 improve it. Whether it is constant programming, local 13 00:00:37,954 --> 00:00:41,351 search, mixed integer programming. You know, it's always, you know, always 14 00:00:41,351 --> 00:00:44,471 starts with some models and try to see how you can improve it. 15 00:00:44,471 --> 00:00:47,218 And, typically, while looking at what, you know, the algorithm is doing, you can 16 00:00:47,218 --> 00:00:50,197 identify these bottle necks. But what we do today is a very simple 17 00:00:50,197 --> 00:00:52,630 illustration of that on the set covering problem. 18 00:00:52,630 --> 00:00:54,190 Okay. So what is a set covering problem? 19 00:00:54,190 --> 00:00:56,159 Okay. You have a set of item, okay, and you 20 00:00:56,159 --> 00:00:58,440 want to collect them all. Okay? 21 00:00:58,440 --> 00:01:01,976 the way you can collect them all is that by looking at sets that you can actually 22 00:01:01,976 --> 00:01:05,558 get, okay? Everyone of these set is a sub, you know 23 00:01:05,558 --> 00:01:10,263 contains a subset of the items, you know, that, you know, from 1 to N. 24 00:01:10,263 --> 00:01:12,017 Okay? So in a sense you have these set, these 25 00:01:12,017 --> 00:01:16,110 subset of these, of these items. And you have to choose as few of them, 26 00:01:16,110 --> 00:01:19,270 okay? So as that you take all the items, okay? 27 00:01:19,270 --> 00:01:22,768 So in a sense, what is the smallest of, [INAUDIBLE] smallest set of subset which 28 00:01:22,768 --> 00:01:27,840 contains all the all the, all the items. That's what you are trying to do, okay? 29 00:01:27,840 --> 00:01:31,090 So let me give you an example because this is always easier to explain on, on 30 00:01:31,090 --> 00:01:34,458 an example. Here, you have all the items from 1 to 9, 31 00:01:34,458 --> 00:01:36,910 okay? That's what you see over there. 32 00:01:36,910 --> 00:01:40,130 And then, you have the various sets that you can select. 33 00:01:40,130 --> 00:01:42,540 Okay? So the first set, S1 is covering item 1 34 00:01:42,540 --> 00:01:44,297 and item 4. Okay? 35 00:01:44,297 --> 00:01:47,530 So if you pick up that set, you, you know, you, you would have item 1 and item 36 00:01:47,530 --> 00:01:50,758 4. If you look at set S2, you would have 37 00:01:50,758 --> 00:01:56,650 item 1 through 3, and then item 6 and 7 here, and then you have item 9. 38 00:01:56,650 --> 00:01:58,207 Okay. So you cover six items here, but you 39 00:01:58,207 --> 00:02:01,566 don't cover the same one, exactly the same one as the first time. 40 00:02:01,566 --> 00:02:03,528 Okay? And so on and so forth, for all the 41 00:02:03,528 --> 00:02:07,757 other, you know, all the other subset. So your goal is that how many of these 42 00:02:07,757 --> 00:02:11,981 guys do I have to select, okay? such that I cover all the items. 43 00:02:11,981 --> 00:02:13,908 Okay? So you can look at this and try to find 44 00:02:13,908 --> 00:02:17,509 out how many you need, okay. So what we going to do is build a couple 45 00:02:17,509 --> 00:02:20,926 of greedy algorithm that will, you know, be more and more refined in how we can 46 00:02:20,926 --> 00:02:25,643 find you know in, in actually, you know, optimizing this problem. 47 00:02:25,643 --> 00:02:27,785 Okay? So one of the things that you have to 48 00:02:27,785 --> 00:02:31,945 understand here is, obviously, if you want to select item 1, you know that you 49 00:02:31,945 --> 00:02:37,190 will have to select either Set 1 or Set 2 or Set 5. 50 00:02:37,190 --> 00:02:39,443 Okay? So these are the only three [INAUDIBLE], 51 00:02:39,443 --> 00:02:42,440 the only three sets that are covering item 1. 52 00:02:42,440 --> 00:02:44,365 So you know for sure that one of them has to be selected. 53 00:02:44,365 --> 00:02:48,148 Okay? So, same thing for of use before item 2, 54 00:02:48,148 --> 00:02:52,910 now it's you know, set 2, set 3 or set 6. Okay? 55 00:02:52,910 --> 00:02:55,690 And this is the same for every one of these items. 56 00:02:55,690 --> 00:02:57,128 Okay? So you know exactly which of these 57 00:02:57,128 --> 00:03:00,342 subsets, you know, can be selected for covering that particular item. 58 00:03:00,342 --> 00:03:02,620 Okay? So, we're going to do a first idea which 59 00:03:02,620 --> 00:03:04,680 is trivial. We're going to look at the set in 60 00:03:04,680 --> 00:03:07,410 sequence, okay? Until we have covered all the items, 61 00:03:07,410 --> 00:03:09,880 okay? This is amazingly simple, okay? 62 00:03:09,880 --> 00:03:11,790 So, the first idea that comes to mind, okay? 63 00:03:11,790 --> 00:03:14,962 So, we look at this, but once we have done that, we have a sense of, you know, 64 00:03:14,962 --> 00:03:19,343 of how, you know, how easy this problem is solved, how hard it is. 65 00:03:19,343 --> 00:03:21,268 Alright? So we start with the S1 you know, you 66 00:03:21,268 --> 00:03:24,813 know we cover two items that what you see over there, okay. 67 00:03:24,813 --> 00:03:27,889 So this is all we going to, you know, illustrate the algorithm. 68 00:03:27,889 --> 00:03:30,610 You take the next one. Then, you cover all lot of the items so 69 00:03:30,610 --> 00:03:33,925 there are only two missing, you take the next one you don't cover anything, you 70 00:03:33,925 --> 00:03:37,087 take the second one you cover one more and then you have this one which cover 71 00:03:37,087 --> 00:03:41,541 everything. So you have a Greedy Algorithm here that 72 00:03:41,541 --> 00:03:45,520 will select five of these things, right? Okay, so that's what you see. 73 00:03:45,520 --> 00:03:50,100 Okay, better than six but barely, right? So, item 2 is to say okay, okay. 74 00:03:50,100 --> 00:03:54,003 So, you know, I was doing [INAUDIBLE] something which is completely stupid. 75 00:03:54,003 --> 00:03:57,267 So what I can do now is look at the, this sequence of subset but take the one that 76 00:03:57,267 --> 00:03:59,990 covered the most item first. Okay? 77 00:03:59,990 --> 00:04:02,550 So, in a sense, you know, this is one of the things that this class is, is always 78 00:04:02,550 --> 00:04:06,230 going to, is one of the things that this class is focusing on. 79 00:04:06,230 --> 00:04:08,852 When you start choosing phase, you have to choose carefully. 80 00:04:08,852 --> 00:04:10,720 Okay. So, you basically think, okay, so if, if 81 00:04:10,720 --> 00:04:13,970 I am choosing a subset, I have to try to choose that subset, you know, that, that 82 00:04:13,970 --> 00:04:16,920 makes sense for the, you know, the problem, you know, for the, you know, 83 00:04:16,920 --> 00:04:20,989 using the nature of the problem. Okay? 84 00:04:20,989 --> 00:04:25,021 So in the sense, the intuition here is that we would cover many more elements, 85 00:04:25,021 --> 00:04:28,590 so you hope that you will do better. Okay? 86 00:04:28,590 --> 00:04:31,717 Once again, we look at these sets, we compute how many elements they are 87 00:04:31,717 --> 00:04:36,723 actually they are actually storing, okay? That's what you see over here. 88 00:04:36,723 --> 00:04:39,432 So, some of them ask/g, you know, are covering 6, some of them are covering 5, 89 00:04:39,432 --> 00:04:41,986 and so on. You take the one that covers the most, 90 00:04:41,986 --> 00:04:44,140 okay? And then, you see all these items, which 91 00:04:44,140 --> 00:04:45,610 have been covered. Okay? 92 00:04:45,610 --> 00:04:48,186 You take the next one, you know, which is 5 here. 93 00:04:48,186 --> 00:04:51,525 You take the subsequent one, which is also five items covered, and then, you 94 00:04:51,525 --> 00:04:55,417 take the one that cover 4. And now, basically, with four sets, you 95 00:04:55,417 --> 00:04:57,369 can cover all the items. Okay? 96 00:04:57,369 --> 00:05:01,285 It's a little bit better, okay, 4. Okay, not, you know, not much but at 97 00:05:01,285 --> 00:05:04,701 least you know this give you a better solution. 98 00:05:04,701 --> 00:05:06,738 Okay? Now, obviously, at that point you have 99 00:05:06,738 --> 00:05:11,537 realized that, well, you know, the first one I select probably, you know, nicely. 100 00:05:11,537 --> 00:05:15,101 But when I look at the second one that I selected, okay, it may already cover many 101 00:05:15,101 --> 00:05:18,404 of the elements that I have already covered. 102 00:05:18,404 --> 00:05:21,690 And that should not be contact, So, what you want is to basically look at the 103 00:05:21,690 --> 00:05:25,188 greedy algorithm which, at every step, is not only looking at all the items, but 104 00:05:25,188 --> 00:05:29,880 only the one that haven't been covered so far, okay? 105 00:05:29,880 --> 00:05:33,408 So, we take the very similar ideas that we have done for, you know, the greedy 106 00:05:33,408 --> 00:05:37,216 algorithm number 2, but now we upgrade it by updating the various elements, which 107 00:05:37,216 --> 00:05:41,650 are actually covered by the sets. Okay? 108 00:05:41,650 --> 00:05:44,770 So, we start out with [INAUDIBLE], you know, the biggest set here because it's 109 00:05:44,770 --> 00:05:48,779 the only one covering six items, okay? Now, you see all these items which are 110 00:05:48,779 --> 00:05:51,600 being covered. The only thing that we should really care 111 00:05:51,600 --> 00:05:54,660 at this point are items which have not been covered. 112 00:05:54,660 --> 00:05:56,430 So, we update the number here. Okay? 113 00:05:56,430 --> 00:06:00,462 In such a way that [UNKNOWN] is represent the number of item, which are not yet 114 00:06:00,462 --> 00:06:04,022 covered, that the sets are covering. Okay? 115 00:06:04,022 --> 00:06:07,550 Another, the item which is the most valuable is item S5 because S5 is 116 00:06:07,550 --> 00:06:12,418 basically covering two items that have not been covered so far. 117 00:06:12,418 --> 00:06:16,513 So that's the, that's the second set that we going to cover and in this particular 118 00:06:16,513 --> 00:06:20,923 case almost all the item has been selected except one. 119 00:06:20,923 --> 00:06:23,382 Okay? So, usually, we update this table again, 120 00:06:23,382 --> 00:06:25,896 okay? And there are basically only two sets 121 00:06:25,896 --> 00:06:29,290 that are covering this particular item. Okay? 122 00:06:29,290 --> 00:06:31,030 And therefore, we select one of these two. 123 00:06:31,030 --> 00:06:34,050 And we have a solution with three items, okay? 124 00:06:34,050 --> 00:06:37,100 So what you have here is a greedy algorithm which covered, you know, three 125 00:06:37,100 --> 00:06:39,674 sets. And this greedy algorithm, in fact, has 126 00:06:39,674 --> 00:06:42,820 some performance guarantee. You can prove that it's never completely, 127 00:06:42,820 --> 00:06:45,630 you know, outrageous in the quality of the solution. 128 00:06:45,630 --> 00:06:48,705 I won't go into the detail, okay? But this is a greedy algorithm that 129 00:06:48,705 --> 00:06:52,870 actually has performance guarantee, okay? But, once again, you know, the principle 130 00:06:52,870 --> 00:06:56,070 here was saying, okay, so we move and we try to design this greedy algorithm in 131 00:06:56,070 --> 00:07:01,087 step wise refinement using more and more of this structure of the problem. 132 00:07:01,087 --> 00:07:02,987 Okay. Now so this are the various steps, of 133 00:07:02,987 --> 00:07:07,298 course, every one of these algorithm is more complex than the previous one. 134 00:07:07,298 --> 00:07:09,017 Okay? They are typically, now we are doing 135 00:07:09,017 --> 00:07:12,251 things which are dynamic recomputing, what is the best set dynamically, so the 136 00:07:12,251 --> 00:07:15,844 complexity increases but the quality also increases. 137 00:07:15,844 --> 00:07:18,916 And so, you know, one of the things that you try to do in optimization is always 138 00:07:18,916 --> 00:07:21,940 trying to find the good trader between the time it takes and then the quality 139 00:07:21,940 --> 00:07:24,760 that it gives you. Okay? 140 00:07:24,760 --> 00:07:26,760 So, can we do better? Is three, is optimal? 141 00:07:26,760 --> 00:07:28,590 Okay. In this particular case three is not 142 00:07:28,590 --> 00:07:31,038 optimal. You know, I'm sure you have noticed that, 143 00:07:31,038 --> 00:07:35,050 you know, the sets you know, S5 and S6 were really complementary. 144 00:07:35,050 --> 00:07:37,170 If you're setting the two, you cover everything. 145 00:07:37,170 --> 00:07:39,890 So, that was the optimal solution. So, a greedy algorithm is typically is 146 00:07:39,890 --> 00:07:42,130 not going to give you the optimal solution, but you can start you know, 147 00:07:42,130 --> 00:07:44,270 looking at how good it is. Okay? 148 00:07:44,270 --> 00:07:46,130 And understanding that as well. Okay? 149 00:07:46,130 --> 00:07:49,434 Now, so once you have these greedy algorithms, one of the things you can 150 00:07:49,434 --> 00:07:52,906 start to do is to say, okay, so can I actually move to another technology and 151 00:07:52,906 --> 00:07:57,865 which technology, you know, is going to be appropriate? 152 00:07:57,865 --> 00:07:59,638 Okay? In this particular case you can start 153 00:07:59,638 --> 00:08:03,980 looking at this problem and say oh, but you know, there are dominance. 154 00:08:03,980 --> 00:08:06,362 There are some set are really dominated by other ones. 155 00:08:06,362 --> 00:08:08,705 Okay? So, for instance, in this particular 156 00:08:08,705 --> 00:08:13,104 problem here set S2 and S3, okay, are really closely related. 157 00:08:13,104 --> 00:08:18,145 You know, S3, okay, is covering only a subset of the item that S2 is covering. 158 00:08:18,145 --> 00:08:22,113 So there is never, never, never any reason to consider S3 because taking S2 159 00:08:22,113 --> 00:08:26,300 is always better. So you could say, okay, so S2 dominates 160 00:08:26,300 --> 00:08:28,950 S3, we can get rid of this. Okay? 161 00:08:28,950 --> 00:08:32,793 So symmetries and dominates are very important, you know, optimization 162 00:08:32,793 --> 00:08:36,970 properties, that you want to, you know, use in practice. 163 00:08:36,970 --> 00:08:38,640 Okay? And in this particular case, we can get 164 00:08:38,640 --> 00:08:41,175 rid of one set. We'll never have to consider that set. 165 00:08:41,175 --> 00:08:43,205 Okay? Now there is another kinds of properties 166 00:08:43,205 --> 00:08:46,140 that we are looking at, okay? So look at this, okay? 167 00:08:46,140 --> 00:08:50,591 So if we look at item 5, there is only one set which covers item 5. 168 00:08:50,591 --> 00:08:53,944 So what do you know? You immediately know that you have to 169 00:08:53,944 --> 00:08:56,590 select, you know, set, you know, S5. Okay? 170 00:08:56,590 --> 00:09:00,248 As soon as you select S5, okay, you're going to select not only item 5, but item 171 00:09:00,248 --> 00:09:03,550 4, item 1, and so on and so forth. Okay? 172 00:09:03,550 --> 00:09:07,242 And in a sense, you reduce the size of the problem tremendously. 173 00:09:07,242 --> 00:09:10,400 Okay? So, once again, you have to do this. 174 00:09:10,400 --> 00:09:13,938 Because you do this, you reduce the size of the problem tremendously. 175 00:09:13,938 --> 00:09:16,450 And so, this is a really nice property to exploit. 176 00:09:17,460 --> 00:09:19,749 Okay? So in a sense, so what I'm showing you 177 00:09:19,749 --> 00:09:23,810 here is try, is exploiting properties of the solution. 178 00:09:23,810 --> 00:09:25,757 Okay? And this is one of the typical thing that 179 00:09:25,757 --> 00:09:28,753 you do, for instance, on Constraint Programming. 180 00:09:28,753 --> 00:09:31,660 You look at how you can actually eliminate many of the possibilities 181 00:09:31,660 --> 00:09:34,720 quickly, kay, using, let's say, symmetries or dominence or things that 182 00:09:34,720 --> 00:09:39,090 you necessarily have to do. This is a property that every solution 183 00:09:39,090 --> 00:09:42,070 will have. Every solution has to pick up item 5, 184 00:09:42,070 --> 00:09:45,068 therefore, will contain set S5. Okay? 185 00:09:45,068 --> 00:09:49,284 Now the other way that you typically can turn a greedy algorithm into a constraint 186 00:09:49,284 --> 00:09:53,314 program, for instance, is by saying, you know, I made a choice of the first set, 187 00:09:53,314 --> 00:09:58,250 what if I had taken another set, okay, instead? 188 00:09:58,250 --> 00:10:00,680 And now, you can start exploring other alternatives. 189 00:10:00,680 --> 00:10:04,035 So in the sense of a constraint program, you can think of it as like a greedy 190 00:10:04,035 --> 00:10:09,020 algorithm where you explore many things at every one of these steps. 191 00:10:09,020 --> 00:10:13,046 So you will basically prune the search base using various different kind of 192 00:10:13,046 --> 00:10:16,768 techniques. So these things are not as completely 193 00:10:16,768 --> 00:10:19,610 separated as you may think of. Okay? 194 00:10:19,610 --> 00:10:23,640 So this is essentially, you know the way you can build, you know, greedy 195 00:10:23,640 --> 00:10:26,854 algorithm. You can build, you know, optimization 196 00:10:26,854 --> 00:10:29,300 programs step by step. Okay? 197 00:10:29,300 --> 00:10:33,326 And, and this is a little bit the kind of metholodologies that you have to follow 198 00:10:33,326 --> 00:10:38,220 when you are trying to tackle a new optimization problem. 199 00:10:38,220 --> 00:10:40,860 Okay. Thank you and good luck.