1 00:00:00,012 --> 00:00:04,608 In the Design and Analysis of Algorithms, both this course and its predecessor, 2 00:00:04,608 --> 00:00:09,267 we've covered a lot of famous algorithms. We've covered a lot of fundamental data 3 00:00:09,267 --> 00:00:11,647 structures. We've covered lots of killer 4 00:00:11,647 --> 00:00:14,732 applications. And despite the fact that I've tried to 5 00:00:14,732 --> 00:00:19,277 use our time together in the optimal way, giving you the most bang for the buck, 6 00:00:19,277 --> 00:00:22,632 there are fundamental topics we have not had a chance. 7 00:00:22,632 --> 00:00:25,786 To cover. In these final few videos, I want to talk 8 00:00:25,786 --> 00:00:29,147 about things we didn't talk about. I have a few goals. 9 00:00:29,147 --> 00:00:33,988 The first goal is to give you a little bit of literacy and a few more techniques 10 00:00:33,988 --> 00:00:39,011 so at least you've heard of things like bypar type matching, and the maximum flow 11 00:00:39,011 --> 00:00:43,570 problem, and your aware that there are good algorithms for those sorts of 12 00:00:43,570 --> 00:00:46,701 problems. Secondly, I want to get you excited to do 13 00:00:46,701 --> 00:00:50,937 a little bit more study either through further courses or on your own. 14 00:00:50,937 --> 00:00:55,709 And indeed generally the topics that I'll mention in these final few videos are 15 00:00:55,709 --> 00:00:58,742 covered in your more comprehensive textbooks. 16 00:00:58,742 --> 00:01:02,683 On algorithms. Finally I hope to convey something that I 17 00:01:02,683 --> 00:01:07,997 mention in passing, in the past, is that algorithms is not a oscified field. 18 00:01:07,997 --> 00:01:13,261 Rather, it's a vibrant area of research. There's still lot's of things about 19 00:01:13,261 --> 00:01:16,922 fundamental problems of models that we don't get. 20 00:01:16,922 --> 00:01:20,658 Understand. Let's get the ball rolling with a very 21 00:01:20,658 --> 00:01:24,369 fun, classical problem known as stable matching. 22 00:01:24,369 --> 00:01:30,139 We're going to think of stable matching as a graph problem and there's going to 23 00:01:30,139 --> 00:01:33,917 be two sets of nodes, two sets of vertices, U and V. 24 00:01:33,917 --> 00:01:39,106 For historical reasons, we will call these sets the men and the women, 25 00:01:39,106 --> 00:01:43,636 respectively. A non-essential assumption that will make 26 00:01:43,636 --> 00:01:48,749 for simplicity is that these 2 sets of nodes have equal cardinality. 27 00:01:48,749 --> 00:01:53,433 Call that common size N. So for example, maybe N equals 3 and we 28 00:01:53,433 --> 00:01:56,530 can call the first set of nodes A, B and C. 29 00:01:56,530 --> 00:02:01,906 And the second set of nodes D, E and F. Now what's really important about the 30 00:02:01,906 --> 00:02:06,911 stable matching problem is that every node has preferences, there are things 31 00:02:06,911 --> 00:02:11,365 that advance more than others. Specifically each node in U has a rank 32 00:02:11,365 --> 00:02:16,012 list of the nodes in V, each node in V has a rank list of the nodes in U. 33 00:02:16,012 --> 00:02:20,110 So in this example maybe A, B, and C are all on the same page. 34 00:02:20,110 --> 00:02:24,968 They all agree that D is really the best node, it's better than E, but E is 35 00:02:24,968 --> 00:02:30,334 definitely better than F. That is maybe every single node on the 36 00:02:30,334 --> 00:02:35,167 left hand side has the rank listed D, followed by E, followed by F. 37 00:02:35,167 --> 00:02:43,467 By contrast, maybe there's heterogeneous preferences amongst vertices in capital 38 00:02:43,467 --> 00:02:47,192 V. Maybe D thinks a is better than B is 39 00:02:47,192 --> 00:02:52,692 better than C. While E prefers B to C to A. 40 00:02:52,692 --> 00:02:58,538 And finally, F prefers C to A to B. So what might these nodes and these 41 00:02:58,538 --> 00:03:04,389 ranked lists represent? Well, imagine for example that one set is colleges and the 42 00:03:04,389 --> 00:03:08,352 other set is applicants, recent high school graduates. 43 00:03:08,352 --> 00:03:13,259 Each college has a ranking of the applicants based on things like test 44 00:03:13,259 --> 00:03:17,455 scores, grades and the fit that student is for the college. 45 00:03:17,455 --> 00:03:23,177 And each student similarly has a ranking over the colleges of which one's prefers 46 00:03:23,177 --> 00:03:24,870 to go to. Do over others. 47 00:03:24,870 --> 00:03:29,084 As another example you might think about medical residents that is people who just 48 00:03:29,084 --> 00:03:32,719 finished medical school looking for residency and the other side being 49 00:03:32,719 --> 00:03:35,128 hospitals. Again hospitals are going to have a 50 00:03:35,128 --> 00:03:39,124 preference over which recent graduates they accept as residents, residents, 51 00:03:39,124 --> 00:03:43,039 potential residents of course have preferences over which hospital they get 52 00:03:43,039 --> 00:03:46,727 assigned to. So given this data, 2 sets of nodes, and 53 00:03:46,727 --> 00:03:51,482 these rank lists we're interested in computing a stable matching. 54 00:03:51,482 --> 00:03:56,972 So what's a stable matching? Well, first of all, it better give you a perfect 55 00:03:56,972 --> 00:04:00,372 matching. And a perfect matching means that each 56 00:04:00,372 --> 00:04:06,292 node of the left, each node of U is matched to exactly one node of V and vice 57 00:04:06,292 --> 00:04:10,382 versa. In addition to being a perfect matching, 58 00:04:10,382 --> 00:04:16,440 it should also be stable in the sense that there should be no Blocking pair. 59 00:04:16,440 --> 00:04:21,459 What that means, is that if you look at any pair of nodes, little u from capital 60 00:04:21,459 --> 00:04:26,194 u, little v from capital v, if little u and little v are not matched, there 61 00:04:26,194 --> 00:04:29,451 better be a good reason for them not being matched. 62 00:04:29,451 --> 00:04:34,591 Specifically little u better strictly prefer Who it is matched to, call that V 63 00:04:34,591 --> 00:04:39,901 prime, to this alternate V, or if that's not the case, it better be that little v 64 00:04:39,901 --> 00:04:44,889 strictly prefers the node it's matched to, call it U prime, compared to the 65 00:04:44,889 --> 00:04:46,609 alternative. Little u. 66 00:04:46,609 --> 00:04:51,596 So what's the motivation for this definition? Well think about any perfect 67 00:04:51,596 --> 00:04:54,657 matching that fails this stability property. 68 00:04:54,657 --> 00:04:58,533 You're really asking for trouble for that kind of matching. 69 00:04:58,533 --> 00:05:03,620 Specifically u with its wandering eye will spy v and it prefers v over v node v 70 00:05:03,620 --> 00:05:07,782 prime to which It was a sign. Similiary, V is going to return that 71 00:05:07,782 --> 00:05:10,387 gaze. V prefers U by assumption that this 72 00:05:10,387 --> 00:05:15,152 stability property fails over the node U prime to which it was matched, so U and V 73 00:05:15,152 --> 00:05:19,562 are going to have an incentive to essentially elope and disrupt this 74 00:05:19,562 --> 00:05:23,061 Imagine for example that u is a student and V is a college. 75 00:05:23,061 --> 00:05:26,619 And you have a matching that fails the stability property. 76 00:05:26,619 --> 00:05:31,265 Then, this pair this student and this college has an incentive to do a sort of 77 00:05:31,265 --> 00:05:35,409 back room deal after the fact. U might want to withdraw its commitment 78 00:05:35,409 --> 00:05:38,950 from its assigned college, V-prime to try and get into V instead. 79 00:05:38,950 --> 00:05:43,523 And V to college actually wants to sort of withdraw its offer from its 80 00:05:43,523 --> 00:05:46,038 student, U prime, they give it to U instead. 81 00:05:46,038 --> 00:05:50,598 So that's an unstable matching, but as long as there's no blocking pair of this 82 00:05:50,598 --> 00:05:55,005 form, and for every pair which is not matched together there's a good reason 83 00:05:55,005 --> 00:05:58,002 for it, then we deem it a stable Stable matching. 84 00:05:58,002 --> 00:06:02,729 Let me tell you about an extremely elegant and justly famous algorithm for 85 00:06:02,729 --> 00:06:07,097 computing a stable matching, the Gale-Shapley Proposal Algorithm. 86 00:06:07,097 --> 00:06:12,183 Lloyd Shapely was one of the recipients of the 2012 Nobel Prize in economics, in 87 00:06:12,183 --> 00:06:16,438 large part for this algorithm. Had David Gale still been alive, he 88 00:06:16,438 --> 00:06:20,560 surely would have been a co-recipient of that Nobel Prize as well. 89 00:06:21,669 --> 00:06:26,459 Let me explain the algorithm in an example, then I'll give you the general 90 00:06:26,459 --> 00:06:29,891 psudeocode. We start with the empty matching, with 91 00:06:29,891 --> 00:06:34,632 nobody matched with anybody else. And as long as there's some man, some 92 00:06:34,632 --> 00:06:38,657 node on the left hand side which isn't matched to some woman. 93 00:06:38,657 --> 00:06:43,497 We pick an arbitrary unattached man and we let it propose to a woman of its 94 00:06:43,497 --> 00:06:46,690 choice. So we might start for example with The 95 00:06:46,690 --> 00:06:50,432 man C. C says, well, my favorite woman is D, so 96 00:06:50,432 --> 00:06:56,838 let me start by proposing to the woman D. Now D isn't necessarily very impressed 97 00:06:56,838 --> 00:07:03,410 with C, you've noticed C is last in these preference list, but with a lack of other 98 00:07:03,410 --> 00:07:09,922 proposals at the moment, the node D tentatively accepts this proposal from C. 99 00:07:09,922 --> 00:07:15,667 Now we proceed to the next iteration of the wow loop, we look for a man that is 100 00:07:15,667 --> 00:07:20,937 unattached, so maybe we pick B. B now gets its shot at proposing to its 101 00:07:20,937 --> 00:07:24,206 favorite woman. In this case it is also D. 102 00:07:24,206 --> 00:07:29,851 So now the node D has a bit of leverage it has two competing offers from the 103 00:07:29,851 --> 00:07:35,645 nodes B and C and of course it retains the offer from the preferred suitor in 104 00:07:35,645 --> 00:07:41,576 this case D prefers D to C so the edge between C and D gets deleted and replaced 105 00:07:41,576 --> 00:07:46,507 with the edge between B and D. So now both of them at A and C are 106 00:07:46,507 --> 00:07:50,673 unattached. Perhaps in the next iteration we let the 107 00:07:50,673 --> 00:07:55,459 node A gets its shot. It again proposes to the node D, and 108 00:07:55,459 --> 00:08:01,096 again D prefers A to B, so the edge between A and D is going to replace the 109 00:08:01,096 --> 00:08:05,762 edge between B and D. So now we again have 2 on hatched men B 110 00:08:05,762 --> 00:08:11,569 and C, lets say we pick C next so C gets to propose to what ever women it wants 111 00:08:11,569 --> 00:08:17,562 and D is its favorite but it is already being rejected by D and in the algorithm 112 00:08:17,562 --> 00:08:22,933 men will never re-propose to a women who has previously rejected him. 113 00:08:22,933 --> 00:08:28,972 So next on C's list is the women E and so E has that not had any other offer so E. 114 00:08:28,972 --> 00:08:32,613 He tentatively accepts the proposal from C. 115 00:08:32,613 --> 00:08:38,786 But now, once B gets a chance, B will also propose next to E, that's B's next 116 00:08:38,786 --> 00:08:42,545 favorite and it's already been rejected by D. 117 00:08:42,545 --> 00:08:48,718 And E prefers B to C, so we will replace the edge between C and E with the edge 118 00:08:48,718 --> 00:08:52,816 between B and E. And so now, finally, the man C is the 119 00:08:52,816 --> 00:08:58,958 only one that remains unattached and it has no choice but to propose to its final 120 00:08:58,958 --> 00:09:03,608 option the node F. F actually happily accepts that because c 121 00:09:03,608 --> 00:09:08,799 is on the top of f's list. And that leaves us with the matching That 122 00:09:08,799 --> 00:09:14,082 matches A with D, B with E, and C with F. This is clearly a perfect matching. 123 00:09:14,082 --> 00:09:17,711 I'll let you check that's is also a stable matching. 124 00:09:17,711 --> 00:09:23,209 Having traced through this example, I hope that the pseudocode for the general 125 00:09:23,209 --> 00:09:27,593 algorithm will not surprise you. So we have a main y loop. 126 00:09:27,593 --> 00:09:33,828 In each iteration we pick some man who is not yet engaged to any woman, call that 127 00:09:33,828 --> 00:09:37,396 man u. U then proposes to his favorite woman, 128 00:09:37,396 --> 00:09:43,759 the top ranked woman with the constraint that he will not propose a second time to 129 00:09:43,759 --> 00:09:49,693 any woman who has already rejected him. Each woman then behaves in the obvious 130 00:09:49,693 --> 00:09:54,667 way which is to keep track of all of the suitors, all of the men that have 131 00:09:54,667 --> 00:09:59,750 proposed to her and to retain as a tentative engagement only the proposal 132 00:09:59,750 --> 00:10:03,317 from their favorite, that is highest ranked suitor. 133 00:10:04,631 --> 00:10:09,508 Notice this algorithm maintains the invariant that the current set of 134 00:10:09,508 --> 00:10:14,357 engagements is a matching. Not necessarily a perfect matching, but a 135 00:10:14,357 --> 00:10:17,465 matching. That is, at any given time a man is 136 00:10:17,465 --> 00:10:20,601 engaged tentatively to, at most, one woman. 137 00:10:20,601 --> 00:10:24,752 And every woman is tentatively engaged to, at most, one. 138 00:10:24,752 --> 00:10:30,398 The Gale-shapley theorem states that, not only does this algorithm terminate 139 00:10:30,398 --> 00:10:35,421 quickly, not only does it terminate quickly with a perfect matching. 140 00:10:35,421 --> 00:10:40,413 But in fact, it terminates quickly, namely, in a quadratic number of 141 00:10:40,413 --> 00:10:43,592 iterations at most. With a stable match. 142 00:10:43,592 --> 00:10:50,687 In particular, this theory, this algorithm provides constructive proof 143 00:10:50,687 --> 00:10:58,322 that a stable matching always exists, no matter what the preference of the node 144 00:10:58,322 --> 00:11:01,482 are. A highly non-obvious fact. 145 00:11:01,482 --> 00:11:06,969 The proof of the Gale-Shapley theorem is as elegant as the algorithm. 146 00:11:06,969 --> 00:11:12,022 Let me show it to you now. So first of all, why does the algorithm 147 00:11:12,022 --> 00:11:17,737 converge in a quadratic number of iterations? Well, no man proposes to a 148 00:11:17,737 --> 00:11:21,572 woman twice. Therefore, each man makes at most n 149 00:11:21,572 --> 00:11:25,496 proposals. There are only n men so that's the most 150 00:11:25,496 --> 00:11:31,246 n^2 proposals over all. Secondly, when the Gale-Shapley algorthym 151 00:11:31,246 --> 00:11:34,628 terminates it does show with a perfect matching. 152 00:11:34,628 --> 00:11:39,956 Indeed it's a stable matching, but let's just for the moment prove that it's a 153 00:11:39,956 --> 00:11:44,568 perfect matching. To, to see this let's suppose the 154 00:11:44,568 --> 00:11:49,452 contrary. Let's suppose that when it halts, it does not halt with a perfect 155 00:11:49,452 --> 00:11:52,273 match. That means, in particular, some man is 156 00:11:52,273 --> 00:11:56,015 assigned to know woman. How could that have happened? Well it 157 00:11:56,015 --> 00:11:59,355 must have been that the man fell off of his ranked list. 158 00:11:59,355 --> 00:12:04,102 He proposed to every single one of the n women and was rejected by all of them . 159 00:12:04,102 --> 00:12:08,741 So how could this have happened? Well, notice that for a man to be 160 00:12:08,741 --> 00:12:14,143 rejected by a woman, it must be that the woman has some other offer from some 161 00:12:14,143 --> 00:12:19,192 other man that she likes better. That is, rejections happen only by a 162 00:12:19,192 --> 00:12:24,665 woman who is engaged to somebody else. So by virtue of this man being rejected 163 00:12:24,665 --> 00:12:28,909 by all n women, all n women got engaged at some point in the algorithm. 164 00:12:28,909 --> 00:12:33,438 Moreover, if you inspect the algorithm, you'll notice that once a woman is 165 00:12:33,438 --> 00:12:37,063 engaged to somebody, she will remain engaged forever more. 166 00:12:37,063 --> 00:12:41,910 The only thing that changes, is the woman is engaged to men that she likes better 167 00:12:41,910 --> 00:12:46,859 and better as the alogorithm proceeds. So a man rejected by everybody implies 168 00:12:46,859 --> 00:12:51,681 that all women get engaged at some point in the algorithm which means they are all 169 00:12:51,681 --> 00:12:56,107 engaged at the end of the algorithm. But there's an equal number of men and 170 00:12:56,107 --> 00:12:58,865 women. So there's no way all women are engaged 171 00:12:58,865 --> 00:13:03,589 at the end and some man is not engaged. If all women are engaged, all men must be 172 00:13:03,589 --> 00:13:05,024 engaged. As well. 173 00:13:05,024 --> 00:13:11,264 Finally, why is the perfect matching, computed by the Gale-Shapley proposal 174 00:13:11,264 --> 00:13:17,422 algorithm, not just perfect, but more generally, stable? To see why this is 175 00:13:17,422 --> 00:13:21,403 true, let's prove that there's no blocking pair. 176 00:13:21,403 --> 00:13:26,934 That is, let's consider an arbitrary unmatched man u and woman v. 177 00:13:26,934 --> 00:13:33,923 Now either, the man, u, proposed to the woman, v, at some point in the proposal 178 00:13:33,923 --> 00:13:39,852 algorithm, or he didn't. Let's treat those two cases separately. 179 00:13:39,852 --> 00:13:44,885 So let's first suppose that at no point in the algorithm did the man u, propose 180 00:13:44,885 --> 00:13:48,356 to the woman v. So, what's the behavior of a man in the 181 00:13:48,356 --> 00:13:53,364 proposal algorithm? Well a man simply proposes to the women on his list one at 182 00:13:53,364 --> 00:13:57,461 a time in turn as necessary. And the man, the woman to whom a man 183 00:13:57,461 --> 00:14:02,469 winds up matched at the end of the algorithm is the last woman to whom he 184 00:14:02,469 --> 00:14:04,940 proposed. So if the man u never got around to 185 00:14:04,940 --> 00:14:09,391 proposing to the woman v, that just means that he never searched that far down in 186 00:14:09,391 --> 00:14:12,852 his preference list. And he wound up, at the conclusion of the 187 00:14:12,852 --> 00:14:16,227 algorithm, matched with the woman he prefers to the woman v. 188 00:14:16,227 --> 00:14:20,931 And so that's sufficient to say, to claim that uv is not a blocking pair. 189 00:14:20,931 --> 00:14:26,167 Suppose, on the other hand, that you did propose the woman, v, at some point in 190 00:14:26,167 --> 00:14:29,822 the algorithm. Why did they not wind up matched? Well, 191 00:14:29,822 --> 00:14:34,959 it must be because the woman, v, got a preferred offer, an offer from some man 192 00:14:34,959 --> 00:14:39,207 that she prefers To the man you. Now something we mentioned earlier, is 193 00:14:39,207 --> 00:14:43,790 over the course of the algorithm, a women just winds up being engaged to men that 194 00:14:43,790 --> 00:14:47,630 she prefers more and more. So that any point she winds up engaged to 195 00:14:47,630 --> 00:14:52,366 a man she prefers to you, then the end of the algorithm she will And be engaged, 196 00:14:52,366 --> 00:14:57,286 matched with a man that she prefers to u. And again, that implies that uv is not a 197 00:14:57,286 --> 00:15:00,714 blocking pair. Since uv was arbitrary, this is a stable 198 00:15:00,714 --> 00:15:03,294 matching. That completes the proof of the 199 00:15:03,294 --> 00:15:04,624 Gale-Shapley Theorem.