1 00:00:00,000 --> 00:00:04,033 So we now have an algorithm, a very elegant one, that solves the weighted 2 00:00:04,033 --> 00:00:08,458 independent set problem in path graphs. Moreover the algorithm runs in linear O 3 00:00:08,458 --> 00:00:12,659 of n time, clearly the best possible. But before we do a victory lap I want to 4 00:00:12,659 --> 00:00:17,084 point out that there's something this algorithm doesn't do that you might want 5 00:00:17,084 --> 00:00:21,397 it to do, mainly hand you the actual optimal solution, not merely the value of 6 00:00:21,397 --> 00:00:24,982 that optimal solution. So the point of this video is to show you 7 00:00:24,982 --> 00:00:28,735 how we can reconstruct an optimal solution given the table-filling 8 00:00:28,735 --> 00:00:33,863 algorithm from the previous video. So we just write that algorithm back 9 00:00:33,863 --> 00:00:36,640 down. It's so short, it won't take me too long. 10 00:00:40,220 --> 00:00:43,935 Now when this algorithm completes, what do we have in our hands? 11 00:00:43,935 --> 00:00:47,178 We have an array, and this array is filled with numbers. 12 00:00:47,178 --> 00:00:51,659 In particular, in the last entry in the array is a number like 184 so that's 13 00:00:51,659 --> 00:00:54,254 great. That tells us that the maximum weight 14 00:00:54,254 --> 00:00:56,790 that any independence set possesses is 184. 15 00:00:56,790 --> 00:01:01,448 But, in many applications, we're going to want to know not just that information 16 00:01:01,448 --> 00:01:06,107 but actually which vertices constitute that independence set with total weight 17 00:01:06,107 --> 00:01:10,576 of 184. So, perhaps the first hack that comes to 18 00:01:10,576 --> 00:01:15,226 mind to address this issue is to augment the array, so that each entry stores not 19 00:01:15,226 --> 00:01:20,048 just the value of an optimal solution to the sub-problem produced by the graph G 20 00:01:20,048 --> 00:01:23,607 sub I, but also an actual set of vertices achieving that value. 21 00:01:23,607 --> 00:01:28,143 And I will leave it for you to if you want, rework the previous pseudo codes so 22 00:01:28,143 --> 00:01:32,678 that when you fill in a new entry, you fill in not just the value of an optimal 23 00:01:32,678 --> 00:01:37,558 solution given solution to the previous sub-problems but infact also the solution 24 00:01:37,558 --> 00:01:40,474 itself. This hack however is not generally how 25 00:01:40,474 --> 00:01:45,440 things are done in dynamic programming. It unnecessarily wastes both time and 26 00:01:45,440 --> 00:01:50,598 space and a much smarter approach is to reconstruct from the filled in table an 27 00:01:50,598 --> 00:01:56,968 optimal solution as needed. So if you think about it, it's kind of 28 00:01:56,968 --> 00:02:01,338 cool that this is even possible, that our one line algorithm doesn't cover its 29 00:02:01,338 --> 00:02:05,765 tracks, that it leaves enough clues for us as detectives to go back and examine 30 00:02:05,765 --> 00:02:08,286 and reconstruct what the optimal solution is. 31 00:02:08,286 --> 00:02:12,320 The following key point articulates exactly why this is indeed possible. 32 00:02:16,440 --> 00:02:20,945 So the starting point of this observation is the correctness of our algorithm, our 33 00:02:20,945 --> 00:02:24,077 one line algorithm. Which of course we established in the 34 00:02:24,077 --> 00:02:26,714 previous video. And by correctness I mean what's 35 00:02:26,714 --> 00:02:30,560 guaranteed that this algorithm will populate each entry of your array 36 00:02:30,560 --> 00:02:33,417 correctly. The number in the ith entry is indeed the 37 00:02:33,417 --> 00:02:36,220 maximum weight of independent set in the graph G I. 38 00:02:36,220 --> 00:02:40,203 So remember our thought experiment about what the optimal solution could possibly 39 00:02:40,203 --> 00:02:42,680 look like. We concluded it could only be one of two 40 00:02:42,680 --> 00:02:46,567 things, and we really wound up wishing we had a little birdie that could tell us 41 00:02:46,567 --> 00:02:50,210 whether or not the rightmost vertex v.sub.n was in the optimal solution or 42 00:02:50,210 --> 00:02:52,444 not. If we knew which case we were in we could 43 00:02:52,444 --> 00:02:56,525 just recursively compute the remainder of the solution from a graph that has either 44 00:02:56,525 --> 00:02:58,760 one or two fewer vertices. So here's the point, 45 00:02:58,760 --> 00:03:02,620 this filled in table, that's our little birdy. 46 00:03:02,620 --> 00:03:05,574 Here's what I mean. But what, what's, what's the reasoning 47 00:03:05,574 --> 00:03:09,121 that our algorithm goes through to fill in this last entry of this array, and 48 00:03:09,121 --> 00:03:12,715 don't forget, we've already proven that our algorithm's correct, we did that in 49 00:03:12,715 --> 00:03:15,156 the last video? Well, it does a comparison between the 50 00:03:15,156 --> 00:03:17,460 two possible candidates vying for the optimal one. 51 00:03:17,460 --> 00:03:22,339 On the one hand it commutes the case one solution that looks up the optimal value 52 00:03:22,339 --> 00:03:26,981 of a solution for the graph with one fewer vertex and it compares that to the 53 00:03:26,981 --> 00:03:31,623 case two solution including V N the last vertex and adding that to an optimal 54 00:03:31,623 --> 00:03:35,967 solution with two fewer vertices. And in this max operator in the line of 55 00:03:35,967 --> 00:03:39,300 code it's explicitly comparing which is better, case one. 56 00:03:39,300 --> 00:03:43,507 The solution which excludes B sub N, or case two, the solution which includes B 57 00:03:43,507 --> 00:03:45,484 sub N. So, whichever one of those was the 58 00:03:45,484 --> 00:03:49,134 winner, whichever one of those cases was used to fill in that last entry. 59 00:03:49,134 --> 00:03:52,885 That exactly tells us whether or not B sub N is in the optimal solution. 60 00:03:52,885 --> 00:03:56,940 If we use the first case, that means B sub N is not in the optimal solution, it 61 00:03:56,940 --> 00:03:59,627 gets excluded. If the second case was the winner, then 62 00:03:59,627 --> 00:04:03,124 we know B sub N is in the optimal solution, because that was the winner. 63 00:04:03,124 --> 00:04:06,217 If we have a tie, then there's an optimal solution either way. 64 00:04:06,217 --> 00:04:09,360 There's one that includes BN and there's one that excludes BN, 65 00:04:09,360 --> 00:04:13,742 So those are the tracks in the mud left for us by the forward direction of the 66 00:04:13,742 --> 00:04:16,349 for-loop. We can just go back and look at which 67 00:04:16,349 --> 00:04:19,067 case was used to fill in each entry of the array. 68 00:04:19,067 --> 00:04:23,504 Again, for the ones that used case one, that corresponds to excluding the current 69 00:04:23,504 --> 00:04:27,775 vertex; for those that used case two to fill in the entry, that corresponds to 70 00:04:27,775 --> 00:04:32,027 including that vertex, in the solution. So the reconnect structure algorithm will 71 00:04:32,027 --> 00:04:35,746 take as input the filled in array that was generated by our one line algorithm 72 00:04:35,746 --> 00:04:38,382 on the previous slide. And what it's going to do, it's going to 73 00:04:38,382 --> 00:04:40,454 trace through this array from right to left. 74 00:04:40,454 --> 00:04:43,985 And at each step of the main loop, it's going to say, it's going to look at the 75 00:04:43,985 --> 00:04:46,621 current entry. And it's just going to compute explicitly 76 00:04:46,621 --> 00:04:49,728 which of the two candidates were used to fill in this array entry. 77 00:04:49,728 --> 00:04:53,118 If you want, you can also cache the results of these comparisions on the 78 00:04:53,118 --> 00:04:55,425 forward pass. That's an optimization that will be 79 00:04:55,425 --> 00:04:58,720 useful later for harder problems. But for now if you want, you can just 80 00:04:58,720 --> 00:05:00,698 think about redoing the comparision thing. 81 00:05:00,698 --> 00:05:04,099 Hey, you know, their were two possible. More ways that we could have filled in 82 00:05:04,099 --> 00:05:06,440 this entry, let's just check which of the two were used. 83 00:05:06,440 --> 00:05:11,896 So if in fact the preceding array entry is at least as large as the one from two 84 00:05:11,896 --> 00:05:16,746 back plus the weight of the current vertex, that corresponds to case one 85 00:05:16,746 --> 00:05:21,731 winning, to the sub-solution that excludes the current vertex being better 86 00:05:21,731 --> 00:05:26,514 than the one that includes it. So in that case we just skip the current 87 00:05:26,514 --> 00:05:31,400 vertex V sub I and we decrease the array index by one in our scan. 88 00:05:31,400 --> 00:05:35,576 If the other case wins, that is, if we fill in the current entry, if an optimal 89 00:05:35,576 --> 00:05:39,807 solution to the current graph G sub I comprises the current vertex of E sub I 90 00:05:39,807 --> 00:05:43,224 plus the optimal solution to the graph with two fewer vertices. 91 00:05:43,224 --> 00:05:45,882 In this case we know we'd better include V sub I. 92 00:05:45,882 --> 00:05:49,244 That's part of an optimal solution to the current sub-problem. 93 00:05:49,244 --> 00:05:53,584 Moreover, that's the case where we need to look up the optimal solution with two 94 00:05:53,584 --> 00:05:56,567 fewer vertices. So, we include the current vertex and we 95 00:05:56,567 --> 00:06:02,088 decrease the array and index by two. So formally we have a correctness claim 96 00:06:02,088 --> 00:06:06,122 which is that the final output S return by the reconstructing algorithm is, as 97 00:06:06,122 --> 00:06:09,483 desired, a maximum weight independent set of the original graph g. 98 00:06:09,483 --> 00:06:13,414 We've already talked about all the ingredients necessary for a formal proof, 99 00:06:13,414 --> 00:06:17,293 for those of you who are interested. Of course, it precedes by induction and, 100 00:06:17,293 --> 00:06:21,378 in the inductive step, you use the same case analysis we've been using over and 101 00:06:21,378 --> 00:06:24,016 over again. The optimal solution at any given point, 102 00:06:24,016 --> 00:06:27,791 it has only two possible candidates. The algorithm explicitly figures out 103 00:06:27,791 --> 00:06:32,020 which of the two it is and that is what. Triggers whether or not to include or 104 00:06:32,020 --> 00:06:35,488 exclude the current vertex. The running time, it's even easier. 105 00:06:35,488 --> 00:06:38,445 We have a wild loop that runs in most an iterations. 106 00:06:38,445 --> 00:06:41,060 We do constant work in each of the iterations. 107 00:06:41,060 --> 00:06:45,267 So just like the forward pass, this backwards pass is really just a single 108 00:06:45,267 --> 00:06:48,735 scan through the away. It's going to be lightning fast linear 109 00:06:48,735 --> 00:06:49,020 time.