1
00:00:00,000 --> 00:00:04,033
So we now have an algorithm, a very 
elegant one, that solves the weighted 

2
00:00:04,033 --> 00:00:08,458
independent set problem in path graphs. 
Moreover the algorithm runs in linear O 

3
00:00:08,458 --> 00:00:12,659
of n time, clearly the best possible. 
But before we do a victory lap I want to 

4
00:00:12,659 --> 00:00:17,084
point out that there's something this 
algorithm doesn't do that you might want 

5
00:00:17,084 --> 00:00:21,397
it to do, mainly hand you the actual 
optimal solution, not merely the value of 

6
00:00:21,397 --> 00:00:24,982
that optimal solution. 
So the point of this video is to show you 

7
00:00:24,982 --> 00:00:28,735
how we can reconstruct an optimal 
solution given the table-filling 

8
00:00:28,735 --> 00:00:33,863
algorithm from the previous video. 
So we just write that algorithm back 

9
00:00:33,863 --> 00:00:36,640
down. 
It's so short, it won't take me too long. 

10
00:00:40,220 --> 00:00:43,935
Now when this algorithm completes, what 
do we have in our hands? 

11
00:00:43,935 --> 00:00:47,178
We have an array, and this array is 
filled with numbers. 

12
00:00:47,178 --> 00:00:51,659
In particular, in the last entry in the 
array is a number like 184 so that's 

13
00:00:51,659 --> 00:00:54,254
great. 
That tells us that the maximum weight 

14
00:00:54,254 --> 00:00:56,790
that any independence set possesses is 
184. 

15
00:00:56,790 --> 00:01:01,448
But, in many applications, we're going to 
want to know not just that information 

16
00:01:01,448 --> 00:01:06,107
but actually which vertices constitute 
that independence set with total weight 

17
00:01:06,107 --> 00:01:10,576
of 184. 
So, perhaps the first hack that comes to 

18
00:01:10,576 --> 00:01:15,226
mind to address this issue is to augment 
the array, so that each entry stores not 

19
00:01:15,226 --> 00:01:20,048
just the value of an optimal solution to 
the sub-problem produced by the graph G 

20
00:01:20,048 --> 00:01:23,607
sub I, but also an actual set of vertices 
achieving that value. 

21
00:01:23,607 --> 00:01:28,143
And I will leave it for you to if you 
want, rework the previous pseudo codes so 

22
00:01:28,143 --> 00:01:32,678
that when you fill in a new entry, you 
fill in not just the value of an optimal 

23
00:01:32,678 --> 00:01:37,558
solution given solution to the previous 
sub-problems but infact also the solution 

24
00:01:37,558 --> 00:01:40,474
itself. 
This hack however is not generally how 

25
00:01:40,474 --> 00:01:45,440
things are done in dynamic programming. 
It unnecessarily wastes both time and 

26
00:01:45,440 --> 00:01:50,598
space and a much smarter approach is to 
reconstruct from the filled in table an 

27
00:01:50,598 --> 00:01:56,968
optimal solution as needed. 
So if you think about it, it's kind of 

28
00:01:56,968 --> 00:02:01,338
cool that this is even possible, that our 
one line algorithm doesn't cover its 

29
00:02:01,338 --> 00:02:05,765
tracks, that it leaves enough clues for 
us as detectives to go back and examine 

30
00:02:05,765 --> 00:02:08,286
and reconstruct what the optimal solution 
is. 

31
00:02:08,286 --> 00:02:12,320
The following key point articulates 
exactly why this is indeed possible. 

32
00:02:16,440 --> 00:02:20,945
So the starting point of this observation 
is the correctness of our algorithm, our 

33
00:02:20,945 --> 00:02:24,077
one line algorithm. 
Which of course we established in the 

34
00:02:24,077 --> 00:02:26,714
previous video. 
And by correctness I mean what's 

35
00:02:26,714 --> 00:02:30,560
guaranteed that this algorithm will 
populate each entry of your array 

36
00:02:30,560 --> 00:02:33,417
correctly. 
The number in the ith entry is indeed the 

37
00:02:33,417 --> 00:02:36,220
maximum weight of independent set in the 
graph G I. 

38
00:02:36,220 --> 00:02:40,203
So remember our thought experiment about 
what the optimal solution could possibly 

39
00:02:40,203 --> 00:02:42,680
look like. 
We concluded it could only be one of two 

40
00:02:42,680 --> 00:02:46,567
things, and we really wound up wishing we 
had a little birdie that could tell us 

41
00:02:46,567 --> 00:02:50,210
whether or not the rightmost vertex 
v.sub.n was in the optimal solution or 

42
00:02:50,210 --> 00:02:52,444
not. 
If we knew which case we were in we could 

43
00:02:52,444 --> 00:02:56,525
just recursively compute the remainder of 
the solution from a graph that has either 

44
00:02:56,525 --> 00:02:58,760
one or two fewer vertices. 
So here's the point, 

45
00:02:58,760 --> 00:03:02,620
this filled in table, 
that's our little birdy. 

46
00:03:02,620 --> 00:03:05,574
Here's what I mean. 
But what, what's, what's the reasoning 

47
00:03:05,574 --> 00:03:09,121
that our algorithm goes through to fill 
in this last entry of this array, and 

48
00:03:09,121 --> 00:03:12,715
don't forget, we've already proven that 
our algorithm's correct, we did that in 

49
00:03:12,715 --> 00:03:15,156
the last video? 
Well, it does a comparison between the 

50
00:03:15,156 --> 00:03:17,460
two possible candidates vying for the 
optimal one. 

51
00:03:17,460 --> 00:03:22,339
On the one hand it commutes the case one 
solution that looks up the optimal value 

52
00:03:22,339 --> 00:03:26,981
of a solution for the graph with one 
fewer vertex and it compares that to the 

53
00:03:26,981 --> 00:03:31,623
case two solution including V N the last 
vertex and adding that to an optimal 

54
00:03:31,623 --> 00:03:35,967
solution with two fewer vertices. 
And in this max operator in the line of 

55
00:03:35,967 --> 00:03:39,300
code it's explicitly comparing which is 
better, case one. 

56
00:03:39,300 --> 00:03:43,507
The solution which excludes B sub N, or 
case two, the solution which includes B 

57
00:03:43,507 --> 00:03:45,484
sub N. 
So, whichever one of those was the 

58
00:03:45,484 --> 00:03:49,134
winner, whichever one of those cases was 
used to fill in that last entry. 

59
00:03:49,134 --> 00:03:52,885
That exactly tells us whether or not B 
sub N is in the optimal solution. 

60
00:03:52,885 --> 00:03:56,940
If we use the first case, that means B 
sub N is not in the optimal solution, it 

61
00:03:56,940 --> 00:03:59,627
gets excluded. 
If the second case was the winner, then 

62
00:03:59,627 --> 00:04:03,124
we know B sub N is in the optimal 
solution, because that was the winner. 

63
00:04:03,124 --> 00:04:06,217
If we have a tie, then there's an optimal 
solution either way. 

64
00:04:06,217 --> 00:04:09,360
There's one that includes BN and there's 
one that excludes BN, 

65
00:04:09,360 --> 00:04:13,742
So those are the tracks in the mud left 
for us by the forward direction of the 

66
00:04:13,742 --> 00:04:16,349
for-loop. 
We can just go back and look at which 

67
00:04:16,349 --> 00:04:19,067
case was used to fill in each entry of 
the array. 

68
00:04:19,067 --> 00:04:23,504
Again, for the ones that used case one, 
that corresponds to excluding the current 

69
00:04:23,504 --> 00:04:27,775
vertex; for those that used case two to 
fill in the entry, that corresponds to 

70
00:04:27,775 --> 00:04:32,027
including that vertex, in the solution. 
So the reconnect structure algorithm will 

71
00:04:32,027 --> 00:04:35,746
take as input the filled in array that 
was generated by our one line algorithm 

72
00:04:35,746 --> 00:04:38,382
on the previous slide. 
And what it's going to do, it's going to 

73
00:04:38,382 --> 00:04:40,454
trace through this array from right to 
left. 

74
00:04:40,454 --> 00:04:43,985
And at each step of the main loop, it's 
going to say, it's going to look at the 

75
00:04:43,985 --> 00:04:46,621
current entry. 
And it's just going to compute explicitly 

76
00:04:46,621 --> 00:04:49,728
which of the two candidates were used to 
fill in this array entry. 

77
00:04:49,728 --> 00:04:53,118
If you want, you can also cache the 
results of these comparisions on the 

78
00:04:53,118 --> 00:04:55,425
forward pass. 
That's an optimization that will be 

79
00:04:55,425 --> 00:04:58,720
useful later for harder problems. 
But for now if you want, you can just 

80
00:04:58,720 --> 00:05:00,698
think about redoing the comparision 
thing. 

81
00:05:00,698 --> 00:05:04,099
Hey, you know, their were two possible. 
More ways that we could have filled in 

82
00:05:04,099 --> 00:05:06,440
this entry, let's just check which of the 
two were used. 

83
00:05:06,440 --> 00:05:11,896
So if in fact the preceding array entry 
is at least as large as the one from two 

84
00:05:11,896 --> 00:05:16,746
back plus the weight of the current 
vertex, that corresponds to case one 

85
00:05:16,746 --> 00:05:21,731
winning, to the sub-solution that 
excludes the current vertex being better 

86
00:05:21,731 --> 00:05:26,514
than the one that includes it. 
So in that case we just skip the current 

87
00:05:26,514 --> 00:05:31,400
vertex V sub I and we decrease the array 
index by one in our scan. 

88
00:05:31,400 --> 00:05:35,576
If the other case wins, that is, if we 
fill in the current entry, if an optimal 

89
00:05:35,576 --> 00:05:39,807
solution to the current graph G sub I 
comprises the current vertex of E sub I 

90
00:05:39,807 --> 00:05:43,224
plus the optimal solution to the graph 
with two fewer vertices. 

91
00:05:43,224 --> 00:05:45,882
In this case we know we'd better include 
V sub I. 

92
00:05:45,882 --> 00:05:49,244
That's part of an optimal solution to the 
current sub-problem. 

93
00:05:49,244 --> 00:05:53,584
Moreover, that's the case where we need 
to look up the optimal solution with two 

94
00:05:53,584 --> 00:05:56,567
fewer vertices. 
So, we include the current vertex and we 

95
00:05:56,567 --> 00:06:02,088
decrease the array and index by two. 
So formally we have a correctness claim 

96
00:06:02,088 --> 00:06:06,122
which is that the final output S return 
by the reconstructing algorithm is, as 

97
00:06:06,122 --> 00:06:09,483
desired, a maximum weight independent set 
of the original graph g. 

98
00:06:09,483 --> 00:06:13,414
We've already talked about all the 
ingredients necessary for a formal proof, 

99
00:06:13,414 --> 00:06:17,293
for those of you who are interested. 
Of course, it precedes by induction and, 

100
00:06:17,293 --> 00:06:21,378
in the inductive step, you use the same 
case analysis we've been using over and 

101
00:06:21,378 --> 00:06:24,016
over again. 
The optimal solution at any given point, 

102
00:06:24,016 --> 00:06:27,791
it has only two possible candidates. 
The algorithm explicitly figures out 

103
00:06:27,791 --> 00:06:32,020
which of the two it is and that is what. 
Triggers whether or not to include or 

104
00:06:32,020 --> 00:06:35,488
exclude the current vertex. 
The running time, it's even easier. 

105
00:06:35,488 --> 00:06:38,445
We have a wild loop that runs in most an 
iterations. 

106
00:06:38,445 --> 00:06:41,060
We do constant work in each of the 
iterations. 

107
00:06:41,060 --> 00:06:45,267
So just like the forward pass, this 
backwards pass is really just a single 

108
00:06:45,267 --> 00:06:48,735
scan through the away. 
It's going to be lightning fast linear 

109
00:06:48,735 --> 00:06:49,020
time.