1 00:00:00,000 --> 00:00:05,468 So now, let's talk about the final and most serious problem, which is that stuff 2 00:00:05,468 --> 00:00:10,340 in the Internet, nodes, links, etc., is failing all the time. 3 00:00:10,340 --> 00:00:14,322 You know its true that we just argued that the asynchronous bellman ford 4 00:00:14,322 --> 00:00:18,087 shortest path rounding protocol is guaranteed to converge to correct 5 00:00:18,087 --> 00:00:20,979 shortest path. that was assuming that the network was 6 00:00:20,979 --> 00:00:24,220 static that no legs were coming up and down. 7 00:00:24,220 --> 00:00:29,425 So one particularly simple problem in a presence of failure is what's known as 8 00:00:29,425 --> 00:00:33,922 the counting to infinity problem. So I'm going to show you a particularly 9 00:00:33,922 --> 00:00:38,103 contrived version of this problem just to kind of illustrate the form in the 10 00:00:38,103 --> 00:00:42,015 simplest way, but it's quite easy to come up with more realistic examples. 11 00:00:42,015 --> 00:00:46,088 And this problem is not just some kind of theoretical flaw in the distributed 12 00:00:46,088 --> 00:00:49,465 Bellmanu2013Ford algorithm. This problem really was observed in 13 00:00:49,465 --> 00:00:53,380 practice with those running protocols from the 1970's. 14 00:00:53,380 --> 00:00:58,719 So imagine we just have a super simple network verticies s, v, and t by directed 15 00:00:58,719 --> 00:01:04,194 arch from s to v and an arc from v to t everything has unit cost and we're doing 16 00:01:04,194 --> 00:01:09,250 it routing to the destination t. You might if you want a more realistic 17 00:01:09,250 --> 00:01:13,535 example, think about the arcs between S and V as standing in for longer paths 18 00:01:13,535 --> 00:01:16,762 that have multiple hops. In any case let's imagine that we 19 00:01:16,762 --> 00:01:21,269 successfully computed shortest paths on this basic network so that distance form 20 00:01:21,269 --> 00:01:25,400 T to itself is zero, from V to T is one and from S to T is two. 21 00:01:25,400 --> 00:01:30,157 Now, again, the issue is that links are failing all the time on the internet. 22 00:01:30,157 --> 00:01:34,120 So, at some point, this link from V to T might go down. 23 00:01:34,120 --> 00:01:38,669 So, at some point, V is going to notice that its link to T has failed and it's 24 00:01:38,669 --> 00:01:43,960 going to have to reset its shortest path estimate to T to be plus infinity. 25 00:01:43,960 --> 00:01:48,681 And in an effort to recover connectivity to t, it makes sense for v to ask it's 26 00:01:48,681 --> 00:01:52,387 neighbors, do they have paths to t? And if so, how long are they? 27 00:01:52,387 --> 00:01:56,929 So in particular, v will pull s, and s will say, oh yeah, I have a path to t of 28 00:01:56,929 --> 00:01:59,918 distance only two. And v says, oh well that's great. 29 00:01:59,918 --> 00:02:02,667 I, currently my estimate to t is plus infinity. 30 00:02:02,667 --> 00:02:07,150 I can get to s with length one, s says it can get back to t with length two. 31 00:02:07,150 --> 00:02:10,160 So that gives me a path of length three to t. 32 00:02:10,160 --> 00:02:15,807 Now, of course the flaw in this reasoning is that s was depending on v to get to t 33 00:02:15,807 --> 00:02:21,455 and now all of a sudden, v circularly is going to use s in its mind to get all the 34 00:02:21,455 --> 00:02:24,865 way back to t. So this already illustrates the dangers 35 00:02:24,865 --> 00:02:29,233 of naively implementing the asynchronous Bellman Ford algorithm in the presence of 36 00:02:29,233 --> 00:02:32,128 link failures. Where does the name counting to infinity 37 00:02:32,128 --> 00:02:34,496 come from. Well you can imagine that for some 38 00:02:34,496 --> 00:02:38,549 implementations of this protocol, S would then say, oh boy, okay, V just revised 39 00:02:38,549 --> 00:02:40,917 it's shortest path estimate to T to be three. 40 00:02:40,917 --> 00:02:44,233 My next hop was to V. So if V takes two longer to get to T then 41 00:02:44,233 --> 00:02:46,812 I'm going to take two longer to get to T as well. 42 00:02:46,812 --> 00:02:50,286 So at that point S updates it's shortest path distance to be four. 43 00:02:50,286 --> 00:02:53,707 And now this keeps going on. At this point V says, oh well, if S is 44 00:02:53,707 --> 00:02:56,760 going to take two longer to get to T, I was depending on S. 45 00:02:56,760 --> 00:03:01,076 So, I have, have to go up by two. And then, S goes up by two and B goes up 46 00:03:01,076 --> 00:03:05,241 by two and so on, forevermore. So failures cause problems for the 47 00:03:05,241 --> 00:03:08,463 convergence of the distributive shortest-path routing protocols. 48 00:03:08,463 --> 00:03:11,835 Not just the counting-to-infinity problem, there are others as well. 49 00:03:11,835 --> 00:03:15,208 And this is a tough issue. Much blood and ink has been spilled over 50 00:03:15,208 --> 00:03:19,134 the relative merits of different possible solutions for dealing with failures. 51 00:03:19,134 --> 00:03:22,910 Back in the 1980s people were largely proposing sort of hacks to the basic 52 00:03:22,910 --> 00:03:25,376 protocol. I will the, give them credit, they had 53 00:03:25,376 --> 00:03:29,252 some pretty ridiculous and fun names for their hacks, like split horizon with 54 00:03:29,252 --> 00:03:33,229 poison reverse." but what eventually people converged on is moving from these 55 00:03:33,229 --> 00:03:38,840 so-called distance-vector protocols to what are called path vector protocols. 56 00:03:40,040 --> 00:03:44,883 And what happens in a path vector protocol is, each vertex maintains not 57 00:03:44,883 --> 00:03:49,658 just the next top to every possible destination, but they actually keep 58 00:03:49,658 --> 00:03:55,580 locally the entire path, which is going to get used from v to each destination t. 59 00:03:55,580 --> 00:03:59,600 So there's definitely some irony here because this is essentially the solution 60 00:03:59,600 --> 00:04:03,418 to reconstructing optimal solutions in dynamic programming that we've been 61 00:04:03,418 --> 00:04:07,337 studiously avoiding this whole time. You might, you might recall when we first 62 00:04:07,337 --> 00:04:11,103 started discussing reconstruction algorithms, this was back independent 63 00:04:11,103 --> 00:04:14,056 sets of path graphs. We first said, well, you know, you could 64 00:04:14,056 --> 00:04:18,127 in the forward pass through the array store not just the optimal solution value 65 00:04:18,127 --> 00:04:21,639 but the optimal solution itself. But we argued, well, that's, that's not 66 00:04:21,639 --> 00:04:23,472 really the best idea. It wastes time. 67 00:04:23,472 --> 00:04:26,271 It wastes space. Much smarter is to just reconstruct an 68 00:04:26,271 --> 00:04:29,580 optimal solution via a backward pass through the filled in table. 69 00:04:29,580 --> 00:04:33,393 But what's happening here is this optimized version of the reconstruction 70 00:04:33,393 --> 00:04:37,310 algorithm that doesn't use extra time or space, it's just not robust enough to 71 00:04:37,310 --> 00:04:39,526 withstand the vagaries of internet routing. 72 00:04:39,526 --> 00:04:43,546 So we are going to the crudest solution, where we literally just store optimal 73 00:04:43,546 --> 00:04:46,020 solutions, not just the solution value it's self. 74 00:04:46,020 --> 00:04:49,106 So that explains why this is called a path vector protocol. 75 00:04:49,106 --> 00:04:53,082 Distance vector protocol you store a distance for each possible destination. 76 00:04:53,082 --> 00:04:56,320 Here were storing a path for each possible destination. 77 00:04:56,320 --> 00:04:59,298 So the drawback is exactly what we've been saying all along. 78 00:04:59,298 --> 00:05:03,270 If you store the optimal solutions, and not just the optimal solution value, it's 79 00:05:03,270 --> 00:05:06,845 going to take quite a bit more space. And indeed, doing this blows up the 80 00:05:06,845 --> 00:05:09,820 routing tables in routers, that's just a fact. 81 00:05:09,820 --> 00:05:13,353 There are two quite important benefits. The first we've already discussed, 82 00:05:13,353 --> 00:05:16,741 they're more robust to failures. I am not going to discuss the details 83 00:05:16,741 --> 00:05:20,226 about how you use this extra path information to handle failures better. 84 00:05:20,226 --> 00:05:23,953 But certainly in our contrived motivating example you can see how if S and V 85 00:05:23,953 --> 00:05:27,777 actually knew the entire path they were storing to T, they could make sure that 86 00:05:27,777 --> 00:05:30,760 they didn't get into this counting to infinity problem. 87 00:05:30,760 --> 00:05:35,656 But moving to a path vector protocol doesn't just add robustness, it also 88 00:05:35,656 --> 00:05:41,392 enables new functionality. So in particular, passing around entire 89 00:05:41,392 --> 00:05:44,263 paths allows. Vertices to express more complicated 90 00:05:44,263 --> 00:05:47,250 preferences over routes than merely what their lengths are. 91 00:05:47,250 --> 00:05:50,034 Imagine, for example, you were a small internet provider. 92 00:05:50,034 --> 00:05:53,832 And you have a much more favorable contract with AT&T than you do with 93 00:05:53,832 --> 00:05:56,161 Sprint. So it cost you much less money to send 94 00:05:56,161 --> 00:05:59,806 traffic through AT&T as an intermediary than it does through Sprint. 95 00:05:59,806 --> 00:06:03,654 Well, now imagine you're running this, distributive shortest-path protocol. 96 00:06:03,654 --> 00:06:07,451 And two of your neighbors offer you two paths to get to some destination T. 97 00:06:07,451 --> 00:06:11,147 The first path goes through AT&T, the second path goes through Sprint. 98 00:06:11,147 --> 00:06:15,060 And, of course, the only reason you know this information is because it's a path 99 00:06:15,060 --> 00:06:17,982 vector protocol, and you're passing around the actual path and you can see 100 00:06:17,982 --> 00:06:21,577 what the intermediate stops are. Well then your free to say oh well I'm 101 00:06:21,577 --> 00:06:25,527 going to use as my path to t and this is what I'm going to advertise to other 102 00:06:25,527 --> 00:06:29,376 people I'm going to advertise and store the path that goes through AT&T. 103 00:06:29,376 --> 00:06:32,360 I'm going to ignore the one that goes through Sprint. 104 00:06:32,360 --> 00:06:36,807 So that is more or less how the border gateway protocol, aka the BGP protocol, 105 00:06:36,807 --> 00:06:41,254 works, and that is the protocol which governs how traffic gets routed through 106 00:06:41,254 --> 00:06:44,950 the core of the Internet. So that wraps up our discussion of how 107 00:06:44,950 --> 00:06:49,628 shortest path algorithms dating back to the 1950's has played a fundamental role 108 00:06:49,628 --> 00:06:54,017 in how shortest path routing has evolved in the Internet over the past half 109 00:06:54,017 --> 00:06:54,480 century.