Having completed our first analysis of an algorithm namely an upper bound on the running time of the merge sort algorithm. What I want to do next is take a step back and be explicit about three assumptions. Three biases that we made when we did this analysis of the merge sort and interpreted the results. These three assumptions we will adopt as guiding principles for how to reason about algorithms and how to define a so called fast algorithm for the rest of the course. So the first guiding principle is that we used what's often called worst case analysis. By worst case analysis I simply mean that our upper bound of 6 N log N plus 6 N applies to the number of line executed for every single input array of link N.. We made absolutely not assumptions about the input, where it comes from, what it looks like, beyond what the input link in was but differently if hypothetically we have some adversary whose sole purpose in life was to concoct some level of input designed to make our algorithm as slow as possible. The worse this adversary could do is upper bounded by this same number, 6 N log N + 6 N. Now this so sort of worst case guarantee popped out so naturally from our analysis of MergeSort you might well be wondering what else could you do? Well, two other methods of analysis, which do have their place, although we won't really discuss them in this course, are Average" Case" analysis and also the use of a set of pre-specified benchmarks. My average case analysis I mean you analyze the average running time of an algorithm under some assumption about the relative frequencies in different inputs. So for example in the sorting problem, one thing you could do although that is not what we did here, you could assume that every possible input array is equally or likely and then analyze the average running time of an algorithm. By, benchmarks I just mean that one agrees upfront about some sets say, ten or twenty benchmark inputs, which are thought to represent practical or typical inputs for the algorithm. Now both average case analysist and benchmarks are useful in certain settings, but for them to make sense you really have to have domain knowledge about your problem, you need to have some understanding of what inputs are more common than others, what inputs better represent typical inputs that others. By contrast, in worse case analysis, by definition you're making absolutely no assumptions about where the input comes from, so as a result, worse case analysis, is particularly appropriate for general purpose subroutines, subroutines that you design without having any knowledge of how they will be used or what kind of inputs they will be used on. And happily, another bonus of doing worst case analysis, as we will in this course, it's usually mathematically much more tractable than trying to analyze the average performance of an algorithm under some distribution over inputs. Or to understand the detailed behavior of an algorithm on a particular set of benchmark inputs. This mathematical tractability was reflected in our merge sort analysis, where we had no operatory goal of analyzing the worst case, per se. But it's naturally what popped out of our reasoning about the algorithm's running time. The second and third guiding principals are closely related the second one is that in this course when we analyze algorithms we won't worry unduely about small costing factors or lower order terms. We saw this philosophy at work very early on in our analysis of merge shore when we discussed the number of lines of code that the merge subroutine requires. We first upper-bounded it by 4M plus two for an array of length M and then we said eh, let's just think about it as 6M instead, let's have a simpler sloppy upper bound and work with that. So that was already an example of not worrying about small changes in the constant factor. Now the question you should be wondering about is, why do we do this and can we really get away with it? So let me tell you about the justifications for this kind of principle. So the first motivation is clear, and we used it already in our MergeSort analysis which is its simply way easier mathematically, if we don't have to precisely pin down what the leading constant factors and the rotor terms are. The second justification is a little less obvious, but it's extremely important. So I claim that giving the level at which we were describing algorithms in this course it would be totally in appropriate to obsess unduly about what the constant factors are. Recall our discussion of the merge sub routine, so we wrote that sub routine down in pseudo code and we gave it 4 N + 2 on the number of lines of code executed giving an input of length M. We also noted that it was somewhat ambiguous exactly how many lines of code we should count it as depending on how you count loop increments and so on. So even there, small constant factors could creep in given the under-specification of the pseudo-code. Depending on how that pseudo-code gets translated into an actual programming language, like C or Java. You'll see the number of lines of code deviate even further, not by a lot but again by small constant factors. When such a program is then compiled down into machine code, you'll see even a greater variance depending on the exact processor, the compiler, the compiler optimizations, the programing implementation and so on. So to summarize because we're going to describe algorithms at a level that transcends any particular programming language it would be inappropriate to specify a precise consonance. The precise consonance will ultimately determined by more machine dependent aspects like who the programmer is, what the compiler is, what the processor is and so on. And now the third justification is, frankly, we're just going to be able to get away with it. That is, one might be concerned that ignoring things like small constant factors leads us astray. That we wind up deriving results, which suggest that an algorithm is fast when it's really slow in practice or vice versa, but for the problems we discuss in this course, we'll get extremely accurate predictive power even though we won't be keeping track of lower terms and constant factors. When the mathematical analysis we do suggests that an algorithm is fast, indeed it will be. When it suggests that it's not fast, indeed that will be the case. So we lose a little bit of granularity of information but we don't lose on what we really care about which is accurate guidance about what algorithms are going to be faster. Than others. So the first two justifications, I think, are pretty self evident. This third justification is more of an assertion, but it's one we'll be backing up over and over again as we proceed through this course. Now don't get me wrong, I'm not saying constant factors aren't important in practice. Obviously, for crucial programs, the constant factors are hugely important. If you're running the, sort of, crucial loop for, you know, your startup's survival depends on, by all means, optimize the constant like crazy. The point is just that understanding tiny constant factors in the analysis is an inappropriate level of granularity for the kind of algorithm analysis we're going to be doing in this course. Okay. Let's move on to the third and final, guiding principle. So the third principle is that we're going to use what's called Asintonic analysis, by which I mean we will focus on the case of large input sizes. The performance of an algorithm as the size N of the input grows large, that is, tends to infinity. Now this focus on large input sizes was already evident when we interpreted our bound on MergeSort so how did we describe the bound in MergeSort? We said oh, well it needs a number of operations proportional a constant factor times N log N. And we very cavalierly declared this was better than any algorithm which has quadratic dependence of its running time on the number of operations. So for example we argue that MergeSort is a better, faster algorithm than something like insertion sort without actually discussing the cost and factors at all. So mathematically, we were saying the running time of merge short. Which we know, which we can represent as the function, six analog base true of N + X N is better than any function which has a quadratic dependence on N even one with a small constant like let's say one half N squared which might be roughly the running time of insertion sort. And this is a mathematical statement that is true, if and only if, N is sufficiently large. Once N grows large it's certainly true that the expression on the left is smaller than the expression on the right. But for small N, the expression on the right is actually going to be smaller because of the smaller leading term. So, in saying that MergeSort is superior to insertion sort, the bias is that we're focusing on problems with large N. So, the question you should have is, is that reasonable? Is that a justified assumption to focus on, large input sizes? And, the answer is certainly, yes. So the reason we focus on large input sizes is because frankly those are the only problems which are even which are at all interesting. If all you need to do is sort 100 numbers use whatever method you want and its going to happen instantaneously on modern computers. You don't need to know say the divide and conquer paradigm if all you need to do is sort 100 numbers. So one thing you might be wondering is if with computers getting faster all the time according to Moore's Law if really it doesn't even matter to think about algorithmic analysis. If eventually all problematic sizes will just be tribally solvable on super fast computers but in fact the opposite is true. Moore's law with computers getting faster actually says that our computational ambitions will naturally grow, we naturally focus on ever larger problem sizes and the gulf between an n squared algorithm and an N log N algorithm will become ever wider. A different way to think about it is how much bigger a problem size you can solve. As computers get faster. If you're using an algorithm with a running time which is proportional to the input size. Then if computers get faster by a factor of four, then you can solve problems which were a factor of four larger. Whereas, if you're using an algorithm whose running time is proportional to the square of the input size, then the computers get faster by a factor of four. You can only solve double the problem size and mostly even starker examples of this gulf between different algorithmic approaches as time goes on. So to drive this point home, let me show you a couple of graphs. So what we're looking at here is, we're looking at a graph of two functions. So the solid function is the upper bound that we proved on MergeShort so this is going to be 6 N log base 2 of N + 6 N. And the dotted line is a estimate or rather generous estimate about the running time of insertion sort. Namely, one half * N2. squared. And we see here, in the graph, exactly the behavior we discussed earlier. Which is that, for small n, down here. In fact, because one half inch squared has a smaller leading constant its actually a smaller function. And this is true up to this crossing point, of maybe 90 or so. But then beyond N equal 90. The quadratic growth in the N squared term overwhelms the fact that it had a smaller constant and it starts getting bigger than this other function 6N login plus 6N. So that in the regime below 90 its predicting that insertion sort will be better and in the regime above 90 its predicting that MergeSort will be faster. Now here's what's interesting let's scale the X axis let's look well beyond this crossing point of 90 let's just increase it in order of magnitude up to a raise of size 1500. And I want to emphasize these are still very small problem sizes if all you need to do to is sort of raise of size 1500 you really don't need to know divide and conquer or anything else we'll talk about. That's a pretty trivial problem on modern computers. So what we're seeing is that, even for very modest problem sizes. Here an array of say, size 1500. The quadratic dependence and the insertion sort bound is more than dwarfing the fact that it had a lower than constant factor. So in this large regime, the gulf between the two algorithms is growing. And of course, if I increased it another 10X or 100X or 1000X to get to genuinely interesting problem sizes the gap between these two algorithms would be even bigger, it would be huge. That said I'm not saying you should be completely ignorant of cost and factors when you implement algorithms. It's still good to have a general sense of what these cost and factors are so for example in highly tuned versions of merge sort in which you will find in many programming libraries. In fact, because of the difference in constant factors. The algorithm will actually switch from merge sort, over to insertion sort. Once the problem size drops below some particular threshold. Say, seven elements or something like that. So for small problem sizes, you use the algorithm with smaller constant factors. Then the insertion sort. For larger problem sa, sizes, you use the algorithm with a better rate of growth, namely, merge sort. So, to review, our first guiding principle is that we're going to pursue worst case analysis. We're going to look to bounds on the performance, on the running time of an algorithm, which make no domain assumptions, which make no assumptions about which input of a given length the algorithm is provided. The second guiding principle is we're not going to focus on constant factors or lower our terms. That would be inappropriate, given the level of granularity at which we're describing algorithms. And third is we're going to focus on the rate of growth of algorithms for large problem sizes. Putting these three principles together we get a mathematical definitions of a fast algorithm. Mainly, we're going to pursue algorithms whose worse case running time grows slowly as a function of the input size. So let me tell you how you should interpret what I just wrote down in this box. So on the left hand side is clearly what we want. Okay we want algorithms which run quickly if we implement them. And on the right hand side is a proposed mathematical surrogate of a fast algorithm, right? The left hand side is not a mathematical definition. The right hand side is as we'll become clearer in the next set of lectures. So we're identifying fast algorithms which are those that have good asymptotic running time. Running time which grows slowly with the input size. Now what would we want from a mathematical definition. We'd want a sweet spot. Okay, on the one hand we want something we can actually reason about. This is why we zoom out and squint and ignore things like constant factors and moreover returns. We can't keep track of everything. Otherwise, we'd never be able to analyze stuff. On the other hand, we don't want to throw out the baby with the bathwater. We want to retain predictive power. And this turns out, this definition turns out, for the problems we're going to talk about in this course, to be the sweet spot for reasoning about algorithms, okay? worst case analysis, using the asymptotic running time. We'll be able to prove lots of theorems, we'll be able to establish a lot of performance guarantees for fundamental algorithms. But at the same time, we'll have good predictive power. What the theory advocates will, in fact, be algorithms that are. Well known to be, fast in practice. So the final explanation I owe you, is, what do I mean by, the running time grows slowly with respect to the input size? Well, the answer depends a little bit on the context. But for almost all of the problems we're going to discuss, the Holy Grail will be to have what's called a linear time algorithm. An algorithm whose number of instructions grows proportional to the input size. So we won't always be able to achieve linear time. But that's, in some sense, the best case scenario. Notice, linear time is even better than what we achieved with our merge short algorithm for sorting. MergeShort runs a little bit superlinear. It's N * Log N where N is the input size. If possible. We'd love to be linear time, it's not always going to be possible, but that is what we will aspire toward. For most of the problems we'll discuss in this course. Looking ahead the next series of videos is going to have two goals, first of all on the analysis side I'll describe formally what I mean by asymptotic running time, I'll introduce bigger notation and its variants, explain it's mathematical definitions and give a number of examples. On the design side we'll get more experience applying the divide and conquer paradigm to further problems, see you then.