Having completed our first analysis of an algorithm, namely an upper bound on the running time of the Merge Short algorithm. What I wanna do next is take a step back, and be explicit about three assumptions, three biases that we made when we did this analysis of Merge Short, and interpreted the results. These three assumptions we will adopt as guiding principles for how to reason about algorithms, and how to define a so called fast algorithm for the rest of the course. So, the first guiding principle is that we used what's often called worst case analysis. By worst case. Analysis, I simply mean that our upper bound of six N log N plus six N. Applies to the number of lines of executed for every single input array of length end. We made absolutely no assumptions about the input, where it comes from, what it looks like beyond what the input length N was. Put differently, if, hypothetically, we had some adversary whose sole purpose in life was to concoct some malevolent input designed to make our algorithm run as slow as possible. The worst this adversary could do, is. Upper bounded by this same number, 6N log N + 6N. Now, this, so, sort of worst case guarantee popped out so naturally from our analysis of Merge Short, you might well be wondering, what else could you do? Well, two other methods of analysis, which do have their place, although we won't really dicuss them in this course, are quote unquote, average case analysis. And also the use of a set of prespecified benchmarks. By average case analysis, I mean, you analyze the average running time of an algorithm under some assumption about the relative frequencies of different inputs. So, for example, in the sorting problem, one thing you could do, although it's not what we do here. You could assume that every possible input array is equally unlikely, and then analyze the average running time of an algorithm. By benchmarks, I just mean that one agrees up front about some set, say ten or twenty, Benchmark inputs, which are thought to represent practical or typical inputs for the algorithm. Now, both average-case analysis and benchmarks are useful in certain settings, but for them to make sense, you really have to have domain knowledge about your problem. You need to have some understanding of what inputs are more common than others, what inputs better represent typical inputs than others. By contrast, in worst-case analysis, by definition you're making absolutely no assumptions about where the input comes from. So, as a result, worst-case analysis is particularly appropriate for general-purpose sub-routines. Sub-routines that you design. Find without having any knowledge of how they will be used or what kind of inputs they will be used on. And happily, another bonus of doing worst case analysis, as we will in this course, it's usually mathematically much more attractable than trying to analyze the average performance of an algorithm under some distribution over inputs. Or to understand the detailed behavior of an algorithm on a particular set of benchmark inputs. This mathemetical tractabilty was reflected in our Merge Sort analysis, where we had no a priori goal of analyzing the worst case, per se. But it's naturally what popped out of our reasoning about the algorithm's running time. The second and third guiding principles are closely related. The second one is that, in this course, when we analyze algorithms, we won't worry unduly about small constant factors or lower order terms. We saw this philosophy at work very early on in our analysis of merge sort. When we discussed the number of lines of code that the merge subroutine requires. We first upper-bounded it by 4m plus two, for an array of length m, and then we said, eh, let's just think about it as 6m instead. Let's have a simpler, sloppy upper-bound and work with that. So, that was already an example of not worrying about small changes in the constant factor. Now, the question you should be wondering about is, why do we do this, and can we really get away with it? So let me tell you about the justifications for this guiding principle. So the first motivation is clear, and we used it already in our merge short analysis. Which is simply way easier mathematically, if we don't have to, precisely pin down what the [inaudible] constant factors and lower-order terms are. The second justification is a little less obvious, but is extremely important. So, I claim that, given the level at which we're describing and analyzing algorithms in this course, it would be totally inappropriate to obsess unduly about exactly what the constant factors are. Recall our discussion of the merge subroutine. So, we wrote that subroutine down in pseudocode, and we gave it analysis of 4m plus two on the number of lines of code executed, given an input of length m. We also noted that, it was somewhat ambiguous exactly how many lines of code we should count it as, depending on how you count loop increments and so on. So even there it's small constant factors could creep in given the under specification of the pseudo code. Depending on how that pseudo code gets translated into an actual program language like C or Java. You'll see the number of lines of code deviate even further, not but a lot but again by small constant factors. When such a program is then compiled down into machine code, you'll see even greater variance depending on the exact processor, the compiler, the compiler optimizations, the programming implementation, and so on. So to summarize, because we're going to describe algorithms at a level. That transcends any particular programming language. It would be inappropriate to specify precise constants. The precise constants were ultimately determined by more machine dependent aspects like who the programmer is, what the compiler is, what the processor is, and so on. And now the third justification is frankly, we're just going to be able to get away with it. [sound] That is, one might be concerned that ignoring things like small constant factors leads us astray. That we wind up deriving results which suggest that an algorithm is fast when it's really slow in practice, or vice versa. And for the problems we discuss in this course we'll get extremely accurate predictive power. Even though we won't be keeping track of lower terms and constant factors. When the mathematical analysis we do suggests that an algorithm is fast, indeed it will be. When it suggests that it's not fast, indeed that will be the case. So we lose a little bit of granularity of information. But we don't lose in what we really care about, which is accurate guidance about what algorithms are gonna be faster than others. So the first two justifications, I think, are pretty self evident. This third justification is more of an assertion, but it's one we'll be baking up over and over again as we proceed through this course. Now, don't get me wrong. I'm not saying constant factors aren't important in practice. Obviously, for crucial programs the constant factors are hugely important. If you're running the sorta crucial loop, you know, your start up's survival depends on, by all means optimize the constant like crazy. The point is just that understanding tiny constant factors in the analysis is an inappropriate level of granularity for the kind of algorithm analysis we're going to be doing in this course. Okay, lets move on the, the third and final guiding principle. So the third principle is that we're going to use what's called asymptotic analysis, by which I mean we will focus on the case of a large input sizes. The performance of an algorithm as the size N of the input grows large, that is, tends to infinity. Now this focus on large input size is it was already evident when we interpreted our bound on Merge Sort. So, how did we describe the bound on Merge Sort? We said, oh, well, it needs a number of operations proportional, a constant fact or times in login. And we very cavalierly declared that this was better than any algorithm which has quadratic dependence of it's running time on the number of operations. So for example, we argued that merge sort is a better, faster algorithm than something like insertion sort, without actually discussing the constant factors at all. So mathematically. We were saying the running time of merge short, which we know, which we can represent as the function. Six N log base two of N + 6N is better than any function which has a quadratic dependence on n. Even one with a small constant like lets say 1/2 N squared which might roughly be the running time of insertion sort. And this is a mathematical statement that is true if and only if N is sufficiently large once N grows large it is certainly true that the expression on the left is smaller than the expression on the right but for small N the expression on the right is actually going to be smaller because of the smaller leading term so in saying that merge sort is superior to insertion sort the bias is that we're focusing on problems with a large N so the question you should have is that reasonable is that a justified assumption to focus on large input sizes and the answer is certainly yes. So the reason we focus on large input sizes is because, frankly, those are the only problems which are even, which are at all interesting. If all you need to do is sort 100 numbers, use whatever method you want, and it's gonna happen instantaneously on modern computers. You don't need to know say, the divide and conquer paradigm, if all you need to do is sort 100 numbers. So one thing you might be wondering is if, with computers getting faster all the time according to Moore's Law, if really, it doesn't even matter to think about algorithmic analysis, if eventually all problem sizes will just be trivially solvable on super fast computers. But, in fact, the opposite is true. Moore's Law, with computers getting faster, actually says that our computational ambitions will naturally grow. We naturally focus on ever larger problem sizes. And the gulf between an N squared algorithm and an m log n algorithm will become ever wider. A different way to think about it is in terms of how much bigger a problem size you can solve. As computers get faster. If you are using an algorithm with a running time which is proportional to the input size then the computers get faster by a factor of four then you can solve problems that are factor of four or larger. Whereas if you are using an algorithm whose running time is proportional to the square of the input size then a computer gets faster by a factor of four, you can only solve double the problem size and we'll see even starker examples of this gulf between different algorithm approaches as the time goes on. So to drive this point home. Let me show you a couple of graphs. So what we're looking at here, is we're looking at a graph, of two functions. So the solid function. Is the upper bound that we proved on merge sort. So this is gonna be 6nlog(base2)n plus 6n. And the dotted line is an estimate. A rather generous estimate about the running time of, [inaudible] sort. Namely one-half times N. Squared. And we see here in the graph exactly the behavior that we discussed earlier, which is that the small N. Down here. In fact because one-half N. Squared has a smaller leading constant it's actually a smaller function. And this is true up to this crossing point of maybe 90 or so. Again, beyond n=90. The quadratic growth in the N squared term. Overwhelms the fact that it had a smaller constant and it starts being bigger than this other function six of N + six N so in the regime below 90 it's predicting that the insertion store will be better and in the regime above 90 it's predicting that merge sort will be faster. Now here's what's interesting let's scale the X axis let's look well beyond this crossing point of 90 let's just increase it in order of magnitude up to a raise in size 1500. And I want to emphasize these are still very small problem sizes. If all you need to do is sort arrays of size 1500 you really don't need to know Divide-and-conquer or anything else we'll talk about -- that's a pretty trivial problem on modern computers. [sound]. So what we're seeing is, that even for very modest problem sizes here, array of, of, size, say 1500. The quadratic dependence in the insertion sort bound is more than dwarfing the fact, that it had a lower constant factor. So in this large regime, the gulf between the two algorithms is growing. And of course, if I increased it another 10X or 100x or 1000x to get to genuinely interesting problem sizes, the gap between these two algorithms would be even bigger, it would be huge. That said, I'm not saying you should be completely ignorant of constant factors when you implement algorithms. It's still good to have a general sense of what these constant factors are so for example in highly tuned versions of Merge Sort which you'll find in mny programming libraries. In fact, because of the difference in constant factors, the algorithm will actually switch from Merge Sort over to insertion sort, once the problem size drops below some particular threshold, say seven elements, or something like that. So for small problem sizes, you use the algorithm with smaller constant factors, and the insertion sort for larger problem sizes, you use the algorithm and better rate of growth, mainly merge short. So, to review our first guiding principal is that we're going to pursue worse case analysis. We're going to look to bounds on the performance, on the running time of an algorithm which make no domain assumptions, which make no assumptions about which input of a given length the algorithm is provided. The second guiding principal is we're not going to focus on constant factors or lower returns, that would be inappropriate, given the level of granularity at which we're describing algorithms and third is where going to focus on the rate of growth of algorithms for large problem sizes. Putting these three principles together, we get a mathematical definition of a fast algorithm. Namely, we're gonna pursue algorithms whose worst case running time grows slowly as a function of the input size. So let me tell you how you should interpret what I just wrote down in this box. So on the left hand side is clearly what we want. Okay, we want algorithms which run quickly if we implement them. And on the right hand side is a proposed mathematical surrogate of a fast algorithm. Right, the left hand side is not a mathematical definition. The right hand side is, as well become clear in the next set of lectures. So we're identifying fast algorithms, which, those that have good asymptotic running time which grows slowly with the input size. Now, what would we want from a mathematical definition? We'd want a sweet spot. On one hand we want something we can actually reason about. This is why we zoom out and squint and ignore things like constant factors and lower terms. We can't keep track of everything. Otherwise we'd never be able to analyze stuff. On the other hand we don't want to throw out the baby with the bath water, we want to retain predictive power and this turns out this definition turns out for the problems we're going to talk about in this course, to be the sweet spot for reasoning about algorithms okay worst case analysis using the asymptotic running time. We'll be able to prove lots of theorems. We'll be able to establish a lot of performance guarantees for fundamental algorithms but at the same time we'll have good predictive power what the theory advocates will in fact be algorithms that are known to be best in practice. So, the final explanation I owe you, is, what do I mean by, the running time grows slowly, with respect to the input size? Well, the answer depends a little bit on the context, but, for almost all of the problems we're going to discuss, the holy grail will be to have what's called a linear time algorithm, an algorithm whose number of instructions grows proportional to the input size. So, we won't always be able to achieve linear time, but that's, in some sense, the best case scenario. Notice, linear time is even better than what we achieved with our merge short algorithm for sorting. Merge short runs a little bit superlinear, it's n times log n, running as the input size. If possible, we. To be linear time. It's not always gonna be possible, but that is what we will aspire toward. For most of the problems we'll discuss in this course. Looking ahead, the next series of videos is going to have two goals. First of all, on the analysis side, I'll describe formally what I mean by asymptotic running time. I'll introduce "Big Oh" notation and its variants, explain its mathematical definitions, and give a number of examples. On the design side, we'll get more experience applying the divide and conquer paradigm to further problems. See you then.