Having completed our first analysis of an 
algorithm namely an upper bound on the 
running time of the merge sort algorithm. 
What I want to do next is take a step 
back and be explicit about three 
assumptions. 
Three biases that we made when we did 
this analysis of the merge sort and 
interpreted the results. 
These three assumptions we will adopt as 
guiding principles for how to reason 
about algorithms and how to define a so 
called fast algorithm for the rest of the 
course. 
So the first guiding principle is that we 
used what's often called worst case 
analysis. 
By worst case analysis I simply mean that 
our upper bound of 6 N log N plus 6 N 
applies to the number of line executed 
for every single input array of link N.. 
We made absolutely not assumptions about 
the input, where it comes from, what it 
looks like, beyond what the input link in 
was but differently if hypothetically we 
have some adversary whose sole purpose in 
life was to concoct some level of input 
designed to make our algorithm as slow as 
possible. 
The worse this adversary could do is 
upper bounded by this same number, 6 N 
log N + 6 N. 
Now this so sort of worst case guarantee 
popped out so naturally from our analysis 
of MergeSort you might well be wondering 
what else could you do? 
Well, two other methods of analysis, 
which do have their place, although we 
won't really discuss them in this course, 
are Average" Case" analysis and also the 
use of a set of pre-specified benchmarks. 
My average case analysis I mean you 
analyze the average running time of an 
algorithm under some assumption about the 
relative frequencies in different inputs. 
So for example in the sorting problem, 
one thing you could do although that is 
not what we did here, 
you could assume that every possible 
input array is equally or likely and then 
analyze the average running time of an 
algorithm. 
By, benchmarks I just mean that one 
agrees upfront about some sets say, ten 
or twenty 
benchmark inputs, which are thought to 
represent practical or typical inputs for 
the algorithm. 
Now both average case analysist and 
benchmarks are useful in certain 
settings, but for them to make sense you 
really have to have domain knowledge 
about your problem, you need to have some 
understanding of what inputs are more 
common than others, what inputs better 
represent typical inputs that others. 
By contrast, in worse case analysis, by 
definition you're making absolutely no 
assumptions about where the input comes 
from, so as a result, worse case 
analysis, is particularly appropriate for 
general purpose subroutines, subroutines 
that you design without having any 
knowledge of how they will be used or 
what kind of inputs they will be used on. 
And happily, another bonus of doing worst 
case analysis, as we will in this course, 
it's usually mathematically much more 
tractable than trying to analyze the 
average performance of an algorithm under 
some distribution over inputs. 
Or to understand the detailed behavior of 
an algorithm on a particular set of 
benchmark inputs. 
This mathematical tractability was 
reflected in our merge sort analysis, 
where we had no operatory goal of 
analyzing the worst case, per se. 
But it's naturally what popped out of our 
reasoning about the algorithm's running 
time. 
The second and third guiding principals 
are closely related the second one is 
that in this course when we analyze 
algorithms we won't worry unduely about 
small costing factors or lower order 
terms. 
We saw this philosophy at work very early 
on in our analysis of merge shore when we 
discussed the number of lines of code 
that the merge subroutine requires. 
We first upper-bounded it by 4M plus two 
for an array of length M and then we said 
eh, let's just think about it as 6M 
instead, let's have a simpler sloppy 
upper bound and work with that. 
So that was already an example of not 
worrying about small changes in the 
constant factor. 
Now the question you should be wondering 
about is, why do we do this and can we 
really get away with it? 
So let me tell you about the 
justifications for this kind of 
principle. 
So the first motivation is clear, and we 
used it already in our MergeSort analysis 
which is its simply way easier 
mathematically, if we don't have to 
precisely pin down what the leading 
constant factors and the rotor terms are. 
The second justification is a little less 
obvious, but it's extremely important. 
So I claim that giving the level at which 
we were describing algorithms in this 
course it would be totally in appropriate 
to obsess unduly about what the constant 
factors are. 
Recall our discussion of the merge sub 
routine, so we wrote that sub routine 
down in pseudo code and we gave it 4 N + 
2 on the number of lines of code executed 
giving an input of length M. 
We also noted that it was somewhat 
ambiguous exactly how many lines of code 
we should count it as depending on how 
you count loop increments and so on. 
So even there, small constant factors 
could creep in given the 
under-specification of the pseudo-code. 
Depending on how that pseudo-code gets 
translated into an actual programming 
language, like C or Java. 
You'll see the number of lines of code 
deviate even further, not by a lot but 
again by small constant factors. 
When such a program is then compiled down 
into machine code, you'll see even a 
greater variance depending on the exact 
processor, the compiler, the compiler 
optimizations, the programing 
implementation and so on. 
So to summarize because we're going to 
describe algorithms at a level that 
transcends any particular programming 
language it would be inappropriate to 
specify a precise consonance. 
The precise consonance will ultimately 
determined by more machine dependent 
aspects like who the programmer is, what 
the compiler is, what the processor is 
and so on. 
And now the third justification is, 
frankly, we're just going to be able to 
get away with it. 
That is, one might be concerned that 
ignoring things like small constant 
factors leads us astray. 
That we wind up deriving results, which 
suggest that an algorithm is fast when 
it's really slow in practice or vice 
versa, but for the problems we discuss in 
this course, we'll get extremely accurate 
predictive power even though we won't be 
keeping track of lower terms and constant 
factors. 
When the mathematical analysis we do 
suggests that an algorithm is fast, 
indeed it will be. 
When it suggests that it's not fast, 
indeed that will be the case. 
So we lose a little bit of granularity of 
information but we don't lose on what we 
really care about which is accurate 
guidance about what algorithms are going 
to be faster. 
Than others. 
So the first two justifications, I think, 
are pretty self evident. 
This third justification is more of an 
assertion, but it's one we'll be backing 
up over and over again as we proceed 
through this course. 
Now don't get me wrong, I'm not saying 
constant factors aren't important in 
practice. 
Obviously, for crucial programs, the 
constant factors are hugely important. 
If you're running the, sort of, crucial 
loop for, you know, your startup's 
survival depends on, by all means, 
optimize the constant like crazy. 
The point is just that understanding tiny 
constant factors in the analysis is an 
inappropriate level of granularity for 
the kind of algorithm analysis we're 
going to be doing in this course. 
Okay. 
Let's move on to the third and final, 
guiding principle. 
So the third principle is that we're 
going to use what's called Asintonic 
analysis, by which I mean we will focus 
on the case of large input sizes. 
The performance of an algorithm as the 
size N of the input grows large, that is, 
tends to infinity. 
Now this focus on large input sizes was 
already evident when we interpreted our 
bound on MergeSort so how did we describe 
the bound in MergeSort? We said oh, well 
it needs a number of operations 
proportional a constant factor times N 
log N. 
And we very cavalierly declared this was 
better than any algorithm which has 
quadratic dependence of its running time 
on the number of operations. 
So for example we argue that MergeSort is 
a better, faster algorithm than something 
like insertion sort without actually 
discussing the cost and factors at all. 
So mathematically, 
we were saying the running time of merge 
short. 
Which we know, which we can represent as 
the function, six analog base true of N + 
X N is better than any function which has 
a quadratic dependence on N even one with 
a small constant like let's say one half 
N squared which might be roughly the 
running time of insertion sort. 
And this is a mathematical statement that 
is true, if and only if, N is 
sufficiently large. 
Once N grows large it's certainly true 
that the expression on the left is 
smaller than the expression on the right. 
But for small N, the expression on the 
right is actually going to be smaller 
because of the smaller leading term. 
So, in saying that MergeSort is superior 
to insertion sort, the bias is that we're 
focusing on problems with large N. 
So, the question you should have is, is 
that reasonable? 
Is that a justified assumption to focus 
on, large input sizes? 
And, the answer is certainly, yes. 
So the reason we focus on large input 
sizes is because frankly those are the 
only problems which are even which are at 
all interesting. 
If all you need to do is sort 100 numbers 
use whatever method you want and its 
going to happen instantaneously on modern 
computers. 
You don't need to know say the divide and 
conquer paradigm if all you need to do is 
sort 100 numbers. 
So one thing you might be wondering is if 
with computers getting faster all the 
time according to Moore's Law if really 
it doesn't even matter to think about 
algorithmic analysis. 
If eventually all problematic sizes will 
just be tribally solvable on super fast 
computers but in fact the opposite is 
true. 
Moore's law with computers getting faster 
actually says that our computational 
ambitions will naturally grow, we 
naturally focus on ever larger problem 
sizes and the gulf between an n squared 
algorithm and an N log N algorithm will 
become ever wider. 
A different way to think about it is how 
much bigger a problem size you can solve. 
As computers get faster. 
If you're using an algorithm with a 
running time which is proportional to the 
input size. 
Then if computers get faster by a factor 
of four, then you can solve problems 
which were a factor of four larger. 
Whereas, if you're using an algorithm 
whose running time is proportional to the 
square of the input size, then the 
computers get faster by a factor of four. 
You can only solve double the problem 
size and mostly even starker examples of 
this gulf between different algorithmic 
approaches as time goes on. 
So to drive this point home, let me show 
you a couple of graphs. 
So what we're looking at here is, we're 
looking at a graph of two functions. 
So the solid function is the upper bound 
that we proved on MergeShort so this is 
going to be 6 N log base 2 of N + 6 N. 
And the dotted line is a estimate or 
rather generous estimate about the 
running time of insertion sort. 
Namely, one half * N2. 
squared. 
And we see here, in the graph, exactly 
the behavior we discussed earlier. 
Which is that, for small n, 
down here. 
In fact, because one half inch squared 
has a smaller leading constant its 
actually a smaller function. 
And this is true up to this crossing 
point, of maybe 90 or so. 
But then beyond N equal 90. 
The quadratic growth in the N squared 
term overwhelms the fact that it had a 
smaller constant and it starts getting 
bigger than this other function 6N login 
plus 6N. 
So that in the regime below 90 its 
predicting that insertion sort will be 
better and in the regime above 90 its 
predicting that MergeSort will be faster. 
Now here's what's interesting let's scale 
the X axis let's look well beyond this 
crossing point of 90 let's just increase 
it in order of magnitude up to a raise of 
size 1500. 
And I want to emphasize these are still 
very small problem sizes if all you need 
to do to is sort of raise of size 1500 
you really don't need to know divide and 
conquer or anything else we'll talk 
about. 
That's a pretty trivial problem on modern 
computers. 
So what we're seeing is that, even for 
very modest problem sizes. 
Here an array of say, size 1500. 
The quadratic dependence and the 
insertion sort bound is more than 
dwarfing the fact that it had a lower 
than constant factor. 
So in this large regime, the gulf between 
the two algorithms is growing. 
And of course, if I increased it another 
10X or 100X or 1000X to get to genuinely 
interesting problem sizes the gap between 
these two algorithms would be even 
bigger, it would be huge. 
That said I'm not saying you should be 
completely ignorant of cost and factors 
when you implement algorithms. 
It's still good to have a general sense 
of what these cost and factors are so for 
example in highly tuned versions of merge 
sort in which you will find in many 
programming libraries. 
In fact, because of the difference in 
constant factors. 
The algorithm will actually switch from 
merge sort, over to insertion sort. 
Once the problem size drops below some 
particular threshold. 
Say, seven elements or something like 
that. 
So for small problem sizes, you use the 
algorithm with smaller constant factors. 
Then the insertion sort. 
For larger problem sa, sizes, you use the 
algorithm with a better rate of growth, 
namely, merge sort. 
So, to review, our first guiding 
principle is that we're going to pursue 
worst case analysis. 
We're going to look to bounds on the 
performance, on the running time of an 
algorithm, which make no domain 
assumptions, which make no assumptions 
about which input of a given length the 
algorithm is provided. 
The second guiding principle is we're not 
going to focus on constant factors or 
lower our terms. 
That would be inappropriate, given the 
level of granularity at which we're 
describing algorithms. 
And third is we're going to focus on the 
rate of growth of algorithms for large 
problem sizes. 
Putting these three principles together 
we get a mathematical definitions of a 
fast algorithm. 
Mainly, we're going to pursue algorithms 
whose worse case running time grows 
slowly as a function of the input size. 
So let me tell you how you should 
interpret what I just wrote down in this 
box. 
So on the left hand side is clearly what 
we want. 
Okay we want algorithms which run quickly 
if we implement them. 
And on the right hand side is a proposed 
mathematical surrogate of a fast 
algorithm, 
right? 
The left hand side is not a mathematical 
definition. 
The right hand side is as we'll become 
clearer in the next set of lectures. 
So we're identifying fast algorithms 
which are those that have good asymptotic 
running time. 
Running time which grows slowly with the 
input size. 
Now what would we want from a 
mathematical definition. 
We'd want a sweet spot. 
Okay, on the one hand we want something 
we can actually reason about. 
This is why we zoom out and squint and 
ignore things like constant factors and 
moreover returns. 
We can't keep track of everything. 
Otherwise, we'd never be able to analyze 
stuff. 
On the other hand, we don't want to throw 
out the baby with the bathwater. 
We want to retain predictive power. 
And this turns out, this definition turns 
out, for the problems we're going to talk 
about in this course, to be the sweet 
spot for reasoning about algorithms, 
okay? worst case analysis, using the 
asymptotic running time. 
We'll be able to prove lots of theorems, 
we'll be able to establish a lot of 
performance guarantees for fundamental 
algorithms. 
But at the same time, we'll have good 
predictive power. 
What the theory advocates will, in fact, 
be algorithms that are. 
Well known to be, fast in practice. 
So the final explanation I owe you, is, 
what do I mean by, the running time grows 
slowly with respect to the input size? 
Well, the answer depends a little bit on 
the context. 
But for almost all of the problems we're 
going to discuss, the Holy Grail will be 
to have what's called a linear time 
algorithm. 
An algorithm whose number of instructions 
grows proportional to the input size. 
So we won't always be able to achieve 
linear time. 
But that's, in some sense, the best case 
scenario. 
Notice, linear time is even better than 
what we achieved with our merge short 
algorithm for sorting. 
MergeShort runs a little bit superlinear. 
It's N * Log N where N is the input size. 
If possible. 
We'd love to be linear time, it's not 
always going to be possible, but that is 
what we will aspire toward. 
For most of the problems we'll discuss in 
this course. 
Looking ahead the next series of videos 
is going to have two goals, first of all 
on the analysis side I'll describe 
formally what I mean by asymptotic 
running time, I'll introduce bigger 
notation and its variants, explain it's 
mathematical definitions and give a 
number of examples. 
On the design side we'll get more 
experience applying the divide and 
conquer paradigm to further problems, see 
you then.