I just got the number of divide and conquer algorithms and, so far, I've been short shrift to proofs of correctness. This has been a conscience decision on my part. Coming up with the right divide and conquer algorithm for a problem can definitely be difficult, but once you have that eureka moment and you figure out the right algorithm you tend to, also, have a good understanding of why it's correct, why it actually solves the problem on every possible input. Similarly when I present to you a divide and conquer algorithm like, say, merge sort or quicksort, I expect that many of you have a good and accurate intuition about why the algorithm is correct. In contrast the running time of these developed [inaudible] algorithms is often highly non-obvious. So, correctness proofs for divide-and-conquer algorithms tend to simply formalize the intuition that you have via a proof by induction. That's why I haven't been spending much time on them. But nevertheless, I do feel like I owe you at least one rigorous correctness proof for a divide-and-conquer algorithm, and we may as well do it for quicksort. So in this optional video, we'll briefly review proofs by induction, and then we'll show how such a proof can be used to rigorously establish the correctness of quicksort. The correctness proofs for most of the other divide-and-conquer algorithms that we discuss can be formalized in a similar way. So let's begin by reviewing the format for proofs by induction. So, the canonical proofs by induction and the kind that we'll be using here, is when you want to establish an assertion for all of the positive integers in. So now it's some assertion which is parameterized by n, where n is a positive integer. I know this is a little abstract, so let me just be concrete about the assertion that we actually care about for quicksort. So for us, the assertion P(n) is the statement that cor, quicksort is always correct on inputs of length n, arrays that have n elements. So an induction proof has two parts. The first part is a base case and the second part is an inductive step. For the base case you have to get started so you show that at the very least your assertion is true when n equals one. This is often a trivial matter and that'll be the case when we establish the correctness of quick sort. Just on our rays with only one element. So, the non-trivial part of a proof by induction is usually the inductive step. And in the inductive step, you look at a value of n not covered by the base case, so a value of n bigger than one. And you show that if the assertion holds for all smaller values, small integers, then it also holds for the integer n. That is, you show that for every positive integer N that's two or greater, you assume that P of K holds for all K strictly less than N. And under that assumption, which is called the inductive hypothesis. Under the assumption that P of K holds for all K strictly less than N, you then establish that P of N holds as well. So if you manage to complete both of these steps, if you prove both the base case that P(1) holds, you argue that directly, and then also you argue that assuming the inductive hypothesis, that the assertion holds for all smaller integers, it also holds for an arbitrary integer n. Then you're done. Then in fact you have proven that the assertion P then holds for every single positive integer N. Right? So for any given N that you care about, the way you can derive that from one and two is you just start from the base case, P of one holds. Then you apply the inductive step N minus one times. And boom, you've got it. So you know that P holds for the integer N that you care about as well. And that's true for arbitrarily large values of N. So those are proofs by induction in general. Now let's instantiate this proof format, this type of proof for establishing the correctness of quicksort. So let me write again what is the assertion we care about. Our definition of P(n) is gonna be that quicksort is always correct on arrays of length n. And of course what we want to prove is that quicksort is correct no matter what size array that you give it, that is, we want to prove that P(n) holds for every single n at least one. So this is right in the wheelhouse of proofs by induction. ?Kay, so that's how we're going to establish it. Now depending on the order in which you're watching the videos, you may or may not have seen our discussion about how you actually choose the pivot, recall that the first thing Quick Sort does is choose a pivot, then it partitions the array around the pivot. So, we're going establish the correctness of Quick Sort, no matter how the choose pivot sub-routine gets implemented. Okay, so now matter how you choose pivots, you'll always have correctness. As we, as we'll see in a different video, the choice of pivot definitely has an influence on the running of Quick Sort, the correctness of Quick Sort, there's no matter how you choose the pivot. So it's perceived by a proof by induction. So for the base case when n equals one, this is a fairly trivial statement. Right? So, then we're just talking about inputs that have only one element. Every such array is already sorted. Quicksorts, in the bai, when n equals one just returns the input array. It doesn't do anything, and that is indeed the sort of array that it returns. So, by the rather trivial argument we had directly proven that p of one holds. We've proven the rather unimpressive statement that quicksort always correctly sorts one element arrays. Okay? No big deal. So, let's move on to the inductive step. So in the inductive step we have to fix an arbitrary value of N that's at least two. A value of N not covered by the base case. So let's fix some value of N, that leaves two. Now what are we trying to prove? We're trying to prove that Quick Sort always correctly sorts every input array of length N. So we also have to fix an arbitrary such input. So let's make sure we're all clear on what it is we need to show, what do you show in an inductive step. Assuming that PFK holds. For all smaller values, all smaller integers, then P of N holds as well. And remember this is the inductive hypothesis. So in the context of quicksort, we're assuming that quicksort never makes a mistake on any input array that has length strictly smaller than n. And now we just have to show it never makes a mistake on array, input arrays that have size exactly n. So this is the point in the proof where we actually delve into how Quick Sort is implemented to argue correctness. So recall what the first step of Quick Sort is, it picks some pivot arbitrarily, we don't know how, we don't care how. And then it partitions the array around this pivot element p. Now as we argued in the video where we discussed the partition sub routine, at the conclusion of that sub routine, the array has been rearranged into the following format. The pipit is wherever it is, everything to the left of the pipit is less than the pipit, and everything bigger than the pipit is greater than the pipit. Alright, this is where how things stand at the conclusion of the partitioning sub routine. So let's call this stuff less than the pipit the first part of the partition array, and the stuff bigger than the pipit, the second part of the partition array. And recall our observation from the overview video that the pivot winds up in its correct position. Right, where would the pivot be? Where is any element suppose to be in the final sorted array? What's suppose to be to the right of everything less than it, and to the left of everything bigger than it? And that's exactly where this partitioning subroutine deposits the pivot element peak. So now to imply the inductive hypothesis, which you'll recall is a hypothesis about how quick sort operates on smaller sub arrays. Let's call the length of the first part in the second part of the partition [inaudible] K1 and K2 respectively. Now, crucially, both k1 and k2 are strictly less than n. Both of these two parts have lengths strictly less than that of the given input array a. That's because the pivot in particular is excluded from both of those two parts. So, their gonna have, at most n minus one [inaudible]. That means that we can apply the inductive hypothesis, which says that the quicksort never makes a mistake on an array that has size strictly less than n. That implies that our two recursive calls to quickstart, the one to the first part and the one to the second part don't make mistakes. They're guaranteed to sort those sub arrays correctly by the inductive hypothesis. And to be very precise, what we're using to argue that the [inaudible] are correct, are P of K1 and P of K2. Or P is the assertion that [inaudible] is always correct on a [inaudible]. K1 and K2. And we know that both of these statements are true because k1 and k2 are less th, are both less than n and because of the inductive hypothesis. So what's the upshot? The upshot is, quicksort's gonna be correct. And so the first recursive call puts all of the elements that are less than the pivot in the correct relative order. Next comes the pivot, which is bigger than all of that stuff in the first part and less than all the stuff in the second part, and then the second recursive call correctly orders all of the elements in the second part. So with those three things pasted together, we have a sorted version of the input array and since this array was an arbitrary one, of link N. That establishes the assertion P of N and since n was arbitrary, that establishes the inductive and completes the proof of correctness of quick sort for an arbitrary method of choosing the pivot element. [sound]