In this next series of videos, we'll get some more practice applying the divide and conquer algorithm and design paradigm to various problems. This will also give us a glimpse of the kinds of application [inaudible] to which it's been successfully applied. We're gonna start by extending the merge sort algorithm to solve a problem involving counting the number of inversions of an array. Before we tackle the specific problem of counting the number of inversions in an array, let me say a few words about the divide and conquer paradigm in general. So again, you've already seen the totally canonical example of divide and conquer, namely merge sort. So the following three conceptual steps will be familiar to you. The first step, no prizes for guessing is you divide. The problem. Into smaller sub-problems. Sometimes this division happens only in your mind. It's really more of a conceptual step than part of your code. Sometimes you really do copy over parts of the input into say new arrays to pass on to your recursive calls. The second step, again no prizes here, is you conquer the sub-problems just using recursion. So for example, in Merge Sort, you conceptually divide the array into two different pieces. And then you [inaudible] with the conquer or sort to the first half of the array. And you, you do the same thing with the second half of the array. Now, of course, it's not quite as easy as just doing these two steps. Dividing the problem, and then solving the sub problems recursively. Usually, you have some extra cleanup work after the recursive calls, and to stitch together the solutions to the sub problems into one for the big problem, the problem that you actually care about. Recall, for example, in Merge Sort, after our recursive calls, the left half of the array was sorted, the right half of the array was sorted. But we still had to stitch those together. Merge into a sorted version of the entire array. So the [inaudible] step is to combine. The solutions to the subproblem into one problem. Generally the largest amount of ingenuity happens in the third step. How do you actually quickly combine solutions to subproblems into one to the original problem? Sometimes you also get some cleverness in the first step with division. Sometimes it's as simple as just spliting a ray in two. But there are cases where the division step also has some ingenuity. Now let's move on to the specific problem of counting inversions and see how to apply this divide and conquer paradygm. So let begin by defining the problem formally now. We're given as input an array A with a length N. And you can define the problem so that the array a contains any ole distinct numbers. But, let's just keep thing simple and assume that it contains the numbers one through n. The integers in that range in some order. That captures the essence of the problem. And the goal is to compute the number of inversions of this array so what's an inversion you may ask well an inversion is just a pair of array [inaudible] I and J with I smaller than J so that earlier array entry the I entry is bigger than the latter one the Jake one so one thing that should be evident is that if the array contains these numbers in sorted order if the array is simply one two three four all the way up to N then the number of inversions is zero. The converse you might also want to think through if the array has any other ordering of the numbers between one and N other than the assorted one, then it's going to have a non. Of zero number of inversions. Let's look at another example. So'spose we have an array of six entries. So the numbers one thru six in the following order. One, three, five followed by two, four, six. So how many inversions does this array have? So again what we need to look for are pairs of array entries so that the earlier or left entry is bigger than the later or right entry. So one example which we see right here would five and two. Those are right next to each other and out of order, the earlier entry is bigger than the other one. But there's others, there's three and two for example Those are out of order. And, five and four are also out of order. And I'll leave it to you to check that those are the only three pairs that are out of order. So summarizing the inversions in this array of length six are 3-2, 5-2, and 5-4. Corresponding to the array entries, 2-4, 3-4, and 3-5. Pictorially, we can think of it thusly, we can first. Write down the numbers in order, one up to six. And then we can write down the numbers again but, ordered in the way that their given in the input array. So, one three five two four six. And then we can connect the dots, meaning we connect one to one. Reconnect two to two, and so on. It turns out, and I'll leave to for you to, to think this through, that the number of crossing pairs of line segments prescisely correspond to the number of inversions. So we see that there are one, two, three crossing line segments. And these are exactly in correspondence with the three inversions, we found earlier. Five and two, three and two, and five and four. Now, [inaudible] wanna solve this problem you might ask. Well there's few reasons that come up. One would be to have a numerical similarity measure that quantifies how close to [inaudible] lists are to each other. So for example, suppose I took you and a friend, and I took, identified ten movies that both of you had seen. And I asked each of you to order, or to rank these movies from your most favorite to your least favorite. Now I can form an array, and compute inversions. And it quantifies, in some sense, how dissimilar your two rankings are to each other. So in more detail, in the first entry of this array, I would put down the ranking that your friend gave to your favorite movie. So if you had your favorite movie, Star Wars or whatever. And your friend only thought it was the fifth best out of the ten, then I would write down a five in the first entry of this array. Generally, I would take your second favorite movie. I would look at how your friend ranked that. I would put that in the second entry of the array and so on, all the way up to the tenth entry of the array, where I would put your friend's ranking of your least favorite movie. Now, if you have exactly identical preferences, if you rank them exactly the same way, the number of inversions of this array would be zero. And in general, the more inversions this array has, it quantifies that your lists look more and more different from each other. Now why might you want to do this why might you want to know whether two different people ranked things in the similar way had similar preferences well one reason might be what's called collaborative filtering, probably many of you have had the experience of going to a website and if you've made a few purchases through this website it starts recommending further purchases for you, so one way to solve this problem under the hood, is to look at your purchases look at what you seem to like, find other people who have similar preferences similar history look at things they've bought that you haven't, and then recommend. New products to you based on what similar customers seemed to have bought. So this problem captures some of the essence of identifying which customers or which people are similar based on data about what they prefer. So just to make sure we're all on the same page, let me pause for a brief quiz. We've already noticed that a given array will have zero inversions, if and only if it's in sorted order. If it only contains the numbers of one through N in order. So, on the other side, what is the largest number of inversions an array could possibly have? Let's say, just for an array of size six, like the one in this example here. So the answer to this question is the first one. Fifteen. Or in general in an N. Element array the largest number of inversions is N. Choose two. Also known as N times N minus one over two. Which, again, in the case of a [inaudible] is going to evaluate to fifteen. The reason is, the worst case is when the array is in backwards order, reverse [inaudible] order, and every single pair of [inaudible] indices is inverted. And so the number of indices IJ, with I less than J is precisely [inaudible] too. Let's now turn our attention to the problem of computing the number of inversions of an array as quickly as possible. So one option that is certainly available to us is the brute force algorithm. And by brute force I just mean we could set up a double four loop. One which goes through I, one which goes through J bigger than I, and we just check each pair IJ individually with I less than J whether that particular pair of array entities AI and AJ is inverted and if it is then we add it to our running count. And then we return the final count at the end of the double four loop. That's certainly correct. The only problem is, as we just observed, there's N [inaudible] two or a quadratic number of potential inversions so this algorithm's almost going to run in time quadratic in the array link. Now remember the mantra of any good algorithm designer. Can we do better? And the answer is yes. And the method we'll be using, divide and conquer. The way in which we'll divide will be motivated directly by merge sort where we recurs e separately on the left and the right half's of the array. We're gonna do the same thing here. To understand how much progress we can make purely using recursion let's classify the inversions of array into one of three types. So suppose we have an inversion of an array I, J, and remember in an inversion you always have I less than J. We're gonna call it a left inversion. If both of the array indices are at most N over two, where N is the array length. We're gonna call it a right inversion if they're both strictly greater than N over two. And we're gonna call it a split inversion if the smaller index is at most N over two and the larger index is bigger than N over two. We can discuss the progress made by recursion in these terms. When we recurse on the left-half of an array, if we implement our algorithm correctly, we'll successfully be able to count all of the inversions located purely in that first half. Those are precisely the left inversions. Similarly, a second recursive call on just the right half of an array, the second half of an [inaudible] array will successfully count all of the right inversions. There remains the questions of how to count the split inversions. But we shouldn't be surprised there's some residual work left over, even after the recursive calls do their job. That, of course, was the case at Merge Short, where [inaudible] magically took care of sorting the left half of the array, sorting the right half of the array. But there was still, after their return, the matter of merging those two sorted lists into one. And here again, after the recursion is gonna be the matter of cleaning up and counting the number of split inversions. So for example if you go back to the six element array we worked through before, 135246, you'll notice that there, in fact, all of the inversions are split. So the recursive calls will both come back counting zero inversions. And all of the work for that example will be done by the count split inversions subroutine. So let's summarize where things stand given underspecified high level description of the algorithm as we envision it. There is a base case. I'll go ahead and write it down for completeness, which is if we're given a one element array, then there's certainly no inversion so we can just immediately return the answer zero. For any bigger array, we're going to divde and conquer. So we'll count the left inversions with a recursive call. The right inversions with a recursive call. And then we'll have some currently unimplemented subroutine that counts the split inversions. Since every inversion is either left or right, or split, and can't be any more than one of those three, then, having done these three things, we can simply return their sum. So that's our high level attack on how we're gonna count up the number of inversions. And of course, we need to specify how we're gonna count the number of split inversions. And moreover, we lack that subroutine to run quickly. An analogy to emerge short, where, outside the recursive calls, we did merely linear work. Outs-, in the merge subroutine. Here, we'd like to do only linear work in counting up the number of split inversions. If we succeed in this goal, if we produce a correct and linear time of limitation to count up the number of split incursions, then this entire recursive algorithm will run in big O. Of N. Log in time. The reason the overall out rhythm will run in O. Of N. Log in time is exactly the same reason that merge short ran in N. Log in time. There's two recursive calls. Each on a problem of one-half the size. And outside of the recursive calls we would be doing linear work. So you could copy down exactly the same recursion tree argument we used for merge short. It would apply equally well here. Alternatively, very soon we will cover the master method, and as one very special case it will prove that this algorithm, if we can implement it thusly, will run in O. Of N. Log in time. Now one thing to realize, is this is a fairly ambitious goal, to count up the number of split inversions in linear time. It's not that there can't be too many split inversions. There can actually be a lot of them. If you have an array where the first half of the array contains the numbers N over two plus one, up to N. Whereas the second part of the array contains the numbers one up to N over two, that has a quadratic number of inversions, all of which are split. So, what we're attempting to do here is count up a quadratic number of things using only linear time. Can it really be done? Yes is can, as we'll see in the next video.