In this next series of videos, we'll get
some more practice applying the divide and
conquer algorithm and design paradigm to
various problems. This will also give us a
glimpse of the kinds of application
[inaudible] to which it's been
successfully applied. We're gonna start by
extending the merge sort algorithm to
solve a problem involving counting the
number of inversions of an array. Before
we tackle the specific problem of counting
the number of inversions in an array, let
me say a few words about the divide and
conquer paradigm in general. So again,
you've already seen the totally canonical
example of divide and conquer, namely
merge sort. So the following three
conceptual steps will be familiar to you.
The first step, no prizes for guessing is
you divide. The problem. Into smaller
sub-problems. Sometimes this division
happens only in your mind. It's really
more of a conceptual step than part of
your code. Sometimes you really do copy
over parts of the input into say new
arrays to pass on to your recursive calls.
The second step, again no prizes here, is
you conquer the sub-problems just using
recursion. So for example, in Merge Sort,
you conceptually divide the array into two
different pieces. And then you [inaudible]
with the conquer or sort to the first half
of the array. And you, you do the same
thing with the second half of the array.
Now, of course, it's not quite as easy as
just doing these two steps. Dividing the
problem, and then solving the sub problems
recursively. Usually, you have some extra
cleanup work after the recursive calls,
and to stitch together the solutions to
the sub problems into one for the big
problem, the problem that you actually
care about. Recall, for example, in Merge
Sort, after our recursive calls, the left
half of the array was sorted, the right
half of the array was sorted. But we still
had to stitch those together. Merge into a
sorted version of the entire array. So the
[inaudible] step is to combine. The
solutions to the subproblem into one
problem. Generally the largest amount of
ingenuity happens in the third step. How
do you actually quickly combine solutions
to subproblems into one to the original
problem? Sometimes you also get some
cleverness in the first step with
division. Sometimes it's as simple as just
spliting a ray in two. But there are cases
where the division step also has some
ingenuity. Now let's move on to the
specific problem of counting inversions
and see how to apply this divide and
conquer paradygm. So let begin by defining
the problem formally now. We're given as
input an array A with a length N. And you
can define the problem so that the array a
contains any ole distinct numbers. But,
let's just keep thing simple and assume
that it contains the numbers one through
n. The integers in that range in some
order. That captures the essence of the
problem. And the goal is to compute the
number of inversions of this array so
what's an inversion you may ask well an
inversion is just a pair of array
[inaudible] I and J with I smaller than J
so that earlier array entry the I entry is
bigger than the latter one the Jake one so
one thing that should be evident is that
if the array contains these numbers in
sorted order if the array is simply one
two three four all the way up to N then
the number of inversions is zero. The
converse you might also want to think
through if the array has any other
ordering of the numbers between one and N
other than the assorted one, then it's
going to have a non. Of zero number of
inversions. Let's look at another example.
So'spose we have an array of six entries.
So the numbers one thru six in the
following order. One, three, five followed
by two, four, six. So how many inversions
does this array have? So again what we
need to look for are pairs of array
entries so that the earlier or left entry
is bigger than the later or right entry.
So one example which we see right here
would five and two. Those are right next
to each other and out of order, the
earlier entry is bigger than the other
one. But there's others, there's three and
two for example Those are out of order.
And, five and four are also out of order.
And I'll leave it to you to check that
those are the only three pairs that are
out of order. So summarizing the
inversions in this array of length six are
3-2, 5-2, and 5-4. Corresponding to the
array entries, 2-4, 3-4, and 3-5.
Pictorially, we can think of it thusly, we
can first. Write down the numbers in
order, one up to six. And then we can
write down the numbers again but, ordered
in the way that their given in the input
array. So, one three five two four six.
And then we can connect the dots, meaning
we connect one to one. Reconnect two to
two, and so on. It turns out, and I'll
leave to for you to, to think this
through, that the number of crossing pairs
of line segments prescisely correspond to
the number of inversions. So we see that
there are one, two, three crossing line
segments. And these are exactly in
correspondence with the three inversions,
we found earlier. Five and two, three and
two, and five and four. Now, [inaudible]
wanna solve this problem you might ask.
Well there's few reasons that come up. One
would be to have a numerical similarity
measure that quantifies how close to
[inaudible] lists are to each other. So
for example, suppose I took you and a
friend, and I took, identified ten movies
that both of you had seen. And I asked
each of you to order, or to rank these
movies from your most favorite to your
least favorite. Now I can form an array,
and compute inversions. And it quantifies,
in some sense, how dissimilar your two
rankings are to each other. So in more
detail, in the first entry of this array,
I would put down the ranking that your
friend gave to your favorite movie. So if
you had your favorite movie, Star Wars or
whatever. And your friend only thought it
was the fifth best out of the ten, then I
would write down a five in the first entry
of this array. Generally, I would take
your second favorite movie. I would look
at how your friend ranked that. I would
put that in the second entry of the array
and so on, all the way up to the tenth
entry of the array, where I would put your
friend's ranking of your least favorite
movie. Now, if you have exactly identical
preferences, if you rank them exactly the
same way, the number of inversions of this
array would be zero. And in general, the
more inversions this array has, it
quantifies that your lists look more and
more different from each other. Now why
might you want to do this why might you
want to know whether two different people
ranked things in the similar way had
similar preferences well one reason might
be what's called collaborative filtering,
probably many of you have had the
experience of going to a website and if
you've made a few purchases through this
website it starts recommending further
purchases for you, so one way to solve
this problem under the hood, is to look at
your purchases look at what you seem to
like, find other people who have similar
preferences similar history look at things
they've bought that you haven't, and then
recommend. New products to you based on
what similar customers seemed to have
bought. So this problem captures some of
the essence of identifying which customers
or which people are similar based on data
about what they prefer. So just to make
sure we're all on the same page, let me
pause for a brief quiz. We've already
noticed that a given array will have zero
inversions, if and only if it's in sorted
order. If it only contains the numbers of
one through N in order. So, on the other
side, what is the largest number of
inversions an array could possibly have?
Let's say, just for an array of size six,
like the one in this example here. So the
answer to this question is the first one.
Fifteen. Or in general in an N. Element
array the largest number of inversions is
N. Choose two. Also known as N times N
minus one over two. Which, again, in the
case of a [inaudible] is going to evaluate
to fifteen. The reason is, the worst case
is when the array is in backwards order,
reverse [inaudible] order, and every
single pair of [inaudible] indices is
inverted. And so the number of indices IJ,
with I less than J is precisely
[inaudible] too. Let's now turn our
attention to the problem of computing the
number of inversions of an array as
quickly as possible. So one option that is
certainly available to us is the brute
force algorithm. And by brute force I just
mean we could set up a double four loop.
One which goes through I, one which goes
through J bigger than I, and we just check
each pair IJ individually with I less than
J whether that particular pair of array
entities AI and AJ is inverted and if it
is then we add it to our running count.
And then we return the final count at the
end of the double four loop. That's
certainly correct. The only problem is, as
we just observed, there's N [inaudible]
two or a quadratic number of potential
inversions so this algorithm's almost
going to run in time quadratic in the
array link. Now remember the mantra of any
good algorithm designer. Can we do better?
And the answer is yes. And the method
we'll be using, divide and conquer. The
way in which we'll divide will be
motivated directly by merge sort where we
recurs e separately on the left and the
right half's of the array. We're gonna do
the same thing here. To understand how
much progress we can make purely using
recursion let's classify the inversions of
array into one of three types. So suppose
we have an inversion of an array I, J, and
remember in an inversion you always have I
less than J. We're gonna call it a left
inversion. If both of the array indices
are at most N over two, where N is the
array length. We're gonna call it a right
inversion if they're both strictly greater
than N over two. And we're gonna call it a
split inversion if the smaller index is at
most N over two and the larger index is
bigger than N over two. We can discuss the
progress made by recursion in these terms.
When we recurse on the left-half of an
array, if we implement our algorithm
correctly, we'll successfully be able to
count all of the inversions located purely
in that first half. Those are precisely
the left inversions. Similarly, a second
recursive call on just the right half of
an array, the second half of an
[inaudible] array will successfully count
all of the right inversions. There remains
the questions of how to count the split
inversions. But we shouldn't be surprised
there's some residual work left over, even
after the recursive calls do their job.
That, of course, was the case at Merge
Short, where [inaudible] magically took
care of sorting the left half of the
array, sorting the right half of the
array. But there was still, after their
return, the matter of merging those two
sorted lists into one. And here again,
after the recursion is gonna be the matter
of cleaning up and counting the number of
split inversions. So for example if you go
back to the six element array we worked
through before, 135246, you'll notice that
there, in fact, all of the inversions are
split. So the recursive calls will both
come back counting zero inversions. And
all of the work for that example will be
done by the count split inversions
subroutine. So let's summarize where
things stand given underspecified high
level description of the algorithm as we
envision it. There is a base case. I'll go
ahead and write it down for completeness,
which is if we're given a one element
array, then there's certainly no inversion
so we can just immediately return the
answer zero. For any bigger array, we're
going to divde and conquer. So we'll count
the left inversions with a recursive call.
The right inversions with a recursive
call. And then we'll have some currently
unimplemented subroutine that counts the
split inversions. Since every inversion is
either left or right, or split, and can't
be any more than one of those three, then,
having done these three things, we can
simply return their sum. So that's our
high level attack on how we're gonna count
up the number of inversions. And of
course, we need to specify how we're gonna
count the number of split inversions. And
moreover, we lack that subroutine to run
quickly. An analogy to emerge short,
where, outside the recursive calls, we did
merely linear work. Outs-, in the merge
subroutine. Here, we'd like to do only
linear work in counting up the number of
split inversions. If we succeed in this
goal, if we produce a correct and linear
time of limitation to count up the number
of split incursions, then this entire
recursive algorithm will run in big O. Of
N. Log in time. The reason the overall out
rhythm will run in O. Of N. Log in time is
exactly the same reason that merge short
ran in N. Log in time. There's two
recursive calls. Each on a problem of
one-half the size. And outside of the
recursive calls we would be doing linear
work. So you could copy down exactly the
same recursion tree argument we used for
merge short. It would apply equally well
here. Alternatively, very soon we will
cover the master method, and as one very
special case it will prove that this
algorithm, if we can implement it thusly,
will run in O. Of N. Log in time. Now one
thing to realize, is this is a fairly
ambitious goal, to count up the number of
split inversions in linear time. It's not
that there can't be too many split
inversions. There can actually be a lot of
them. If you have an array where the first
half of the array contains the numbers N
over two plus one, up to N. Whereas the
second part of the array contains the
numbers one up to N over two, that has a
quadratic number of inversions, all of
which are split. So, what we're attempting
to do here is count up a quadratic number
of things using only linear time. Can it
really be done? Yes is can, as we'll see
in the next video.