1
00:00:03,407 --> 00:00:10,166
next we're gonna talk about 3-way radix
quicksort which is that algorithm that

2
00:00:10,393 --> 00:00:14,722
combines the benefits of quicksort and MSD
radix sort.

3
00:00:14,950 --> 00:00:21,936
This algorithm actually came to me into
development during the development of this

4
00:00:21,936 --> 00:00:25,656
course.
As it turns out, radix sorting, while it

5
00:00:25,656 --> 00:00:29,762
was really well known to everyone in the
60's and70's.

6
00:00:29,957 --> 00:00:35,589
It was kind of lost sight of for a while
in the'80's as people moved to a higher

7
00:00:35,589 --> 00:00:40,509
level of programming languages.
And it wasn't so easy or effecient to get

8
00:00:40,509 --> 00:00:44,523
characters out of keys.
You had to either work with numbers.

9
00:00:44,717 --> 00:00:47,631
You had to use abstractions like compared
to.

10
00:00:47,825 --> 00:00:50,868
And radix sort got kind of lost for a
while.

11
00:00:51,062 --> 00:00:55,400
But then string processing became more and
more important as.

12
00:00:55,400 --> 00:01:01,120
Computers became bigger and faster and so
we needed to talk about how to sort

13
00:01:01,120 --> 00:01:07,182
strings and this algorithm is a really
simple algorithm that comes out when you

14
00:01:07,182 --> 00:01:13,491
start to address that problem.
So, the idea is that what we're gonna do

15
00:01:13,491 --> 00:01:20,480
is use 3-way partitioning for quicksort.
But just by character.

16
00:01:20,480 --> 00:01:26,668
So since there's a, we're gonna do one
character at a time, there's gonna be a

17
00:01:26,668 --> 00:01:32,697
lot of equal keys for that character.
And the idea is that when you do have

18
00:01:32,697 --> 00:01:38,966
equal keys, you'll pull'em together with
those leading characters, and then you

19
00:01:38,966 --> 00:01:45,637
could recursively sort the subarrays, but
you can do the normal quicksort thing for

20
00:01:45,637 --> 00:01:48,612
those.
So for this example, when we partition,

21
00:01:48,809 --> 00:01:54,058
first partitioning item, we're going to
use it's first character, and we're going

22
00:01:54,058 --> 00:01:58,873
to divide up into less than that.
Equal to that and greater than that.

23
00:01:58,873 --> 00:02:04,345
And then we re-curse three times, one for
the greater one for the ones that start

24
00:02:04,345 --> 00:02:07,364
with the same character and one for the
less.

25
00:02:07,364 --> 00:02:12,890
That's overview of the algorithm.
So, let's do just a, a quick trace of the

26
00:02:12,890 --> 00:02:17,465
first few recursive calls.
So, partitioning item, again, is S.

27
00:02:17,690 --> 00:02:22,265
We divide it up that way.
And so now, we're gonna sort the top

28
00:02:22,265 --> 00:02:28,448
subarray, the ones that are less than S.
So we partition that out in B and then we

29
00:02:28,448 --> 00:02:33,808
get down to subarrays, size one.
And the next thing that we need to do is

30
00:02:33,808 --> 00:02:37,936
the big middle sub-file.
We need to sort it on its second

31
00:02:37,936 --> 00:02:41,340
character, so the partition item is E this
time.

32
00:02:41,340 --> 00:02:45,612
So we rearrange those to have the ones
that start with S, E.

33
00:02:45,612 --> 00:02:48,799
The ones that are less and there aren't
any,

34
00:02:48,799 --> 00:02:55,869
And the ones that are bigger.
And so next recursion is on the third

35
00:02:55,869 --> 00:03:03,564
letter in that middle subfile.
In this case, it's A, and so We have, the

36
00:03:03,564 --> 00:03:10,026
ones that are equal, so that's the ones
that start with SEA and the ones that are

37
00:03:10,026 --> 00:03:14,872
bigger of the two cells and then move on
to the next character.

38
00:03:14,872 --> 00:03:19,950
So it, it's of the ones that are equal in
kind of a controlled way.

39
00:03:20,180 --> 00:03:24,631
That's a example of a trace for 3-way
string quicksort.

40
00:03:24,631 --> 00:03:28,315
Now it's a program that almost writes
itself.

41
00:03:28,545 --> 00:03:34,378
Once you've seen the idea and you've seen
an implementation of Quicksort,

42
00:03:34,609 --> 00:03:41,363
It's a very minor modification into the
3-way quicksort that we discussed when we

43
00:03:41,593 --> 00:03:47,656
were talking about Quicksort.
Instead, so we pick out the Dth character.

44
00:03:47,656 --> 00:03:53,719
We take d as an argument, we pick out the
dth character we do the regular

45
00:03:53,950 --> 00:03:58,330
Partitioning element.
Again, picking out.

46
00:03:58,967 --> 00:04:03,113
Just the Dth character for each key that
we look at.

47
00:04:03,113 --> 00:04:06,939
And this is the standard three way
partitioning.

48
00:04:06,939 --> 00:04:12,998
And then when we're done with the
partitioning, we do the array of ones that

49
00:04:12,998 --> 00:04:18,657
are less chara-, that Dth characters less
than the ones that are bigger.

50
00:04:18,657 --> 00:04:25,035
And then in the middle, if we had any,
we'd sort the middle sub-ray and we'd move

51
00:04:25,035 --> 00:04:30,201
over one character.
That's a, a very compact string sorting

52
00:04:30,201 --> 00:04:35,982
algorithm that performs very well.
Now, what about the performance of that

53
00:04:35,982 --> 00:04:38,340
algorithm?
Well, we talked about.

54
00:04:38,559 --> 00:04:44,279
Standard quicksort performance.
And, if we randomly order the keys ahead

55
00:04:44,279 --> 00:04:47,578
of time.
You use 2N natural log int string

56
00:04:47,578 --> 00:04:51,904
compares, on average.
And, the thing is, though, if you have

57
00:04:51,904 --> 00:04:55,131
keys that have lots of long common
prefixes.

58
00:04:55,131 --> 00:05:01,583
And this happens in lots of applications.
Then, those compares are gonna go through

59
00:05:01,583 --> 00:05:09,493
all those keys every time.
That 3-way string quicksort Uses, although

60
00:05:09,493 --> 00:05:17,990
the analysis is pretty complicated, it
uses only 2N ln N character compares, so

61
00:05:17,990 --> 00:05:23,799
that's an amazing, a huge savings for
typical common cases.

62
00:05:23,799 --> 00:05:31,082
It doesn't have to, go through the long
common prefixes that usually cause the

63
00:05:31,082 --> 00:05:38,276
problem. And what about versus MSD's or,.
Well, MSD sort has this caching problem

64
00:05:38,276 --> 00:05:44,479
and it's got the count arrays and it's got
all this overhead involved in maintaining

65
00:05:44,479 --> 00:05:48,127
the counts.
Whereas three way string quick-sort is

66
00:05:48,127 --> 00:05:53,893
still cash friendly, 'cause most of the
time, it's partitioning the same way that

67
00:05:53,893 --> 00:05:58,417
normal quick sort is.
It's still got that short inner loop and

68
00:05:58,417 --> 00:06:04,117
it doesn't use any extra space.
And there's all kinds of applications that

69
00:06:04,341 --> 00:06:10,170
for example library call numbers or
account numbers where long string keys

70
00:06:10,170 --> 00:06:16,447
that have non randomness in them.
This algorithm adapts really well to such

71
00:06:16,447 --> 00:06:20,706
situations.
The bottom line is if you've got string

72
00:06:20,706 --> 00:06:25,040
keys, 3-way string quicksort is the method
of choice.

73
00:06:25,040 --> 00:06:30,047
It's simple to implement,
And it's gonna perform well so I'll

74
00:06:30,047 --> 00:06:34,680
probably leave sorting algorithms with
this bottom line.

75
00:06:34,954 --> 00:06:43,817
And the idea is that now we've got a quite
fast algorithm that, does a linear

76
00:06:43,817 --> 00:06:51,298
rhythmic number of character compares,
And that's in randomly, random keys in

77
00:06:51,298 --> 00:06:54,929
some way.
But even if they're not random, it's

78
00:06:54,929 --> 00:06:58,560
difficult to characterize really the worst
case.

79
00:06:58,787 --> 00:07:03,779
But it's more, when data is not or this
algorithm won't perform well,

80
00:07:04,006 --> 00:07:10,435
And even better chance of getting
sub-linear performance then all the other

81
00:07:10,435 --> 00:07:14,141
algorithms.
That's 3-way radix quicksort.