So what I want to do next is test your
understanding about the search and
insertion procedures by asking you about
their running time. So of the following
four parameters of a search tree that
contains n different keys, which one
governs the worst case time of a search or
insertion. So the correct answer is the
third one. So, the heights of a search
tree governs the worst case time of the
search or of an insertion. Notice
that means merely knowing the number of
keys n is not enough to deduce what the
worst case search time is. You also have
to know something about the structure of
the tree. So, to see that, just let's
think about the two examples that we've
been running so far. One of which is nice
and balanced. And the other of which,
which contains exactly the same five keys
is super unbalanced, It's this crazy
linked list in effect. So, in any
search tree, the worst case time to do is
search or insertion is
proportional to the largest number of
pointers left to right child pointer that
you might have to follow to get from the
root all the way to a null pointer. Of course
in a successful search, you're going to
terminate before you encounter a null
pointer but in the worst case, you want
insertion you go all the way to a null
pointer. Now on the tree on the left
you're going to follow at most 3 such
pointers. So for example, if you're
searching for 2.5. You're going to follow
a left pointer followed by a right
pointer. By another pointer and that one
is going to be null. So we're going to
follow three pointers. On the other hand,
in the right tree, you might follow as
many as five pointers before that fifth
pointer is null. For example, if you
search for the key zero, you're going to
traverse five left pointers in a row and
then you're finally going to encounter the
null at the end. So, it is not constant
time certainly, you have to get to the
bottom of the tree. It's going to be from
proportional to logarithmic, logarithm in
the number of keys if you have a nicely
balanced binary search tree like this one
on the left. It's going to be proportional
to the number of keys n as in the fourth
answer if you have a really lousy
search tree like this one on the right and
in general. Search time or the insertion
time is going to be proportional to the
height. The largest number of hops we need
to take to get from the root to
the leaf of the tree. Let's move on to some more
operations that search tree support but
that, for example, the dynamics data
structures of heaps and hash tables do not.
So let's start with the minimum and
the maximum. So, by contrast and a heap
remember, you can choose one or the two.
You can either find the minimum, usually
you find the maximum easily but not both.
And the search tree is really easy to
find, either the min or the max. So, let's
start with the minimum. One way to think
of it is that you do a search for negative
infinity in the search tree. So, you
started the root. And you just keep
following left child pointers until you
run out, until you hit a null. And
whatever the last key that you visit has
to be the smallest key of the tree, right?
Because, think about it, suppose you
started the root. Supposed that the root
was not the minimum, then where is the
minimum got to be, It's got to be in the
left sub-tree so you follow the left
child pointer and then you just
repeat the argument. If you haven't
already found the minimum, where it's got
to be with respect to current place,
it's got to be in the left sub tree and
you just iterate until you can't go to the left
any further. So for example, in our
running search tree. You'll notice that if
we just keep following left child
pointers, we'll start at the three, we'll
go to the one, we'll try to go left from
the one. We'll hit a null pointer and
we'll return one and one is indeed the
minimum key in this tree. Now, given that
we've gone over how to compute the
minimum, no prizes to guess how we compute
the maximum. Of course, if we want to
compute the maximum instead of following
left child pointers we follow right child
pointers by symmetric reasoning as
guaranteed upon the largest key in the
tree. It's like searching for the key plus
infinity. Alright. So what about computing
the predecessor? So remember this means
you're given key in the tree, in the
element of the tree and you want to find
the next smallest element so for example
the predecessor of the three is two. The
predecessor of the two in this tree is the
one. The predecessor of the five is the
four. The predecessor of the four is the
three. So, here I'll be a little hand
wavy just in the interest of getting
through all of the operations in
reasonable amount of time but let me just
point out that there is one really easy
case and then there is one slightly
trickier case. So the easy case. Is
when the node with the key k has a
non-empty left sub tree. If that's the
case, then what you want is simply the
biggest element in this node left sub
tree. So, I'll leave it for you to prove
formally that this is indeed the
correct way to compute predecessors
for keys that do have a non-empty left sub
tree, let's just verify in our example
by going through the trees that have a
left sub tree and checking this is in
fact what we want. Now, if you look at it,
there's actually only two nodes that have
a non-empty left sub tree. The three has a
non-empty left sub tree and indeed the
largest key in the left sub tree three is
the two and that is the predecessor of the
three so that worked out fine. And then
the other node with a non-empty left subtree is the five
and it's left subtree is simply the element four
of course the maximum of that tree is also
four. And then you'll notice that is
indeed the predecessor of five in this
entire search tree. So, the trickier case
is what happens if you know the key with
no left subtree at all. Okay. So, what are
you going to do if you not in the
easy case, Well, given at this node with
key k, you only have three pointers and by
assumption, the left one is null so that's
not going to get you anywhere, now, the
right childpointer if you think about it
is totally pointless for computing
the predecessor. Remember, the predecessor
is going to be a key less than the given key
k. The right subtree by definition of a
search tree only has keys that are bigger
than k. So, it stands for reason to find
the predecessor we got to follow the
parent pointer. Maybe in fact more than
one parent pointer so to motivate exactly
how we're going to follow parent pointers,
let's look at a couple of examples in our
favorite search tree here on the right.
So, let's start with a node two. So, we
know we got to follow a parent
pointer. When we follow to this parent
pointer, we get to one and boom, one in
fact is two's predecessor in this tree so
that was really easy to computer two's
predecessor. It seemed that all we have to
do is follow the parent pointer. So, for
another example though which think about
the node four. Now, four when we follow
which parent pointer, we get to five and.
Five is not 4's predecessor, it's 4's
successor. What we wanted a key that is less
than where we started, we follow the
parent pointer and it was bigger. But, if
we follow one more parent pointer, then we
get to the three. So, from the two we
needed to follow one parent pointer, from
the four we needed to follow two parent
pointers. But the point is, you just need
to follow parent pointers until you get to
a node with key smaller than your own. And
at that point you can stop and that's
guaranteed to be the predecessor. So,
hopefully, you would find this intuitive.
I should say, I have definitely not
formally proved that this works and that
is a good exercise for those of you that
want to have a deeper understanding of
search trees and this magical search tree
property and all of the structure that it
grants you. The other thing I should
mention is another way to interpret the,
the terminating criteria. So what I've
said is you stop your search of parent
pointers as soon as you get to through
smaller than yours If you think it about
a little bit, you'll realize you'll get to
a key smaller than yours, the very first
time you take a left turn. So the very
first time that you go from a right child
to it's parent. Look at the example, when
we started from two, we took a left turn,
right? We went from upper link going
leftward To it's a right child of one, and
that's when we got to the predecessor in
just one step. By contrast when we started
from the four, our first step was to the
right. So, we got to a node
that was bigger than where we started for
five is four's left child which is going
to be smaller than five. But the first
time we took a left turn on the next step,
we got to a node that is not only smaller
than five but actually smaller from four,
smaller from the starting point. So, in
fact, you're going to see a key smaller
than your starting point at very first
time, you take a left turn, the very first
time you go from a node to a parent and in
fact, that node is that parent's right
child. So this is another statement which
I think is intuitive but which formally is
not totally obvious. And again I encourage
you to think carefully about why these two
descriptions of the terminating criteria
are exactly the same so it doesn't matter
if you stop when you first find a key
smaller than your starting point. It
doesn't matter if you first stop when you
follow a parent pointer that goes from a
node that's the right child of a node.
Either way you're going to stop at exactly
the same time so I encourage you to think
about why those are the exact same
stopping condition. A couple of other
details if you start from the unique node
that has no predecessor at all, you'll
never going to trigger this terminating
condition so for example if you start from
the node one in the search tree, not only
is the left subtree empty which says
you're suppose to start traversing parent
pointers but then when you traverse a
parent pointer, you only go to the right.
You never turn left and that's because
there is no predecessor so that's how you
detect that you're at the minimum of a
search tree. And then of course if you
wanted to be the successor of the key
instead of the predecessor, obviously you
just flip left and right through out this
entire description. So that's the high
level explanation of all of these
different ordering operations, minimum and
maximum predecessor and successor work in
a search tree. Let me ask you the same
question I asked you when we talked about
search in insertion. How long that these
operations take in the worst case? Well,
the answer is the same as it was before.
It's proportional to the height of the
tree and the explanation is exactly the
same as it was before. So to understand
the dependence on the height was just
focused on the maximum operation that has
the state within the question. The other
three operations, the running time is
proportional to the height in the worst
case for exactly the same reasons. So,
what is the max operation do when you
started the root and you just follow the
right child pointers until you run out
them so you hit null. So, you know, that
the running time is going to be no worse
in the longest such paths. It's particular
path from the root to essentially a
leaf. So instead we're going to have a
running time more than the height of the
tree, on the other hand for all you know.
The path from the root to the maximum key
might well be the longest one in the tree.
It might be the path that actually
determines the height of the search tree.
So, for example in our running unbalanced
example, that would be a bad tree for the
minimum operation If you look for the
minimum in this tree, then you have to
traverse every single pointer from five
all the way down to one. Of course there's
an analogious bad search tree for the
maximum operation where the one is the
root and the five is all the way down to the left. Another thing you can do is search
trees which mimics what you can do with
sorted arrays is you can print out all of
the keys in the sorted order in linear
time with constant time per element.
Obviously, in the sorted array this is
trivial. Use your for loop start ing at
the beginning at the array pointing up the
keys one at a time and there's a very
elegant recursive implementation for doing
the exact same thing in a search tree. And
this is known as an in order traversal of
binary search tree. So as always you begin
at the beginning namely at the root of the
search tree. And a little bit of notation
of which call, all of the search tree that
starts at r's left child t sub l and the
search tree routed at r's right child t
Sub r. In our running example of course
the root is three t sub l with
correspondent in the search tree
comprising only the elements one and two,
t sub r would correspond to the sub-tree
comprising only the elements five and
four. Now, remember we want to print out
the keys in increasing order. So in
particular, the first key we want to print
out is the smallest of them all. So it's
something we definitely don't want to do
is we don't want to first print out the
key at the root. For example in our search
tree example, the root's key is three, we
don't want to print that out first. We
want to print out the one first. So where
is the minimum lie? Well, by the search
tree property, it's got to lie in the left
subtree t sub l, So we're just going to
recurse on t Sub l. So by the magic of
recursion or if you prefer induction, what
re-cursing on t sub l is going to
accomplish is we're going to print out all
of the keys in t sub l in increasing order
from smallest to largest. Now that's
pretty cool because t sub l contains
exactly the keys that are smaller than the
key of the root. Remember that's the
search tree property. Everything bigger
than the root's key has to be in the left
sub tree. Everything bigger than the
root's key have to be in its right sub
tree. So in our concrete example of this
first recursive call is we're going to
print the keys one and then two. And now,
if you think about it it's the perfect
time to print out the key at the root,
right? we want to print out all the keys in
increasing order we've already done
everything less than the root's key Where
re-cursing and on the right hand side will
take you everything bigger in it so in
between the two recursive calls, this is
why it's called an in order traversal,
that's when we want to print out. R's key.
And clearly this works in our concrete example, the first recursive call print out one and two,
it's the perfect time to print out three and
then a recursive call of print out four
and five. And more generally, the
recursive call on there right subtree
will print out all of the keys bigger than
the roots key and increasing order again
by the magic of recursion or induction
So, the fact that the pseudo-code is
correct. The fact that the so-called
in-order traversal indeed print out the
keys in increasing order. This is a fairly
straightforward proof by induction. It's
very much in the spirit or the proofs by induction, correctness of divide and conquer
algorithms that we've discussed earlier in
the course. So what about the running time
of an in order traversal? The claim is
that the running time of this procedure is
linear. It's O of n where n is the number
of keys in the search tree. And the reason
is, there's exactly one recursive call for
each node of the tree and constant work is
done in each of those recursive calls. And
a little more detail, so what is the
in order] traversal do, It will print
out the keys in increasing. In particular
it prints out each key exactly once. Each
recursive call prints out exactly one key's
value. So there's exactly n recursive
calls and all of the recursive call does
is print one thing. So n recursive calls
constant time for each that gives us a
running time of O(n) overall. In most
data structures, deletion is the most
difficult operation and in search trees.
There are no exception. So let's get into
it and talk about how deletion works,
there are three different cases. So the
first order of business is to locate the
node that has the key k, locate
the node that we want to get rid off.
Right so for starters, maybe we're
trying to delete the key two from our
running example search tree. So the first
thing we need to do is figure out where it
is. So, there are three possibilities for
the number of children that a node in a
search tree might have and might have zero
children that might have one child
it might have two children, corresponding to
those three cases that the deletion
pseudo-code will also have three cases.
So, let's start with the happy case where
there's only zero children like in this
case where deleting the key 2
from the search tree. Then of course, we
can, without any reservations just delete
the node directly from the search tree,
Nothing can go wrong, there's no children
depending on that node. Then there is the
medium difficult case. This is where. The
node containing k has one child. An
example here would be, if we wanted to
delete five from the search tree so the
medium case is also not too bad. All you
got to do is splice out the node that you
want to delete. That creates a hole in the
tree but then that node, deleted node's
unique child assumes the previous position
of the deleted node. I can make a nerdy
joke about Shakespeare right here but I'll
refrain. For example, in our five node
search tree if we wanted to, let's say we
haven't actually deleted two out of this
one, if we wanted to delete the five. The
five when we take it out of the tree that
would leave a hole but then we just
replace the position previously held by five
by it's unique child four. And if you
think about it that works just fine in the
sense of that preserves the search tree
property. Remember the search tree
property says that everything in say, a
right subtree has to be bigger than
everything in the nodes key, so we've made
four the new right child of three but four
and any children that it might have were
always part of 3's right subtree so all
that stuff has got to be bigger than three
so there's no problem putting four and
possibly all of its descendants. as the
right child of three. The search tree
property is in fact retained. So, the
final difficult case then is when the node
being deleted has both of its children,
has two children. So, in our running
example with five nodes, this would only
transpire if you wanted to delete the
root, you want to delete the key three
from the tree. The problem, of course, is
that, you know, you can try ripping out
this node from the tree but then, there's
this hole and it's not clear that it's
going to work to promote either child.
Into that spot. You might stare at our
example search tree and try to understand
what would happen if you try to bring one
up to be the root or if you try to bring
five up to be the root. Problems would
happen, that's what would happen. This is an
interesting contrast to when we faced the
same issue with heaps. Because the heap
property in some sense is perhaps less
stringent, there we didn't have an issue.
When we wanted to delete something with
two children, we just promoted the smaller
of the two children assuming we wanted to
export and extract them in operation.
Here, we're going to have to work a little
harder. In fact this is going to be really
neat trick. We're going to do something
that reduces the case of two children to
the previously solved cases of zero or one
children. So here's a very sneaky way we
identify a node to which we can apply
either the case zero or the case one
operation. What we're going to do is we're
going to. Start from k and we're going to
compute k's predecessor. Remember, this is
the next smallest key in the tree. So, for
example, the predecessor of the key three
is two. That's the next smallest key in
the tree. In general, let's call case
predecessor l. Now, this might seem
complicated. We're trying to implement one
tree operation and with deletion and all
of a sudden we're invoking a different
tree operation predecessor which we
covered a couple of slides ago. And to
some extent you're right you know,
delete, this is a nontrivial operation.
But, it's not quite as bad as you think
for the following reason. When we compute
this predecessor, we're actually in the
easy case of the predecessor operation
conceptually . Remember how do you get a
predecessor, well it depends. What does it
depend on? It depends on whether you got a
non-empty left sub tree or not. If you
don't have a non-empty left sub tree,
that's how you got to those things and
follow a parent pointers upward until you
find a key which is smaller than what
you've started. But. If you've got a left
sub tree, then it's easy. You just find
the maximum of the left sub tree and
that's got to be the predecessor and
remember, finding maximum are easy. All
you have to do is follow right child
pointers until you can't anymore. Now,
what's cool is because we only bother with
this predecessor computation in the case
where case k's node has both children. We only
have to do it in the case where it has a
non-empty left subtree. So really when we
say compute k's predecessor l. All you
got to do is follow k's left child. That's
not null because it has both children. And
then, follow right child pointers until
you can't anymore and that's the
predecessor. Now, here's the fairly
brilliant parts of the way you do
implement deletion in the search tree
which is you swap these two keys, k and l.
So for example in our running search tree,
instead of this three at the root we would
put a two there and instead of this two at
the leaf, it would put a three there. And
the first time you see this, it just
strikes you as a little crazy, maybe even
cheating or just simply disregarding the
roles of, rules of search trees. And
actually, it is like check out what happen
to our example search tree. We swap the
three and the two and this is not a search
tree anymore, right? So, we have this three
which is in two left sub tree and a three
is bigger than the two and that is not
allowed. That is violation of the search
tree property. Oops. So, how can we get
away with this and we get away with this
is we're going to delete three anyway. So,
we're going to wind up with the search
tree at the end of the day. So we may have
messed up the search tree property a
little bit but we've swapped k in the
position where its really easy to get rid of.
Well how did we compute case
predecessor l? Ultimately that was the
result of a maximum computation which
involves following right child pointers
until you get stuck and l was the place we
got stuck. What's the meaning to get
stuck? It means l's right child pointer is
null. It does not have two children. In
particular it does not have a right child.
Once we swap k in the l's old position, k
now does not have a right child. It may or
may not have a left child and the example
on the right it does not have a left child
either in this new position but in general
it might have a left child. But, it
definitely doesn't have a right child.
Because that was a position at which a
maximum computation got stuck. And if we
want to delete a node that has only zero
or one child, well that we know how to do.
That we covered in the last slide. Either
you just delete it, that's what we do in
the running example here. Or in the case
where k's new node does have a left child,
you would do the splice out operation. So
you would rip out the node that contains k
and that the unique child of that node
would assume the previous position of that
node. Now an exercise which I'm not going
to do here but I strongly encourage you
think through in the privacy of your own
home, is that , in fact, this deletion
operation retains the search tree
property. So roughly speaking, when you do
the swap, you can violate the search tree
property as we see in this example but all
of the violations involved the node
you're about to delete so once you delete
that node, there's no other violations of
the search property so bingo, you're left
with the search tree. The running time
this time no get, no prizes for guessing
what it is because it's basically just one
of these predecessor computations plus
some pointer rewiring just like the
predecessor and search is going to be
governed by the height of the tree. So let
me just say a little bit about the
final two operations mentioned earlier,
select and rank. Remember select is just
a selection problem. I'll give you an
order statistic like seventeen and I want you
to return the seventeenth smallest key in
the tree. Rank is I give you a key value
and I want to know how many keys in the
tree are less than or equal to that value.
So, to implement these operations
efficiently, we actually need one small
new idea which is to augment binary search
trees with additional information at each
node. So, now the search tree will contain
not just a key but also information about
the tree itself. So, this idea is often
called augmenting your data structure and
perhaps the most canonical augmentation of
the search tree like these is to keep track
in each node, not just to the key value
but also over the population of tree nodes
in the sub tree that is rooted there. So
let's call this size of x. Which is the
number of tree nodes in the subtree rooted
at x. So to make sure you know what I
mean, let me just tell you what the size
field should be for each of the five nodes
in our running search tree example. So
again example, we're thinking about how
many nodes are in the subtree rooted given
node. Or equivalently, following child
pointers from that node how many different
tree nodes can you reach? So from the root
of course, you can reach everybody.
Everybody's in the tree rooted at the root
so the size there is five. By contrast,
you start at the node one, well, you can
get to the one or you can follow the right
child pointer to get to the two. So at the
one. The size would be two and the node
with the key value five for the same
reason, the size would be two. At the two
leaves, the subtree where the leaf is just
the leaf itself so there, the size would
be one. There's an easy way to compute the
size of a given node once you know the
size of its two sub trees. So, if the
given node in the search tree has children
y and z, then, how many nodes are there in
the sub tree rooted x, well, there's those
that are rooted at y. There are those in
the left sub tree, there are those that
are reachable from z that is there are
the children that are also children of z
and then there's x itself. Now in
general, whenever you augment a data
structure, and this is something we'll
talk about again when we discuss red black
trees, you've got to pay the piper. So,
the extra data that you maintain it might
be useful for speeding up certain
operations. But whenever you have
operations that modify the tree,
specifically insertion and deletion, you
have to take care to keep that extra data
valid, keep it maintained. Now, in the
case of the subtree sizes, there are quite
straightforward to maintain under
insertion and deletion without affecting the
running time of insertion and deletion
very much but that's something you should
always think about offline. For example,
when you perform an insertion remember how
that works. You do as, essentially a
search. You follow left and right child
pointers down to the bottom of the tree
until you get a null pointer then that's
where you stick a new node. Now what you
have to do is you have to trace back up
that path, all of the ancestors of the new
node you just inserted and increment their
subtree sizes by one. So let's wrap up
this video by showing you how to implement
the selection procedure given an nth order
statistic in a search tree that's been
augmented so that at every node you know
the size of a subtree rooted at that node.
Well of course as always you start at the
beginning which in the search tree is the
root. And let's say the root has a
sub-children y and z. Y or z could be
null, that's no problem. We just think of
the size of a null node as being zero.
Now, what's the search tree property? It
says, every, these keys that are less than
the keys sorted x are precisely the one
that are in the left sub tree of x. The
keys in the tree, they are bigger than the
key to x or precisely the ones that you're
going to find in x's right sub tree. So,
supposed we're asked to find the
seventeenth order statistic in the search
three. Seventeenth smallest key that's
stored in the tree, Where is it going to
be? Where should we look? Well, it's going
to depend on the structure of the tree and
in fact it's going to depend on the
subtree sizes. This is exactly. We're
keeping track of them so we can quickly
make decisions about how to navigate
through the tree. So for a simple example,
suppose that x's left subtree contains say
25 keys. So remember y know locally
exactly what the population of the subtree
is so in constant time from x, we can
figure out how many keys are in y subtree
let's say its 25. Now, by the defining
property of search trees, these are the 25
smallest keys anywhere in the tree. Right,
x is bigger than all of them. Everything
in x's right subtree is bigger than all of
them. So, the 25 smallest order
statistics are all in the subtree rooted
to y, clearly that where we should recurse. Clearly
that's where the answer lies so in
recursing the subtree root of y and then
we are again looking for the seventeenth
order statistic in this new smaller search
tree. On the other supposed when we
started x and we look, we ask why. How,
how many nodes are there in your subtree.
Maybe y locally have stored the number
twelve. So there's only twelve things in
x's left subtree. Well, okay, x itself is
bigger than all of them so that's going
to, x is going to be the thirteenth
biggest order statistic. It's going to be
the thirteenth biggest element in the
tree. Everything else is in the right sub
tree. So, in particular, the seventeenth
order statistic is going to be in
the right sub tree so we're going to
recurse in the rght sub tree. Now, what
are we looking for, we're not looking for
the seventeenth order statistic anymore. The
twelve smallest things all in x's sub tree, 
x itself is the thirteenth
smallest so we are looking for the fourth
smallest of what remains. So, the
recursion is very much along the lines of
what we did in the divide and conquer
selection algorithms earlier in the
course. So to fill in some more details,
let's let a denote the subtree size at y.
And if it happens that x has no left
child, we'll, the point would be a to be
zero. So the super lucky case is when
there's exactly i - 1 nodes in the left
subtree. That means the root here, x is
itself the ith order statistic remember it's bigger
than everything In it's left subtree it's
smaller than everything in its right
subtree.  But, in the general case we're
going to be recursing either on the left
subtree or in the right subtree. We
recurse on the left subtree when its
population is large enough that we
guarantee it and compasses the ith order
statistic. And that happens exactly when it
sides is at least i. That's because the left
subtree has the smallest keys that are
anywhere in the search tree. And in the
final case when the left subtree is so
small that the only does it not contain
the ith order statistic but also x is
too small to be an ith order statistic then we
recurse in the right subtree knowing that
we have thrown away a + 1, the a + 1
smallest key values anywhere in the
original tree. So, correctness of this
procedure is pretty much exactly the same
as the inducted correctness for the
selection algorithms we've discussed
earlier in effect to the root of the
search tree is acting as a pivot element
with everything in the left sub tree being
less than the root everything in the right
sub tree being greater than the element in
the root so that's why the recursion is
correct. As far as the running time, I
hope it's evident from the pseudo code
that we do constant time each time they
recurse. How many times can we recurse
when we keep moving down the tree that
maximum number of times we can move down
the tree is proportional to the height of
the tree. So, it was again is proportional
to the height. So, that's the select
operation, There is an analogous way to
write the rank operation. Remember, this
is where you're given the key value and
you want to count up the number of stored
keys that are less than or equal to that
target value, Again, you use this
augmented search trees and again, you can
get running time porportional to the height
and I encourage you to think through
the details of how implement rank
offline.