So, in this video, we'll go over the
basics behind implementing binary search
trees. We are not going to focus on the
balance aspect in this video that will be
discussed in later videos and we are going
to talk about things which are true for
binary search trees on general, balanced
or otherwise. But let's just recall, you
know, why are we doing this, you know,
what is the raison d'être of this data
structure, the balance version of the
binary search tree and basically, its a
dynamic version of a sorted array. So,
that's pretty much everything you can do
on a sorted array, maybe in slightly more
expensive time. They are still
really fast but in addition to this
dynamic, it accommodates insertions and
deletions. So, remember, if you want to
keep a sorted array data structure, every
time you insert, every time you delete,
you're probably going to wind up paying a
linear factor which is way too
expensive in most applications. By
contrast with the search tree, a balanced
version, you can insert and delete a
logarithmic time in the number of keys in
the tree. And moreover, you can do stuff
like search in logarithmic time, no more
expensive than binary search on a sorted
array and also you can sort of say the
selection problem in the special cases,
the minimum or maximum. Okay, it's not
constant time like in a sorted array but
still logarithmic pretty good and in
addition, you can print out all of the
keys from smallest to largest and in
linear time, constant time per element
just like you could with the linear scan
through a sorted array. So, that's what
they're good for. Everything a sorted
array can do more or less plus insertions
and deletions everything in logarithmic
time. So, how are search trees organized?
And again, what I'm going to say in the
rest of this video is true both for
balanced and unbalanced search trees.
We're going to worry about the balancing
aspect in the later videos. Alright, so,
let me tell you the key ingredients in a
binary search tree. Let me also just draw
a simple cartoon example in the upper
right part of the slide. So, this one to
one correspondence between nodes of the
tree and keys that are being stored. And
as usual in our data structure discussions
we're going to act as if the only thing
that we care about, the only thing that
exists at each node is this key when
generally, this associated data that you
really care about. So, each node in the
tree will generally contain both the key
plus a pointer to some data structure that
has more information. Maybe the key is the
employee ID number, and then there's a
pointer to lots of other information about
that employee. Now, in addition to the
nodes, you have to have links amongst the
nodes and there's a lot of different ways
to do the exact implementation of the
pointers that connect the node of the tree
together but the video I'm just going to
keep is straightforward as possible and
we're just going to assume that in each
node, there's three pointers. One to a
left child, another one to the right child
and then the third pointer which points to
the parent. Now, of course, some of these
pointers can be null and in fact in the
five node binary search tree I've drawn on
the right for each of the five nodes, at
least one of these three pointers is null.
So, for example, for the node with key one
it has a null left child pointer, there was
no left child. It's the right child
pointer going to point to the node with
key two and the parent pointer was going
to a node that has key three. Similarly
three is going to have a null parent pointer
and the root node in this case, three is a
unique node but has a null parent pointer.
Here the node with key value three, of
course, has a left child pointer points to
one and has a right child pointer that
points to five. Now, here is the most
fundamental property of search trees.
Let's just go ahead and call it the Search
Tree Property. So, the search tree
property asserts the following condition
at every single node of the search tree.
If the node has some key value then all of
the keys stored in the left subtree should
be less than that key. And similarly, all
of the keys stored in the right subtree
should be bigger than that key. So, if we
have some node who's stored key value is x
and this is somewhere, you know, say deep
in the middle of the tree so upward we
think of as being toward the root. And
then if we think about all the nodes that
are reachable, after following the left
child pointer from x, that's the left
subtree. And similarly, the right subtree
being everything reachable via the right
child pointer from x, it should be the
case that all keys in the left subtree are
less than x and all keys in the right
subtree are bigger than x. And again, I
want to emphasize this property holds not
just to the root but at every single node
in the tree. I've defined the search to a
property assuming that all of the keys are
distinct, that's why I wrote strictly less
than in the left sub tree and strictly
bigger than in the right subtree. But
search trees can easily accommodate
duplicate keys as well. We just have to
have some convention about how you handle
ties. So, for example, you could say that
everything in the left subtree is less
than or equal to the key at that node and
then everything in the right subtree
should be strictly bigger than that node.
That works fine as well. So, if this is
the first time you've ever heard of the
search tree property, maybe at first blush
it seems a little arbitrary. It seems like
I pulled it out of thin air but actually,
you would have reversed engineer this property
if you sat down and thought about what
property would make search really easy in
a data structure. The point is, the search
tree property tells you exactly where to
look for some given key. So, looking ahead
a little bit, stealing my fire from a
slide to come, suppose you were looking
for say, a key 23, and you started the
root and the root is seventeen. The point
of the search tree property is you know
where 23 has to be. If the root is
seventeen, you're looking to 23, if it's
in the tree, no way is it in the left
subtree, it's got to be in the right
subtree. So, you can just follow the right
child pointer and forget about the left
subtree for the rest of the search. This
is very much in the spirit of binary
search where you start in the middle of
the array and again, you compare what
you're looking for to what's in the middle
and either way, you can recurse on one of
the two sides forgetting forevermore about
the other half of the array and that's
exactly the point of the search tree
property. We're going to have to search
from root on down, the search tree
property guarantees we have a unique
direction to go next and we never have to
worry about any of the stuff that we don't
see. We can also draw a very loose analogy
with our discussion of heaps and may
recall heaps were also logically, we
thought of them as a tree even though they
are implemented as an array. And heaps
have some heap property and if you go back
to review the heap property, you'll find
that this is not the same thing as the
search three property. These are two
different properties and that's going to
trying to make different things easy. Back
when we talk about heaps, the property was
that this is for the extract min version.
Parents always have to be smaller than
their children. That's different than the
search tree property which says stuff to the
left, that's smaller than you, stuff to
the right is bigger than you. And heaps,
we have the heap property so that
identifying the minimum value was trivial.
It was guaranteed to be at the root. Heaps
are designed so that you can find the
minimum easily. Search trees are, are
defined so that you can search easily
that's why, you have this different search
tree property. If you want to get smaller,
you go left. If you want to get bigger you
go right. One point that's important to
understand early, and this will be
particularly relevant once we did, once we
try to enforce balancing in our subsequent
videos is that, for a given set of keys,
you can have a lot of different search
trees. On the previous slide , I drew one
search tree containing the key values one,
two, three, four, five. Let me redraw that
exact same search tree here. If you stare
to this tree a little while you'll agree
that in fact that every single node of
this tree, all of the things in the left
subtree are smaller, all of the things in
the right subtree are bigger. However, let
me show you another valid binary search
tree with the exact same sets of keys. So,
in the second search three, the root is
five, the maximum value. And everybody has
no right children, only the left children
are populated and that goes five, four,
three, two, one in descending order. If
you check here again, it has the property
that at every node, everything in the left
subtree is smaller. Everything in the
right subtree, in this case, empty, is
bigger. So, extrapolating from these two
cartoon examples, we surmised that for
a given set of n keys, search trees that
contain these keys could vary in height
anywhere from the best case scenario of a
perfectly balance binary tree which just
going to have logarithmic height to the
worst case of one of these link list like
chain which is going to be linear in the
number of keys n. And so just to remind
you the height of a search tree which is
also sometimes called the depth is just
the longest number of hops it ever take to
get to from a root to a leaf. So, in the
first search tree, here the height is two
and then the second search tree, the
height is four. If the search tree is
perfectly arranged with the number of
nodes essentially doubling at every level,
then the depth is you're going to run out
of nodes around the depth of log2n. And in
general, if you have a chain of n keys
that that's going to be n - 1 but we'll
just call it n amongst friends. So, now
that we understand the basic structure of
binary search trees, we can actually talk
about how to implement all of the
operations that they support. So, as we go
through most of the supported operations
one at a time, I'm just going to give you
a really high level description. It should
be enough for you to code up on
implementation if you want or as usual, if
you want more details or actual working
code, you can check on the web or in one
of the number of good programming or
algorithms textbooks. So, let's start with
really the primary operation which is
search. Searching, we've really already
discussed how it's done when we discuss
the search tree property. Again, the
search tree property makes it obvious how
to look for something in a search tree.
Pretty much you just follow your nose
you have no other choice. So, you started
the root, it's the obvious place to start.
If you're lucky, the root is what you are
looking for and then you stop and then you
return to root. More likely, the root is
either bigger than or less than the key
that you're looking for. Now, if the key
is smaller, the key you are looking for is
smaller than the key of the root, where
you're going to look? Well, the search
tree property says, if it's in the tree,
it's got to be in the left subtree so you
follow the left sub child pointer. If the
key you're looking for is bigger than the
key at the root, where is it got to be?
Got to be in the right subtree. So, you're
just going to recurse on the right
subtree. So, in this example, if you're
searching for, say the key two, obviously
you're going to go left from the root. If
you're searching for the key four,
obviously you're going to go right from
the root. So, how can the search
terminate? Well, it can terminate in one
of two ways. First of all, you might find
what you're looking for so in this
example, if you search for four, you're
going to traverse to right child pointer
then a left child pointer and then boom,
you're at the four and you return
successfully. The other way the search can
terminate is with a null pointer. So, in
this example, suppose you were looking for
a node with key six, what would happen?
Well, you start at the root, three is too
small so you go to the right. You get to
five, five is still too small cuz you're
looking for six so you try to go right but
the right child pointer is null. And that
means six is not in the tree. If there was
anywhere in the tree, it had to be to the
right of the three, it had to be to the
right of the five but you tried it and you
ran on the pointers so the six isn't
there. And you return correctly with an
unsuccessful search. Next, let's discuss
the insert operation which really is just
a simple piggy backing on the search that
we just described. So, for simplicity the
first think about the case where there are
no duplicate keys. The first thing to do
on this insertion is search for the key k.
Now, because there are no duplicates, this
search will not succeed. This key k is not
yet in the tree. So, for example, in the
picture on the right, we might think about
trying to insert the key six. What's going
to happen when we search for six, we
follow a right child pointer. We go from
three to five and then we try to spot
another one and make it stuck. There's a
null pointer going to the right of five.
Then when this unsuccessful search
terminates at a null pointer, we just rewire
that pointer to point to a node with this
new key k. So, if you want to permit
duplicates from the data structure, you
got to tweak the code and insert a little
bit but really barely at all. You just
need some convention for handling the case
when you do in counter the key that we are
about to insert. So, for example, if the
current note has the key equal to the one
you're inserting, you could have the
convention that you always continue on the
left subtree and then you continue the
search as usual again, eventually
terminating at a null pointer and you stick
the new inserted node you rewire to null
pointer to point to it. One good exercise
for you to think through which I'm not
going to say more about here is that when
you insert a new node, you retain the
search tree property. That is if you start
with the search tree, you start within
tree where at every node stuff to the left
is smaller, the stuff to the right is
bigger. You insert something and you
follow this procedure. You will still have
the search tree property after this new
node has been inserted. That's something
for you to think through.