Sunday, March 20, 2016
On 7:36:00 AM by your education in Algorithm No comments
the Balanced Binary Search Tree. Like our discussion of
other data structures we'll begin with the what. That is we'll take the
client's perspective and we'll ask what operations are supported by this data structure,
what can you actually use it for? Then we'll move on to the how and the why.
We'll peer under the hood of the data structure and look at how it's actually
implemented and then understanding the implementation to understand why the
operations have the running times that they do. So what is a Balanced Binary
Search Tree good for? Well, I recommend thinking about it as a dynamic version
of a sorted array. That is, if you have data store in a Balanced Binary Search
Tree, you can do pretty much anything on the data that you could if it was just
the static sorted array. But in addition, the data structure can accommodate
insertions and deletions. You can accommodate a dynamic set of data that you're
storing overtime. So to motivate the operations that a Balanced Binary Search Tree
supports, let's just start with the sorted array and look at some of the things
you can easily do with data that happens to be stored in such a way. So let's
think about an array that has numerical data although, generally as we've said,
in data structures is usually associated other data that's what you actually
care about and the numbers are just some unique identifier for each of the
records. So these might be an employee ID number, social security numbers,
packet ID numbers and network contacts, etcetera. So what are some things that
are easy to do given that your data is stored as a sorted array, most a bunch
of things? First of all, you can search and recall that searching in a sorted
array I generally done using binary search so this is how we used to look up
phone numbers when we have physical phone books. You'd start in the middle of
the phone book, if the name you were looking for was less than the midpoint,
you recurse on the left hand side, otherwise you'd recurse on the right hand
side. As we discussed back in the Master Method Lectures long ago, this is going
to run in logarithmic time. Roughly speaking, every time you recurse, you've thrown
out half of the array so you're guaranteed to terminate within a logarithmic
number of iterations so binary search is logarithmic search time. Something
else we discussed in previous lectures is the selection problem. So previously,
we discussed this in much harder context of unsorted arrays. Remember, the
selection problem in addition to array you're given in order statistic. So, if
your order statistic that your target is seventeen, that means you're looking
for the seventeenth smallest number that's stored in the array. So in previous
lectures, we worked very hard to get a linear time algorithm for this problem
in unsorted arrays. Now, in a sorted array, you want to know the seventeenth
smallest element in the array. Pretty easy problem, just return whatever element
happens to be in the seventeenth position of the array since the array is
sorted, that's where it is so no problem. It's already sorted constant time,
you can solve the selection problem. Of course, two special cases of the
selection problem are finding the minimum element of the array. That's just if
the order statistic problem with i = 1and the maximum element, that's just i =
n. So this just corresponds to returning the element that's in the first
position and the last position of the array respectively. Well let's do some
more brainstorming. What other operations could we implement on a sorted array?
Well here's a couple more. So there are operations called the Predecessor and
Successor operations. And so the way these work is, you start with one element.
So, say you start with a pointer to the 23, and you want to know where in this
array is the next smallest element. That's the predecessor query and the successor
operation returns the next largest element in the array. So the predecessor of
the 23 is the seventeen, the successor of the 23 would be the 30. And again in
a sorted array, these are trivial, right? You just know that predecessors just
one position back in the array, the successor is one position forward. So given
a pointer to the 23, you can return to 17 or the 30 in constant time. What
else? Well, how about the rank operation? So we haven't discussed this
operation in the past. So what rank is, this has for how many key stored in the
data structure are less than or equal to a given key. So for example, the rank
of 23 would be equal to 6. Because 6 of the 8 elements in the array are less than or equal to 23. And if you think about
it, implementing the rank operation is really no harder than implementing
search. All you do is search for the given key and wherever it is search
terminates in the array. You just look at the position in the array and boom,
that's the rank of that element. So for example, if you do a binary search for
23 and then when you terminates, you discover it is, they're in position number
six then you know the rank is six. If you do an unsuccessful search, say you search
for 21, well then you get stuck in between the 17 and the 23, and at that point
you can conclude that the rank of 21 in this array is five. Let me just wrap up
the list with the final operation which is trivial to implement in the sorted
array. Namely, you can output or print say the stored keys in sorted order
let's say from smallest to largest. And naturally, all you do here is a single scan
from left to right through the array, outputting whatever element you see next.
The time required is constant per element or linear overall. So that's a quite impressive
list of supported operations. Could you really be so greedy as to want still
more from our data structure? Well yeah, certainly. We definitely want more than
just what we have on the slide. The reason being, these are operations that operate
on a static data set which is not changing overtime. But the world in general is
dynamic. For example, if you are running a company and keeping track of the employees,
sometimes you get new employees, sometimes employees leave. That is one of the
data structure that not only supports these kinds of operations but also,
insertions and deletions. Now of course it's not that it's impossible to
implement insert or delete in a sorted array, it's just that they're going to
run way too slow. In general, you have to copy over a linear amount of stuff on
an insertion or deletion if you want to maintain the sorted array property. So
this linear time performance when insertion and deletion is unacceptable unless
you barely ever do those operations.
So, the raison d'etre of the Balanced Binary Search Tree is to
implement this exact same set of operations just as rich as that's supported by
a sorted array but in addition, insertions and deletions. Now, a few of these
operations won't be quite as fast or we have to give up a little bit instead of
constant time, the one in logarithmic time and we still got logarithmic time
for all of these operations, linear time
for outputting the elements in sort of order plus, we'll be able to insert and
delete in logarithmic time so let me just spell that out in a little more
detail. So, a Balanced Binary Search Tree will act like a sorted array plus, it
will have fast, meaning logarithmic time
inserts and deletes. So let's go ahead and spell out all of those operations.
So search is going to run in O(log n) time, just like before. Select runs in
constant time in a sorted array and here it's going to take logarithmic, so we'll
give up a little bit on the selection problem but we'll still be able to do it
quite quickly. Even on the special cases of finding the minimum or finding the
maximum in our, in our data structure, we're going to need logarithmic time in
general. Same thing for finding predecessors and successors they're not, they're
no longer constant time, they go with logarithmic. Rank took as logarithmic time
and the, even the sorted array version and that will remain logarithmic here.
As we'll see, we lose essentially nothing over the sorted array, if we want to
output the key values in sorted order say from smallest to largest. And crucially,
we have two more fast operations compared to the sorted array of data
structure. We can insert stuff so if you hire a new employee, you can insert them
into your data structure. If an employee decides to leave, you can remove them
from the data structure. You do not have to spend linear time like you did for sort
of array, you only have to spend the logarithmic time whereas always n is the number
of keys being stored in the data structure. So the key takeaway here is that,
if you have data and it has keys which come from a totally ordered set like,
say numeric keys, then a Balanced Binary Search Tree supports a very rich collection
of operations. So if you anticipate doing a lot of different processing using
the ordering information of all of these keys, then you really might want to
consider a Balanced Binary Search Tree to maintain them. Well then, keep in
mind though is that we have seen a couple of other data structures which don't
do quite as much as balanced binary search trees but what they do, they do better.
We already, we just discussed in the last slide of the sorted array. So, if you
have a static data set, you don't need inserts and deletes. Well then by all means,
don't bother with Balanced Binary Search Tree that use a sorted array because
it will do everything super fast. But, we also sought through dynamic data structures
which don't do as much but do it, but what they do, they do very well. So, we
saw a heap, so what the heap is good for is it's just as dynamic as a search
tree. It allows insertions and deletions both in logarithmic time. And in addition,
it keeps track of the minimum element or the maximum element. Remember in a
heap, you can choose whether you want to keep track of the minimum or keep
track of the maximum but unlike in a search tree, a heap does not
simultaneously keep track of the minimum and the maximum. So if you just need
those three operations, insertions, deletions and remembering the smallest, and
this would be the case for example in a priority queue or scheduling application
as discussed in the heap videos. Then, a Binary Search Tree is over kill. You
might want to consider a heap instead. In fact, the benefits of a heap don't
show up in the big O notation here both have logarithmic operation time but the
constant factors both in space and time are going to be faster with a heap then
with a Balanced Binary Search Tree. The other dynamic data structure that we discussed
is a hash table. And what hash tables are really, really good at is handling
insertions and searches, that is look ups. Some, sometimes, depending on the implementation also handle deletions really
well also. So, if you don't actually need to remember things like minima,
maxima or remember ordering information on the keys, you just have to remember
what's there and what's not. Then the data structure of choice is definitely the
hash table, not the balance binary search tree. Again, the Balance Binary
Search Tree would be fine and we'd give you logarithmic look up time but it's
kind of over kill for the problem. All you need is fast look ups. A hash table
recall will give you constant time look ups. So that will be a noticeable win
over the Balanced Binary Search Tree. But if you want a very rich set of
operations for processing your data. Then, the Balanced Binary Search Tree could
be the optimal data structure for your needs.
Subscribe to:
Post Comments (Atom)
Search
Popular Posts
-
Do you know when and how to use basic data structures is an essential skill for the serious programmer. Data structures are used in pretty ...
-
the Balanced Binary Search Tree. Like our discussion of other data structures we'll begin with the what. That is we'll take t...
-
there may be distinct cuts which are tied for the fewest number of crossing edges. For a concrete example you could think ...
Blog Archive
Powered by Blogger.
0 comments:
Post a Comment