home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Interactive Guide
/
c-cplusplus-interactive-guide.iso
/
c_ref
/
csource3
/
155_01
/
readme
< prev
next >
Wrap
Text File
|
1987-10-09
|
16KB
|
396 lines
B-Tree Library
Developed by
Ray Swartz
The programming routines that make up this b-tree library
are hereby put into the public domain. Any use, either
private or commercial, is allowed upon the condition that
a notice appear within the program documentation or on the terminal
screen (during execution of the program)
that these b-tree routines were developed by Ray Swartz. Such notice
must be clear and obvious, appearing at the beginning of written documentation
or displayed on the terminal screen during the initial phases of the program.
Use of these routines without proper attribution is not allowed.
This violates the spirit within which these routines were
developed and put into the public domain. The author reserves the right
to charge royalties of 10% of the gross revenue of any computer program
using these routines WITHOUT the required notice.
These routines create and manage a balanced binary tree.
The advantages of these trees are well-known and will not be described
here. However, for an excellent discussion of this, and a number
of related topics, see D. Knuth, The Art of Computer Programming -
Volume 3 (Sorting and Searching) pages 422 - 471. This book is
published by Addison-Wesley, ISBN: 0-201-03803-X.
The keyfile created to do this has two distinct parts.
The first part holds the header information on the keyfile itself.
This takes up the first 19 (DATA_LENGTH) bytes of the file and
are arranged as follows:
1) A Tilde (~) - used to denote a keyfile
2) The length of this file's key field (2-byte integer (ascii))
3) Node number of next available key node (5-byte integer (ascii))
4) Node number of first node in list (5-byte integer (ascii))
5) Number of non-deleted nodes in the list (5-byte integer (ascii))
6) A Tilde
This information is stored in the structure KEYINFO which is declared
in the file BTREE.H.
The second part of a keyfile is the key nodes. Each node holds:
1) A delete flag (YES if deleted, NO if active) (2 bytes)
2) A balance factor (-1 if unbalanced to left, 0 if balanced,
1 if unbalanced to right). (2 bytes)
3) The node number just to the left of this one. (5 bytes)
4) The node number just to the right of this one. (5 bytes)
5) The record number of the data containing this key. (5 bytes)
6) The key this node references (length depends on keyfile initialization)
This information is stored in the structure NODE which is declared in the
file BTREE.H.
Before information can be stored in a keyfile, the file must
be initialized by the TREEINIT program included with this library.
The user must declare the length of the keys this file will hold.
This length is a maximum that must remain constant. Changing the
size of the key requires rebuilding the tree stored in the keyfile.
No provisions are made for managing the data records referenced by
these keys. This is the responsibility of the programmer using
these routines.
The heart of this system is the INSERT routine which
not only inserts new keys into the tree but maintains the tree's
balance as well. This routine, in it entirety, was implemented by
following the algorithm in Knuth, The Art of Computer Programming
Volume 3, Pages 455 - 457. No attempt should be made to modify
the INSERT function without a thorough understanding of this material.
The b-tree library consists of 3 program files:
BTREE1.C
BTREE0.C
TERMCTRL.C.
The file BTREE1.C contains the INSERT, DELETE_KEY, GET_NEXT,
GET_PREVIOUS, and FIND_KEY functions. BTREE0.C contains support
functions called by BTREE1.C. TERMCTRL.C contains all the routines
that use functions that directly manipulate the terminal.
Generally, these terminal control functions are not required for
proper usage, however the routines will not work without them.
They are included for 2 reasons. First, when the
routines were written, I integrated them into the program. Removing them
would be a pain. Second, once the appropriate changes are made to the
CURSOR routine (addressing the cursor) and the CLEAR_END_OF_PAGE macro
(in BTREE.H), an elementary screen driver is enabled for whatever terminal
is used.
To create a linking library, these files should be organized so
that BTREE1 can reference routines in BTREE0 and TERMCTRL and those
in BTREE0 can reference routines in TERMCTRL.
Inserting keys into a balanced binary tree is a straight
forward process. Deleting keys from such a tree and maintaining
the balance is another thing entirely. In fact, deleting tree nodes
may require several adjustments to maintain tree balance. In an
attempt to avoid all the complexities introduced by performing
true node deletion, I opted for a conceptually simpler approach.
Instead of actually deleting nodes from the tree, I mark them as
deleted without removing them from the tree. In this way, the tree's
structure never changes except during insertion. A "deleted" node
remains in the tree except it can't be "found."
Like all shortcuts, this one has a cost. As more and more
nodes are deleted, the search to find an "active" node takes longer
since an increase number of nodes are being read but "ignored."
However, searching a binary tree is quite efficient. Even if 10%
of a tree was deleted, the search would not take much longer in
real time. The obvious solution is to re-build the tree when the
performance slows unacceptability due to an overabundance of deleted nodes.
This set of routines uses three tricks. The first one is relying on
someone else's algorithm to design the insertion function. The second
one is to mark deleted nodes in place of actually removing them from the
tree. The last one is
the use of a stack to find the next and previous nodes from the current
one. I call this a trick because how it works is not obvious.
The problem is this: Given that we have searched for and found
a specific node, how do we get to the key that is just below
(previous) or just above (next) this one? Consider the balanced
binary tree shown below that has 15 nodes - all active:
8
/ \
/ \
4 12
/ \ / \
2 6 10 14
/ \ / \ / \ / \
1 3 5 7 9 11 13 15
Suppose node 12 is the current node and we want to find the next larger
key. That key is the one at the end of the left branch of the node
one to the right of the current node - in this case, node 13. The
exact opposite is true of the key that is the next smaller or previous
node. It is the one at the end of the right branch of the node
one to the left of the current node - in this case node 11. This is the
easy possibility. What happens if we go the the next one - making
node 13 the current node - and now want to know what the next node is?
The problem is that node 13 is the end of a tree branch and we
can't go further "down" the tree. Instead, we must go "up" a node
to 14. How do we keep treck of which node is up one from here?
We do this by building a stack as we look through the tree. In
reality, we use two identical stacks - a right one and a left one.
In the right stack we "push" values each time we move down and to the
right in the tree. We do the same on the left stack when we move
down and to the left. As an example, to search for node 13, we begin
at the head of the list (node 8) and move to the right (since 13 > 8).
After moving right, we push 8 onto the right stack. At 12 we again move
right (13 > 12) to node 14. We then push 12 onto the right stack.
At 14 we move left (13 < 14). This time we push 14 onto the left stack
since we moved left here. The search now terminates since we are at node
13. The stacks look like:
Left Right
14 8
12