home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Fred Fish Collection 1.5
/
ffcollection-1-5-1992-11.iso
/
ff_disks
/
400-499
/
ff473.lzh
/
CNewsSrc
/
cnews_src.lzh
/
man
/
dbz3z.man
< prev
next >
Wrap
Text File
|
1991-01-05
|
18KB
|
397 lines
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
NNNNAAAAMMMMEEEE
dbminit, fetch, store, dbmclose - somewhat dbm-compatible
database routines
dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
dbzsize, dbzincore, dbzdebug - database routines
SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
####iiiinnnncccclllluuuuddddeeee <<<<ddddbbbbzzzz....hhhh>>>>
ddddbbbbmmmmiiiinnnniiiitttt((((bbbbaaaasssseeee))))
cccchhhhaaaarrrr ****bbbbaaaasssseeee;;;;
ddddaaaattttuuuummmm
ffffeeeettttcccchhhh((((kkkkeeeeyyyy))))
ddddaaaattttuuuummmm kkkkeeeeyyyy;;;;
ssssttttoooorrrreeee((((kkkkeeeeyyyy,,,, vvvvaaaalllluuuueeee))))
ddddaaaattttuuuummmm kkkkeeeeyyyy;;;;
ddddaaaattttuuuummmm vvvvaaaalllluuuueeee;;;;
ddddbbbbmmmmcccclllloooosssseeee(((())))
ddddbbbbzzzzffffrrrreeeesssshhhh((((bbbbaaaasssseeee,,,, ssssiiiizzzzeeee,,,, ffffiiiieeeellllddddsssseeeepppp,,,, ccccmmmmaaaapppp,,,, ttttaaaaggggmmmmaaaasssskkkk))))
cccchhhhaaaarrrr ****bbbbaaaasssseeee;;;;
lllloooonnnngggg ssssiiiizzzzeeee;;;;
iiiinnnntttt ffffiiiieeeellllddddsssseeeepppp;;;;
iiiinnnntttt ccccmmmmaaaapppp;;;;
lllloooonnnngggg ttttaaaaggggmmmmaaaasssskkkk;;;;
ddddbbbbzzzzaaaaggggaaaaiiiinnnn((((bbbbaaaasssseeee,,,, oooollllddddbbbbaaaasssseeee))))
cccchhhhaaaarrrr ****bbbbaaaasssseeee;;;;
cccchhhhaaaarrrr ****oooollllddddbbbbaaaasssseeee;;;;
ddddaaaattttuuuummmm
ddddbbbbzzzzffffeeeettttcccchhhh((((kkkkeeeeyyyy))))
ddddaaaattttuuuummmm kkkkeeeeyyyy;;;;
ddddbbbbzzzzssssttttoooorrrreeee((((kkkkeeeeyyyy,,,, vvvvaaaalllluuuueeee))))
ddddaaaattttuuuummmm kkkkeeeeyyyy;;;;
ddddaaaattttuuuummmm vvvvaaaalllluuuueeee;;;;
ddddbbbbzzzzssssyyyynnnncccc(((())))
lllloooonnnngggg
ddddbbbbzzzzssssiiiizzzzeeee((((nnnneeeennnnttttrrrriiiieeeessss))))
lllloooonnnngggg nnnneeeennnnttttrrrriiiieeeessss;;;;
ddddbbbbzzzziiiinnnnccccoooorrrreeee((((nnnneeeewwwwvvvvaaaalllluuuueeee))))
ddddbbbbzzzzddddeeeebbbbuuuugggg((((nnnneeeewwwwvvvvaaaalllluuuueeee))))
Sun Microsystems Last change: 13 Oct 1990 1
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
These functions provide an indexing system for rapid random
access to a text file (the _b_a_s_e _f_i_l_e). Subject to certain
constraints, they are call-compatible with _d_b_m(3), although
they also provide some extensions. (Note that they are _n_o_t
file-compatible with _d_b_m or any variant thereof.)
In principle, _d_b_z stores key-value pairs, where both key and
value are arbitrary sequences of bytes, specified to the
functions by values of type _d_a_t_u_m, typedefed in the header
file to be a structure with members _d_p_t_r (a value of type
_c_h_a_r * pointing to the bytes) and _d_s_i_z_e (a value of type _i_n_t
indicating how long the byte sequence is).
In practice, _d_b_z is more restricted than _d_b_m. A _d_b_z data-
base must be an index into a base file, with the database
_v_a_l_u_es being _f_s_e_e_k(3) offsets into the base file. Each such
_v_a_l_u_e must ``point to'' a place in the base file where the
corresponding _k_e_y sequence is found. A key can be no longer
than DBZMAXKEY (a constant defined in the header file)
bytes. No key can be an initial subsequence of another,
which in most applications requires that keys be either
bracketed or terminated in some way (see the discussion of
the _f_i_e_l_d_s_e_p parameter of _d_b_z_f_r_e_s_h, below, for a fine point
on terminators).
_D_b_m_i_n_i_t opens a database, an index into the base file _b_a_s_e,
consisting of files _b_a_s_e....ddddiiiirrrr and _b_a_s_e....ppppaaaagggg which must already
exist. (If the database is new, they should be zero-length
files.) Subsequent accesses go to that database until
_d_b_m_c_l_o_s_e is called to close the database. The base file
need not exist at the time of the _d_b_m_i_n_i_t, but it must exist
before accesses are attempted.
_F_e_t_c_h searches the database for the specified _k_e_y, returning
the corresponding _v_a_l_u_e if any. _S_t_o_r_e stores the _k_e_y-_v_a_l_u_e
pair in the database. _S_t_o_r_e will fail unless the database
files are writeable. See below for a complication arising
from case mapping.
_D_b_z_f_r_e_s_h is a variant of _d_b_m_i_n_i_t for creating a new database
with more control over details. Unlike for _d_b_m_i_n_i_t, the
database files need not exist: they will be created if
necessary, and truncated in any case.
_D_b_z_f_r_e_s_h's _s_i_z_e parameter specifies the size of the first
hash table within the database, in key-value pairs. Perfor-
mance will be best if _s_i_z_e is a prime number and the number
of key-value pairs stored in the database does not exceed
about 75% of _s_i_z_e. (The _d_b_z_s_i_z_e function, given the
expected number of key-value pairs, will suggest a database
size that meets these criteria.) Assuming that an _f_s_e_e_k
Sun Microsystems Last change: 13 Oct 1990 2
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
offset is 4 bytes, the ....ppppaaaagggg file will be 4*_s_i_z_e bytes (the
....ddddiiiirrrr file is tiny and roughly constant in size) until the
number of key-value pairs exceeds about 80% of _s_i_z_e. (Noth-
ing awful will happen if the database grows beyond 100% of
_s_i_z_e, but accesses will slow down somewhat and the ....ppppaaaagggg file
will grow somewhat.)
_D_b_z_f_r_e_s_h's _f_i_e_l_d_s_e_p parameter specifies the field separator
in the base file. If this is not NUL (0), and the last
character of a _k_e_y argument is NUL, that NUL compares equal
to either a NUL or a _f_i_e_l_d_s_e_p in the base file. This per-
mits use of NUL to terminate key strings without requiring
that NULs appear in the base file. The _f_i_e_l_d_s_e_p of a data-
base created with _d_b_m_i_n_i_t is the horizontal-tab character.
For use in news systems, various forms of case mapping (e.g.
uppercase to lowercase) in keys are available. The _c_m_a_p
parameter to _d_b_z_f_r_e_s_h is a single character specifying which
of several mapping algorithms to use. Available algorithms
are:
0000 case-sensitive: no case mapping
BBBB same as 0000
NNNNUUUULLLL same as 0000
==== case-insensitive: uppercase and lowercase
equivalent
bbbb same as ====
CCCC RFC822 message-ID rules, case-sensitive before `@'
(with certain exceptions) and case-insensitive
after
???? whatever the local default is, normally CCCC
Mapping algorithm 0000 (no mapping) is faster than the others
and is overwhelmingly the correct choice for most applica-
tions. Unless compatibility constraints interfere, it is
more efficient to pre-map the keys, storing mapped keys in
the base file, than to have _d_b_z do the mapping on every
search.
For historical reasons, _f_e_t_c_h and _s_t_o_r_e expect their _k_e_y
arguments to be pre-mapped, but expect unmapped keys in the
base file. _D_b_z_f_e_t_c_h and _d_b_z_s_t_o_r_e do the same jobs but han-
dle all case mapping internally, so the customer need not
worry about it.
Sun Microsystems Last change: 13 Oct 1990 3
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
_D_b_z stores only the database _v_a_l_u_es in its files, relying on
reference to the base file to confirm a hit on a key.
References to the base file can be minimized, greatly speed-
ing up searches, if a little bit of information about the
keys can be stored in the _d_b_z files. This is ``free'' if
there are some unused bits in an _f_s_e_e_k offset, so that the
offset can be _t_a_g_g_e_d with some information about the key.
The _t_a_g_m_a_s_k parameter of _d_b_z_f_r_e_s_h allows specifying the
location of unused bits. _T_a_g_m_a_s_k should be a mask with one
group of contiguous 1111 bits. The bits in the mask should be
unused (0) in _m_o_s_t offsets. The bit immediately above the
mask (the _f_l_a_g bit) should be unused (0) in _a_l_l offsets;
(_d_b_z)_s_t_o_r_e will reject attempts to store a key-value pair in
which the _v_a_l_u_e has the flag bit on. Apart from this res-
triction, tagging is invisible to the user. As a special
case, a _t_a_g_m_a_s_k of 1 means ``no tagging'', for use with
enormous base files or on systems with unusual offset
representations.
A _s_i_z_e of 0 given to _d_b_z_f_r_e_s_h is synonymous with the local
default; the normal default is suitable for tables of 90-
100,000 key-value pairs. A _c_m_a_p of 0 (NUL) is synonymous
with the character 0000, signifying no case mapping (note that
the character ???? specifies the local default mapping, nor-
mally CCCC). A _t_a_g_m_a_s_k of 0 is synonymous with the local
default tag mask, normally 0x7f000000 (specifying the top
bit in a 32-bit offset as the flag bit, and the next 7 bits
as the mask, which is suitable for base files up to circa
24MB). Calling _d_b_m_i_n_i_t(_n_a_m_e) with the database files empty
is equivalent to calling _d_b_z_f_r_e_s_h(_n_a_m_e,_0,'\_t','?',_0).
When databases are regenerated periodically, as in news, it
is simplest to pick the parameters for a new database based
on the old one. This also permits some memory of past sizes
of the old database, so that a new database size can be
chosen to cover expected fluctuations. _D_b_z_a_g_a_i_n is a vari-
ant of _d_b_m_i_n_i_t for creating a new database as a new genera-
tion of an old database. The database files for _o_l_d_b_a_s_e
must exist. _D_b_z_a_g_a_i_n is equivalent to calling _d_b_z_f_r_e_s_h with
the same field separator, case mapping, and tag mask as the
old database, and a _s_i_z_e equal to the result of applying
_d_b_z_s_i_z_e to the largest number of entries in the _o_l_d_b_a_s_e
database and its previous 10 generations.
When many accesses are being done by the same program, _d_b_z
is massively faster if its first hash table is in memory.
If an internal flag is 1, an attempt is made to read the
table in when the database is opened, and _d_b_m_c_l_o_s_e writes it
out to disk again (if it was read successfully and has been
modified). _D_b_z_i_n_c_o_r_e sets the flag to _n_e_w_v_a_l_u_e (which
should be 0 or 1) and returns the previous value; this does
not affect the status of a database that has already been
Sun Microsystems Last change: 13 Oct 1990 4
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
opened. The default is 0. The attempt to read the table in
may fail due to memory shortage; in this case _d_b_z quietly
falls back on its default behavior. _S_t_o_r_es to an in-memory
database are not (in general) written out to the file until
_d_b_m_c_l_o_s_e or _d_b_z_s_y_n_c, so if robustness in the presence of
crashes or concurrent accesses is crucial, in-memory data-
bases should probably be avoided.
_D_b_z_s_y_n_c causes all buffers etc. to be flushed out to the
files. It is typically used as a precaution against crashes
or concurrent accesses when a _d_b_z-using process will be run-
ning for a long time. It is a somewhat expensive operation,
especially for an in-memory database.
If _d_b_z has been compiled with debugging facilities available
(which makes it bigger and a bit slower), _d_b_z_d_e_b_u_g alters
the value (and returns the previous value) of an internal
flag which (when 1; default is 0) causes verbose and cryptic
debugging output on standard output.
Concurrent reading of databases is fairly safe, but there is
no (inter)locking, so concurrent updating is not.
The database files include a record of the byte order of the
processor creating the database, and accesses by processors
with different byte order will work, although they will be
slightly slower. Byte order is preserved by _d_b_z_a_g_a_i_n. How-
ever, agreement on the size and internal structure of an
_f_s_e_e_k offset is necessary, as is consensus on the character
set.
An open database occupies three _s_t_d_i_o streams and their
corresponding file descriptors; a fourth is needed for an
in-memory database. Memory consumption is negligible
(except for _s_t_d_i_o buffers) except for in-memory databases.
SSSSEEEEEEEE AAAALLLLSSSSOOOO
dbz(1), dbm(3)
DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
Functions returning _i_n_t values return 0 for success, -1 for
failure. Functions returning _d_a_t_u_m values return a value
with _d_p_t_r set to NULL for failure. _D_b_m_i_n_i_t attempts to have
_e_r_r_n_o set plausibly on return, but otherwise this is not
guaranteed. An _e_r_r_n_o of EEEEDDDDOOOOMMMM from _d_b_m_i_n_i_t indicates that
the database did not appear to be in _d_b_z format.
HHHHIIIISSSSTTTTOOOORRRRYYYY
The original _d_b_z was written by Jon Zeeff (zeeff@b-
tech.ann-arbor.mi.us). Later contributions by David Butler
and Mark Moraes. Extensive reworking, including this docu-
mentation, by Henry Spencer (henry@zoo.toronto.edu) as part
Sun Microsystems Last change: 13 Oct 1990 5
DBZ(3Z) MISC. REFERENCE MANUAL PAGES DBZ(3Z)
of the C News project. Hashing function by Peter Honeyman.
BBBBUUUUGGGGSSSS
The _d_p_t_r members of returned _d_a_t_u_m values point to static
storage which is overwritten by later calls.
Unlike _d_b_m, _d_b_z will misbehave if an existing key-value pair
is `overwritten' by a new (_d_b_z)_s_t_o_r_e with the same key. The
user is responsible for avoiding this by using (_d_b_z)_f_e_t_c_h
first to check for duplicates; an internal optimization
remembers the result of the first search so there is minimal
overhead in this.
Waiting until after _d_b_m_i_n_i_t to bring the base file into
existence will fail if _c_h_d_i_r(2) has been used meanwhile.
The RFC822 case mapper implements only a first approximation
to the hideously-complex RFC822 case rules.
The prime finder in _d_b_z_s_i_z_e is not particularly quick.
Should implement the _d_b_m functions _d_e_l_e_t_e, _f_i_r_s_t_k_e_y, and
_n_e_x_t_k_e_y.
On C implementations which trap integer overflow, _d_b_z will
refuse to (_d_b_z)_s_t_o_r_e an _f_s_e_e_k offset equal to the greatest
representable positive number, as this would cause overflow
in the biased representation used.
_D_b_z_a_g_a_i_n perhaps ought to notice when many offsets in the
old database were too big for tagging, and shrink the tag
mask to match.
Sun Microsystems Last change: 13 Oct 1990 6