home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1992 December
/
simtel1292_SIMTEL_1292_Walnut_Creek.iso
/
msdos
/
database
/
trilogy.arc
/
TRILOGY.DOC
< prev
next >
Wrap
Text File
|
1988-04-04
|
23KB
|
367 lines
TRILOGY - A Note Retrieval System
B. J. Ball
3304 Glen Rose Dr.
Austin, Texas 78731
This program was initially developed as an aid to authors
who, during the preliminary stages of writing a book, may
accumulate several hundred individual notes dealing with
particular aspects of the general subject to be written about.
During the actual writing phase, the question inevitably arises
of locating, as quickly as possible, all notes dealing with the
specific topics currently under consideration. If the notes
are kept in a computer file with each note having a header
indicating the kinds of things it deals with, the program will
allow the rapid retrieval of all notes dealing with a
specified Boolean combination of topics. Since the keywords
used in the note headers are likely to be dependent on the
particular project, and not easily characterized in advance,
use of a standard data base program might be inconvenient. Also
the fact that the notes may range in length from a paragraph or
two to several pages would be a nuisance in a program requiring
a fixed (or maximum) record length to be specified at the
outset.
Another obvious use for such a program as this is keeping
track of business or personal correspondence. Since most
letters fall into several different classifications, it is
sometimes difficult to know exactly where a particular one was
filed. More importantly, when all letters relating to one or
more specified topics need to be collected together, it would
be convenient to have a better procedure than an exhaustive
search of all the likely places. Using TRILOGY, it would only
be necessary to give each letter a header consisting of
keywords describing its relevant categories, after which all
letters having a given combination of keywords in their headers
could easily be retrieved.
The program requires an IBM PC or compatible with at
least 128K available memory . A hard disk, while helpful, is by
no means necessary.
To use TRILOGY, one starts with a text file consisting of
a collection of "notes"; each note must be immediately preceded
by an "identifier", which is just a sequence of keywords
suggesting the content of the note. In order for the program to
recognize identifiers as opposed to the text of notes, it is
required that each identifier be enclosed in distinctive
symbols which are not used anywhere else. (The file also should
not use a tilde (~) since the program uses this symbol as an
"end-of-text" marker.) By changing a configuration file, the
user may select the header markers to be used, choose the
delimiters for separating keywords in headers, specify which
symbols other than letters and numbers will be allowed in
keywords and determine whether keywords will be automatically
capitalized or allowed to contain lower case letters. Color
monitor users may also choose the foreground, background and
border colors desired. (In default mode, headers are to be
enclosed in curly braces {...} and keywords in headers may be
separated by spaces, commas, semicolons, or any other symbol
not eligible to be part of a keyword. Also, in default mode,
all letters in keywords, whether originally entered as capitals
or lower case, are converted to capitals by the program;
keywords may contain apostrophes, hyphens or the underline
character (as a space substitute), but may not contain spaces.
For color monitor users, the default settings are white text on
a blue background. The accessory program, TRIL-SET, allows the
creation of a configuration file with different default values
from these.) It is very important that the program be told
whether or not a color/graphics card is being used, since there
is a machine-language subroutine for fast screen printing which
uses this information, and which will hang up the machine if
this data is wrong. If there is no configuration file available
to the program, the default values listed above will be used,
but you will first be asked whether you are using a color card.
Keywords must contain at least four characters, but there
is no formal limitation on the maximum length - only the first
eighteen characters of each keyword will be printed in the
keyword list, but all characters are relevant in a search.
There is a limitation (160 characters) on the length of the
total identifier for a note, however, in order to help detect
the fairly common error of failing to put a closing } at the
end of an identifier. (If this happens, the program will try to
read all the rest of the that note as part of its identifier,
probably exceeding the 160 character limit in the attempt,
whereupon processing will cease, with the suggestion that the
source file be checked for a missing `}'. Although exceeding
the maximum header length is a "Fatal Error", in the sense that
the program will not continue after encountering it, nothing
that has been done up to this point will be lost.)
Operation of the Program
Given a Source file satisfying the conditions stated
above, TRILOGY will create three additional files - an
Identifiers file containing a list of all the note headers from
the source file, a Keywords file containing an alphabetized
list of all keywords appearing in the headers, and a Records
file, containing the individual notes. These three files may
then be used to isolate quickly all notes whose headers contain
a specified combination of keywords. The retrieved notes may
be displayed on the screen, printed out on the printer, deleted
altogether, or sent to a file. If a note is sent to a file, it
can then be edited (using a word processor, after exiting
TRILOGY) and can later be replaced in the Records file from
which it came. In fact, it is possible to replace any note with
an arbitrary one contained on the default disk, and thus
information can be added to or removed from a note at will.
Provision is also made for "updating" an existing collection of
Trilogy files; i.e., adding additional notes, headers and
keywords from a new Source File.
The TRILOGY program and the desired Source File must
initially be accessible to the program but need not be on the
default drive/directory; the Identifiers, Keywords and Records
files, however, are always placed on the default drive when
created by TRILOGY, and must be there when used. As usual,
"filespec" denotes the filename, preceded by the drive
identification if different from the default drive, and
followed by the filename extension, if any is used. For
convenience, the program adopts a single "generic" name for all
files related to a given source file - if the original file is
A:SOURCE.BAS, the generic name is "SOURCE" and the Identifiers,
Keywords and Records files are automatically named SOURCE.ID,
SOURCE.KW and SOURCE.REC.
When the program starts, you are presented with an
abbreviated menu, since there is not much you can do until some
Trilogy files have been created. The creation process will take
about one minute for every 10K bytes in the Source File - 5
minutes for a 50K file, 10 minutes for a 100K file, etc. (These
times are considerably less with a hard disk and are
approximately halved if a Ramdisk is used. The time may also
vary considerably depending on the average length of the notes
and other factors.) Once a set of Trilogy files has been
created - or selected, if previously created - the menu is
expanded and the "Active" files are listed at the bottom. You
may now choose to have the keywords or the identifiers printed,
on the screen or on the printer. And you may also now use the
"Findnote" feature, which is the primary purpose of the
program.
In Findnote, you will first be asked to enter either a
"Search String" or a "Direct Command" both of which require
some explanation.
A Search String is a Boolean combination of OR's and
AND's of Keywords, with OR taking precedence over AND, contrary
to the usual convention. The only construction recognized is
(xxx OR xxx OR ...) AND (xxx OR xxx OR ...) AND ... AND ... ;
here the xxx's represent keywords, of which at least one must
be present. Parentheses are optional, and punctuation (except
for symbols which are allowed in keywords) will be ignored. The
Boolean operators "AND" and "OR" must be set off by spaces,
but need not be capitalized. (The program will supply
parentheses to enforce the prescribed logical form - e.g., an
entry such as "(x AND y) OR z" will be changed to "x AND (y OR
z)", not at all what was wanted; to search for the former
combination, use "(x OR z) AND (y OR z)". It would, of course,
be possible to include a complete Boolean parsing routine, so
that all combinations would be directly recognized, but that
seems to be more trouble than it is worth. Giving OR
precedence over AND and using the simple parsing scheme
indicated above appears to handle the majority of desired
searches, and the remainder present the user with interesting
logical problems. The Boolean operator NOT has not been
implemented, simply because it seems unnecessary.)
Rather than entering a Search String, you may choose to
give a Direct Command, identifying the desired notes by their
positions in the original file (which may be read from an
Identifiers list printed out by the program). You may enter a
list of note numbers, a range of numbers, or a combination of
these and may optionally end the command with one of the
letters D,P,F,X,R to have the notes immediately Displayed,
Printed, Filed, Deleted or Replaced. Direct Commands must start
with the symbol "#" to distinguish them from Search Strings.
Examples of Direct Commands are :
#2,5,7-10,23 P to print out notes 2,5,7,8,9,10,23
#6-8x to delete notes 6,7,8
#5d to display note 5
In a Direct Command, spaces are irrelevant and invalid entries
(including any letter not at the end of the command) will be
ignored. Numbers larger than the total number of notes will be
replaced by that total number. (This gives an easy way to
operate on all notes in the current file, by using in a direct
command a number known to be larger than the number of notes
present.) The letter suffix, if used, may be upper or lower
case; if a Direct Command has no letter suffix, the specified
notes will be treated just as if they had been discovered
through a search. Entering a question mark at the "Enter Search
String or Direct Command" prompt will call up a help screen,
which briefly describes the formats for Search Strings and
Direct Commands. It should perhaps be noted that sending a note
to a disk file does not automatically delete that note from the
active Trilogy files; the X suffix on a separate command is
necessary to delete notes. Since there is no provision for
recovering deleted notes, deletion should be done with care and
only on backed-up files.
The strength of this program is not in its handling of
Direct Commands, since any database program can do things to
records whose numbers are specified by the user, but in its
ability to quickly find all notes of a specified kind, through
a Search String entry. In the Findnote routine the program will
locate those notes whose headers contain a specified
combination of keywords. Normally an exact match with a keyword
is required in any search, but this may be changed by the use
of the wildcard symbol * : a search for MAN will not find
MANKIND, but a search for MAN* will; similarly, *MAN would find
HUMAN and *MAN* would find both HUMAN and MANKIND (as well as
MISMANAGEMENT,...).
If you have chosen to allow keywords to contain lower
case letters, all searches will be case-sensitive; otherwise
(with the default settings, for example) you may use mixed case
freely in entering search strings, since all letters will
automatically be converted to upper case.
Once a search string has been entered, all notes whose
headers contain the specified combination of keywords will be
located in a very few seconds. The screen will be cleared and
the Search String used by the program will be displayed,
followed by the identifiers of the located notes. You now have
the option of choosing a note and either displaying it on the
screen, printing it out on the printer, sending it to a file,
deleting it from the Active files or replacing it with a note
from the default drive. (If a Direct Command had been entered
without a letter suffix, the identifiers for the specified
notes would have been listed as above, and the same options
would be available.) It is possible to display, print, delete
or file several of the listed notes as one operation, by
entering the analog of a Direct Command. Here the "#" prefix
indicates that the numbers of the notes in the original file
are used, just as before; if a list or range of numbers is
entered here without using the "#" prefix, the numbers will be
interpreted as the positions of the notes in the list of
located notes (the result of the previous search). Both note
numbers appear on the screen, with the original file numbers in
parentheses. If the # prefix is notused and a number is
entered which is greater than the number of notes in the
current list, no action is taken and you are asked to reenter
your choice; otherwise the Direct Command rules apply, with
numbers that are too large being replaced by the total number
of notes in the active files. Also, with the # prefix, you are
not limited to only those notes in the current list, but may
enter an arbitrary Direct Command.
Before printing any notes, you should make sure the paper
is located with the printer's "top-of-form" at the top of a
page; the program counts lines to avoid printing on
perforations, but this does no good unless the printer knows
where the top of the page is, and starts there. When a sequence
of notes is being printed, new pages are begun only when
necessary, not at the begining of each note. You will be asked
at the start of each printout whether you wish to begin a new
page with this note or sequence of notes. There are no error
traps for an off-line printer, so to avoid a bomb-out, be sure
the printer is ready when the program wants it. (You are always
given an opportunity to check that the printer is ready, but if
you make a mistake about this, you will have the fun of
starting all over.)
There are two reasons for the File option - you may wish
to write a note to a disk file in order to correct or modify it
and then replace it in its original Trilogy file, or you may
wish to separate off a subgroup of notes from the original file
into a separate reference file, perhaps to be used later as the
Source File for another application of TRILOGY.
The "modify and replace" procedure - saving a note,
exiting TRILOGY, using a word processor on the offending note,
then reloading TRILOGY and replacing the note - is obviously a
rather cumbersome procedure. Including even simple editing
facilities, however, would make the program unacceptably large.
(A resident word processor like Sidekick, or any of several
commercially available multi-tasking or memory-partitioning
programs will greatly simplify the modify and replace process
by making both TRILOGY and your word processing program
immediately available.) It should be noted that the "R" option
can be applied to only one note at a time. When this option is
used, however, the replacement note can be a virtually
arbitrary file, not necessarily one which was previously
extracted from the current Trilogy files. The only restriction
is that any file used as a replacement for a TRILOGY note must
include a header in the appropriate form.
As mentioned, you may create a subcollection of the
original note collection by sending a sequence of notes to a
single disk file. This may be done all at one time or, since
saved notes are appended to the target file rather than
overwriting it, notes may be added to the subcollection
individually or in small groups, just by specifying the same
file name for each them. If you have deleted a significant
number of notes, you may wish to save all the remaining ones to
use as a Source File for a later application of TRILOGY, since
the deletion process does not automatically reduce the size of
the Records file.
When a note is deleted from the Active files by using the
X command, all keywords which occur only in the header of the
deleted note will be removed from the keyword list. Similarly,
if the header of a replacement note is different from that of
the original, the keyword list will be updated appropriately.
It might be convenient not to make the entire Source File
at once,and the Update facility was included to allow creation
of large Trilogy files from a series of separately composed,
smaller, source files. Only the currently Active files can be
updated, and it should be borne in mind that the updating
process adds all the notes from the "Update File" (i.e. the
Source File for this operation) to the existing Trilogy files;
therefore the Update File should contain only new information.
Also, of course, the Update File must be in the same form as an
original Source File - a sequence of notes each preceded by an
identifier delimited by the chosen header markers.
Whenever the program is waiting for input from you, you
may return to the menu by pressing the Esc key. This might be
useful if, for example, the program wants you to enter a
filename and you have forgotten what's available and want to
use Option 1 to find out, or if you simply change your mind and
want to cancel the operation you've started. Most single-
character entries do not require that the Enter key be pressed,
but all multiple entries do; generally, pressing Enter alone
when a multiple-character entry is expected will return you to
the menu.
Overall, error trapping is pretty good, but as mentioned
earlier, there are no error traps for an off-line printer;
these were originally included, but turned out to be so
unreliable (on my machine, at least) that eliminating them
entirely seemed best. (The problem was that the error would be
trapped the first time it occurred, and possibly the second
time, but not every time.)
The arrays used in the program are dimensioned to allow a
total of 500 notes, 1500 distinct keywords and 2000 records
(corresponding to a Source File of about 500K). Search Strings
are arbitrarily limited to one screen line (79 bytes), and to a
maximum of 10 AND phrases, each containing at most 10 OR's.
(The program will not allow you to exceed the specified limits,
but no information will be lost by an idadvertent attempt to do
so.)
This program should be useful to anyone involved in
research of almost any kind, since in any research, a great
deal depends on having readily at hand a large variety of
information on particular aspects of the subject. Often,
perhaps usually, creating the original Source File is a natural
part of the procedure, especially if a computer file is
ordinarily used. Everything else is done automatically by the
program.