home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Source Code 1992 March
/
Source_Code_CD-ROM_Walnut_Creek_March_1992.iso
/
usenet
/
altsrcs
/
3
/
3584
< prev
next >
Wrap
Text File
|
1991-07-02
|
9KB
|
215 lines
Newsgroups: alt.sources
From: goer@ellis.uchicago.edu (Richard L. Goerwitz)
Subject: kjv browser, part 11 of 11
Message-ID: <1991Jul3.065346.28525@midway.uchicago.edu>
Date: Wed, 3 Jul 1991 06:53:46 GMT
---- Cut Here and feed the following to sh ----
#!/bin/sh
# this is bibleref.11 (part 11 of a multipart archive)
# do not concatenate these parts, unpack them in order with /bin/sh
# file README.rtv continued
#
if test ! -r _shar_seq_.tmp; then
echo 'Please unpack part 1 first!'
exit 1
fi
(read Scheck
if test "$Scheck" != 11; then
echo Please unpack part "$Scheck" next!
exit 1
else
exit 0
fi
) < _shar_seq_.tmp || exit 1
if test ! -f _shar_wnt_.tmp; then
echo 'x - still skipping README.rtv'
else
echo 'x - continuing file README.rtv'
sed 's/^X//' << 'SHAR_EOF' >> 'README.rtv' &&
Xvalues for. In our hypothesized scenario, you would want makeind to
Xstore the max value for the verse field for every chapter of every
Xbook in the Bible. The verse field (field #3), in other words, is
Xyour "rollover" field, and would be passed to makeind using the -l
Xoption. Assuming "kjv" to be the name of your indexable biblical
Xtext, this set of circumstances would imply the following invocation
Xfor makeind:
X
X makeind -f kjv -m 176 -n 3 -l 3
X
XIf you were to want a case-sensitive index (not a good idea), you
Xwould add "-s" to the argument list above (the only disadvantage a
Xcase-insensitive index would bring is that it would obscure the
XLord/lord, and other similar, distinctions).
X Actual English Bible texts usually take up 4-5 megabytes.
XIndexing one would require at over twice that much core memory, and
Xwould take at least an hour on a fast machine. The end result would
Xbe a set of data files occupying about 2 megabytes plus the 4-5
Xmegabytes of the original file. The Bible is hardly a small book.
XOnce these data files were created, they could be moved, along with
Xthe original source file, to any platform you desired.
X Having indexed, and having moved the files to wherever you
Xwanted them, you would then be ready for step 3.
X
X
X--------
X
X
XStep 3: Writing a Program to Access Indexed Files
X
X When accessing text files such as the Bible, the most useful
Xunit for searches is normally the word. Let us suppose you are a
Xzealous lay-speaker preparing a talk on fire imagery and divine wrath
Xin the Bible. You would probably want to look for every passage in
Xthe text that contained words like
X
X fire, firy
X burn
X furnace
X etc.
X
XTo refine the search, let us say that you want every instance of one
Xof these fire words that occurs within one verse of a biblical title
Xfor God:
X
X God
X LORD
X etc.
X
XThe searches for fire, firy, burn, etc. would be accomplished by
Xcalling a routine called retrieve(). Retrieve takes three arguments:
X
X retrieve(pattern, filename, invert_search)
X
XThe first argument should be a string containing a regular expression
Xbased pattern, such as
X
X fir(y|e|iness)|flam(e|ing)|burn.*?
X
XNote that the pattern must match words IN THEIR ENTIRETY. So, for
Xinstance, "fir[ie]" would not catch "firiness," but rather only
X"fire." Likewise, if you want every string beginning with the
Xsequence "burn," the string "burn" will not work. Use "burn.*"
Xinstead. The filename argument supplies retrieve() with the name of
Xthe original text file. The last argument, if nonnull, inverts the
Xsense of the search (a la egrep -v). In the case of the fire words
Xmentioned above, one would invoke retrieve() as follows:
X
X hits1 := retrieve("fir(y|e|iness)|flam(e|ing)|burn.*?", "kjv")
X
XFor the divine names, one would do something along these lines:
X
X hits2 := retrieve("god|lord", "kjv")
X
X Having finished the basic word searches, one would then
Xperform a set intersection on them. If we are looking for fire words
Xwhich occur at most one verse away from a divine name, then we would
Xspecify 1 as our range (as opposed to, say, zero), and the verse as
Xour unit. The utility you would use to carry out the search is
Xr_and(). R_and() would be invoked as follows:
X
X hits3 := r_and(hits1, hits2, "kjv", 3, 1)
X
XThe last two arguments, 3 and 1, specify field three (the "verse"
Xfield) and field 1 (the range).
X To display the text for your "hit list" (hits3 above), you
Xwould call bitmap_2_text():
X
X every write(!bitmap_2_text(hits3, "kjv"))
X
XBitmap_2_text converts the location designators contained in hits3
Xinto actual text.
X The three basic functions mentioned above - retrieve(),
Xr_and(), and bitmap_2_text() - are contained in the three distinct
Xfiles (retrieve.icn, retrops.icn, and bmp2text.icn, respectively).
XOther useful routines are included in these files, and also in
Xwhatnext.icn. If you are planning on writing a retrieval engine for
Xserious work of some kind, you would probably want to construct a mini
Xinterpreter, which would convert strings typed in by the user at
Xrun-time into internal search and retrieval operations.
X Note that I have included no routine to parse or expand
Xhuman-readable input (the nature of which will naturally vary from
Xtext to text). Again, using the Bible as our hypothetical case, it
Xwould be very useful to be able to ask for every passage in, say,
XGenesis chapters 2 through 4, and to be able to print these to the
Xscreen. Doing this would require a parsing routine to break down the
Xreferences, and map them to retrieve-internal format. The routine
Xwould then have to generate all valid locations from the minimum value
Xin chapter 2 above to the max in chapter 4. See the file whatnext.icn
Xfor some aids in accomplishing this sort of task.
X
X
X--------
X
X
XStep 4: Compiling and Running Your Program
X
X Assuming you have written a search/retrieval program using the
Xroutines contained in retrieve.icn, retrops.icn, bmp2text.icn, and
Xwhatnext.icn, you would now be ready to compile it. In order to
Xfunction properly, these routines would need to be linked with
Xinitfile.icn and indexutl.icn. Specific dependencies are noted in the
Xindividual files in case there is any confusion.
X If you have made significant use of this package, you probably
Xshould not worry about the exact dependencies, though. Just link
Xeverything in together, and worry about what isn't needed after you
Xhave fully tested your program:
X
X icont -o yourprog yourprog.icn initfile.icn indexutl.icn \
X retrieve.icn retrops.icn bmp2text.icn binsrch.icn
X
X
X--------
X
X
XProblems, bugs:
X
X This is really an early beta release of the retrieve package.
XI use it for various things. For instance, I recently retrieved a
Xtext file containing informal reviews of a number of Science Fiction
Xworks. My father likes SciFi, and it was close to Fathers' Day, so I
Xindexed the file, and performed cross-referenced searches for words
Xlike "very good," "brilliant," and "excellent," omitting authors my
Xfather has certainly read (e.g. Herbert, Azimov, etc.). I also had
Xoccasion to write a retrieval engine for the King James Bible (hence
Xthe many examples from this text), and to construct a retrieval
Xpackage for the Hebrew Bible, which I am now using to gather data for
Xvarious chapters of my dissertation. I'm happy, incidentally, to hand
Xout copies of my KJV retrieval program. It's a clean little program
Xthat doubtless many would find useful. The Hebrew Bible retrieval
Xpackage I'll hand out as well, but only to fully competent Icon
Xprogrammers who feel comfortable with Hebrew and Aramaic. This latter
Xretrieval package a much less finished product, and would almost
Xcertainly need to be hacked to work on platforms other than what I
Xhave here at home (a Xenix/386 setup with a VGA).
X In general, I hope that someone out there will find these
Xroutines useful, if for no other reason than that it will mean that I
Xget some offsite testing. Obviously, the whole package could have
Xbeen written/maintained in C or something that might offer much better
Xperformance. Doing so would, however, have entailed a considerable
Xloss of flexibility, and would have required a lot more time on my
Xpart. Right now, the retrieve package occupies about 60k of basic
Xsource files, probably half of which consists of comments. When
Xcompiled together with a moderate-size user interface, the total
Xpackage typically comes to about 150k. In-core size typically runs
Xabout 350k on my home machine here (a Xenix/386 box), with the basic
Xrun-time interpreter taking up a good chunk of that space all on its
Xown. It's not a small package, but I've found it a nice base for
Xrapid prototyping and development of small to medium-size search and
Xretrieval engines.
X
X -Richard L. Goerwitz goer%sophist@uchicago.bitnet
X goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
SHAR_EOF
echo 'File README.rtv is complete' &&
true || echo 'restore of README.rtv failed'
rm -f _shar_wnt_.tmp
fi
rm -f _shar_seq_.tmp
echo You have unpacked the last part
exit 0
--
-Richard L. Goerwitz goer%sophist@uchicago.bitnet
goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer