Source Code 1992 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1992 March / Source_Code_CD-ROM_Walnut_Creek_March_1992.iso / usenet / altsrcs / 3 / 3584 < prev next >

Wrap

Text File | 1991-07-02 | 9KB | 215 lines

Newsgroups: alt.sources From: goer@ellis.uchicago.edu (Richard L. Goerwitz) Subject: kjv browser, part 11 of 11 Message-ID: <1991Jul3.065346.28525@midway.uchicago.edu> Date: Wed, 3 Jul 1991 06:53:46 GMT ---- Cut Here and feed the following to sh ---- #!/bin/sh # this is bibleref.11 (part 11 of a multipart archive) # do not concatenate these parts, unpack them in order with /bin/sh # file README.rtv continued # if test ! -r _shar_seq_.tmp; then echo 'Please unpack part 1 first!' exit 1 fi (read Scheck if test "$Scheck" != 11; then echo Please unpack part "$Scheck" next! exit 1 else exit 0 fi ) < _shar_seq_.tmp || exit 1 if test ! -f _shar_wnt_.tmp; then echo 'x - still skipping README.rtv' else echo 'x - continuing file README.rtv' sed 's/^X//' << 'SHAR_EOF' >> 'README.rtv' && Xvalues for. In our hypothesized scenario, you would want makeind to Xstore the max value for the verse field for every chapter of every Xbook in the Bible. The verse field (field #3), in other words, is Xyour "rollover" field, and would be passed to makeind using the -l Xoption. Assuming "kjv" to be the name of your indexable biblical Xtext, this set of circumstances would imply the following invocation Xfor makeind: X X makeind -f kjv -m 176 -n 3 -l 3 X XIf you were to want a case-sensitive index (not a good idea), you Xwould add "-s" to the argument list above (the only disadvantage a Xcase-insensitive index would bring is that it would obscure the XLord/lord, and other similar, distinctions). X Actual English Bible texts usually take up 4-5 megabytes. XIndexing one would require at over twice that much core memory, and Xwould take at least an hour on a fast machine. The end result would Xbe a set of data files occupying about 2 megabytes plus the 4-5 Xmegabytes of the original file. The Bible is hardly a small book. XOnce these data files were created, they could be moved, along with Xthe original source file, to any platform you desired. X Having indexed, and having moved the files to wherever you Xwanted them, you would then be ready for step 3. X X X-------- X X XStep 3: Writing a Program to Access Indexed Files X X When accessing text files such as the Bible, the most useful Xunit for searches is normally the word. Let us suppose you are a Xzealous lay-speaker preparing a talk on fire imagery and divine wrath Xin the Bible. You would probably want to look for every passage in Xthe text that contained words like X X fire, firy X burn X furnace X etc. X XTo refine the search, let us say that you want every instance of one Xof these fire words that occurs within one verse of a biblical title Xfor God: X X God X LORD X etc. X XThe searches for fire, firy, burn, etc. would be accomplished by Xcalling a routine called retrieve(). Retrieve takes three arguments: X X retrieve(pattern, filename, invert_search) X XThe first argument should be a string containing a regular expression Xbased pattern, such as X X fir(y|e|iness)|flam(e|ing)|burn.*? X XNote that the pattern must match words IN THEIR ENTIRETY. So, for Xinstance, "fir[ie]" would not catch "firiness," but rather only X"fire." Likewise, if you want every string beginning with the Xsequence "burn," the string "burn" will not work. Use "burn.*" Xinstead. The filename argument supplies retrieve() with the name of Xthe original text file. The last argument, if nonnull, inverts the Xsense of the search (a la egrep -v). In the case of the fire words Xmentioned above, one would invoke retrieve() as follows: X X hits1 := retrieve("fir(y|e|iness)|flam(e|ing)|burn.*?", "kjv") X XFor the divine names, one would do something along these lines: X X hits2 := retrieve("god|lord", "kjv") X X Having finished the basic word searches, one would then Xperform a set intersection on them. If we are looking for fire words Xwhich occur at most one verse away from a divine name, then we would Xspecify 1 as our range (as opposed to, say, zero), and the verse as Xour unit. The utility you would use to carry out the search is Xr_and(). R_and() would be invoked as follows: X X hits3 := r_and(hits1, hits2, "kjv", 3, 1) X XThe last two arguments, 3 and 1, specify field three (the "verse" Xfield) and field 1 (the range). X To display the text for your "hit list" (hits3 above), you Xwould call bitmap_2_text(): X X every write(!bitmap_2_text(hits3, "kjv")) X XBitmap_2_text converts the location designators contained in hits3 Xinto actual text. X The three basic functions mentioned above - retrieve(), Xr_and(), and bitmap_2_text() - are contained in the three distinct Xfiles (retrieve.icn, retrops.icn, and bmp2text.icn, respectively). XOther useful routines are included in these files, and also in Xwhatnext.icn. If you are planning on writing a retrieval engine for Xserious work of some kind, you would probably want to construct a mini Xinterpreter, which would convert strings typed in by the user at Xrun-time into internal search and retrieval operations. X Note that I have included no routine to parse or expand Xhuman-readable input (the nature of which will naturally vary from Xtext to text). Again, using the Bible as our hypothetical case, it Xwould be very useful to be able to ask for every passage in, say, XGenesis chapters 2 through 4, and to be able to print these to the Xscreen. Doing this would require a parsing routine to break down the Xreferences, and map them to retrieve-internal format. The routine Xwould then have to generate all valid locations from the minimum value Xin chapter 2 above to the max in chapter 4. See the file whatnext.icn Xfor some aids in accomplishing this sort of task. X X X-------- X X XStep 4: Compiling and Running Your Program X X Assuming you have written a search/retrieval program using the Xroutines contained in retrieve.icn, retrops.icn, bmp2text.icn, and Xwhatnext.icn, you would now be ready to compile it. In order to Xfunction properly, these routines would need to be linked with Xinitfile.icn and indexutl.icn. Specific dependencies are noted in the Xindividual files in case there is any confusion. X If you have made significant use of this package, you probably Xshould not worry about the exact dependencies, though. Just link Xeverything in together, and worry about what isn't needed after you Xhave fully tested your program: X X icont -o yourprog yourprog.icn initfile.icn indexutl.icn \ X retrieve.icn retrops.icn bmp2text.icn binsrch.icn X X X-------- X X XProblems, bugs: X X This is really an early beta release of the retrieve package. XI use it for various things. For instance, I recently retrieved a Xtext file containing informal reviews of a number of Science Fiction Xworks. My father likes SciFi, and it was close to Fathers' Day, so I Xindexed the file, and performed cross-referenced searches for words Xlike "very good," "brilliant," and "excellent," omitting authors my Xfather has certainly read (e.g. Herbert, Azimov, etc.). I also had Xoccasion to write a retrieval engine for the King James Bible (hence Xthe many examples from this text), and to construct a retrieval Xpackage for the Hebrew Bible, which I am now using to gather data for Xvarious chapters of my dissertation. I'm happy, incidentally, to hand Xout copies of my KJV retrieval program. It's a clean little program Xthat doubtless many would find useful. The Hebrew Bible retrieval Xpackage I'll hand out as well, but only to fully competent Icon Xprogrammers who feel comfortable with Hebrew and Aramaic. This latter Xretrieval package a much less finished product, and would almost Xcertainly need to be hacked to work on platforms other than what I Xhave here at home (a Xenix/386 setup with a VGA). X In general, I hope that someone out there will find these Xroutines useful, if for no other reason than that it will mean that I Xget some offsite testing. Obviously, the whole package could have Xbeen written/maintained in C or something that might offer much better Xperformance. Doing so would, however, have entailed a considerable Xloss of flexibility, and would have required a lot more time on my Xpart. Right now, the retrieve package occupies about 60k of basic Xsource files, probably half of which consists of comments. When Xcompiled together with a moderate-size user interface, the total Xpackage typically comes to about 150k. In-core size typically runs Xabout 350k on my home machine here (a Xenix/386 box), with the basic Xrun-time interpreter taking up a good chunk of that space all on its Xown. It's not a small package, but I've found it a nice base for Xrapid prototyping and development of small to medium-size search and Xretrieval engines. X X -Richard L. Goerwitz goer%sophist@uchicago.bitnet X goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer SHAR_EOF echo 'File README.rtv is complete' && true || echo 'restore of README.rtv failed' rm -f _shar_wnt_.tmp fi rm -f _shar_seq_.tmp echo You have unpacked the last part exit 0 -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer