home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Meeting Pearls 3
/
Meeting_Pearls_III.iso
/
Contrib
/
Patricia
/
Patricia.doc
< prev
next >
Wrap
Text File
|
1995-09-26
|
6KB
|
128 lines
Patricia
---===================================================---
Practical Algorithm To Retrieve Information Coded In Alphanumeric
Overview
--------
This implementation is an enhanced version of D.R. Morrison's Patricia as
described in [1]. Some of the features are:
- Search for arbitrary text in AmigaGuide and ascii files.
- The result is an amigaguide file with links to all files, resp. AmigaGuide
nodes the search text is found in and occurence numbers for each entry.
- The text passages found can be highlighted.
- Text search is based on a precalculated database, so search is very fast.
- With a suitable database all occurences (even as subwords) of every word
are found. On database creation the range of words added to the database
can be reduced to decrease database size, however.
PSearch documentation
---------------------
PSearch can be started from Workbench and Shell and supports all OS >= 1.2.
Shell arguments from OS 2.04 on are:
DataBases/A/M,Search/K,P=FilePattern/K,NOH=NoHighlight/S,Profile/S,
BaseDir/K,TempDir/K,GuideViewer/K,TextViewer/K,PHighPath/K
DataBases - a list of patricia databases to search in
Search - text(s) that PSearch will search (see below)
FilePattern - only files in the database matching this pattern are searched
NoHighlight - don't highlight found text passages
Profile - print some statistics while searching
BaseDir - path the searched files are located in;
usually saved in the database itself on creation
TempDir - used for temporary files and for highlighted texts,
Default: "T:"
Should you run out of RAM, use a directory on harddrive or
search with "NoHighlight".
GuideViewer - AmigaGuide viewer, default: "AmigaGuide []"
Attention: To show the right node in a highlighted document
always "AmigaGuide xxx Document yyy" is used!
TextViewer - used to show ascii texts, default: "More []"
PHighPath - the path PHighlight is located in
Can be omitted if PHighlight is in the search path or the
current directory when starting PSearch.
With OS <= 1.3 unix style argument parsing is used and
some features are disabled: GUI, profiling and pattern matching.
PSearch -s Search -n -b BaseDir -t TempDir -g GuideViewer
-v TextViewer -h PHighPath - DataBase1 DataBase2 ...
^^^
Please mind the dash!
-sbtgvh - the same as above
-n - NoHighlight
PHighlight is used in PSearch to highlight text passages and should not be
called manually. It will be found if it is in the search path, in the
current directory when starting PSearch or you give the full path.
Search Text
-----------
PSearch can search for an arbitrary number of text passages separated by
" | ". The spaces around the "|" are nescessary, otherwise it is part of the
word beside it. The text itself can consist of several words. The text will
only be found if the words are seperated by the same number of whitespaces
both in the search text and in the files being searched, i.e.
"Meeting Pearls" (one space) is different from "Meeting Pearls" (two
spaces). A carriage return, spaces and tabs are all treated as one
whitespace.
A text might not be found when it contains a word that is either too short
or occured too often in the text. Your are warned about this in the result,
but there is also a "workaround" for this: Assume you search for "dummy-name
has dummy-verb".
"has" is probably not stored in the database, thus this text might not be
found. You can search for "dummy-name dummy-verb" (three spaces instead
of "has"). This will find more occurences, but it will also find "dummy-name
had dummy-verb".
Limitations
-----------
If the range of words stored in the database is intentionally limited or a
error occured while searching, the search might not be complete. You are
warned in the result AmigaGuide document about this and a detailed
explanation is given. All inexact occurence numbers will have a ">=" in
front of them.
If a word in an AmigaGuide document is partly written in another text style,
then you can only search for each part, not for the whole word. If you
search for a sequence of words and in the text there are format commands
between some of these words, then PSearch will not find this sequence,
because it cannot distinguish between spaces and format commands in the
file.
Distribution
------------
This distribution includes only the programs nescessary to search in an
existing database (PSearch and PHighlight). They can be distributed freely.
The program PCreateDB used to create a patricia database is not included and
must not be distributed without the author's permission. All programs and
text files are © 1995 by Patrick Ohly.
If you want to use Patricia to create and distribute your own databases ask
the author for permission and conditions. Angela Schmidt is herewith given
the permission to create databases for the Meeting Pearls III and to include
PSearch, PHighlight and this documentation on that CD.
The Author
----------
Patrick Ohly
Weechstr. 1, WG E0/1
76131 Karlsruhe
Germany
Tel.: +49 721 615662
eMail: patrick.ohly@stud.uni-karlsruhe.de
IRC: Irish@AmigaGer
References
----------
[1] R. Sedgewick, Algorithmen; Addison-Wesley 1992
[2] D. E. Knuth, The Art of Computer Programming,
Volume 3: Sorting and Searching, Addison-Wesley 1975