home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Online Bible 1995 March
/
ROM-1025.iso
/
olb
/
biblemsc
/
gnt
/
wordstat.doc
< prev
next >
Wrap
Text File
|
1994-05-31
|
10KB
|
210 lines
════════════════════════════════════════════════════════════════════════
WordStat v.3.1 1 June 1994
════════════════════════════════════════════════════════════════════════
Word Statistician - ver.3.1
Copyright (c)1994, Bob Rinker - All Rights Reserved
by Bob Rinker
Bitnet: rrinker@fccj
Internet: rrinker%fccj.bitnet@uga.cc.uga.edu
WORDSTAT is a special-purpose text processing program. It reads an
ASCII file and produces certain statistics about the file. Specifically,
it can produce a list of all the unique words in the file and the number
of times each occurs. This list can be presented in lexicographic
order, in order of frequency of occurrence (most frequent first), and in
order by length of word (longest first). The program can also produce a
count of the number of times each letter in the alphabet occurs in the
file.
When looking at words, case is ignored. Strings of characters that
start with numerals or with most non-alphabetic characters are ignored.
However, words immediately preceded by a left parenthesis [(] or a
double quote mark ["] WILL be recognized. Hyphenated words appearing on
an individual line will be counted as a single word. Should hyphenation
occur over a line break, the hyphenated term will be treated as two
separate words.
WORDSTAT is not limited by available DOS memory in the number of unique
words it can handle. As it processes your file, it creates temporary
files on your hard disk and swaps data out to them. The number of unique
words it can handle is limited only by the amount of free space on your
drive. For this reason, WORDSTAT is NOT designed to be run on systems
with no hard drive.
No particular installation is required to use WORDSTAT. The program
supports all video display modes and all text printers. The syntax for
invoking the program is the following:
WORDSTAT <-s | /s> <inputfile.ext> <outputfile.ext>
Typing WORDSTAT alone and pressing ENTER will start the program.
However, the command line may have up to three optional parameters.
The command "WORDSTAT" may be followed by at least one space and the
name of the input file to be analyzed. That may in turn be followed by
at least one space and the name to be used for the output file. After
the program starts, a menu screen will allow you to specify or change
filenames, as well as select desired options.
The -s or /s switch for "sound" (if selected) must precede any input or
output filenames. This switch will activate a series of "alert tones"
once WORDSTAT has completed all of its statistical searches. The
activation of this switch can be helpful when an extremely long document
is being analyzed. The default is no sound whatsoever.
If input or input-and-output filenames were specified on the command
line, these will already be entered for you in the menu. Otherwise, you
will be prompted for at least an input filename, without which the
program will not be able to function.
If no output filename was specified, the output name WORDSTAT.OUT will
be suggested. Should you desire a different output filename, use the
arrow keys or the ENTER key to highlight the filename, then type your
desired filename in place of WORDSTAT.OUT. Should you prefer no file
output in this category, change this entry to NONE.
Other options selectable on the menu screen are the following:
(a) echoing program output to the screen
(not recommended for lengthy files);
(b) producing the frequency count for individual letters
(trivial, but an option nevertheless);
(c) sorting the words by frequency; and
(d) sorting the words by length.
Any or all of these can be selected or deselected by moving the
highlight to the line describing the option and pressing the space bar
to toggle the option. When toggled to ON, the ( ) will appear as (X).
If you select the "sort by frequency" or "sort by word length" options,
you also will be able to select an output filename to receive the
specific output of those selections. If no output filename was
specified on the command line, the default filenames for these options
are WORDSTAT.FRQ and WORDSTAT.SIZ; otherwise, the specified output
filename will appear, followed by the extensions .FRQ and .SIZ in each
case.
Should you desire to change any of the suggested filenames, these can be
edited as noted above: first use the arrow keys or the ENTER key to
highlight the name to be changed, then use the left and right arrow keys
as well as the backspace and delete keys to make the necessary changes.
If you do not desire any output disk files to be created, but would like
to view the output on-screen, select the desired options as instructed,
but type "NONE" in each of the filename entry sections. On-screen
display of the output from the selected categories requires toggling the
"Echo Output to Screen" menu selection to ON (X). Without such a
toggle, there will be NO output whatsoever from any category that has
"NONE" selected as the output filename.
Should the DEL or backspace key be used to completely blank out the
filename, "NONE" will be inserted automatically in that category. If
you do not want ANY sorting or display of the frequency or size options,
simply use the spacebar to deselect the menu option entirely rather than
alter the category entry to "NONE" (whenever "NONE" is selected, sorting
will ALWAYS occur, whether or not the on-screen display or an output
filename has been selected).
All output files will be in ASCII form, viewable with any display
utility or word processor, as well as printable on any text printer.
NOTE: for lengthy files, it is NOT recommended that you view the
results on-screen, since there is no page-pause feature built in.
On-screen viewing is only a viable option when dealing with very
short text files.
When you have finished selecting filenames and options, press the F10
key to continue. The program will first check to see that the file
names are in order -- i.e., that the input file exists, and that there
are not two output files specified with the same file name. Should
there be a problem with the files as named, the program will indicate
the nature of the problem and will return you to the options menu. You
then must change the items on the menu and try again, or press the
escape key to exit the program.
As the program runs, a display will show the progress of its work. If
the -s or /s option has been selected, a series of musical tones will
announce the end of the process. A summary total will be displayed once
the program has finished.
WARNINGS AND CAUTIONS
WARNING: If the file to be analyzed is in the form of a list ALREADY in
alphabetical order, WORDSTAT may not function properly. The initial
function of the program is to create an alphabetically-sorted list with
frequency counts. A list already sorted in alphabetical order may cause
havoc to WORDSTAT, possibly locking up your computer and leaving lost
clusters as a result of having to reboot.
CAUTION: When naming output files, should you accidentally specify
identical filenames in one or more of the categories, an error message
will result. You must manually rename or eliminate one or more of the
filenames in question to resolve this conflict.
COMMENTS AND SUGGESTIONS
Comments, bug reports, and suggestions for improvement are welcomed.
Please contact the author at the above Bitnet or Internet address.
════════════════════════════════════════════════════════════════════════
LICENSE AND WARRANTY
════════════════════════════════════════════════════════════════════════
Word Statistician - ver. 3.1
Copyright (c)1994, Bob Rinker - All Rights Reserved
WORDSTAT was authored and is copyrighted by Bob Rinker with all rights
being reserved.
This program is distributed as FREEWARE to individuals, and may be
freely copied and distributed to individuals and electronic bulletin
board systems by any means so long as the complete distribution package
is included without alteration or change.
The complete distribution package consists of the executable file
WORDSTAT.EXE as well as the file WORDSTAT.DOC, containing the program
documentation as well as this license agreement.
WORDSTAT may be distributed as part of another program or package, so
long as that program or package is also distributed as Freeware.
Shareware and commercial program authors who wish to include WORDSTAT as
part of their product must make prior arrangements with the author.
Shareware authors can expect the granting of a free license. Commercial
authors must expect a small royalty arrangement. For network, business,
organizational, or governmental use, contact the author for site license
rates. The author may be contacted via BITNET or the INTERNET at the
addresses given below.
Note that the bundled distribution of this package as part of a
shareware or commercial product does not preclude the distribution
of this package as a separate Freeware product.
WORDSTAT is offered as-is, with no actual or implied warranty. Users
run the program at their own risk. The author will not be responsible
for any damages or loss incurred by users of this program. Use of the
program constitutes acceptance by the user of these terms.
Bob Rinker
Bitnet: rrinker@fccj
Internet: rrinker%fccj.bitnet@uga.cc.uga.edu
════════════════════════════════════════════════════════════════════════
WordStat v.3.1 1 June 1994
════════════════════════════════════════════════════════════════════════
-eof-