home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Oakland CPM Archive
/
oakcpm.iso
/
cpm
/
spell
/
spell21.lbr
/
SPELL21.DQC
/
SPELL21.DOC
Wrap
Text File
|
1985-02-16
|
19KB
|
432 lines
SPELL V2.0 DOCUMENTATION
Michael C. Adler
December 22, 1982
(C) 1982 Michael C. Adler
This program has been released into the public domain by
the author. It may neither be sold for profit nor
included in a sold software package without permission of
the author.
The first SPELL using this dictionary was probably written
by Ralph Gorin at Stanford. It was transported to MIT by
Wayne Mattson. Both the program at MIT and the dictionary
were most recently revised by William Ackerman at MIT.
Section 5 of this document was copied from portions of Mr.
Ackerman's documentation.
Thanks to all for the effort spent designing the
dictionary!
Spell is a program, written for Z80 processors running
CP/M, designed to detect misspellings in a document.
1. USING SPELL
The minimum configuration of SPELL requires the files
SPELL.COM and DICT.DIC (the main dictionary). At the time of
execution, DICT.DIC must be on either the default drive or drive
A:.
The name of the file to be corrected must be included on
the command line that is used to invoke spell. If a drive name
is specified as a second file name, output is directed to the
specified drive. Thus,
SPELL useless.doc
will check the file "useless.doc" and direct output to the
default drive and
SPELL b:useless.doc c:
will check the file "b:useless.doc" and direct output to disk c.
Spell will check the input file for errors by comparing
each word in the file to the dictionary. If a word is not
found, a null (ascii 0) is placed before the word. To change
this marking character, see section 4, PATCHING SPELL. If a
backup version (.BAK file type) of the input file exists, it
will be deleted. The input file will be renamed to a backup
file and the checked file will replace the input file.
2. USER DICTIONARIES
A user dictionary is a list of correct words that can be
1
loaded by SPELL to augment the main dictionary. Words such as
proper nouns can be placed in user dictionaries to inhibit error
marking. User dictionary files may be formatted in any way that
the user desires, as long as words are delimited by non-alphabe-
tic characters.
SPELL will automatically search for the user dictionary
SPELL.DIC on the default drive and on drive A: if it is not on
the default one. It's contents are then loaded and temporarily
added to the dictionary. It must be loaded again to be included
in subsequent executions of SPELL.
SPELL will also automatically search for d:file.UDC, where
file is the name of the file being corrected and d: is the drive
on which file is found. If found, it is also loaded and tempo-
rarily augments the dictionary. Thus, users may create separate
dictionaries for each text file being corrected. After locating
d:file.UDC, SPELL will search file d:file.ADD. This file is
created by WordStar's ^QL command (see section 3) and is not an
ASCII file. d:file.ADD contains commands generated by WordStar
to include specific words in the user dictionary associated with
d:file. SPELL will temporarily place all of the words in it in
the dictionary and will also save the words by copying them into
d:file.UDC.
It is possible to load additional user dictionaries by
specifying them on the SPELL command line. A list of user dic-
tionaries must be preceded by a dollar sign. A dictionary is
specified by a file name and an optional drive name. If no
drive is specified, the default drive is searched and then
drive A: is checked. Extensions are ignored and default to
.DIC. Hence, the the command line:
SPELL useless.doc b: $dict1 c:dict2 dict3.fun
would correct useless.doc and direct output to drive B:. User
dictionary DICT1.DIC would be loaded from the default drive or
drive A:, dictionary DICT2.DIC would be loaded from drive C:,
and DICT3.DIC would be loaded from the default drive or drive
A:. Notice that the extension .fun was ignored.
3. WordStar's ^QL COMMAND
Files checked by SPELL can be corrected using WordStar. In
response to ^QL, the user is asked which portions of the file
should be searched. WordStar will then position the cursor on
the first marked word and print a menu offering F (Fix word), B
(Bypass word), I (Ignore word), D (Add to dictionary), and S
(Add to supplemental dictionary). The F option deletes the
error marker and returns to the WordStar main menu, allowing
the user to correct the word. B will leave the word marker and
will search for the next misspelled word. In this
implementation of SPELL, the I, D and S options all perform the
same function (although I is easier to use because no question
is asked by WordStar). If either of these options (I, D, S)
are chosen, the
2
mark will be removed and the word will be added to file.ADD.
Thus, choosing these options informs SPELL that the word is cor-
rect and should not be marked again. The D and S options do not
add the word to SPELL's main dictionary because the compression
method used to store the dictionary is too complicated to allow
such modification efficiently. After choosing all of the
options except F, WordStar will automatically search for the
next marked word.
4. PATCHING SPELL
It is not necessary to recompile SPELL to change the
character that marks misspelled words. The byte at 0103H
contains the marking character. Byte 0104H contains the
"default disk" [1 for A: , 2 for B: etc]. In the distribution
version of SPELL, the bytes are 0 and 1 [default is NULL and A:]
. EDFILE, PATCH or DDT or another debugger can be used to
change the bytes at 0103H, 0104H. Octal 23 - '#' is a tolerable
marking character for FinalWord.
5. PROGRAM AND DICTIONARY CHARACTERISTICS
5.1 Word identification algorithm
A word is any uninterrupted sequence of letters and
apostrophes, which does not begin or end with an apostrophe.
Any punctuation, digit, or control character separates words.
Any word consisting of a single letter, or any word more than
40 letters long, is considered to be correctly spelled.
5.2 Dictionary policy
It is the policy of this program to contain only one
spelling of a word, even if ordinary dictionaries show two or
more "acceptable" spellings. Hence, the dictionary contains
LABELED and LABELING, but not LABELLED or LABELLING, even
though all four are actually acceptable. The intention is to
enforce uniformity within each document. The author apologizes
for the restriction on creativity and diversity that this
necessitates, but believes that it is the best policy for this
program.
The dictionary contains many technical and computer terms
such as MICROPROGRAM and DEBUGGER, but does not contain extreme
jargon words such as CONTROLIFY or VALRET. The dictionary
contains no proper names other than names of countries and
states of the United States. The reason is that it would be
virtually impossible to contain all of the proper names that
commonly arise in normal use. Users should keep proper names
(and other correctly spelled words) that arise in their own work
in private dictionaries to avoid having to repeatedly tell SPELL
to accept them.
The dictionary is significantly smaller than that found in
other spelling checkers, such as the DEC TOPS-20 program. The
author believes that the larger dictionary would not reduce the
number of false misspelling indications by very much.
3
[Note: I believe that this dictionary is actually MUCH larger
than any dictionaries currently available for microcomputers.
-Michael]
5.3 Dictionary flags
Words in SPELL's main dictionary (but not the other dictio-
naries) may have flags associated with them to indicate the
legality of suffixes without the need to keep the full suffixed
words in the dictionary. The flags have "names" consisting of
single letters. Their meaning is as follows:
Let # and @ be "variables" that can stand for any letter.
Upper case letters are constants. "..." stands for any string
of zero or more letters, but note that no word may exist in the
dictionary which is not at least 2 letters long, so, for
example, FLY may not be produced by placing the "Y" flag on
"F". Also, no flag is effective unless the word that it
creates is at least 4 letters long, so, for example, WED may
not be produced by placing the "D" flag on "WE".
"V" flag:
...E --> ...IVE as in CREATE --> CREATIVE
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
"N" flag:
...E --> ...ION as in CREATE --> CREATION
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
"X" flag:
...E --> ...IONS as in CREATE --> CREATIONS
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
"H" flag:
...Y --> ...IETH as in TWENTY --> TWENTIETH
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
"Y" FLAG:
... --> ...LY as in QUICK --> QUICKLY
"G" FLAG:
...E --> ...ING as in FILE --> FILING
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
"J" FLAG"
...E --> ...INGS as in FILE --> FILINGS
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
"D" FLAG:
...E --> ...ED as in CREATE --> CREATED
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IED as in IMPLY --> IMPLIED
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
4
...@# --> ...@#ED as in CROSS --> CROSSED
or CONVEY --> CONVEYED
"T" FLAG:
...E --> ...EST as in LATE --> LATEST
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#EST as in SMALL --> SMALLEST
or GRAY --> GRAYEST
"R" FLAG:
...E --> ...ER as in SKATE --> SKATER
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ER as in BUILD --> BUILDER
or CONVEY --> CONVEYER
"Z FLAG:
...E --> ...ERS as in SKATE --> SKATERS
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ERS as in BUILD --> BUILDERS
or SLAY --> SLAYERS
"S" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IES as in IMPLY --> IMPLIES
if # .eq. S, X, Z, or H,
...# --> ...#ES as in FIX --> FIXES
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
...# --> ...#S as in BAT --> BATS
or CONVEY --> CONVEYS
"P" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
if # .ne. Y, or @ = A, E, I, O, or U,
...@# --> ...@#NESS as in LATE --> LATENESS
or GRAY --> GRAYNESS
"M" FLAG:
... --> ...'S as in DOG --> DOG'S
Note: The existence of a flag on a root word in the directory
is not by itself sufficient to cause SPELL to recognize the
indicated word ending. If there is more than one root for
which a flag will indicate a given word, only one of the roots
is the correct one for which the flag is effective; generally it
is the longest root. For example, the "D" rule implies that
either PASS or PASSE, with a "D" flag, will yield PASSED. The
flag must be on PASSE; it will be ineffective on PASS. This is
because, when SPELL encounters the word PASSED and fails to
5
find it in its dictionary, it strips off the "D" and looks up
PASSE. Upon finding PASSE, it then accepts PASSED if and only
if PASSE has the "D" flag. Only if the word PASSE is not in
the main dictionary at all does the program strip off the "E"
and search for PASS. Furthermore, some combinations of flags
are forbidden to allow for dense flag encoding to save space.
For example, only one of the "P", "J", or "V" flags may be on in
any one word.
6. SPELL INTERNALS
SPELL uses a number of temporary files during execution.
The file file.D$$ is the union of file.UDC and file.ADD. At the
end of execution, file.UDC and file.ADD are deleted and file.D$$
is renamed to file.UDC. The file file.$$$ is the output file.
At the end of execution, file.BAK is deleted, the input file is
renamed to file.BAK, and file.$$$ is renamed to the input file
name. Warning: if you do not have room on your disk for
file.BAK, file.DOC and file.$$$ at the same time, either use two
drives or delete file.BAK before you start.
SPELL corrects files with two passes of the input file. On
the first pass, the words in the file are sorted alphabetically
and duplicate words are eliminated. An attempt is then made to
search for the words in the dictionary. Words that are found
are marked. On the second pass of the input file, SPELL
determines whether each word was found by locating them in
memory. This method makes the operation of SPELL more efficient
because common words must be looked up only once and because the
dictionary can be searched sequentially, minimizing disk head
travel. If all of the file does not fit in memory on the first
pass, the input file is partitioned into sections small enough
to fit into memory and is then corrected in a series of two pass
operations until the entire file has been checked. It is
unlikely that memory will be filled in large systems by even
large text files as 3000 individual words should fit easily.
7. DICTIONARY INTERNALS
The dictionary has been compressed, significantly, in order
to save space. Dictionary records are all 256 bytes long and
each record contains as many words as will fit. Individual
words are stored in the following code:
4 bits -- Number of characters to copy from the previous
word. Because the dictionary is stored in
alphabetical order, this saves a large number of
characters. This field is 0 at the beginning of
each record.
x * 5 bits -- Characters are stored in 5 bit code. There may be
any number of 5 bit characters. A character
string is terminated by the following field.
3 bits -- Set to 111 binary to indicate the end of the word.
6
Since 11100 binary is greater than 26, all
alphabetic characters can be stored without using
this combination.
4 bits -- Number of bits of flag data following the word.
The bit position of the flags has been ordered so
that the flags most frequently used are earliest.
Flags not stored are assumed to be off.
x bits -- Flag data. x is determined by the previous field.
Each bit represents one of the 14 suffix flags.
8. MODIFYING THE MAIN DICTIONARY
The source for the main dictionary can currently be found
in the file "[MIT-XX]SRC:<WBA>SPELL.DCT". In order to make it
compatible with SPELL, all of the "/" characters that delimit
flags must be converted to "%" characters so that flags will be
considered earlier in the alphabet than hyphens (DOG%S should be
before DOG'S). The file must then be sorted alphabetically. No
utilities are provided with SPELL to accomplish either of these
tasks. Without high capacity disk drives, you may find it
necessary to perform the above steps on a larger computer.
Once a copy of the main dictionary has been placed on the
microcomputer, use the program DICCRE to create a dictionary.
Include the name of the source file on the DICCRE command line.
DICCRE will create the files DICT.DIC (compressed dictionary)
and SPELL0.MAC (pointer file to dictionary) ON THE DEFAULT DISK
DRIVE. When it has finished converting the input file to the
dictionary file, it will execute a warm boot if the output file
is on the same drive as the input file. However, if the output
file is not on the same disk, it will ask whether another input
file exists. This feature allows the user to put the source
file on two disks in case it does not fit on one. DICCRE will
combine them into one dictionary file. If no more files exist,
answer N to the question. If another file does exist, put the
disk with the new file in the input drive and type Y.
After the dictionary file has been created, it is necessary
to recompile SPELL with the new pointer file, SPELL0.MAC. If
your assembler does not support the INCLUDE statement, you will
have to replace the line INCLUDE SPELL0.MAC in the file
SPELL.MAC with the contents of SPELL0.MAC. After SPELL is
recompiled, be sure to use the correct copy of DICT.DIC with it
or you will obtain unpredictable results.
For more information about dictionaries, see the file:
[MIT-XX]SS:<WBA>DICT.LETTER
Good luck and happy hacking!
Michael Adler (MADLER@MIT-ML)
3 Sunny Knoll Terrace
Lexington, MA 02173
will obtain unpredictable results.
For more information about dictionaries, see the fi