ISPELL(4)                                               ISPELL(4)


NNAAMMEE
       ispell - format of ispell dictionaries and affix files

DDEESSCCRRIIPPTTIIOONN
       _I_s_p_e_l_l(1)  requires  two files to define the language that
       it is spell-checking.  The first file is a dictionary con-
       taining  words  for  the  language,  and  the second is an
       "affix" file that defines the meaning of special flags  in
       the  dictionary.   The two files are combined by _b_u_i_l_d_h_a_s_h
       (see _i_s_p_e_l_l(1)) and written to a hash file  which  is  not
       described here.

       A  raw  _i_s_p_e_l_l  dictionary  (either the main dictionary or
       your own personal dictionary) contains a  list  of  words,
       one  per  line.  Each word may optionally be followed by a
       slash ("/") and one or more flags, which modify  the  root
       word  as  explained  below.  Depending on the options with
       which _i_s_p_e_l_l was built, case may or may not be significant
       in  either  the  root  word  or  the flags, independently.
       Specifically, if the compile-time option CAPITALIZATION is
       defined,  case  is  significant  in the root word; if not,
       case is ignored in the root  word.   If  the  compile-time
       option  MASKBITS  is set to a value of 32, case is ignored
       in the flags; otherwise case is significant in the  flags.
       Contact your system administrator or _i_s_p_e_l_l maintainer for
       more information (or use the --vvvv flag to find  out).   The
       dictionary  should  be  sorted with the --ff flag of _s_o_r_t(1)
       before the hash file is built; this is done  automatically
       by  _m_u_n_c_h_l_i_s_t(1), which is the normal way of building dic-
       tionaries.

       If the dictionary contains words that have string  charac-
       ters  (see  the affix-file documentation below), they must
       be written in the format given by the ddeeffssttrriinnggttyyppee state-
       ment  in  the  affix file.  This will be the case for most
       non-English languages.  Be careful  to  use  this  format,
       rather  than  that of your favorite formatter, when adding
       words to a dictionary.  (If you add words to your personal
       dictionary  during  an _i_s_p_e_l_l session, they will automati-
       cally be converted to the correct  format.   This  feature
       can be used to convert an entire dictionary if necessary:)

                   echo qqqqq > dummy.dict
                   buildhash dummy.dict _a_f_f_i_x_-_f_i_l_e dummy.hash
                   awk '{print "*"}END{print "#"}' _o_l_d_-_d_i_c_t_-_f_i_l_e \
                   | ispell -a -T _o_l_d_-_d_i_c_t_-_s_t_r_i_n_g_-_t_y_p_e \
                     -d ./dummy.hash -p ./_n_e_w_-_d_i_c_t_-_f_i_l_e \
                     > /dev/null
                   rm dummy.*

       The case of the root  word  controls  the  case  of  words
       accepted by _i_s_p_e_l_l, as follows:

       (1)    If  the root word appears only in lower case (e.g.,


                              local                             1


ISPELL(4)                                               ISPELL(4)


              _b_o_b), it will be accepted in lower  case,  capital-
              ized, or all capitals.

       (2)    If   the   root  word  appears  capitalized  (e.g.,
              _R_o_b_e_r_t), it will not be accepted in all-lower case,
              but  will  be  accepted capitalized or all in capi-
              tals.

       (3)    If the root word appears  all  in  capitals  (e.g.,
              _U_N_I_X), it will only be accepted all in capitals.

       (4)    If the root word appears with a "funny" capitaliza-
              tion (e.g., _I_T_C_o_r_p), a word will be  accepted  only
              if it follows that capitalization, or if it appears
              all in capitals.

       (5)    More than one capitalization of  a  root  word  may
              appear  in  the  dictionary.   Flags from different
              capitalizations  are  combined   by   OR-ing   them
              together.

       Redundant capitalizations (e.g., _b_o_b and _B_o_b) will be com-
       bined by _b_u_i_l_d_h_a_s_h and by _i_s_p_e_l_l (for personal  dictionar-
       ies),  and  can be removed from a raw dictionary by _m_u_n_c_h_-
       _l_i_s_t.

       For example, the dictionary:

              bob
              Robert
              UNIX
              ITcorp
              ITCorp

       will accept _b_o_b, _B_o_b, _B_O_B, _R_o_b_e_r_t, _R_O_B_E_R_T,  _U_N_I_X,  _I_T_c_o_r_p,
       _I_T_C_o_r_p,  and  _I_T_C_O_R_P, and will reject all others.  Some of
       the unacceptable forms are _b_O_b, _r_o_b_e_r_t, _U_n_i_x, and  _I_t_C_o_r_p.

       As  mentioned  above,  root words in any dictionary may be
       extended by flags.  Each flag is a single alphabetic char-
       acter,  which  represents  a  prefix or suffix that may be
       added to the root to form a new word.  For example, in  an
       English  dictionary  the  DD  flag can be added to _b_a_t_h_e to
       make _b_a_t_h_e_d.  Since flags are represented as a single  bit
       in  the  hashed  dictionary,  this  results in significant
       space savings.  The _m_u_n_c_h_l_i_s_t script will reduce an exist-
       ing raw dictionary by adding flags when possible.

       When  a  word is extended with an affix, the affix will be
       accepted only if it appears in the same case as  the  ini-
       tial (prefix) or final (suffix) letter of the word.  Thus,
       for example, the entry _U_N_I_X_/_M in the  main  dictionary  (MM
       means  add  an apostrophe and an "s" to make a possessive)
       would accept _U_N_I_X_'_S but would reject _U_N_I_X_'_s.  If _U_N_I_X_'_s is


                              local                             2


ISPELL(4)                                               ISPELL(4)


       legal,  it must appear as a separate dictionary entry, and
       it will not be combined by _m_u_n_c_h_l_i_s_t.   (In  general,  you
       don't  need to worry about these things; _m_u_n_c_h_l_i_s_t guaran-
       tees that its output dictionary will accept the  same  set
       of  words as its input, so all you have to do is add words
       to the dictionary and occasionally run munchlist to reduce
       its size).

       As  mentioned,  the  affix  definition  file describes the
       affixes  associated  with  particular  flags.    It   also
       describes the character set used by the language.

       Although  the  affix-definition  grammar is designed for a
       line-oriented layout, it is actually  a  free-format  yacc
       grammar and can be laid out weirdly if you want.  Comments
       are started by a pound (sharp) sign (#), and  continue  to
       the  end  of  the  line.  Backslashes are supported in the
       usual fashion (\\_n_n_n, plus specials \\nn, \\rr, \\tt, \\vv, \\ff, \\bb,
       and  the new hex format \\xx_n_n).  Any character with special
       meaning to the parser can be changed to  an  uninterpreted
       token  by  backslashing it; for example, you can declare a
       flag named _f_l_a_g _\_*_: or _f_l_a_g _\_:_:.

       The grammar will be presented in a top-down fashion,  with
       discussion of each element.  An affix-definition file must
       contain exactly one table:

              _t_a_b_l_e     :    [_h_e_a_d_e_r_s] [_p_r_e_f_i_x_e_s] [_s_u_f_f_i_x_e_s]

       At least one of _p_r_e_f_i_x_e_s and _s_u_f_f_i_x_e_s is  required.   They
       can appear in either order.

              _h_e_a_d_e_r_s   :    [ _o_p_t_i_o_n_s ] _c_h_a_r_-_s_e_t_s

       The headers describe options global to this dictionary and
       language.  These include the character sets to be used and
       the  formatter, and the defaults for certain _i_s_p_e_l_l flags.

              _o_p_t_i_o_n_s : { _f_m_t_r_-_s_t_m_t | _o_p_t_-_s_t_m_t | _f_l_a_g_-_s_t_m_t | _n_u_m_-_s_t_m_t }

       The options statements define  the  defaults  for  certain
       ispell  flags  and for the character sets used by the for-
       matters.

              _f_m_t_r_-_s_t_m_t :    { _n_r_o_f_f_-_s_t_m_t | _t_e_x_-_s_t_m_t }

       A _f_m_t_r_-_s_t_m_t describes characters that have special meaning
       to  a  formatter.   Normally, this statement is not neces-
       sary, but some languages  may  have  preempted  the  usual
       defaults for use as language-specific characters.  In this
       case, these statements may be used to redefine the special
       characters expected by the formatter.

              _n_r_o_f_f_-_s_t_m_t     :    { nnrrooffffcchhaarrss | ttrrooffffcchhaarrss } _s_t_r_i_n_g


                              local                             3


ISPELL(4)                                               ISPELL(4)


       The  nnrrooffffcchhaarrss  statement  allows redefinition of certain
       _n_r_o_f_f  control  characters.   The  string  given  must  be
       exactly  five characters long, and must list substitutions
       for the left and right parentheses  ("()")  ,  the  period
       ("."),  the backslash ("\"), and the asterisk ("*").  (The
       right parenthesis is not currently used, but  is  included
       for completeness.)  For example, the statement:

              nnrrooffffcchhaarrss {}.\\*

       would replace the left and right parentheses with left and
       right curly braces for  purposes  of  parsing  _n_r_o_f_f/_t_r_o_f_f
       strings,  with  no effect on the others (admittedly a con-
       trived example).  Note that the backslash is escaped  with
       a backslash.

              _t_e_x_-_s_t_m_t  :    { TTeeXXcchhaarrss | tteexxcchhaarrss } _s_t_r_i_n_g

       The  TTeeXXcchhaarrss  statement  allows  redefinition  of certain
       TeX/LaTeX control characters.  The string  given  must  be
       exactly  thirteen characters long, and must list substitu-
       tions for the left and right parentheses ("()") , the left
       and right square brackets ("[]"), the left and right curly
       braces ("{}"), the left and right angle  brackets  ("<>"),
       the  backslash  ("\"), the dollar sign ("$"), the asterisk
       ("*"), the period or  dot  ("."),  and  the  percent  sign
       ("%").  For example, the statement:

              tteexxcchhaarrss ()\[]<\><\>\\$*.%

       would  replace  the  functions of the left and right curly
       braces with the left and right angle brackets for purposes
       of  parsing  TeX/LaTeX  constructs,  while retaining their
       functions for the _t_i_b  bibliographic  preprocessor.   Note
       that the backslash, the left square bracket, and the right
       angle bracket must be escaped with a backslash.

              _o_p_t_-_s_t_m_t  :    { _c_m_p_n_d_-_s_t_m_t | _a_f_f_-_s_t_m_t }

              _c_m_p_n_d_-_s_t_m_t     :    ccoommppoouunnddwwoorrddss _c_o_m_p_o_u_n_d_-_o_p_t

              _a_f_f_-_s_t_m_t       :    aallllaaffffiixxeess _o_n_-_o_r_-_o_f_f

              _o_n_-_o_r_-_o_f_f :    { oonn | ooffff }

              _c_o_m_p_o_u_n_d_-_o_p_t : { _o_n_-_o_r_-_o_f_f | ccoonnttrroolllleedd _c_h_a_r_a_c_t_e_r }

       An _o_p_t_-_s_t_m_t controls certain ispell defaults that are best
       made language-specific.  The aallllaaffffiixxeess statement controls
       the default for the --PP  and  --mm  options  to  _i_s_p_e_l_l_.   If
       aallllaaffffiixxeess  is  turned  ooffff  (the  default),  _i_s_p_e_l_l  will
       default to the behavior of the _-_P flag: root/affix sugges-
       tions will only be made if there are no "near misses".  If
       aallllaaffffiixxeess is  turned  oonn,  _i_s_p_e_l_l  will  default  to  the


                              local                             4


ISPELL(4)                                               ISPELL(4)


       behavior  of  the  _-_m  flag:  root/affix  suggestions will
       always be made.  The ccoommppoouunnddwwoorrddss statement controls  the
       default for the --BB and --CC options to _i_s_p_e_l_l_.  If ccoommppoouunndd--
       wwoorrddss is turned ooffff (the default), _i_s_p_e_l_l will default  to
       the  behavior  of  the _-_B flag: run-together words will be
       reported as errors.  If ccoommppoouunnddwwoorrddss is turned oonn, _i_s_p_e_l_l
       will  default to the behavior of the _-_C flag: run-together
       words will be considered as compounds if both are  in  the
       dictionary.   This  is useful for languages such as German
       and Norwegian, which form large numbers of compound words.
       Finally, if ccoommppoouunnddwwoorrddss is set to _c_o_n_t_r_o_l_l_e_d, only words
       marked with the flag indicated by _c_h_a_r_a_c_t_e_r (which  should
       not  be  otherwise used) will be allowed to participate in
       compound formation.   Because  this  option  requires  the
       flags  to be specified in the dictionary, it is not avail-
       able from the command line.

              _f_l_a_g_-_s_t_m_t :    ffllaaggmmaarrkkeerr _c_h_a_r_a_c_t_e_r

       The ffllaaggmmaarrkkeerr statement describes the character which  is
       used  to  separate affix flags from the root word in a raw
       dictionary file.  This must be a character  which  is  not
       found  in  any  word  (including in string characters; see
       below).  The default is "/" because this character is  not
       normally  used to represent special characters in any lan-
       guage.

              _n_u_m_-_s_t_m_t  :    ccoommppoouunnddmmiinn _d_i_g_i_t

       The ccoommppoouunnddmmiinn statement controls the length of  the  two
       components of a compound word.  This only has an effect if
       ccoommppoouunnddwwoorrddss is turned oonn or if the --CC flag is  given  to
       _i_s_p_e_l_l.   In that case, only words at least as long as the
       given minimum will be accepted as  components  of  a  com-
       pound.  The default is 3 characters.

              _c_h_a_r_-_s_e_t_s :    _n_o_r_m_-_s_e_t_s [ _a_l_t_-_s_e_t_s ]

       The  character-set  section  describes the characters that
       can be part of a word, and defines their collating  order.
       There  must  always  be a definition of "normal" character
       sets;  in addition, there may be one or more partial defi-
       nitions  of  "alternate"  sets which are used with various
       text formatters.

              _n_o_r_m_-_s_e_t_s :    [ _d_e_f_t_y_p_e ] charset-group

       A "normal" character set may optionally begin with a defi-
       nition  of  the  file  suffixes that make use of this set.
       Following this are one or more character-set declarations.

              _d_e_f_t_y_p_e : ddeeffssttrriinnggttyyppee _n_a_m_e _d_e_f_o_r_m_a_t_t_e_r _s_u_f_f_i_x*

       The   ddeeffssttrriinnggttyyppee  declaration  gives  a  list  of  file


                              local                             5


ISPELL(4)                                               ISPELL(4)


       suffixes which should make use of the default string char-
       acters  defined  as  part of the base character set; it is
       only necessary if string  characters  are  being  defined.
       The  _n_a_m_e  parameter  is  a  string giving the unique name
       associated with these suffixes; often it  is  a  formatter
       name.   If  the formatter is a member of the troff family,
       "nroff" should be used for the name  associated  with  the
       most  popular  macro  package;  members  of the TeX family
       should use "tex".  Other names may be chosen  freely,  but
       they  should be kept simple, as they are used in _i_s_p_e_l_l _'_s
       --TT switch to specify a formatter  type.   The  _d_e_f_o_r_m_a_t_t_e_r
       parameter  specifies  the  deformatting  style to use when
       processing files with the given suffixes.  Currently, this
       must  be either tteexx or nnrrooffff.  The _s_u_f_f_i_x parameters are a
       whitespace-separated list of strings which, if present  at
       the end of a filename, indicate that the associated set of
       string characters should be used by default for this file.
       For  example,  the  suffix list for the troff family typi-
       cally includes suffixes such as ".ms", ".me", ".mm",  etc.

              _c_h_a_r_s_e_t_-_g_r_o_u_p :     { _c_h_a_r_-_s_t_m_t | _s_t_r_i_n_g_-_s_t_m_t | _d_u_p_-_s_t_m_t}*

       A  _c_h_a_r_-_s_t_m_t  describes  single  characters; a _s_t_r_i_n_g_-_s_t_m_t
       describes  characters  that  must  appear  together  as  a
       string,  and which usually represent a single character in
       the target language.  Either may also describe  conversion
       between  upper  and  lower  case.   A  _d_u_p_-_s_t_m_t is used to
       describe alternate forms of string characters, so  that  a
       single dictionary may be used with several formatting pro-
       grams that use different conventions for representing non-
       ASCII characters.

              _c_h_a_r_-_s_t_m_t :    wwoorrddcchhaarrss _c_h_a_r_a_c_t_e_r_-_r_a_n_g_e
                        |    wwoorrddcchhaarrss _l_o_w_e_r_c_a_s_e_-_r_a_n_g_e _u_p_p_e_r_c_a_s_e_-_r_a_n_g_e
                        |    bboouunnddaarryycchhaarrss _c_h_a_r_a_c_t_e_r_-_r_a_n_g_e
                        |    bboouunnddaarryycchhaarrss _l_o_w_e_r_c_a_s_e_-_r_a_n_g_e _u_p_p_e_r_c_a_s_e_-_r_a_n_g_e
              _s_t_r_i_n_g_-_s_t_m_t    :    ssttrriinnggcchhaarr _s_t_r_i_n_g
                        |    ssttrriinnggcchhaarr _l_o_w_e_r_c_a_s_e_-_s_t_r_i_n_g _u_p_p_e_r_c_a_s_e_-_s_t_r_i_n_g

       Characters  described with the bboouunnddaarryycchhaarrss statement are
       considered part of a word  only  if  they  appear  singly,
       embedded between characters declared with the wwoorrddcchhaarrss or
       ssttrriinnggcchhaarr statements.  For example, if the  hyphen  is  a
       boundary  character  (useful  in French), the string "foo-
       bar" would be a single word, but "-foo" would be the  same
       as  "foo",  and "foo--bar" would be two words separated by
       non-word characters.

       If two ranges or strings  are  given  in  a  _c_h_a_r_-_s_t_m_t  or
       _s_t_r_i_n_g_-_s_t_m_t,  the  first  describes  characters  that  are
       interpreted as lowercase and the second  describes  upper-
       case.   In  the  case  of  a ssttrriinnggcchhaarr statement, the two
       strings must be of the same length.  Also, in a ssttrriinnggcchhaarr
       statement,  the  actual strings may contain both uppercase


                              local                             6


ISPELL(4)                                               ISPELL(4)


       and  characters   themselves   without   difficulty;   for
       instance, the statement

              stringchar     "\\*(sS"  "\\*(Ss"

       is  legal  and  will  not interfere with (or be interfered
       with by) other declarations of of "s" and "S" as lower and
       upper case, respectively.

       A  final note on string characters: some languages collate
       certain special characters as if they were  strings.   For
       example,  the German "a-umlaut" is traditionally sorted as
       if it were "ae".  Ispell is  not  capable  of  this;  each
       character  must be treated as an individual entity.  So in
       certain cases, ispell will sort a list  of  words  into  a
       different  order  than the standard "dictionary" order for
       the target language.

              _a_l_t_-_s_e_t_s  :    _a_l_t_t_y_p_e [ _a_l_t_-_s_t_m_t* ]

       Because different formatters use  different  notations  to
       represent  non-ASCII  characters,  _i_s_p_e_l_l must be aware of
       the representations used by these formatters.   These  are
       declared as alternate sets of string characters.

              _a_l_t_t_y_p_e   :    aallttssttrriinnggttyyppee _n_a_m_e _s_u_f_f_i_x*

       The aallttssttrriinnggttyyppee statement introduces each set by declar-
       ing the associated  formatter  name  and  filename  suffix
       list.   This  name  and list are interpreted exactly as in
       the ddeeffssttrriinnggttyyppee statement above.  Following this  header
       are  one  or  more  _a_l_t_-_s_t_m_ts  which declare the alternate
       string characters used by this formatter.

              _a_l_t_-_s_t_m_t       :    aallttssttrriinnggcchhaarr _a_l_t_-_s_t_r_i_n_g _s_t_d_-_s_t_r_i_n_g

       The _a_l_t_s_t_r_i_n_g_c_h_a_r statement describes alternate  represen-
       tations for string characters.  For example, the -mm macro
       package of _t_r_o_f_f represents the German "a-umlaut" as _a_\_*_:,
       while  _T_e_X  uses  the sequence _\_"_a.  If the _t_r_o_f_f versions
       are declared as the standard  versions  using  ssttrriinnggcchhaarr,
       the  _T_e_X  versions  may be declared as alternates by using
       the statement

              altstringchar  \\\"a     a\\*:

       When the aallttssttrriinnggcchhaarr statement is used to specify alter-
       nate  forms,  all forms for a particular formatter must be
       declared together as a group.   Also,  each  formatter  or
       macro  package  must provide a complete set of characters,
       both upper- and lower-case, and  the  character  sequences
       used  for  each  formatter  must  be  completely distinct.
       Character sequences which describe upper-  and  lower-case
       versions  of the same printable character must also be the


                              local                             7


ISPELL(4)                                               ISPELL(4)


       same length.  It may  be  necessary  to  define  some  new
       macros  for  a  given  formatter to satisfy these restric-
       tions.  (The current version of _b_u_i_l_d_h_a_s_h does not enforce
       these restrictions, but failure to obey them may result in
       errors being introduced into files that are processed with
       _i_s_p_e_l_l.)

       An  important  minor point is that _i_s_p_e_l_l assumes that all
       characters declared as  wwoorrddcchhaarrss  or  bboouunnddaarryycchhaarrss  will
       occupy exactly one position on the terminal screen.

       A single character-set statement can declare either a sin-
       gle character or a  contiguous  range  of  characters.   A
       range is given as in egrep and the shell: [a-z] means low-
       ercase alphabetics; [^a-z] means all but  lowercase,  etc.
       All  character-set  statements  are  combined (unioned) to
       produce the final list of characters that may be part of a
       word.  The collating order of the characters is defined by
       the order of their declaration; if a range  is  used,  the
       characters  are  considered to have been declared in ASCII
       order.  Characters that have case  are  collated  next  to
       each other, with the uppercase character first.

       The character-declaration statements have a rather strange
       behavior caused by its need to match each lowercase  char-
       acter  with  its uppercase equivalent.  In any given wwoorrdd--
       cchhaarrss or bboouunnddaarryycchhaarrss statement, the characters  in  each
       range are first sorted into ASCII collating sequence, then
       matched one-for-one with the other range.  (The two ranges
       must have the same number of characters).  Thus, for exam-
       ple, the two statements:

              wwoorrddcchhaarrss [aeiou] [AEIOU]
              wwoorrddcchhaarrss [aeiou] [UOIEA]

       would produce exactly the same effect.  To get the  vowels
       to match up "wrong", you would have to use separate state-
       ments:

              wwoorrddcchhaarrss a U
              wwoorrddcchhaarrss e O
              wwoorrddcchhaarrss i I
              wwoorrddcchhaarrss o E
              wwoorrddcchhaarrss u A

       which would cause uppercase 'e' to be 'O',  and  lowercase
       'O'  to  be  'e'.   This should normally be a problem only
       with languages which have been forced  to  use  a  strange
       ASCII collating sequence.  If your uppercase and lowercase
       letters both collate in the same order, you shouldn't have
       to worry about this "feature".

       The  prefixes  and suffixes sections have exactly the same
       syntax, except for the introductory keyword.


                              local                             8


ISPELL(4)                                               ISPELL(4)


              _p_r_e_f_i_x_e_s  :    pprreeffiixxeess _f_l_a_g_d_e_f*
              _s_u_f_f_i_x_e_s  :    ssuuffffiixxeess _f_l_a_g_d_e_f*
              _f_l_a_g_d_e_f   :    ffllaagg [**|~~] _c_h_a_r :: _r_e_p_l*

       A prefix or suffix table consists of an introductory  key-
       word and a list of flag definitions.  Flags can be defined
       more than once, in which case  the  definitions  are  com-
       bined.   Each  flag  controls  one or more _r_e_p_ls (replace-
       ments) which are conditionally applied to  the  beginnings
       or endings of various words.

       Flags  are named by a single character _c_h_a_r.  Depending on
       a configuration option, this character can be  either  any
       uppercase  letter (the default configuration) or any 7-bit
       ASCII character.  Most languages should  be  able  to  get
       along with just 26 flags.

       A  flag  character may be prefixed with one or more option
       characters.  (If you wish to use one of the option charac-
       ters  as  a  flag  character,  simply enclose it in double
       quotes.)

       The asterisk (**) option means that this flag  participates
       in _c_r_o_s_s_-_p_r_o_d_u_c_t formation.  This only matters if the file
       contains both prefix and suffix tables.  If so,  all  pre-
       fixes and suffixes marked with an asterisk will be applied
       in all cross-combinations to the root word.  For  example,
       consider  the  root _f_i_x with prefixes _p_r_e and _i_n, and suf-
       fixes _e_s and _e_d.  If all flags controlling these  prefixes
       and  suffixes are marked with an asterisk, then the single
       root _f_i_x would also generate _p_r_e_f_i_x,  _p_r_e_f_i_x_e_s,  _p_r_e_f_i_x_e_d,
       _i_n_f_i_x,  _i_n_f_i_x_e_s,  _i_n_f_i_x_e_d,  _f_i_x, _f_i_x_e_s, and _f_i_x_e_d.  Cross-
       product formation can produce  a  large  number  of  words
       quickly,  some  of which may be illegal, so watch out.  If
       cross-products produce illegal words, _m_u_n_c_h_l_i_s_t  will  not
       produce  those flag combinations, and the flag will not be
       useful.

              _r_e_p_l :    _c_o_n_d_i_t_i_o_n* >> [ -- _s_t_r_i_p_-_s_t_r_i_n_g ,, ] _a_p_p_e_n_d_-_s_t_r_i_n_g

       The ~~ option specifies that the associated  flag  is  only
       active when a compound word is being formed.  This is use-
       ful in a language like German, where the form  of  a  word
       sometimes changes inside a compound.

       A  _r_e_p_l  is  a conditional rule for modifying a root word.
       Up to 8 _c_o_n_d_i_t_i_o_n_s may be specified.   If  the  _c_o_n_d_i_t_i_o_n_s
       are  satisfied,  the  rules  on the right-hand side of the
       _r_e_p_l are applied, as follows:

       (1)    If a strip-string is given, it  is  first  stripped
              from  the  beginning  or ending (as appropriate) of
              the root word.


                              local                             9


ISPELL(4)                                               ISPELL(4)


       (2)    Then the append-string is added at that point.

       For example, the _c_o_n_d_i_t_i_o_n ..  means "any  word",  and  the
       _c_o_n_d_i_t_i_o_n  YY  means "any word ending in Y".  The following
       (suffix) replacements:

              .    >    MENT
              Y    >    -Y,IES

       would change _i_n_d_u_c_e to _i_n_d_u_c_e_m_e_n_t and _f_l_y to  _f_l_i_e_s.   (If
       they  were  controlled  by  the same flag, they would also
       change _f_l_y to _f_l_y_m_e_n_t, which might not be what was wanted.
       _M_u_n_c_h_l_i_s_t  can  be  used  to  protect against this sort of
       problem; see the command sequence given below.)

       No matter how much you might wish it, the strings  on  the
       right  must be strings of specific characters, not ranges.
       The reasons are rooted deeply in the way _i_s_p_e_l_l works, and
       it  would  be  difficult or impossible to provide for more
       flexibility.  For example, you might wish to write:

              [EY] >    -[EY],IES

       This will not work.  Instead, you must  use  two  separate
       rules:

              E    >    -E,IES
              Y    >    -Y,IES

       The  application  of  _r_e_p_ls  can  be restricted to certain
       words with _c_o_n_d_i_t_i_o_n_s:

              _c_o_n_d_i_t_i_o_n :    { .. | _c_h_a_r_a_c_t_e_r | _r_a_n_g_e }

       A _c_o_n_d_i_t_i_o_n  is  a  restriction  on  the  characters  that
       adjoin, and/or are replaced by, the right-hand side of the
       _r_e_p_l.  Up to 8 _c_o_n_d_i_t_i_o_n_s may be given,  which  should  be
       enough  context  for  anyone.  The right-hand side will be
       applied only if the _c_o_n_d_i_t_i_o_n_s in the _r_e_p_l are  satisfied.
       The  _c_o_n_d_i_t_i_o_n_s  also  implicitly  define  a length; roots
       shorter than the number of _c_o_n_d_i_t_i_o_n_s will  not  pass  the
       test.  (As a special case, a _c_o_n_d_i_t_i_o_n of a single dot "."
       defines a length of zero, so that the rule applies to  all
       words  indiscriminately).   This  length is independent of
       the separate test that insists that all flags  produce  an
       output word length of at least four.

       _C_o_n_d_i_t_i_o_n_s  that are single characters should be separated
       by white space.  For example, to specify words  ending  in
       "ED", write:

              E D  >    -ED,ING        # As in covered > covering

       If you write:


                              local                            10


ISPELL(4)                                               ISPELL(4)


              ED   >    -ED,ING

       the effect will be the same as:

              [ED] >    -ED,ING

       As  a  final  minor,  but important point, it is sometimes
       useful to rebuild a dictionary file using an  incompatible
       suffix  file.   For  example, suppose you expanded the "R"
       flag to generate "er" and "ers" (thus making  the  Z  flag
       somewhat  obsolete).   To  build  a new dictionary _n_e_w_d_i_c_t
       that, using _n_e_w_a_f_f_i_x_e_s, will accept exactly the same  list
       of words as the old list _o_l_d_d_i_c_t did using _o_l_d_a_f_f_i_x_e_s, the
       --cc switch of _m_u_n_c_h_l_i_s_t is  useful,  as  in  the  following
       example:

              $ munchlist -c oldaffixes -l newaffixes olddict > newdict

       If you use this procedure, your new dictionary will always
       accept the same list the original did, even if  you  badly
       screwed up the affix file.  This is because _m_u_n_c_h_l_i_s_t com-
       pares the words generated by a flag with the original word
       list,  and  refuses to use any flags that generate illegal
       words.  (But don't forget that the _m_u_n_c_h_l_i_s_t step takes  a
       long time and eats up temporary file space).

EEXXAAMMPPLLEESS
       As  an example of conditional suffixes, here is the speci-
       fication of the SS flag from the English affix file:

              flag *S:
                  [^AEIOU]Y  >    -Y,IES    # As in imply > implies
                  [AEIOU]Y   >    S         # As in convey > conveys
                  [SXZH]     >    ES        # As in fix > fixes
                  [^SXZHY]   >    S         # As in bat > bats

       The first line applies to words ending in Y,  but  not  in
       vowel-Y.  The second takes care of the vowel-Y words.  The
       third then handles those words that end in a  sibilant  or
       near-sibilant, and the last picks up everything else.

       Note  that  the  _c_o_n_d_i_t_i_o_n_s  are written very carefully so
       that they apply to disjoint sets of words.  In particular,
       note  that  the  fourth line excludes words ending in Y as
       well as the obvious SXZH.   Otherwise,  it  would  convert
       "imply" into "implys".

       Although  the  English  affix file does not do so, you can
       also have a flag generate more than  one  variation  on  a
       root  word.   For example, we could extend the English "R"
       flag as follows:

              flag *R:
                 E           >    R         # As in skate > skater


                              local                            11


ISPELL(4)                                               ISPELL(4)


                 E           >    RS        # As in skate > skaters
                 [^AEIOU]Y   >    -Y,IER    # As in multiply > multiplier
                 [^AEIOU]Y   >    -Y,IERS   # As in multiply > multipliers
                 [AEIOU]Y    >    ER        # As in convey > conveyer
                 [AEIOU]Y    >    ERS       # As in convey > conveyers
                 [^EY]       >    ER        # As in build > builder
                 [^EY]       >    ERS       # As in build > builders

       This flag would generate both "skater" and "skaters"  from
       "skate".   This capability can be very useful in languages
       that make use of noun, verb, and adjective  endings.   For
       instance,  one  could  define a single flag that generated
       all of the German "weak" verb endings.

SSEEEE AALLSSOO
       ispell(1)


                              local                            12