home *** CD-ROM | disk | FTP | other *** search
-
-
-
- FLEX(1) FLEX(1)
-
-
- NNAAMMEE
- flex - fast lexical analyzer generator
-
- SSYYNNOOPPSSIISS
- fflleexx [[--bbccddffiinnppssttvvFFIILLTT88 --CC[[eeffmmFF]] --SSsskkeelleettoonn]] _[_f_i_l_e_n_a_m_e _._._._]
-
- DDEESSCCRRIIPPTTIIOONN
- _f_l_e_x is a tool for generating _s_c_a_n_n_e_r_s_: programs which
- recognized lexical patterns in text. _f_l_e_x reads the given
- input files, or its standard input if no file names are
- given, for a description of a scanner to generate. The
- description is in the form of pairs of regular expressions
- and C code, called _r_u_l_e_s_. _f_l_e_x generates as output a C
- source file, lleexx..yyyy..cc,, which defines a routine yyyylleexx(())..
- This file is compiled and linked with the --llffll library to
- produce an executable. When the executable is run, it
- analyzes its input for occurrences of the regular expres-
- sions. Whenever it finds one, it executes the correspond-
- ing C code.
-
- For full documentation, see fflleexxddoocc((11)).. This manual entry
- is intended for use as a quick reference.
-
- OOPPTTIIOONNSS
- _f_l_e_x has the following options:
-
- --bb Generate backtracking information to _l_e_x_._b_a_c_k_t_r_a_c_k_.
- This is a list of scanner states which require
- backtracking and the input characters on which they
- do so. By adding rules one can remove backtracking
- states. If all backtracking states are eliminated
- and --ff or --FF is used, the generated scanner will
- run faster.
-
- --cc is a do-nothing, deprecated option included for
- POSIX compliance.
-
- NNOOTTEE:: in previous releases of _f_l_e_x --cc specified
- table-compression options. This functionality is
- now given by the --CC flag. To ease the the impact
- of this change, when _f_l_e_x encounters --cc,, it cur-
- rently issues a warning message and assumes that --CC
- was desired instead. In the future this "promo-
- tion" of --cc to --CC will go away in the name of full
- POSIX compliance (unless the POSIX meaning is
- removed first).
-
- --dd makes the generated scanner run in _d_e_b_u_g mode.
- Whenever a pattern is recognized and the global
- yyyy__fflleexx__ddeebbuugg is non-zero (which is the default),
- the scanner will write to _s_t_d_e_r_r a line of the
- form:
-
- --accepting rule at line 53 ("the matched text")
-
-
-
- Version 2.3 26 May 1990 1
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- The line number refers to the location of the rule
- in the file defining the scanner (i.e., the file
- that was fed to flex). Messages are also generated
- when the scanner backtracks, accepts the default
- rule, reaches the end of its input buffer (or
- encounters a NUL; the two look the same as far as
- the scanner's concerned), or reaches an end-of-
- file.
-
- --ff specifies (take your pick) _f_u_l_l _t_a_b_l_e or _f_a_s_t _s_c_a_n_-
- _n_e_r_. No table compression is done. The result is
- large but fast. This option is equivalent to --CCff
- (see below).
-
- --ii instructs _f_l_e_x to generate a _c_a_s_e_-_i_n_s_e_n_s_i_t_i_v_e scan-
- ner. The case of letters given in the _f_l_e_x input
- patterns will be ignored, and tokens in the input
- will be matched regardless of case. The matched
- text given in _y_y_t_e_x_t will have the preserved case
- (i.e., it will not be folded).
-
- --nn is another do-nothing, deprecated option included
- only for POSIX compliance.
-
- --pp generates a performance report to stderr. The
- report consists of comments regarding features of
- the _f_l_e_x input file which will cause a loss of per-
- formance in the resulting scanner.
-
- --ss causes the _d_e_f_a_u_l_t _r_u_l_e (that unmatched scanner
- input is echoed to _s_t_d_o_u_t_) to be suppressed. If
- the scanner encounters input that does not match
- any of its rules, it aborts with an error.
-
- --tt instructs _f_l_e_x to write the scanner it generates to
- standard output instead of lleexx..yyyy..cc..
-
- --vv specifies that _f_l_e_x should write to _s_t_d_e_r_r a sum-
- mary of statistics regarding the scanner it gener-
- ates.
-
- --FF specifies that the _f_a_s_t scanner table representa-
- tion should be used. This representation is about
- as fast as the full table representation _(_-_f_)_, and
- for some sets of patterns will be considerably
- smaller (and for others, larger). See fflleexxddoocc((11))
- for details.
-
- This option is equivalent to --CCFF (see below).
-
- --II instructs _f_l_e_x to generate an _i_n_t_e_r_a_c_t_i_v_e scanner,
- that is, a scanner which stops immediately rather
- than looking ahead if it knows that the currently
- scanned text cannot be part of a longer rule's
-
-
-
- Version 2.3 26 May 1990 2
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- match. Again, see fflleexxddoocc((11)) for details.
-
- Note, --II cannot be used in conjunction with _f_u_l_l or
- _f_a_s_t _t_a_b_l_e_s_, i.e., the --ff,, --FF,, --CCff,, or --CCFF flags.
-
- --LL instructs _f_l_e_x not to generate ##lliinnee directives in
- lleexx..yyyy..cc.. The default is to generate such direc-
- tives so error messages in the actions will be cor-
- rectly located with respect to the original _f_l_e_x
- input file, and not to the fairly meaningless line
- numbers of lleexx..yyyy..cc..
-
- --TT makes _f_l_e_x run in _t_r_a_c_e mode. It will generate a
- lot of messages to _s_t_d_o_u_t concerning the form of
- the input and the resultant non-deterministic and
- deterministic finite automata. This option is
- mostly for use in maintaining _f_l_e_x_.
-
- --88 instructs _f_l_e_x to generate an 8-bit scanner. On
- some sites, this is the default. On others, the
- default is 7-bit characters. To see which is the
- case, check the verbose ((--vv)) output for "equiva-
- lence classes created". If the denominator of the
- number shown is 128, then by default _f_l_e_x is gener-
- ating 7-bit characters. If it is 256, then the
- default is 8-bit characters.
-
- --CC[[eeffmmFF]]
- controls the degree of table compression.
-
- --CCee directs _f_l_e_x to construct _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s_,
- i.e., sets of characters which have identical lexi-
- cal properties. Equivalence classes usually give
- dramatic reductions in the final table/object file
- sizes (typically a factor of 2-5) and are pretty
- cheap performance-wise (one array look-up per char-
- acter scanned).
-
- --CCff specifies that the _f_u_l_l scanner tables should
- be generated - _f_l_e_x should not compress the tables
- by taking advantages of similar transition func-
- tions for different states.
-
- --CCFF specifies that the alternate fast scanner rep-
- resentation (described in fflleexxddoocc((11)))) should be
- used.
-
- --CCmm directs _f_l_e_x to construct _m_e_t_a_-_e_q_u_i_v_a_l_e_n_c_e
- _c_l_a_s_s_e_s_, which are sets of equivalence classes (or
- characters, if equivalence classes are not being
- used) that are commonly used together. Meta-
- equivalence classes are often a big win when using
- compressed tables, but they have a moderate perfor-
- mance impact (one or two "if" tests and one array
-
-
-
- Version 2.3 26 May 1990 3
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- look-up per character scanned).
-
- A lone --CC specifies that the scanner tables should
- be compressed but neither equivalence classes nor
- meta-equivalence classes should be used.
-
- The options --CCff or --CCFF and --CCmm do not make sense
- together - there is no opportunity for meta-
- equivalence classes if the table is not being com-
- pressed. Otherwise the options may be freely
- mixed.
-
- The default setting is --CCeemm,, which specifies that
- _f_l_e_x should generate equivalence classes and meta-
- equivalence classes. This setting provides the
- highest degree of table compression. You can trade
- off faster-executing scanners at the cost of larger
- tables with the following generally being true:
-
- slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C{f,F}e
- -C{f,F}
- fastest & largest
-
-
- --CC options are not cumulative; whenever the flag is
- encountered, the previous -C settings are forgot-
- ten.
-
- --SSsskkeelleettoonn__ffiillee
- overrides the default skeleton file from which _f_l_e_x
- constructs its scanners. You'll never need this
- option unless you are doing _f_l_e_x maintenance or
- development.
-
- SSUUMMMMAARRYY OOFF FFLLEEXX RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS
- The patterns in the input are written using an extended
- set of regular expressions. These are:
-
- x match the character 'x'
- . any character except newline
- [xyz] a "character class"; in this case, the pattern
- matches either an 'x', a 'y', or a 'z'
- [abj-oZ] a "character class" with a range in it; matches
- an 'a', a 'b', any letter from 'j' through 'o',
- or a 'Z'
- [^A-Z] a "negated character class", i.e., any character
- but those in the class. In this case, any
- character EXCEPT an uppercase letter.
- [^A-Z\n] any character EXCEPT an uppercase letter or
-
-
-
- Version 2.3 26 May 1990 4
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- a newline
- r* zero or more r's, where r is any regular expression
- r+ one or more r's
- r? zero or one r's (that is, "an optional r")
- r{2,5} anywhere from two to five r's
- r{2,} two or more r's
- r{4} exactly 4 r's
- {name} the expansion of the "name" definition
- (see above)
- "[xyz]\"foo"
- the literal string: [xyz]"foo
- \X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
- then the ANSI-C interpretation of \x.
- Otherwise, a literal 'X' (used to escape
- operators such as '*')
- \123 the character with octal value 123
- \x2a the character with hexadecimal value 2a
- (r) match an r; parentheses are used to override
- precedence (see below)
-
-
- rs the regular expression r followed by the
- regular expression s; called "concatenation"
-
-
- r|s either an r or an s
-
-
- r/s an r but only if it is followed by an s. The
- s is not part of the matched text. This type
- of pattern is called as "trailing context".
- ^r an r, but only at the beginning of a line
- r$ an r, but only at the end of a line. Equivalent
- to "r/\n".
-
-
- <s>r an r, but only in start condition s (see
- below for discussion of start conditions)
- <s1,s2,s3>r
- same, but in any of start conditions s1,
- s2, or s3
-
-
- <<EOF>> an end-of-file
- <s1,s2><<EOF>>
- an end-of-file when in start condition s1 or s2
-
- The regular expressions listed above are grouped according
- to precedence, from highest precedence at the top to low-
- est at the bottom. Those grouped together have equal
- precedence.
-
- Some notes on patterns:
-
-
-
-
- Version 2.3 26 May 1990 5
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- - Negated character classes _m_a_t_c_h _n_e_w_l_i_n_e_s unless
- "\n" (or an equivalent escape sequence) is one of
- the characters explicitly present in the negated
- character class (e.g., "[^A-Z\n]").
-
- - A rule can have at most one instance of trailing
- context (the '/' operator or the '$' operator).
- The start condition, '^', and "<<EOF>>" patterns
- can only occur at the beginning of a pattern, and,
- as well as with '/' and '$', cannot be grouped
- inside parentheses. The following are all illegal:
-
- foo/bar$
- foo|(bar$)
- foo|^bar
- <sc1>foo<sc2>bar
-
-
- SSUUMMMMAARRYY OOFF SSPPEECCIIAALL AACCTTIIOONNSS
- In addition to arbitrary C code, the following can appear
- in actions:
-
- - EECCHHOO copies yytext to the scanner's output.
-
- - BBEEGGIINN followed by the name of a start condition
- places the scanner in the corresponding start con-
- dition.
-
- - RREEJJEECCTT directs the scanner to proceed on to the
- "second best" rule which matched the input (or a
- prefix of the input). yyyytteexxtt and yyyylleenngg are set up
- appropriately. Note that RREEJJEECCTT is a particularly
- expensive feature in terms scanner performance; if
- it is used in _a_n_y of the scanner's actions it will
- slow down _a_l_l of the scanner's matching. Further-
- more, RREEJJEECCTT cannot be used with the _-_f or _-_F
- options.
-
- Note also that unlike the other special actions,
- RREEJJEECCTT is a _b_r_a_n_c_h_; code immediately following it
- in the action will _n_o_t be executed.
-
- - yyyymmoorree(()) tells the scanner that the next time it
- matches a rule, the corresponding token should be
- _a_p_p_e_n_d_e_d onto the current value of yyyytteexxtt rather
- than replacing it.
-
- - yyyylleessss((nn)) returns all but the first _n characters of
- the current token back to the input stream, where
- they will be rescanned when the scanner looks for
- the next match. yyyytteexxtt and yyyylleenngg are adjusted
- appropriately (e.g., yyyylleenngg will now be equal to _n
- ).
-
-
-
-
- Version 2.3 26 May 1990 6
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- - uunnppuutt((cc)) puts the character _c back onto the input
- stream. It will be the next character scanned.
-
- - iinnppuutt(()) reads the next character from the input
- stream (this routine is called yyyyiinnppuutt(()) if the
- scanner is compiled using CC++++))..
-
- - yyyytteerrmmiinnaattee(()) can be used in lieu of a return
- statement in an action. It terminates the scanner
- and returns a 0 to the scanner's caller, indicating
- "all done".
-
- By default, yyyytteerrmmiinnaattee(()) is also called when an
- end-of-file is encountered. It is a macro and may
- be redefined.
-
- - YYYY__NNEEWW__FFIILLEE is an action available only in <<EOF>>
- rules. It means "Okay, I've set up a new input
- file, continue scanning".
-
- - yyyy__ccrreeaattee__bbuuffffeerr(( ffiillee,, ssiizzee )) takes a _F_I_L_E pointer
- and an integer _s_i_z_e_. It returns a YY_BUFFER_STATE
- handle to a new input buffer large enough to acco-
- modate _s_i_z_e characters and associated with the
- given file. When in doubt, use YYYY__BBUUFF__SSIIZZEE for the
- size.
-
- - yyyy__sswwiittcchh__ttoo__bbuuffffeerr(( nneeww__bbuuffffeerr )) switches the
- scanner's processing to scan for tokens from the
- given buffer, which must be a YY_BUFFER_STATE.
-
- - yyyy__ddeelleettee__bbuuffffeerr(( bbuuffffeerr )) deletes the given
- buffer.
-
- VVAALLUUEESS AAVVAAIILLAABBLLEE TTOO TTHHEE UUSSEERR
- - cchhaarr **yyyytteexxtt holds the text of the current token.
- It may not be modified.
-
- - iinntt yyyylleenngg holds the length of the current token.
- It may not be modified.
-
- - FFIILLEE **yyyyiinn is the file which by default _f_l_e_x reads
- from. It may be redefined but doing so only makes
- sense before scanning begins. Changing it in the
- middle of scanning will have unexpected results
- since _f_l_e_x buffers its input. Once scanning termi-
- nates because an end-of-file has been seen, vvooiidd
- yyyyrreessttaarrtt(( FFIILLEE **nneeww__ffiillee )) may be called to point
- _y_y_i_n at the new input file.
-
- - FFIILLEE **yyyyoouutt is the file to which EECCHHOO actions are
- done. It can be reassigned by the user.
-
- - YYYY__CCUURRRREENNTT__BBUUFFFFEERR returns a YYYY__BBUUFFFFEERR__SSTTAATTEE handle
-
-
-
- Version 2.3 26 May 1990 7
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- to the current buffer.
-
- MMAACCRROOSS TTHHEE UUSSEERR CCAANN RREEDDEEFFIINNEE
- - YYYY__DDEECCLL controls how the scanning routine is
- declared. By default, it is "int yylex()", or, if
- prototypes are being used, "int yylex(void)". This
- definition may be changed by redefining the
- "YY_DECL" macro. Note that if you give arguments
- to the scanning routine using a K&R-style/non-
- prototyped function declaration, you must terminate
- the definition with a semi-colon (;).
-
- - The nature of how the scanner gets its input can be
- controlled by redefining the YYYY__IINNPPUUTT macro.
- YY_INPUT's calling sequence is
- "YY_INPUT(buf,result,max_size)". Its action is to
- place up to _m_a_x___s_i_z_e characters in the character
- array _b_u_f and return in the integer variable _r_e_s_u_l_t
- either the number of characters read or the con-
- stant YY_NULL (0 on Unix systems) to indicate EOF.
- The default YY_INPUT reads from the global file-
- pointer "yyin". A sample redefinition of YY_INPUT
- (in the definitions section of the input file):
-
- %{
- #undef YY_INPUT
- #define YY_INPUT(buf,result,max_size) \
- { \
- int c = getchar(); \
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
- }
- %}
-
-
- - When the scanner receives an end-of-file indication
- from YY_INPUT, it then checks the yyyywwrraapp(()) func-
- tion. If yyyywwrraapp(()) returns false (zero), then it is
- assumed that the function has gone ahead and set up
- _y_y_i_n to point to another input file, and scanning
- continues. If it returns true (non-zero), then the
- scanner terminates, returning 0 to its caller.
-
- The default yyyywwrraapp(()) always returns 1. Presently,
- to redefine it you must first "#undef yywrap", as
- it is currently implemented as a macro. It is
- likely that yyyywwrraapp(()) will soon be defined to be a
- function rather than a macro.
-
- - YY_USER_ACTION can be redefined to provide an
- action which is always executed prior to the
- matched rule's action.
-
- - The macro YYYY__UUSSEERR__IINNIITT may be redefined to provide
- an action which is always executed before the first
-
-
-
- Version 2.3 26 May 1990 8
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- scan.
-
- - In the generated scanner, the actions are all gath-
- ered in one large switch statement and separated
- using YYYY__BBRREEAAKK,, which may be redefined. By
- default, it is simply a "break", to separate each
- rule's action from the following rule's.
-
- FFIILLEESS
- _f_l_e_x_._s_k_e_l
- skeleton scanner.
-
- _l_e_x_._y_y_._c
- generated scanner (called _l_e_x_y_y_._c on some systems).
-
- _l_e_x_._b_a_c_k_t_r_a_c_k
- backtracking information for --bb flag (called
- _l_e_x_._b_c_k on some systems).
-
- --llffll library with which to link the scanners.
-
- SSEEEE AALLSSOO
- flexdoc(1), lex(1), yacc(1), sed(1), awk(1).
-
- M. E. Lesk and E. Schmidt, _L_E_X _- _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r _G_e_n_e_r_a_-
- _t_o_r
-
- DDIIAAGGNNOOSSTTIICCSS
- _r_e_j_e_c_t___u_s_e_d___b_u_t___n_o_t___d_e_t_e_c_t_e_d _u_n_d_e_f_i_n_e_d or
-
- _y_y_m_o_r_e___u_s_e_d___b_u_t___n_o_t___d_e_t_e_c_t_e_d _u_n_d_e_f_i_n_e_d _- These errors can
- occur at compile time. They indicate that the scanner
- uses RREEJJEECCTT or yyyymmoorree(()) but that _f_l_e_x failed to notice the
- fact, meaning that _f_l_e_x scanned the first two sections
- looking for occurrences of these actions and failed to
- find any, but somehow you snuck some in (via a #include
- file, for example). Make an explicit reference to the
- action in your _f_l_e_x input file. (Note that previously
- _f_l_e_x supported a %%uusseedd//%%uunnuusseedd mechanism for dealing with
- this problem; this feature is still supported but now dep-
- recated, and will go away soon unless the author hears
- from people who can argue compellingly that they need it.)
-
- _f_l_e_x _s_c_a_n_n_e_r _j_a_m_m_e_d _- a scanner compiled with --ss has
- encountered an input string which wasn't matched by any of
- its rules.
-
- _f_l_e_x _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_e_d _- a scanner rule matched a
- string long enough to overflow the scanner's internal
- input buffer (16K bytes - controlled by YYYY__BBUUFF__MMAAXX in
- "flex.skel").
-
- _s_c_a_n_n_e_r _r_e_q_u_i_r_e_s _-_8 _f_l_a_g _- Your scanner specification
- includes recognizing 8-bit characters and you did not
-
-
-
- Version 2.3 26 May 1990 9
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- specify the -8 flag (and your site has not installed flex
- with -8 as the default).
-
- _f_a_t_a_l _f_l_e_x _s_c_a_n_n_e_r _i_n_t_e_r_n_a_l _e_r_r_o_r_-_-_e_n_d _o_f _b_u_f_f_e_r _m_i_s_s_e_d _-
- This can occur in an scanner which is reentered after a
- long-jump has jumped out (or over) the scanner's activa-
- tion frame. Before reentering the scanner, use:
-
- yyrestart( yyin );
-
-
- _t_o_o _m_a_n_y _%_t _c_l_a_s_s_e_s_! _- You managed to put every single
- character into its own %t class. _f_l_e_x requires that at
- least one of the classes share characters.
-
- AAUUTTHHOORR
- Vern Paxson, with the help of many ideas and much inspira-
- tion from Van Jacobson. Original version by Jef
- Poskanzer.
-
- See flexdoc(1) for additional credits and the address to
- send comments to.
-
- DDEEFFIICCIIEENNCCIIEESS // BBUUGGSS
- Some trailing context patterns cannot be properly matched
- and generate warning messages ("Dangerous trailing con-
- text"). These are patterns where the ending of the first
- part of the rule matches the beginning of the second part,
- such as "zx*/xy*", where the 'x*' matches the 'x' at the
- beginning of the trailing context. (Note that the POSIX
- draft states that the text matched by such patterns is
- undefined.)
-
- For some trailing context rules, parts which are actually
- fixed-length are not recognized as such, leading to the
- abovementioned performance loss. In particular, parts
- using '|' or {n} (such as "foo{3}") are always considered
- variable-length.
-
- Combining trailing context with the special '|' action can
- result in _f_i_x_e_d trailing context being turned into the
- more expensive _v_a_r_i_a_b_l_e trailing context. For example,
- this happens in the following example:
-
- %%
- abc |
- xyz/def
-
-
- Use of unput() invalidates yytext and yyleng.
-
- Use of unput() to push back more text than was matched can
- result in the pushed-back text matching a beginning-of-
- line ('^') rule even though it didn't come at the
-
-
-
- Version 2.3 26 May 1990 10
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- beginning of the line (though this is rare!).
-
- Pattern-matching of NUL's is substantially slower than
- matching other characters.
-
- _f_l_e_x does not generate correct #line directives for code
- internal to the scanner; thus, bugs in _f_l_e_x_._s_k_e_l yield
- bogus line numbers.
-
- Due to both buffering of input and read-ahead, you cannot
- intermix calls to <stdio.h> routines, such as, for exam-
- ple, ggeettcchhaarr(()),, with _f_l_e_x rules and expect it to work.
- Call iinnppuutt(()) instead.
-
- The total table entries listed by the --vv flag excludes the
- number of table entries needed to determine what rule has
- been matched. The number of entries is equal to the num-
- ber of DFA states if the scanner does not use RREEJJEECCTT,, and
- somewhat greater than the number of states if it does.
-
- RREEJJEECCTT cannot be used with the _-_f or _-_F options.
-
- Some of the macros, such as yyyywwrraapp(()),, may in the future
- become functions which live in the --llffll library. This
- will doubtless break a lot of code, but may be required
- for POSIX-compliance.
-
- The _f_l_e_x internal algorithms need documentation.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Version 2.3 26 May 1990 11
-
-
-