FLEX
Section: User Commands (1)
Updated: 13 May 1987
Index
Return to Main Contents
NAME
flex - fast lexical analyzer generator
SYNOPSIS
flex
[
-dfirstvFILT -c[efmF] -Sskeleton_file
] [
filename
]
DESCRIPTION
flex
is a rewrite of
lex
intended to right some of that tool's deficiencies: in particular,
flex
generates lexical analyzers much faster, and the analyzers use
smaller tables and run faster.
OPTIONS
In addition to lex's
-t
flag, flex has the following options:
- -d
-
makes the generated scanner run in
debug
mode. Whenever a pattern is recognized the scanner will
write to
stderr
a line of the form:
--accepting rule #n
Rules are numbered sequentially with the first one being 1.
- -f
-
has the same effect as lex's -f flag (do not compress the scanner
tables); the mnemonic changes from
fast compilation
to (take your pick)
full table
or
fast scanner.
The actual compilation takes
longer,
since flex is I/O bound writing out the big table.
-
This option is equivalent to
-cf
(see below).
- -i
-
instructs flex to generate a
case-insensitive
scanner. The case of letters given in the flex input patterns will
be ignored, and the rules will be matched regardless of case. The
matched text given in
yytext
will have the preserved case (i.e., it will not be folded).
- -r
-
specifies that the scanner uses the
REJECT
action.
- -s
-
causes the
default rule
(that unmatched scanner input is echoed to
stdout)
to be suppressed. If the scanner encounters input that does not
match any of its rules, it aborts with an error. This option is
useful for finding holes in a scanner's rule set.
- -v
-
has the same meaning as for lex (print to
stderr
a summary of statistics of the generated scanner). Many more statistics
are printed, though, and the summary spans several lines. Most
of the statistics are meaningless to the casual flex user.
- -F
-
specifies that the
fast
scanner table representation should be used. This representation is
about as fast as the full table representation
(-f),
and for some sets of patterns will be considerably smaller (and for
others, larger). In general, if the pattern set contains both "keywords"
and a catch-all, "identifier" rule, such as in the set:
"case" return ( TOK_CASE );
"switch" return ( TOK_SWITCH );
...
"default" return ( TOK_DEFAULT );
[a-z]+ return ( TOK_ID );
then you're better off using the full table representation. If only
the "identifier" rule is present and you then use a hash table or some such
to detect the keywords, you're better off using
-F.
-
This option is equivalent to
-cF
(see below).
- -I
-
instructs flex to generate an
interactive
scanner. Normally, scanners generated by flex always look ahead one character
before deciding that a rule has been matched. At the possible cost of some
scanning overhead (it's not clear that more overhead is involved), flex will
generate a scanner which only looks ahead when needed. Such scanners are
called
interactive
because if you want to write a scanner for an interactive system such
as a command shell, you will probably want the user's input to be terminated
with a newline, and without
-I
the user will have to type a character in addition to the newline in order
to have the newline recognized. This leads to dreadful interactive performance.
-
If all this seems to confusing, here's the general rule: if a human will
be typing in input to your scanner, use
-I,
otherwise don't; if you don't care about how fast your scanners run and
don't want to make any assumptions about the input to your scanner,
always use
-I.
-
Note,
-I
cannot be used in conjunction with
full
or
fast tables,
i.e., the
-f, -F, -cf,
or
-cF
flags.
- -L
-
instructs flex to not generate
#line
directives (see below).
- -T
-
makes flex run in
trace
mode. It will generate a lot of messages to standard out concerning
the form of the input and the resultant non-deterministic and deterministic
finite automatons. This option is mostly for use in maintaining flex.
- -c[efmF]
-
controls the degree of table compression.
-ce
directs flex to construct
equivalence classes,
i.e., sets of characters
which have identical lexical properties (for example, if the only
appearance of digits in the flex input is in the character class
"[0-9]" then the digits '0', '1', ..., '9' will all be put
in the same equivalence class).
-cf
specifies that the
full
scanner tables should be generated - flex should not compress the
tables by taking advantages of similar transition functions for
different states.
-cF
specifies that the alternate fast scanner representation (described
above under the
-F
flag)
should be used.
-cm
directs flex to construct
meta-equivalence classes,
which are sets of equivalence classes (or characters, if equivalence
classes are not being used) that are commonly used together.
A lone
-c
specifies that the scanner tables should be compressed but neither
equivalence classes nor meta-equivalence classes should be used.
-
The options
-cf
or
-cF
and
-cm
do not make sense together - there is no opportunity for meta-equivalence
classes if the table is not being compressed. Otherwise the options
may be freely mixed.
-
The default setting is
-cem
which specifies that flex should generate equivalence classes
and meta-equivalence classes. This setting provides the highest
degree of table compression. You can trade off
faster-executing scanners at the cost of larger tables with
the following generally being true:
slowest smallest
-cem
-ce
-cm
-c
-c{f,F}e
-c{f,F}
fastest largest
- -Sskeleton_file
-
overrides the default skeleton file from which flex constructs
its scanners. You'll never need this option unless you are doing
flex maintenance or development.
INCOMPATIBILITIES WITH LEX
flex
is fully compatible with
lex
with the following exceptions:
- -
-
There is no run-time library to link with. You needn't
specify
-ll
when linking, and you must supply a main program. (Hacker's note: since
the lex library contains a main() which simply calls yylex(), you actually
can
be lazy and not supply your own main program and link with
-ll.)
- -
-
lex's
%r
(Ratfor scanners) and
%t
(translation table) options
are not supported.
- -
-
The do-nothing
-n
flag is not supported.
- -
-
When definitions are expanded, flex encloses them in parentheses.
With lex, the following
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? printf( "Found it\n" );
%%
will not match the string "foo" because when the macro
is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
and the precedence is such that the '?' is associated with
"[A-Z0-9]*". With flex, the rule will be expanded to
"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.
- -
-
yymore()
is not supported.
- -
-
The undocumented lex-scanner internal variable
yylineno
is not supported.
- -
-
If your input uses
REJECT,
you must run flex with the
-r
flag. If you leave out the flag, the scanner will abort at run-time
with a message that the scanner was compiled without the flag being
specified.
- -
-
The
input()
routine is not redefinable, though may be called to read characters
following whatever has been matched by a rule. If
input()
encounters and end-of-file the normal
yywrap()
processing is done. A ``real'' end-of-file is returned as
EOF.
-
Input can be controlled by redefining the
YY_INPUT
macro.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
action is to place up to max_size characters in the character buffer "buf"
and return in the integer variable "result" either the
number of characters read or the constant YY_NULL (0 on Unix systems)
systems) to indicate EOF. The default YY_INPUT reads from the
file-pointer "yyin" (which is by default
stdin),
so if you
just want to change the input file, you needn't redefine
YY_INPUT - just point yyin at the input file.
-
A sample redefinition of YY_INPUT (in the first section of the input
file):
%{
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
%}
You also can add in things like counting keeping track of the
input line number this way; but don't expect your scanner to
go very fast.
- -
-
output()
is not supported.
Output from the ECHO macro is done to the file-pointer
"yyout" (default
stdout).
- -
-
Trailing context is restricted to patterns which have either
a fixed-sized leading part or a fixed-sized trailing part.
For example, "a*/b" and "a/b*" are okay, but not "a*/b*".
This restriction is due to a bug in the trailing context
algorithm given in
Principles of Compiler Design
(and
Compilers - Principles, Techniques, and Tools)
which can result in mismatches. Try the following lex program
%%
x+/xy printf( "I found \"%s\"\n", yytext );
on the input "xxy". (If anyone knows of a fast algorithm for
finding the beginning of trailing context for an arbitrary
pair of regular expressions, please let me know!)
If you must have arbitrary trailing context, you can use
yyless()
to effect it.
- -
-
flex reads only one input file, while lex's input is made
up of the concatenation of its input files.
ENHANCEMENTS
- -
-
Exclusive start-conditions
can be declared by using
%x
instead of
%s.
These start-conditions have the property that when they are active,
no other rules are active.
Thus a set of rules governed by the same exclusive start condition
describe a scanner which is independent of any of the other rules in
the flex input. This feature makes it easy to specify "mini-scanners"
which scan portions of the input that are syntactically different
from the rest (e.g., comments).
- -
-
flex dynamically resizes its internal tables, so directives like "%a 3000"
are not needed when specifying large scanners.
- -
-
The scanning routine generated by flex is declared using the macro
YY_DECL.
By redefining this macro you can change the routine's name and
its calling sequence. For example, you could use:
#undef YY_DECL
#define YY_DECL float lexscan( a, b ) float a, b;
to give it the name
lexscan,
returning a float, and taking two floats as arguments.
- -
-
flex generates
#line
directives mapping lines in the output to
their origin in the input file.
- -
-
You can put multiple actions on the same line, separated with
semi-colons. With lex, the following
foo handle_foo(); return 1;
is truncated to
foo handle_foo();
flex does not truncate the action. Actions that are not enclosed in
braces are terminated at the end of the line.
- -
-
Actions can be begun with
%{
and terminated with
%}.
In this case, flex does not count braces to figure out where the
action ends - actions are terminated by the closing
%}.
This feature is useful when the enclosed action has extraneous
braces in it (usually in comments or inside inactive #ifdef's)
that throw off the brace-count.
- -
-
All of the scanner actions (e.g.,
ECHO, yywrap ...)
except the
unput()
and
input()
routines,
are written as macros, so they can be redefined if necessary
without requiring a separate library to link to.
FILES
- flex.skel
-
skeleton scanner
- flex.fastskel
-
skeleton scanner for -f and -F
- flexskelcom.h
-
common definitions for skeleton files
- flexskeldef.h
-
definitions for compressed skeleton file
- fastskeldef.h
-
definitions for -f, -F skeleton file
SEE ALSO
lex(1)
M. E. Lesk and E. Schmidt,
LEX - Lexical Analyzer Generator
AUTHOR
Vern Paxson, with the help of many ideas and much inspiration from
Van Jacobson. Original version by Jef Poskanzer. Fast table
representation is a partial implementation of a design done by Van
Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
Thanks to the many flex beta-testers, especially Casey Leedom,
Nick Christopher, Chris Faylor, Eric Goldman, Craig Leres, Mohamed el Lozy,
Esmond Pitt, Jef Poskanzer, and Dave Tallman. Thanks to John Gilmore,
Bob Mulcahy,
Rich Salz, and Richard Stallman for help with various distribution headaches.
Send comments to:
Vern Paxson
Real Time Systems
Bldg. 46A
Lawrence Berkeley Laboratory
1 Cyclotron Rd.
Berkeley, CA 94720
(415) 486-6411
vern@lbl-{csam,rtsg}.arpa
ucbvax!lbl-csam.arpa!vern
DIAGNOSTICS
flex scanner jammed -
a scanner compiled with
-s
has encountered an input string which wasn't matched by
any of its rules.
flex input buffer overflowed -
a scanner rule matched a string long enough to overflow the
scanner's internal input buffer (as large as
BUFSIZ
in "/usr/include/stdio.h"). You can edit
flexskelcom.h
and increase
YY_BUF_SIZE
and
YY_MAX_LINE
to increase this limit.
REJECT used and scanner was
not generated using -r -
just like it sounds. Your scanner uses
REJECT.
You must run flex on the scanner description using the
-r
flag.
old-style lex command ignored -
the flex input contains a lex command (e.g., "%n 1000") which
is being ignored.
BUGS
Use of unput() or input() trashes the current yytext and yyleng.
Use of unput() to push back more text than was matched can
result in the pushed-back text matching a beginning-of-line ('^')
rule even though it didn't come at the beginning of the line.
Nulls are not allowed in flex inputs or in the inputs to
scanners generated by flex. Their presence generates fatal
errors.
Do not mix trailing context with the '|' operator used to
specify that multiple rules use the same action. That is,
avoid constructs like:
foo/bar |
bletch |
bugprone { ... }
They can result in subtle mismatches. This is actually not
a problem if there is only one rule
using trailing context and it is the first in the list (so the
above example will actually work okay). The
problem is due to fall-through in the action switch statement,
causing non-trailing-context rules to execute the
trailing-context code of their fellow rules. This should
be fixed, as it's a nasty bug and not obvious. The proper fix is
for flex to spit out a FLEX_TRAILING_CONTEXT_USED #define and then
have the backup logic in a separate table which is consulted for
each rule-match, rather than as part of the rule action. The
place to do the tweaking is in add_accept() - any kind soul want
to be a hero?
The pattern:
x{3}
is considered to be variable-length for the purposes of trailing
context, even though it has a clear fixed length.
Due to both buffering of input and read-ahead, you cannot intermix
calls to, for example,
getchar()
with flex rules and expect it to work. Call
input()
instead.
The total table entries listed by the
-v
flag excludes the number of table entries needed to determine
what rule has been matched. The number of entries is equal
to the number of DFA states if the scanner was not compiled
with
-r,
and greater than the number of states if it was.
The scanner run-time speeds have not been optimized as much
as they deserve. Van Jacobson's work shows that the can go quite
a bit faster still.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- INCOMPATIBILITIES WITH LEX
-
- ENHANCEMENTS
-
- FILES
-
- SEE ALSO
-
- AUTHOR
-
- DIAGNOSTICS
-
- BUGS
-
This document was created by
man2html,
using the manual pages.
Time: 21:30:44 GMT, January 09, 2023