home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1992 September
/
Simtel20_Sept92.cdr
/
msdos
/
pascal
/
qparser.arc
/
READ.ME
< prev
next >
Wrap
Text File
|
1985-09-30
|
53KB
|
1,454 lines
QPARSER Demonstration System
Version 2.3, September 3, 1985
Copyright (C) 1985, QCAD Systems, Inc.
Introduction
QPARSER provides a strategy and a set of software tools for compiler and
translator development that we and other professionals have found to be
highly productive. The strategy is one of writing semantic actions that
respond to production rule ``triggers''. The tool set includes an LR parser
generator and several fully worked-out example programs.
QPARSER provides the grammar-directed ``front-end'', as well as several
software tools useful in developing the semantics ``back-end'' of a
compiler. A comprehensive user manual is part of the full system.
The front-end is automatically generated, solving several difficult software
problems. This is the half that scans and analyzes the input language forms,
and is capable of extracting meaning from them by relating them back to the
rules of the defining grammar.
The back-end, which must be specially written for each application, is the
operational side of the translator. It performs actions in response to the
extracted meaning of input forms. Since translator actions are very general
in nature, it's necessary to write these as Pascal or C program fragments
for your application.
QPARSER will generate a parser automatically, but not a complete translator.
The parser will contain the necessary source file access, a lexical scanner,
symbol table functions, debugging tools and semantics tools.
QPARSER Features
The principal features of QPARSER are:
* Correctness. The language accepted by the translator will be exactly
that specified by the grammar.
* Detection of Ambiguity. If your grammar is ambiguous, or if the
parser is unable to assure a correct translation, a diagnostic message
will be printed.
* Automatic generation of two key components. The lexical scanner and
the LR parser are automatically generated.
* Efficient table-driven parser operations. For a large grammar, most
compiler experts agree that a table-driven LR1 approach is superior in
both performance and space requirements to any other. Several
optimizations are made to minimize table size.
* Modularity of semantic operations. A translator programming task
will become neatly partitioned into a number of relatively water-tight
compartments.
page 1
QPARSER Demonstration System
* Efficient lexical scanner. The lexical scanner operates by a CASE
statement transfer on the next character to be scanned from the
source. No backtracking, table searching or waste motion is involved.
* Efficient and powerful symbol table mechanism. Symbol searching is
built into the lexical scanner, which is automatically generated.
* Error recovery. A reasonable error recovery system is built into
each translator. No extra parser table space is required.
* Ease of extending or changing the language. Once a translator is
written following the guidelines in the full user manual, it'll be
easy to extend or modify.
* Ease of extending or changing the lexical scanner. The scanner is
provided in well-documented source form. Its underlying assumptions
are described in the user manual.
What's on the Demo Disk
The demo system diskette contains the following files:
ShowQP.COM -- a quick demo program
QPDEMO.TXT -- needed by ShowQP
LR1.COM -- the parser generator program
LR1P.COM -- skeleton expander
PMACS.TXT -- Pascal form macro file
LR1SKEL.PAS -- a parser ``skeleton'' file
CALCDBUG.PAS, SKELRTBL.PAS, SKELNUM.PAS,
SKELSYMS.PAS, SKELDTBL.PAS -- skeleton source files
CALC.GRM -- grammar definition for calculator
CALCSKEL.PAS -- skeleton file customized for calculator
CALCSEM.PAS, CALCUTIL.PAS -- calculator-specific semantics
CALC.COM -- executable calculator file
READ.ME -- this text file
The Quick Demo
Select the floppy drive containing the demo diskette as the default drive.
For example, if your floppy drive is A, then enter A: at the MS-DOS command
level. Then run ShowQP by typing its name. Youll be able to step through a
simple demonstration of some of Qparsers features. (If you see some peculiar
characters, you probably need to adjust your CONFIG.SYS file according to
the directions at the beginning of the demo.)
Configuration
We're going to assume that your microcomputer system is an IBM PC/XT with
the PC-DOS operating system, though you can also run the demos on an
IBM-compatible machine. You'll also be using some of the standard DOS
functions in the demonstrations. A Pascal compiler and screen-oriented
editor would make this demo system more meaningful, but aren't necessary.
The full QPARSER system requires a Pascal or C compiler and a text editor.
We recommend a system with at least 256 Kbytes of memory and a hard disk.
Turbo Pascal (which includes a screen-oriented text editor) is provided with
page 2
QPARSER Demonstration System
the full system, but not with the demo system. Order a copy of Turbo Pascal
from QCAD Systems, and use it for additional experiments with the demo
system.
Installation
If your computer has a hard disk: Copy the files from the QPARSER demo
diskette to a new directory using the following commands, which assume that
the hard disk is the default drive:
mkdir \qparser
copy a:*.* \qparser
cd \qparser
You will now be able to run the calculator examples from the \qparser
directory.
If your computer does NOT have a hard disk: The QPARSER demo diskette is
double-sided, which may not be readable from certain older PC versions. Call
us if this is a problem -- we can supply the demo system on two single-sided
diskettes. In any case, we recommend copying some of the demo diskettes
files -- for example, its .TXT and .PAS files -- to a blank diskette. The
following formats a new diskette, then copies all .PAS files from disk drive
a to drive b:
format b:
copy a:*.pas b:
Getting Started
QPARSER is easy to try out. The following operations will analyze the
``pocket calculator'' grammar and prepare a source Pascal file for a working
interactive calculator.
Start by entering the two commands
LR1 CALC
LR1P CALC -s CALCSKEL
CALC.GRM contains the calculator's grammar and is read by the first command
(the default extension .GRM is assumed). LR1 produces a grammar table
CALC.TBL. The second command reads CALC.TBL and CALCSKEL.PAS and produces a
full Pascal program file CALC.PAS. A short report file will appear on the
console.
Ordinarily, the CALC.PAS file would then need to be compiled, but we've
supplied a compiled version of this file (CALC.COM) so that you can execute
the calculator example and try out the parsing diagnostic tools without a
compiler.
If you have a Pascal compiler, you might try compiling CALC.PAS instead. It
follows Turbo Pascal conventions, and should compile without any complaints
with that compiler. If you are using another Pascal compiler, you may have
to edit CALC.PAS or CALCSKEL.PAS in order to accommodate compiler
differences.
page 3
QPARSER Demonstration System
Executing CALC.COM
When you execute CALC.COM, you should see a prompt (>). You'll then be able
to type arithmetic expressions, whose values will be reported on the
console. For example,
5 + 6.5*(9 - 3)
yields the result 44. An assignment statement, e.g.,
x := 5 + 15
causes variable x to be declared and receive the value 20. Then an
assignment such as
y := x + 5
can be evaluated, associating value 25 with variable y.
The point of this exercise is not to run a calculator, but to demonstrate
that a simple grammar and a short semantics file is sufficient to produce an
interpreter for a simple language. We're next going to look at the grammar
file (CALC.GRM) and the semantics files (CALCSEM.PAS and CALCUTIL.PAS) to
see what QPARSER does and what is required of you as the programmer.
The Calculator Grammar
Any of the source files can be examined by using a text editor of your
choice, or the EDLIN or MORE command -- see the DOS reference manuals for
instructions.
The calculator grammar file CALC.GRM is very short, so we'll reproduce it
below:
\ CALC -- A simple four-function calculator grammar
\ which allows variable assignments.
Goal -> Stmts QUIT #quit
Stmts -> Stmts Stmt
-> Stmt
Stmt -> Expr <eol> #prtval
-> <identifier> := Expr <eol> #assign
-> <eol> \ allow an empty line
Expr -> Expr + Term #plus
-> Expr - Term #minus
-> Term
Term -> Term * Fact #mpy
-> Term / Fact #divide
-> Fact
Fact -> Primary
-> - Primary #uminus
Primary -> ( Expr ) #parens
-> <identifier> #variable
-> <real> #realval
-> <integer> #intval
What does all this mean? First, anything starting with a backslash (\) is
page 4
QPARSER Demonstration System
just a comment. Secondly, a symbol preceded by a sharp sign (#) is a
production rule tag, whose purpose will be explained later. What remains is
a list of production rules, whose general form is like this:
<left-member> -> <right-member>
Whenever a left-member is missing, it is assumed to be the same as for the
preceding rule. Thus the third rule, written
-> Stmt
above, is actually
Stmts -> Stmt
The primary purpose of a grammar is to define a language -- in this case,
the algebraic formulas and assignment statements that can be legally entered
when the calculator is executed. A grammar is a set of production rules, or
just rules for short. Each rule has exactly one left member element and one
or more right member elements, each of which we shall call a token. The
tokens are separated by one or more spaces.
For example,
Term -> Term * Fact
is a rule in this grammar. Its left member token is Term, and the three
right member tokens are Term, *, and Fact. It says, in essence, that any
input string form that is to be considered a Term must consist of a Term,
followed by an asterisk (*), followed by a Fact.
A Term and a Fact can each be expanded into any of a large number of token
sequences, while the asterisk (*) stands for itself. Tokens of the former
kind are called nonterminals, while tokens of the latter kind are called
terminals.
Another rule is
Term -> Term / Fact
The left member token Term has been omitted in the grammar file, and is
therefore inherited from the previous rule in the grammars rule list. This
rule says that a Term consists of a Term followed by a slash (/) followed by
a Fact.
The first rule in the list
Goal -> Stmts QUIT
will expand into the entire language. The left member token Goal of this
rule is not permitted anywhere else in the grammar. This rule says that any
legal language form must consist of a member of Stmts followed by the token
QUIT.
page 5
QPARSER Demonstration System
How a GOAL Symbol Expands into Forms
Lets start with the first rule:
Goal -> Stmts QUIT
We see that Stmts appears on the left of two rules:
Stmts -> Stmts Stmt
Stmts -> Stmt
This means that we can choose either one of these two rules, and then
substitute the right members for one appearance of the left member. Let's
choose the first rule, and make the substitution:
Goal -> Stmts QUIT -> Stmts Stmt QUIT
Well, we don't seem to be getting anywhere, because the token Stmts appears
in our new, expanded form. However, we can close this cycle by choosing the
second Stmts rule for a substitution:
Goal -> Stmts QUIT -> Stmts Stmt QUIT -> Stmt Stmt QUIT
It should be clear that we can get a sequence of Stmt forms by choosing the
first rule several times. Each substitution step is called a derivation
step; we are deriving a sentence in the language from the Goal token.
Now let's look at the Stmt rules. There are three of them, as follows:
Stmt -> Expr <eol>
Stmt -> <identifier> := Expr <eol>
Stmt -> <eol>
The special token <eol> is a predefined terminal token that stands for an
enter. The special token <identifier> stands for any Pascal-like user name;
for example, L1, SAM_SMITH, MARY_JONES are all legal <identifier>s. The Expr
is associated with some more rules, permitting more substitutions.
Let's choose the first of these Stmt rules for the first substitution. We
then have, starting from the Goal again:
Goal -> Stmts QUIT -> Stmt QUIT -> Expr <eol> QUIT
More substitutions are now possible. There are three Expr rules; two of
these include an Expr and a Term. There are also three Term rules, etc.
Notice that the algebraic tokens +, -, *, / appear in these.
We'll continue the expansion by making various choices along the way. We
suggest trying one yourself on paper, to see just how many different ways a
Goal can be expanded into sequences of algebraic formulas.
page 6
QPARSER Demonstration System
Goal -> Stmts QUIT
-> Stmt QUIT
-> Expr <eol> QUIT
-> Expr + Term <eol> QUIT
-> Term + Term <eol> QUIT
-> Fact + Term <eol> QUIT
-> Primary + Term <eol> QUIT
-> <integer> + Term <eol> QUIT
-> <integer> + Fact <eol> QUIT
-> <integer> + Primary <eol> QUIT
-> <integer> + ( Expr ) <eol> QUIT
-> <integer> + ( Term - Term ) <eol> QUIT
-> <integer> + ( Term * Fact - Term ) <eol> QUIT
-> <integer> + ( Fact * Fact - Term ) <eol> QUIT
-> <integer> + ( Primary * Fact - Term ) <eol> QUIT
-> <integer> + ( <real> * Fact - Term ) <eol> QUIT
-> <integer> + ( <real> * Primary - Term ) <eol> QUIT
-> <integer> + ( <real> * <identifier> - Term ) <eol> QUIT
-> <integer> + ( <real> * <identifier> - Fact ) <eol> QUIT
-> <integer> + ( <real> * <identifier> - Primary ) <eol> QUIT
-> <integer> + ( <real> * <identifier> - <integer> ) <eol> QUIT
Whew! Fortunately, that's as far as we can go, because all the tokens are
terminal. Now, <integer> stands for any decimal integer, and <real> for a
floating-point number. So our derived sentence might look like this:
15 + (17.65 * SAM - 45) <eol> QUIT <eol>
The <eol> will be interpreted as an `enter' when the parser is executed. The
last enter will cause an exit from the calculator program.
Parsing
We've seen how a grammar can be written that defines some language we're
interested in. At this point, QPARSER can take over and produce a program
that can invert these derivation steps. When a derivation step is inverted,
we say that a reduce step or a reduction has occurred. Such a program is
called a bottom-up parser.
The program LR1 performs an analysis of the grammar and produces a set of
parser tables. Then program LR1P produces a parser program in Pascal source,
using a Pascal-based macro file PMACS.
The actions performed by LR1 and LR1P are roughly as follows:
1. The nonterminal and terminal tokens are identified. The terminal
tokens are used to construct a lexical scanner, which when executed
reads source strings and breaks them into the terminal token pieces.
2. The grammar rules are then used to construct a so-called LR state
machine, which is an abstract description of a parser.
3. The abstract state machine description is turned into a specific
Pascal or C program. The state machine is carried in the program in
tabular form, and the parser is a table interpreter. The lexical
scanner is expressed directly as Pascal or C statements.
page 7
QPARSER Demonstration System
Following the Parser's Steps
You can follow the parsing steps of the CALC.GRM grammar by running CALC.COM
again. This time, type the character ! before entering an expression, for
example,
! 15 + (17.65 * 22 - 45)
The ! character will bring up a debugging prompt before any of the rest of
the line is scanned. The debug prompt looks like this:
I(dentifier, D(ebug level, A(ll symbols, C(ontinue?
Type D, then 1 (enter), to choose debug level 1. You'll get the debug prompt
again, so type C to continue. You should see a production printed:
Primary -> <integer>
This indicates that the 15 in the input string has been scanned (the
<integer>) and has been reduced to a Primary. The debug prompt will again
appear, so type C. The next production appears:
Expr -> Term
This indicates that Primary has been reduced to Term and now Term is about
to be reduced to Expr. The production rule Term -> Primary will always be
skipped over during parsing due to an optimization in the parsing tables.
Typing the next C will bring up
Primary -> <real>
This says that the 17.65 in the input string has been scanned and reduced to
a Primary. Keep typing C, and you'll see the rest of the productions. Here
they are, as they will appear during the parsing process; weve added
comments to show what's happening:
Primary -> <integer> \ 15
Term -> Term * Fact \ 17.65 * 22 = 388.3
Expr -> Term
Primary -> <integer> \ 45
Expr -> Expr - Term \ 388.3 - 45 = 343.3
Primary -> ( Expr )
Expr -> Expr + Term \ 15 + 343.3 = 358.3
Stmt -> Expr <eol> \ the whole line
At this point, C will print a result and produce another prompt:
= 358.3
>
Lets type in QUIT. We'll then see one more production:
page 8
QPARSER Demonstration System
> QUIT
Goal -> Stmts QUIT
One more C and the CALC program is done.
Looking at the Semantics Stack
In the above example, you've surely noticed that the first number (15) was
scanned, then set aside temporarily until the parenthesized expression
(17.65*22-45) could be scanned. Where was the value 15 kept while this extra
work was going on?
The answer is that it was held on a semantics stack. The semantics stack is
a fundamental data structure of QPARSER. Every time a token is read from the
input stream, state information is pushed on the stack associated with the
token. A token like <integer> causes the integer value to be saved on the
stack. The token <real> causes a floating-point number to be saved, and the
token <identifier> causes a name to be saved.
When a production rule is applied, the top of the semantics stack will
always carry information that corresponds to the right members of the
production. Let's do another trace and see how that information is carried.
Execute CALC.COM again with the same expression:
> ! 15 + (17.65*22-45)
This time, choose D(ebug level 2, then C(ontinue. Youll see the following
display:
<integer> FIXED: 15
Primary -> <integer>
This says that the semantics stack contains just one item, the first integer
in the input stream. FIXED refers to its type, and the 15 is of course its
value as held in the stack.
The next C(ontinue produces the following:
Primary FLOAT: 1.500E+01
Expr -> Term
This announces that the fixed point number 15 has been converted into a
floating point number. That action was caused by a semantics action in
response to the production Primary -> <integer>. We'll see where that
happened in the source program later.
The next C(ontinue produces the following:
Expr FLOAT: 1.500E+01
+ OTHER:
( OTHER:
<real> FLOAT: 1.765E+01 \ TOS
Primary -> <real>
Here we've scanned the +, the ( and the real number 17.65. The stack has
page 9
QPARSER Demonstration System
four items -- its top (noted as TOS -- ``top-of-stack'' -- above) is shown
at the end of the stack report. The production Primary -> <real> is about to
be applied, and its right member refers to the <real> on the top of the
stack.
Now we see where the 15 went while the remainder of the expression is being
worked on -- it's held in the stack beneath the remaining material. It will
reappear later as the other material is reduced and popped from the stack.
Push C twice -- well skip one step involving the production Primary ->
<integer>. You should see the following stack:
Expr FLOAT: 1.500E+01
+ OTHER:
( OTHER:
Primary FLOAT: 1.765E+01 \ TOS-2
* OTHER: \ TOS-1
Primary FLOAT: 2.200E+01 \ TOS
Term -> Term * Fact
The productions right member (Term * Fact) seems not to correspond with the
three top stack members, but it does. The parser generator has optimized
parsing by eliminating some unnecessary reduction steps. The Term
corresponds to the Primary at TOS-2, and the Fact corresponds to the Primary
at TOS.
This production indicates that a multiplication is supposed to take place
between the numbers in TOS and TOS-2. The product is 388.3 and will appear
after you hit C again:
Expr FLOAT: 1.500E+01
+ OTHER:
( OTHER:
Primary FLOAT: 3.883E+02 \ TOS
Expr -> Term
Notice that the top three stack elements have been removed and replaced by a
single element -- the floating point number 388.3. Hit C twice. You should
get the following display:
Expr FLOAT: 1.500E+01
+ OTHER:
( OTHER:
Expr FLOAT: 3.883E+02 \ TOS-2
- OTHER:
Primary FLOAT: 4.500E+01 \ TOS
Expr -> Expr - Term
Here, the last number (45) has been scanned. The stack is set up for a
subtraction. Since the Expr in the production corresponds to TOS-2, and the
Term to TOS, the calculator program is expected to subtract 45 from 388.3,
yielding 343.3. That's shown in the next display, after hitting C:
page 10
QPARSER Demonstration System
Expr FLOAT: 1.500E+01
+ OTHER:
( OTHER:
Expr FLOAT: 3.433E+02
) OTHER:
Primary -> ( Expr )
This production won't do any arithmetic, but must make sure that the Expr
value 343.3 will appear on the stack top, associated with the Primary. The
reduce action causes the parentheses to disappear, yielding:
Expr FLOAT: 1.500E+01
+ OTHER:
Primary FLOAT: 3.433E+02
Expr -> Expr + Term
At last, the 15 reappears at the right place to perform an addition. That
operation yields 358.3, and the smaller stack:
Expr FLOAT: 3.583E+02
<eol> OTHER:
Stmt -> Expr <eol>
This production prints the resulting Expr value on the console, so after
hitting C, you should see:
= 358.3
>
Type in ``QUIT'', then another C to end the program.
Some Observations
We've run through a complete trace for the pocket calculator example. What
should be apparent is how the form of the grammar rules determine when
operations are needed, and also how the semantics stack mechanism ensures
that information is available when it is needed.
The parser generator system has solved several algorithmic tasks for you:
* The derivation of an input string has been uncovered and
reproduced in backward order, as a parser.
* Information is carried on the semantics stack until needed later.
* Names and numbers are scanned and put into the stack at the right
place.
* When each production rule pops up, the top of the semantics stack
carries just what is needed for the operation it implies.
You can do any number of different things when a production rule pops up.
We'll look at how that works next. We'll also take a look at some of the
type declarations and procedures that implement the parser.
page 11
QPARSER Demonstration System
Semantics Coding
Most of what you've seen in the semantics trace comes through the parser
generator tools and the skeleton files. However, one important part isn't
automatic -- you must write a semantics procedure called APPLY.
Procedure APPLY is the interface between the automatically generated parser
and the semantics operations that you must design and write. Please look at
APPLY, founnd in file CALCSEM.PAS, while you read on.
APPLY is essentially a large CASE statement, whose tag names come from the
tags attached to the production rules in the grammar. We've deferred
discussion of these tags until now. Recall that a production rule tag is the
name following a sharp sign in a production. For example, in CALC.GRM, the
production
Expr -> Expr + Term #plus
carries the tag plus, which you will find in the CALCSEM.PAS source. The
APPLY action for this production rule is essentially to collapse the
information in the top three elements of SEMSTACK (corresponding to the
right member Expr + Term) into a single element (corresponding to the left
member Expr), as shown in figure 1.
-----------------------------------------------------
TOS
+---+---+---+---+---+
| | | E | + | T | semstack: BEFORE apply action
+---+---+---+---+---+ production E -> E + T
^ ^ ^
| | +---- stackx
| +-------- stackx-1
+------------ stackx-2
. . . . . . . . . . . . . . . . . . . . . . . . . . .
TOS
+---+---+---+
| | | E | semstack: AFTER apply action
+---+---+---+
^
+------------ stackx
Figure 1. Before and after APPLY operations on production
E -> E + T
----------------------------------------------------------
Here's what procedure APPLY looks like, with some of the material removed
for the sake of discussion:
page 12
QPARSER Demonstration System
procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp);
{ Apply customized for the calculator. All operations are kept
track of on the semantic stack. }
begin
case pflag of
{ omitted material }
DIVIDE, {Term -> Term / Fact}
MINUS, {Expr -> Expr - Term}
MPY, {Term -> Term * Fact}
PLUS: {Expr -> Expr + Term}
eval_binop(pflag, semstack[stackx-2],
semstack[stackx], tsemp);
{ omitted material }
end { apply case }
end;
Notice the PLUS tag. When the production Expr -> Expr + Term is ready to be
applied, the parser system will call APPLY with a value of PFLAG that will
direct control to the PLUS statement. This statement happens to be shared
with DIVIDE, MINUS, and MPY, but it doesn't have to be. You can -- and in
fact you must -- distinguish production rules with distinct tag names. You
can also omit the tag on any production that requires no special operation.
Notice that the function EVAL_BINOP passes the flag parameter PFLAG, and
three more parameters. The second parameter, SEMSTACK[STACKX-2], refers to
the TOS-2 element of the semantics stack. The third parameter,
SEMSTACK[STACKX], refers to the TOS element. As we've seen in the trace,
these carry information associated with the token.
The result of a stack operation is passed back through the variable TSEMP.
Procedure EVAL_BINOP
This procedure is in file CALCUTIL.PAS. Note that the two operands are
passed as OPND1 and OPND2, respectively, and these are of type SEMRECP,
which is the type of SEMSTACK[xxx].
EVAL_BINOP just sorts out which operation it is to perform, checking for the
possibility that the entry might be OTHER. (OTHER will creep in through a
syntax error recovery patch.) The RESULT value will always be a real number,
and SEM_TYPE will therefore always be FLOAT. EVAL_BINOP also attempts to
produce a reasonable result, even though either of the operands might be the
default type OTHER.
The procedure WRITE_VALUE, also in CALCUTIL.PAS, essentially decides how to
print a number. All numbers are stored in floating-point form, including
integers, but a real number might be sufficiently close to an integer to
justify printing it that way. That's most of WRITE_VALUE. There are clearly
other strategies that can be adopted.
The last two procedures, INIT_SEM and END_SEM, are called before any parsing
action is taken, and after all actions are complete, respectively. You can
use these for any initializations required at these times. Usually none is
required.
page 13
QPARSER Demonstration System
The Types of SEMSTACK and TSEMP
You may wish to look at the Pascal type of the variables SEMSTACK and TSEMP.
They are part of the global declarations in file CALCSKEL.PAS. SEMSTACK is
an array of pointers to the record type SEMREC. Its top of stack will be the
integer index STACKX.
Record SEMREC carries different cases for the different kinds of things to
be carried in the semantic stack -- IDENT for identifiers, FIXED for
integers, FLOAT for real numbers, etc. The record variable SEMT indicates
which kind of object is in place, making it possible for a general function
like EVAL_BINOP to decide how to carry out an operation.
The default SEMT type OTHER is used for semantic objects that carry no
special information, such as the operation tokens +, *, etc.
TSEMP is a pointer to a SEMREC record. In passing information back through
an APPLY call, the record field pointed to by TSEMP should be filled with
something that will appear on the stack. You can choose to do nothing, in
which case TSEMP will point to the default case OTHER. There's also a
special case that's handy to know about: any single production with no tag
causes its right member to be copied into its left member. A single
production is a production rule with exactly one right-member token. For
example, the rule
Fact -> Primary
qualifies as a single production. When its tag is absent, the Primary
information is just carried along on the stack without change.
Symbols and the Symbol Table
QPARSER provides a complete set of symbol table access functions. You may
wish to look at their source, in file SKELSYMS.PAS. The global declarations
associated with these functions are in CALCSKEL.PAS -- look at the record
SYMTABTYPE, and the other types and variables associated with it.
A symbol is picked up by the lexical scanner whenever the grammar token
<identifier> is scanned. The lexical scanner in QPARSER assumes that names
and other tokens follow Pascal rules -- but you can modify or rewrite the
scanner to suit other language disciplines.
When an <identifier> is scanned, a default symbol table entry is built for
it, and a pointer to that entry appears in the semantics stack. Symbol table
entries have the form of the SYMTABTYPE record, and are allocated from the
Pascal heap. An efficient name search algorithm is used, based on a hash
table. These search algorithms are expressed in the functions in file
SKELSYMS.PAS.
The skeleton file symbol table functions are suitable for flat or
block-scoped names. The user manual and the mini-Pascal compiler (included
in the full QPARSER package) describe how they can be used in considerable
detail to cover a variety of special situations. Names can be unlinked from
the table at the end of a block, and can also be deallocated from the heap.
page 14
QPARSER Demonstration System
Tracing Symbols
You can see how the symbol tracing system works by running CALC.COM again.
This time, keep the debug level at 0, and enter each of the following
formulas:
X := 15
Y := X - 40
Z := X + Y
The calculator should report that X = 15, Y = -25, and Z = -10. Now enter a
! character:
!
You will get the usual debug menu:
I(dentifier, D(ebug level, A(ll symbols, C(ontinue?
When you type A, you will be shown all the names known in the symbol table:
Z Y X
(The order of these names is an accident of the hash table.) When you type
I, you'll be prompted for a name, and then its attributes as found in the
symbol table will be reported:
I(dentifier, D(ebug level, A(ll symbols, C(ontinue? I
What symbol? y
Y REAL_VAR 0: -2.5000E+01
Some of these attributes will always be provided. Others will require
writing a short program segment to produce. In the above line, Y, REAL_VAR
and the 0 (scope level) are essentially automatic. Printing the numeric
value (-2.5000E+01) requires adding a write to the standard debugging file.
To see what's needed, look at CALCDBUG.PAS, which contains the trace and
interactive dump procedures. Most of this is generic to any grammar or
language. The line needed to support the calculator may be found in
procedure DUMP_SYM, as follows:
{---------- following line is specific to CALCDBUG -----------}
real_variable: write(rfile, rval);
Error Recovery
The calculator program illustrates the generic error recovery system
provided with every generated translator. Error recovery operates from the
parsing tables used by the parser. It attempts to find a local patch in the
vicinity of the error. The patch must be compatible with the parsing
operations and is searched for in a special procedure that does not
interfere with the mainstream of parsing operations. When a patch is found,
the machine state and stack are returned to the main parser such that
parsing can continue as though there were no error.
The error recovery system does not depend on line endings or special
page 15
QPARSER Demonstration System
delimiter tokens. It will operate as effectively for a block-structured
language as for a line-oriented language. It will usually find a reasonable
patch in one step. It does not require scanning ahead through more tokens in
order to find a patch, although that would help make a more effective
decision.
Try running CALC.COM with some invalid formulas. The following illustrates
two cases and the recovery report. Note that the syntax error report points
to the position of the error in the input line:
> x y + 15
^
ERROR: syntax error
Type any character to continue:
= 5.0000000001
Here the patch yielded some new expression; the program nevertheless
produced a partial result.
> x := y ++ 15
^
ERROR: syntax error
Type any character to continue:
Undefined variable ERR#AA
X := -10.000000001
Here, the patch amounted to creating and inserting a new identifier between
the two + signs, ERR#AA. This will be tagged as an ERRSYM in the symbol
table, but our semantics routine didn't pay attention to that and complained
about the undefined variable. It's possible to continue semantics checking,
working around the possible error insertions, but it would be easier to just
suppress all semantics operations upon finding the first syntax error.
SUMMARY: What's Automatic and What Isn't
We can now review what QPARSER provides automatically and whats required of
you as a programmer in order to generate a complete translator or compiler.
In general, you must design and implement the grammar and the semantic
functions. You will need to expand the number of categories of the SEMREC
and SYMTABTYPE record structures, since these contain only a few simple
types.
When these are expanded, some additions will be needed (e.g. to
SKELDBUG.PAS) to support symbolic tracing and debugging, as weve seen.
Designing and writing the semantic functions start with the APPLY procedure.
When you are starting from scratch with a grammar, the parser generator will
produce a dummy APPLY procedure for you automatically. It will look like
this, using the CALC.GRM grammar as an example:
page 16
QPARSER Demonstration System
procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp);
begin
case pflag of
ASSIGN:
begin { Stmt -> <identifier> := Expr <eol> }
writeln(rfile, ASSIGN applied.);
end;
DIVIDE:
begin { Term -> Term / Fact }
writeln(rfile, DIVIDE applied.);
end;
{ ... material omitted ... }
UMINUS:
begin { Fact -> - Primary }
writeln(rfile, UMINUS applied.);
end;
VARIABLE:
begin { Primary -> <identifier> }
writeln(rfile, VARIABLE applied.);
end;
end
end;
The generator has provided a separate CASE item for each tagged production,
in alphabetic order, and has included the production for reference purposes
as a comment. It has also provided a writeln to the report file rfile.
You can compile and execute a generated program based on nothing more than a
grammar. Tagged productions will announce themselves during its execution as
indicated above. You can then start to fill in the separate slots in the
APPLY case table with ``real semantics operations, testing the result as you
add various modules.
The generic tracing and debugging tools will also operate with no special
additions, although they cannot report your SEMREC or SYMTABTYPE extensions
without some help.
The Full QPARSER System
This demo guide is essentially the introductory chapter of the full QPARSER
user manual. The full system contains, in addition to an unrestricted parser
generator and the calculator example discussed above, all of the following:
1. A comprehensive user manual containing a language tutorial, a
detailed discussion of the lexical scanner, symbol table, semantics
stack, use of the heap, the example assembler, simulator and
mini-Pascal compiler, and a technical summary. The user manual also
contains listings of all supplied source programs.
2. A C skeleton and a companion macro file CMACS.TXT that will
produce a complete, ready-to-compile parser in Kernighan and Ritchie
C.
3. A grammar checking and analysis program. This can produce a
cross-reference table of the grammar and report all terminal and
nonterminal tokens. It looks for useless nonterminals and other
page 17
QPARSER Demonstration System
grammar problems, such as a single production cycle. It can report
FIRST, FOLLOW and PREDECESSOR sets, which are valuable in
understanding LR conflicts and in designing semantics structures.
4. A simulator and assembler for a simple Pcode-style stack machine
is provided, in Pascal source form. The assembler makes use of the LR
translator system -- its grammar is also provided. If you want to
write an assembler for any purpose or machine, this provides a good
starting point.
5. A Pascal subset compiler is provided in Pascal source form, along
with its grammar and a detailed discussion of its internals. This
illustrates the design of a user type declaration scheme, full
expression evaluation, constant folding, several control structures
with some optimizations, compound name access evaluation with
optimizations, and a procedure call/exit mechanism. If you want to
write a compiler or interpreter for a language using any or all of
these Pascal features, this grammar and source material is a good
starting point.
6. A copy of Turbo Pascal is provided. All of the Pascal source
materials are compatible with this compiler. It contains a full screen
editor as well. You can order this product separately from us if you
wish to explore the demo system more fully. Turbo Pascal has certain
limitations, as summarized below. For a large project, you will likely
prefer a more powerful Pascal or C compiler.
Educational Applications
Are you an instructor of a course in compiler construction or language
theory? If so, you will find QPARSER a valuable laboratory addition to your
course. The system has been adopted by several educational institutions
(write or call for more information). The second edition of the textbook
written by Dr. William Barrett supplements QPARSER, and is scheduled for
release by SRA (Chicago) by December, 1985.
In addition to the obvious grammar and semantics experiments that can be
performed, the parser generator can produce detailed item sets that
eventually form the LR state machine. The effects of various optimizations
can also be followed, and the item-sets traced through to the final state
machine.
You can observe the development of the item-sets by selecting a fairly high
report level when executing LR1 -- these range from 0 to 6 and provide an
increasingly detailed set of reports. For example, here's a command line
that will provide such a report file:
LR1 CALC -l3 CALCRPT.TXT
There are no hidden object files or undocumented features in QPARSER. LR1
produces a set of parser tables that are clearly defined by the source
access procedures and the parser itself. Many of the experiments can be
performed without writing any Pascal programs.
page 18
QPARSER Demonstration System
Copy Protection
Only one program is copy protected in the full system -- thats the parser
generator LR1. Its provided in object form only. The IBM versions use
SoftGuard. After installing LR1 to a hard disk or floppy diskette, you can
then freely execute LR1 without having the distribution diskette physically
present. You can also install a second copy for backup purposes.
The university editions enable multiple copies to be made from a single
distribution diskette. The instructor must submit a signed qualification
form to us.
LR1 Specifications
The parser generator LR1 constructs an item-set state machine with an LR(0)
construction. Inadequate states are identified and converted into lookahead
states. The lookahead token sets for reduce actions are computed by the
DeRemer-Penello LALR(1) method, as reported in ACM Transactions on Program
Languages & Systems, Vol 4, No. 4, October 1982.
A failure to resolve a state by LALR results in a resolution report and a
default resolution. Special tags may be added to production rules to guide
the resolution process.
Resolution may result in one or more useless lookahead states; these are
removed along with any unreachable states found. If the halt state cannot be
reached from the goal state, an error report is generated.
Untagged single productions are bypassed if the LR state machine so permits.
The resulting state tables are folded by identifying common subsequences.
These tables are of a sparse-matrix type, resulting in great economy of
representation of the final machine, without sacrificing performance.
Several tables are produced to facilitate symbolic debugging -- these
include production and token tables. These may easily be suppressed along
with the debugging procedures for a production version of the translator.
The automatically-generated lexical scanner assumes that Pascal-style
lexical analysis is appropriate. These assumptions are explained in the user
manual and are also manifest in the skeleton source files. A choice of
Pascal or C is provided for automatic generation. For other host languages,
you will have to write your own skeletons, perhaps by editing a Pascal or C
skeleton.
All the generated material and all the skeleton files are provided in
machine-readable source form, permitting their adaptation to other host
languages.
Limitations of LR1
The demo version of LR1 is limited to 25 production rules.
The full system is limited to 255 terminal tokens and 255 nonterminal
tokens. There are no other hard-bound limits other than maximum available
heap memory size. LR1s data structures are allocated from memory in just
sufficient quantities as needed. Space no longer needed is disposed or
page 19
QPARSER Demonstration System
released. The program requires about 65 Kbytes, and the stack another 65
Kbytes maximum. MS-DOS uses another 30 Kbytes. The rest of the memory is
available for heap.
Weve found that the mini-Pascal grammar, with 140 productions, requires
about 200 Kbytes of heap space. LR1 completes in less than two minutes on an
IBM XT.
More About Turbo Pascal
All of the examples supplied with QPARSER will compile and execute under
Turbo Pascal. Turbo has a number of features that make it attractive for
software development. Its screen-oriented editor is nicely integrated with
the compiler. Syntax errors are reported through the editor by the screen
cursor. Run-time errors can be traced to a single line in the program in the
editor.
Turbo may be ordered separated at the regular retail price from QCAD
Systems, Inc. One copy is included with the full QPARSER system.
However, you will find that Turbo has a number of size limitations that may
interfere with the development of a large compiler system, as follows:
* The compiled Turbo program space is limited to 64 Kilobytes. This limit
can be exceeded somewhat by using overlays. The limit does not depend on the
total memory of your machine.
* The Turbo compiler does not support independently compiled modules. Thus
the entire Pascal program, with all include files, must fit within the
compilers space limitations.
* The Turbo editor is limited to 63 Kilobytes of source data. It is
possible that the LR1 generator will produce a file too large to be accepted
by this editor. Note that EDLIN (on the IBM) can accept an arbitrarily large
source file.
* There are certain other limitations explained in the Turbo manual that
may affect your development.
The LR1 parser generator will make full use of the maximum memory of your
system, however, and can generate an arbitrarily large program file.
If you are impacted by the Turbo memory limitations, we recommend the use of
a different editor and/or compiler for the compilation of the generated
translators.
Other Systems and Services
QPARSER can be provided on the VAX/VMS, MacIntosh 512K, and the
Hewlett-Packard series 200. Quantity and dealer discounts available. Site
and source licenses available. Consultation with our expert staff on
specific language or translator issues can be arranged. Call for details and
prices.
page 20
QPARSER Demonstration System
For Further Study
We hope weve given you some notion of whats required to make effective use
QPARSER user manual is itself a condensed course in language
theory, and is available separately. The following is a short bibliography
of books on compiler theory.
Compiler Construction: Theory and Practice, Barrett and Couch, Science
Research Associates, Inc, second edition, 1985.
Principles of Compiler Design, Aho and Ullman, Addison-Wesley, 1979.
The Theory of Parsing, Translation, and Compiling, two volumes, Aho and
Ullman, Prentice-Hall, 1973.
Turbo Pascal is a registered trademark of Borland International.
QPARSER is a trademark of QCAD Systems, Inc.
page 21
QPARSER Demonstration System
PRODUCT ORDER FORM
QCAD Systems, Inc.
1164 Hyde Avenue, San Jose, CA 95129 (408) 727-6671
TOLL-FREE: (800) 538-9787 (outside California)
Quantity Item Unit Price Total
-----------------------------------------------------------------------------
Qparser Industrial system $400.00
Qparser University system* 400.00
Qparser Demo Package 10.00
Qparser User Manual 25.00
Turbo Pascal 3.0 69.95
Qtools -- Unix-like tool kit 49.50
Qroff -- Laserjet typesetting system 79.95
SUB-TOTAL
TAX (CA residents)
SHIPPING (overseas)
GRAND TOTAL
* Please enclose educational agreement
Name __________________________________________________________________
Company _______________________________________________________________
Address _______________________________________________________________
City _________________________________ State ____________ ZIP _________
Computer System _______________________________
Method of payment:
___ check enclosed
___ charge card #: __________________________ Exp: ____________
Name on card: _____________________________________________
Visa ___ Mastercard ___
___ purchase order #: _______________________
(prices subject to change without notice)
OVERSEAS SHIPPING: $5.00 each for Qroff, Qtools, Qparser demo;
$20.00 for full Qparser system; $10.00 for Qparser manual or Turbo.
page 22