Simtel MSDOS 1992 September

home *** CD-ROM | disk | FTP | other *** search

/ Simtel MSDOS 1992 September / Simtel20_Sept92.cdr / msdos / pascal / qparser.arc / READ.ME < prev next >

Wrap

Text File | 1985-09-30 | 53KB | 1,454 lines

QPARSER Demonstration System Version 2.3, September 3, 1985 Copyright (C) 1985, QCAD Systems, Inc. Introduction QPARSER provides a strategy and a set of software tools for compiler and translator development that we and other professionals have found to be highly productive. The strategy is one of writing semantic actions that respond to production rule ``triggers''. The tool set includes an LR parser generator and several fully worked-out example programs. QPARSER provides the grammar-directed ``front-end'', as well as several software tools useful in developing the semantics ``back-end'' of a compiler. A comprehensive user manual is part of the full system. The front-end is automatically generated, solving several difficult software problems. This is the half that scans and analyzes the input language forms, and is capable of extracting meaning from them by relating them back to the rules of the defining grammar. The back-end, which must be specially written for each application, is the operational side of the translator. It performs actions in response to the extracted meaning of input forms. Since translator actions are very general in nature, it's necessary to write these as Pascal or C program fragments for your application. QPARSER will generate a parser automatically, but not a complete translator. The parser will contain the necessary source file access, a lexical scanner, symbol table functions, debugging tools and semantics tools. QPARSER Features The principal features of QPARSER are: * Correctness. The language accepted by the translator will be exactly that specified by the grammar. * Detection of Ambiguity. If your grammar is ambiguous, or if the parser is unable to assure a correct translation, a diagnostic message will be printed. * Automatic generation of two key components. The lexical scanner and the LR parser are automatically generated. * Efficient table-driven parser operations. For a large grammar, most compiler experts agree that a table-driven LR1 approach is superior in both performance and space requirements to any other. Several optimizations are made to minimize table size. * Modularity of semantic operations. A translator programming task will become neatly partitioned into a number of relatively water-tight compartments. page 1 QPARSER Demonstration System * Efficient lexical scanner. The lexical scanner operates by a CASE statement transfer on the next character to be scanned from the source. No backtracking, table searching or waste motion is involved. * Efficient and powerful symbol table mechanism. Symbol searching is built into the lexical scanner, which is automatically generated. * Error recovery. A reasonable error recovery system is built into each translator. No extra parser table space is required. * Ease of extending or changing the language. Once a translator is written following the guidelines in the full user manual, it'll be easy to extend or modify. * Ease of extending or changing the lexical scanner. The scanner is provided in well-documented source form. Its underlying assumptions are described in the user manual. What's on the Demo Disk The demo system diskette contains the following files: ShowQP.COM -- a quick demo program QPDEMO.TXT -- needed by ShowQP LR1.COM -- the parser generator program LR1P.COM -- skeleton expander PMACS.TXT -- Pascal form macro file LR1SKEL.PAS -- a parser ``skeleton'' file CALCDBUG.PAS, SKELRTBL.PAS, SKELNUM.PAS, SKELSYMS.PAS, SKELDTBL.PAS -- skeleton source files CALC.GRM -- grammar definition for calculator CALCSKEL.PAS -- skeleton file customized for calculator CALCSEM.PAS, CALCUTIL.PAS -- calculator-specific semantics CALC.COM -- executable calculator file READ.ME -- this text file The Quick Demo Select the floppy drive containing the demo diskette as the default drive. For example, if your floppy drive is A, then enter A: at the MS-DOS command level. Then run ShowQP by typing its name. Youll be able to step through a simple demonstration of some of Qparsers features. (If you see some peculiar characters, you probably need to adjust your CONFIG.SYS file according to the directions at the beginning of the demo.) Configuration We're going to assume that your microcomputer system is an IBM PC/XT with the PC-DOS operating system, though you can also run the demos on an IBM-compatible machine. You'll also be using some of the standard DOS functions in the demonstrations. A Pascal compiler and screen-oriented editor would make this demo system more meaningful, but aren't necessary. The full QPARSER system requires a Pascal or C compiler and a text editor. We recommend a system with at least 256 Kbytes of memory and a hard disk. Turbo Pascal (which includes a screen-oriented text editor) is provided with page 2 QPARSER Demonstration System the full system, but not with the demo system. Order a copy of Turbo Pascal from QCAD Systems, and use it for additional experiments with the demo system. Installation If your computer has a hard disk: Copy the files from the QPARSER demo diskette to a new directory using the following commands, which assume that the hard disk is the default drive: mkdir \qparser copy a:*.* \qparser cd \qparser You will now be able to run the calculator examples from the \qparser directory. If your computer does NOT have a hard disk: The QPARSER demo diskette is double-sided, which may not be readable from certain older PC versions. Call us if this is a problem -- we can supply the demo system on two single-sided diskettes. In any case, we recommend copying some of the demo diskettes files -- for example, its .TXT and .PAS files -- to a blank diskette. The following formats a new diskette, then copies all .PAS files from disk drive a to drive b: format b: copy a:*.pas b: Getting Started QPARSER is easy to try out. The following operations will analyze the ``pocket calculator'' grammar and prepare a source Pascal file for a working interactive calculator. Start by entering the two commands LR1 CALC LR1P CALC -s CALCSKEL CALC.GRM contains the calculator's grammar and is read by the first command (the default extension .GRM is assumed). LR1 produces a grammar table CALC.TBL. The second command reads CALC.TBL and CALCSKEL.PAS and produces a full Pascal program file CALC.PAS. A short report file will appear on the console. Ordinarily, the CALC.PAS file would then need to be compiled, but we've supplied a compiled version of this file (CALC.COM) so that you can execute the calculator example and try out the parsing diagnostic tools without a compiler. If you have a Pascal compiler, you might try compiling CALC.PAS instead. It follows Turbo Pascal conventions, and should compile without any complaints with that compiler. If you are using another Pascal compiler, you may have to edit CALC.PAS or CALCSKEL.PAS in order to accommodate compiler differences. page 3 QPARSER Demonstration System Executing CALC.COM When you execute CALC.COM, you should see a prompt (>). You'll then be able to type arithmetic expressions, whose values will be reported on the console. For example, 5 + 6.5*(9 - 3) yields the result 44. An assignment statement, e.g., x := 5 + 15 causes variable x to be declared and receive the value 20. Then an assignment such as y := x + 5 can be evaluated, associating value 25 with variable y. The point of this exercise is not to run a calculator, but to demonstrate that a simple grammar and a short semantics file is sufficient to produce an interpreter for a simple language. We're next going to look at the grammar file (CALC.GRM) and the semantics files (CALCSEM.PAS and CALCUTIL.PAS) to see what QPARSER does and what is required of you as the programmer. The Calculator Grammar Any of the source files can be examined by using a text editor of your choice, or the EDLIN or MORE command -- see the DOS reference manuals for instructions. The calculator grammar file CALC.GRM is very short, so we'll reproduce it below: \ CALC -- A simple four-function calculator grammar \ which allows variable assignments. Goal -> Stmts QUIT #quit Stmts -> Stmts Stmt -> Stmt Stmt -> Expr <eol> #prtval -> <identifier> := Expr <eol> #assign -> <eol> \ allow an empty line Expr -> Expr + Term #plus -> Expr - Term #minus -> Term Term -> Term * Fact #mpy -> Term / Fact #divide -> Fact Fact -> Primary -> - Primary #uminus Primary -> ( Expr ) #parens -> <identifier> #variable -> <real> #realval -> <integer> #intval What does all this mean? First, anything starting with a backslash (\) is page 4 QPARSER Demonstration System just a comment. Secondly, a symbol preceded by a sharp sign (#) is a production rule tag, whose purpose will be explained later. What remains is a list of production rules, whose general form is like this: <left-member> -> <right-member> Whenever a left-member is missing, it is assumed to be the same as for the preceding rule. Thus the third rule, written -> Stmt above, is actually Stmts -> Stmt The primary purpose of a grammar is to define a language -- in this case, the algebraic formulas and assignment statements that can be legally entered when the calculator is executed. A grammar is a set of production rules, or just rules for short. Each rule has exactly one left member element and one or more right member elements, each of which we shall call a token. The tokens are separated by one or more spaces. For example, Term -> Term * Fact is a rule in this grammar. Its left member token is Term, and the three right member tokens are Term, *, and Fact. It says, in essence, that any input string form that is to be considered a Term must consist of a Term, followed by an asterisk (*), followed by a Fact. A Term and a Fact can each be expanded into any of a large number of token sequences, while the asterisk (*) stands for itself. Tokens of the former kind are called nonterminals, while tokens of the latter kind are called terminals. Another rule is Term -> Term / Fact The left member token Term has been omitted in the grammar file, and is therefore inherited from the previous rule in the grammars rule list. This rule says that a Term consists of a Term followed by a slash (/) followed by a Fact. The first rule in the list Goal -> Stmts QUIT will expand into the entire language. The left member token Goal of this rule is not permitted anywhere else in the grammar. This rule says that any legal language form must consist of a member of Stmts followed by the token QUIT. page 5 QPARSER Demonstration System How a GOAL Symbol Expands into Forms Lets start with the first rule: Goal -> Stmts QUIT We see that Stmts appears on the left of two rules: Stmts -> Stmts Stmt Stmts -> Stmt This means that we can choose either one of these two rules, and then substitute the right members for one appearance of the left member. Let's choose the first rule, and make the substitution: Goal -> Stmts QUIT -> Stmts Stmt QUIT Well, we don't seem to be getting anywhere, because the token Stmts appears in our new, expanded form. However, we can close this cycle by choosing the second Stmts rule for a substitution: Goal -> Stmts QUIT -> Stmts Stmt QUIT -> Stmt Stmt QUIT It should be clear that we can get a sequence of Stmt forms by choosing the first rule several times. Each substitution step is called a derivation step; we are deriving a sentence in the language from the Goal token. Now let's look at the Stmt rules. There are three of them, as follows: Stmt -> Expr <eol> Stmt -> <identifier> := Expr <eol> Stmt -> <eol> The special token <eol> is a predefined terminal token that stands for an enter. The special token <identifier> stands for any Pascal-like user name; for example, L1, SAM_SMITH, MARY_JONES are all legal <identifier>s. The Expr is associated with some more rules, permitting more substitutions. Let's choose the first of these Stmt rules for the first substitution. We then have, starting from the Goal again: Goal -> Stmts QUIT -> Stmt QUIT -> Expr <eol> QUIT More substitutions are now possible. There are three Expr rules; two of these include an Expr and a Term. There are also three Term rules, etc. Notice that the algebraic tokens +, -, *, / appear in these. We'll continue the expansion by making various choices along the way. We suggest trying one yourself on paper, to see just how many different ways a Goal can be expanded into sequences of algebraic formulas. page 6 QPARSER Demonstration System Goal -> Stmts QUIT -> Stmt QUIT -> Expr <eol> QUIT -> Expr + Term <eol> QUIT -> Term + Term <eol> QUIT -> Fact + Term <eol> QUIT -> Primary + Term <eol> QUIT -> <integer> + Term <eol> QUIT -> <integer> + Fact <eol> QUIT -> <integer> + Primary <eol> QUIT -> <integer> + ( Expr ) <eol> QUIT -> <integer> + ( Term - Term ) <eol> QUIT -> <integer> + ( Term * Fact - Term ) <eol> QUIT -> <integer> + ( Fact * Fact - Term ) <eol> QUIT -> <integer> + ( Primary * Fact - Term ) <eol> QUIT -> <integer> + ( <real> * Fact - Term ) <eol> QUIT -> <integer> + ( <real> * Primary - Term ) <eol> QUIT -> <integer> + ( <real> * <identifier> - Term ) <eol> QUIT -> <integer> + ( <real> * <identifier> - Fact ) <eol> QUIT -> <integer> + ( <real> * <identifier> - Primary ) <eol> QUIT -> <integer> + ( <real> * <identifier> - <integer> ) <eol> QUIT Whew! Fortunately, that's as far as we can go, because all the tokens are terminal. Now, <integer> stands for any decimal integer, and <real> for a floating-point number. So our derived sentence might look like this: 15 + (17.65 * SAM - 45) <eol> QUIT <eol> The <eol> will be interpreted as an `enter' when the parser is executed. The last enter will cause an exit from the calculator program. Parsing We've seen how a grammar can be written that defines some language we're interested in. At this point, QPARSER can take over and produce a program that can invert these derivation steps. When a derivation step is inverted, we say that a reduce step or a reduction has occurred. Such a program is called a bottom-up parser. The program LR1 performs an analysis of the grammar and produces a set of parser tables. Then program LR1P produces a parser program in Pascal source, using a Pascal-based macro file PMACS. The actions performed by LR1 and LR1P are roughly as follows: 1. The nonterminal and terminal tokens are identified. The terminal tokens are used to construct a lexical scanner, which when executed reads source strings and breaks them into the terminal token pieces. 2. The grammar rules are then used to construct a so-called LR state machine, which is an abstract description of a parser. 3. The abstract state machine description is turned into a specific Pascal or C program. The state machine is carried in the program in tabular form, and the parser is a table interpreter. The lexical scanner is expressed directly as Pascal or C statements. page 7 QPARSER Demonstration System Following the Parser's Steps You can follow the parsing steps of the CALC.GRM grammar by running CALC.COM again. This time, type the character ! before entering an expression, for example, ! 15 + (17.65 * 22 - 45) The ! character will bring up a debugging prompt before any of the rest of the line is scanned. The debug prompt looks like this: I(dentifier, D(ebug level, A(ll symbols, C(ontinue? Type D, then 1 (enter), to choose debug level 1. You'll get the debug prompt again, so type C to continue. You should see a production printed: Primary -> <integer> This indicates that the 15 in the input string has been scanned (the <integer>) and has been reduced to a Primary. The debug prompt will again appear, so type C. The next production appears: Expr -> Term This indicates that Primary has been reduced to Term and now Term is about to be reduced to Expr. The production rule Term -> Primary will always be skipped over during parsing due to an optimization in the parsing tables. Typing the next C will bring up Primary -> <real> This says that the 17.65 in the input string has been scanned and reduced to a Primary. Keep typing C, and you'll see the rest of the productions. Here they are, as they will appear during the parsing process; weve added comments to show what's happening: Primary -> <integer> \ 15 Term -> Term * Fact \ 17.65 * 22 = 388.3 Expr -> Term Primary -> <integer> \ 45 Expr -> Expr - Term \ 388.3 - 45 = 343.3 Primary -> ( Expr ) Expr -> Expr + Term \ 15 + 343.3 = 358.3 Stmt -> Expr <eol> \ the whole line At this point, C will print a result and produce another prompt: = 358.3 > Lets type in QUIT. We'll then see one more production: page 8 QPARSER Demonstration System > QUIT Goal -> Stmts QUIT One more C and the CALC program is done. Looking at the Semantics Stack In the above example, you've surely noticed that the first number (15) was scanned, then set aside temporarily until the parenthesized expression (17.65*22-45) could be scanned. Where was the value 15 kept while this extra work was going on? The answer is that it was held on a semantics stack. The semantics stack is a fundamental data structure of QPARSER. Every time a token is read from the input stream, state information is pushed on the stack associated with the token. A token like <integer> causes the integer value to be saved on the stack. The token <real> causes a floating-point number to be saved, and the token <identifier> causes a name to be saved. When a production rule is applied, the top of the semantics stack will always carry information that corresponds to the right members of the production. Let's do another trace and see how that information is carried. Execute CALC.COM again with the same expression: > ! 15 + (17.65*22-45) This time, choose D(ebug level 2, then C(ontinue. Youll see the following display: <integer> FIXED: 15 Primary -> <integer> This says that the semantics stack contains just one item, the first integer in the input stream. FIXED refers to its type, and the 15 is of course its value as held in the stack. The next C(ontinue produces the following: Primary FLOAT: 1.500E+01 Expr -> Term This announces that the fixed point number 15 has been converted into a floating point number. That action was caused by a semantics action in response to the production Primary -> <integer>. We'll see where that happened in the source program later. The next C(ontinue produces the following: Expr FLOAT: 1.500E+01 + OTHER: ( OTHER: <real> FLOAT: 1.765E+01 \ TOS Primary -> <real> Here we've scanned the +, the ( and the real number 17.65. The stack has page 9 QPARSER Demonstration System four items -- its top (noted as TOS -- ``top-of-stack'' -- above) is shown at the end of the stack report. The production Primary -> <real> is about to be applied, and its right member refers to the <real> on the top of the stack. Now we see where the 15 went while the remainder of the expression is being worked on -- it's held in the stack beneath the remaining material. It will reappear later as the other material is reduced and popped from the stack. Push C twice -- well skip one step involving the production Primary -> <integer>. You should see the following stack: Expr FLOAT: 1.500E+01 + OTHER: ( OTHER: Primary FLOAT: 1.765E+01 \ TOS-2 * OTHER: \ TOS-1 Primary FLOAT: 2.200E+01 \ TOS Term -> Term * Fact The productions right member (Term * Fact) seems not to correspond with the three top stack members, but it does. The parser generator has optimized parsing by eliminating some unnecessary reduction steps. The Term corresponds to the Primary at TOS-2, and the Fact corresponds to the Primary at TOS. This production indicates that a multiplication is supposed to take place between the numbers in TOS and TOS-2. The product is 388.3 and will appear after you hit C again: Expr FLOAT: 1.500E+01 + OTHER: ( OTHER: Primary FLOAT: 3.883E+02 \ TOS Expr -> Term Notice that the top three stack elements have been removed and replaced by a single element -- the floating point number 388.3. Hit C twice. You should get the following display: Expr FLOAT: 1.500E+01 + OTHER: ( OTHER: Expr FLOAT: 3.883E+02 \ TOS-2 - OTHER: Primary FLOAT: 4.500E+01 \ TOS Expr -> Expr - Term Here, the last number (45) has been scanned. The stack is set up for a subtraction. Since the Expr in the production corresponds to TOS-2, and the Term to TOS, the calculator program is expected to subtract 45 from 388.3, yielding 343.3. That's shown in the next display, after hitting C: page 10 QPARSER Demonstration System Expr FLOAT: 1.500E+01 + OTHER: ( OTHER: Expr FLOAT: 3.433E+02 ) OTHER: Primary -> ( Expr ) This production won't do any arithmetic, but must make sure that the Expr value 343.3 will appear on the stack top, associated with the Primary. The reduce action causes the parentheses to disappear, yielding: Expr FLOAT: 1.500E+01 + OTHER: Primary FLOAT: 3.433E+02 Expr -> Expr + Term At last, the 15 reappears at the right place to perform an addition. That operation yields 358.3, and the smaller stack: Expr FLOAT: 3.583E+02 <eol> OTHER: Stmt -> Expr <eol> This production prints the resulting Expr value on the console, so after hitting C, you should see: = 358.3 > Type in ``QUIT'', then another C to end the program. Some Observations We've run through a complete trace for the pocket calculator example. What should be apparent is how the form of the grammar rules determine when operations are needed, and also how the semantics stack mechanism ensures that information is available when it is needed. The parser generator system has solved several algorithmic tasks for you: * The derivation of an input string has been uncovered and reproduced in backward order, as a parser. * Information is carried on the semantics stack until needed later. * Names and numbers are scanned and put into the stack at the right place. * When each production rule pops up, the top of the semantics stack carries just what is needed for the operation it implies. You can do any number of different things when a production rule pops up. We'll look at how that works next. We'll also take a look at some of the type declarations and procedures that implement the parser. page 11 QPARSER Demonstration System Semantics Coding Most of what you've seen in the semantics trace comes through the parser generator tools and the skeleton files. However, one important part isn't automatic -- you must write a semantics procedure called APPLY. Procedure APPLY is the interface between the automatically generated parser and the semantics operations that you must design and write. Please look at APPLY, founnd in file CALCSEM.PAS, while you read on. APPLY is essentially a large CASE statement, whose tag names come from the tags attached to the production rules in the grammar. We've deferred discussion of these tags until now. Recall that a production rule tag is the name following a sharp sign in a production. For example, in CALC.GRM, the production Expr -> Expr + Term #plus carries the tag plus, which you will find in the CALCSEM.PAS source. The APPLY action for this production rule is essentially to collapse the information in the top three elements of SEMSTACK (corresponding to the right member Expr + Term) into a single element (corresponding to the left member Expr), as shown in figure 1. ----------------------------------------------------- TOS +---+---+---+---+---+ | | | E | + | T | semstack: BEFORE apply action +---+---+---+---+---+ production E -> E + T ^ ^ ^ | | +---- stackx | +-------- stackx-1 +------------ stackx-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . TOS +---+---+---+ | | | E | semstack: AFTER apply action +---+---+---+ ^ +------------ stackx Figure 1. Before and after APPLY operations on production E -> E + T ---------------------------------------------------------- Here's what procedure APPLY looks like, with some of the material removed for the sake of discussion: page 12 QPARSER Demonstration System procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp); { Apply customized for the calculator. All operations are kept track of on the semantic stack. } begin case pflag of { omitted material } DIVIDE, {Term -> Term / Fact} MINUS, {Expr -> Expr - Term} MPY, {Term -> Term * Fact} PLUS: {Expr -> Expr + Term} eval_binop(pflag, semstack[stackx-2], semstack[stackx], tsemp); { omitted material } end { apply case } end; Notice the PLUS tag. When the production Expr -> Expr + Term is ready to be applied, the parser system will call APPLY with a value of PFLAG that will direct control to the PLUS statement. This statement happens to be shared with DIVIDE, MINUS, and MPY, but it doesn't have to be. You can -- and in fact you must -- distinguish production rules with distinct tag names. You can also omit the tag on any production that requires no special operation. Notice that the function EVAL_BINOP passes the flag parameter PFLAG, and three more parameters. The second parameter, SEMSTACK[STACKX-2], refers to the TOS-2 element of the semantics stack. The third parameter, SEMSTACK[STACKX], refers to the TOS element. As we've seen in the trace, these carry information associated with the token. The result of a stack operation is passed back through the variable TSEMP. Procedure EVAL_BINOP This procedure is in file CALCUTIL.PAS. Note that the two operands are passed as OPND1 and OPND2, respectively, and these are of type SEMRECP, which is the type of SEMSTACK[xxx]. EVAL_BINOP just sorts out which operation it is to perform, checking for the possibility that the entry might be OTHER. (OTHER will creep in through a syntax error recovery patch.) The RESULT value will always be a real number, and SEM_TYPE will therefore always be FLOAT. EVAL_BINOP also attempts to produce a reasonable result, even though either of the operands might be the default type OTHER. The procedure WRITE_VALUE, also in CALCUTIL.PAS, essentially decides how to print a number. All numbers are stored in floating-point form, including integers, but a real number might be sufficiently close to an integer to justify printing it that way. That's most of WRITE_VALUE. There are clearly other strategies that can be adopted. The last two procedures, INIT_SEM and END_SEM, are called before any parsing action is taken, and after all actions are complete, respectively. You can use these for any initializations required at these times. Usually none is required. page 13 QPARSER Demonstration System The Types of SEMSTACK and TSEMP You may wish to look at the Pascal type of the variables SEMSTACK and TSEMP. They are part of the global declarations in file CALCSKEL.PAS. SEMSTACK is an array of pointers to the record type SEMREC. Its top of stack will be the integer index STACKX. Record SEMREC carries different cases for the different kinds of things to be carried in the semantic stack -- IDENT for identifiers, FIXED for integers, FLOAT for real numbers, etc. The record variable SEMT indicates which kind of object is in place, making it possible for a general function like EVAL_BINOP to decide how to carry out an operation. The default SEMT type OTHER is used for semantic objects that carry no special information, such as the operation tokens +, *, etc. TSEMP is a pointer to a SEMREC record. In passing information back through an APPLY call, the record field pointed to by TSEMP should be filled with something that will appear on the stack. You can choose to do nothing, in which case TSEMP will point to the default case OTHER. There's also a special case that's handy to know about: any single production with no tag causes its right member to be copied into its left member. A single production is a production rule with exactly one right-member token. For example, the rule Fact -> Primary qualifies as a single production. When its tag is absent, the Primary information is just carried along on the stack without change. Symbols and the Symbol Table QPARSER provides a complete set of symbol table access functions. You may wish to look at their source, in file SKELSYMS.PAS. The global declarations associated with these functions are in CALCSKEL.PAS -- look at the record SYMTABTYPE, and the other types and variables associated with it. A symbol is picked up by the lexical scanner whenever the grammar token <identifier> is scanned. The lexical scanner in QPARSER assumes that names and other tokens follow Pascal rules -- but you can modify or rewrite the scanner to suit other language disciplines. When an <identifier> is scanned, a default symbol table entry is built for it, and a pointer to that entry appears in the semantics stack. Symbol table entries have the form of the SYMTABTYPE record, and are allocated from the Pascal heap. An efficient name search algorithm is used, based on a hash table. These search algorithms are expressed in the functions in file SKELSYMS.PAS. The skeleton file symbol table functions are suitable for flat or block-scoped names. The user manual and the mini-Pascal compiler (included in the full QPARSER package) describe how they can be used in considerable detail to cover a variety of special situations. Names can be unlinked from the table at the end of a block, and can also be deallocated from the heap. page 14 QPARSER Demonstration System Tracing Symbols You can see how the symbol tracing system works by running CALC.COM again. This time, keep the debug level at 0, and enter each of the following formulas: X := 15 Y := X - 40 Z := X + Y The calculator should report that X = 15, Y = -25, and Z = -10. Now enter a ! character: ! You will get the usual debug menu: I(dentifier, D(ebug level, A(ll symbols, C(ontinue? When you type A, you will be shown all the names known in the symbol table: Z Y X (The order of these names is an accident of the hash table.) When you type I, you'll be prompted for a name, and then its attributes as found in the symbol table will be reported: I(dentifier, D(ebug level, A(ll symbols, C(ontinue? I What symbol? y Y REAL_VAR 0: -2.5000E+01 Some of these attributes will always be provided. Others will require writing a short program segment to produce. In the above line, Y, REAL_VAR and the 0 (scope level) are essentially automatic. Printing the numeric value (-2.5000E+01) requires adding a write to the standard debugging file. To see what's needed, look at CALCDBUG.PAS, which contains the trace and interactive dump procedures. Most of this is generic to any grammar or language. The line needed to support the calculator may be found in procedure DUMP_SYM, as follows: {---------- following line is specific to CALCDBUG -----------} real_variable: write(rfile, rval); Error Recovery The calculator program illustrates the generic error recovery system provided with every generated translator. Error recovery operates from the parsing tables used by the parser. It attempts to find a local patch in the vicinity of the error. The patch must be compatible with the parsing operations and is searched for in a special procedure that does not interfere with the mainstream of parsing operations. When a patch is found, the machine state and stack are returned to the main parser such that parsing can continue as though there were no error. The error recovery system does not depend on line endings or special page 15 QPARSER Demonstration System delimiter tokens. It will operate as effectively for a block-structured language as for a line-oriented language. It will usually find a reasonable patch in one step. It does not require scanning ahead through more tokens in order to find a patch, although that would help make a more effective decision. Try running CALC.COM with some invalid formulas. The following illustrates two cases and the recovery report. Note that the syntax error report points to the position of the error in the input line: > x y + 15 ^ ERROR: syntax error Type any character to continue: = 5.0000000001 Here the patch yielded some new expression; the program nevertheless produced a partial result. > x := y ++ 15 ^ ERROR: syntax error Type any character to continue: Undefined variable ERR#AA X := -10.000000001 Here, the patch amounted to creating and inserting a new identifier between the two + signs, ERR#AA. This will be tagged as an ERRSYM in the symbol table, but our semantics routine didn't pay attention to that and complained about the undefined variable. It's possible to continue semantics checking, working around the possible error insertions, but it would be easier to just suppress all semantics operations upon finding the first syntax error. SUMMARY: What's Automatic and What Isn't We can now review what QPARSER provides automatically and whats required of you as a programmer in order to generate a complete translator or compiler. In general, you must design and implement the grammar and the semantic functions. You will need to expand the number of categories of the SEMREC and SYMTABTYPE record structures, since these contain only a few simple types. When these are expanded, some additions will be needed (e.g. to SKELDBUG.PAS) to support symbolic tracing and debugging, as weve seen. Designing and writing the semantic functions start with the APPLY procedure. When you are starting from scratch with a grammar, the parser generator will produce a dummy APPLY procedure for you automatically. It will look like this, using the CALC.GRM grammar as an example: page 16 QPARSER Demonstration System procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp); begin case pflag of ASSIGN: begin { Stmt -> <identifier> := Expr <eol> } writeln(rfile, ASSIGN applied.); end; DIVIDE: begin { Term -> Term / Fact } writeln(rfile, DIVIDE applied.); end; { ... material omitted ... } UMINUS: begin { Fact -> - Primary } writeln(rfile, UMINUS applied.); end; VARIABLE: begin { Primary -> <identifier> } writeln(rfile, VARIABLE applied.); end; end end; The generator has provided a separate CASE item for each tagged production, in alphabetic order, and has included the production for reference purposes as a comment. It has also provided a writeln to the report file rfile. You can compile and execute a generated program based on nothing more than a grammar. Tagged productions will announce themselves during its execution as indicated above. You can then start to fill in the separate slots in the APPLY case table with ``real semantics operations, testing the result as you add various modules. The generic tracing and debugging tools will also operate with no special additions, although they cannot report your SEMREC or SYMTABTYPE extensions without some help. The Full QPARSER System This demo guide is essentially the introductory chapter of the full QPARSER user manual. The full system contains, in addition to an unrestricted parser generator and the calculator example discussed above, all of the following: 1. A comprehensive user manual containing a language tutorial, a detailed discussion of the lexical scanner, symbol table, semantics stack, use of the heap, the example assembler, simulator and mini-Pascal compiler, and a technical summary. The user manual also contains listings of all supplied source programs. 2. A C skeleton and a companion macro file CMACS.TXT that will produce a complete, ready-to-compile parser in Kernighan and Ritchie C. 3. A grammar checking and analysis program. This can produce a cross-reference table of the grammar and report all terminal and nonterminal tokens. It looks for useless nonterminals and other page 17 QPARSER Demonstration System grammar problems, such as a single production cycle. It can report FIRST, FOLLOW and PREDECESSOR sets, which are valuable in understanding LR conflicts and in designing semantics structures. 4. A simulator and assembler for a simple Pcode-style stack machine is provided, in Pascal source form. The assembler makes use of the LR translator system -- its grammar is also provided. If you want to write an assembler for any purpose or machine, this provides a good starting point. 5. A Pascal subset compiler is provided in Pascal source form, along with its grammar and a detailed discussion of its internals. This illustrates the design of a user type declaration scheme, full expression evaluation, constant folding, several control structures with some optimizations, compound name access evaluation with optimizations, and a procedure call/exit mechanism. If you want to write a compiler or interpreter for a language using any or all of these Pascal features, this grammar and source material is a good starting point. 6. A copy of Turbo Pascal is provided. All of the Pascal source materials are compatible with this compiler. It contains a full screen editor as well. You can order this product separately from us if you wish to explore the demo system more fully. Turbo Pascal has certain limitations, as summarized below. For a large project, you will likely prefer a more powerful Pascal or C compiler. Educational Applications Are you an instructor of a course in compiler construction or language theory? If so, you will find QPARSER a valuable laboratory addition to your course. The system has been adopted by several educational institutions (write or call for more information). The second edition of the textbook written by Dr. William Barrett supplements QPARSER, and is scheduled for release by SRA (Chicago) by December, 1985. In addition to the obvious grammar and semantics experiments that can be performed, the parser generator can produce detailed item sets that eventually form the LR state machine. The effects of various optimizations can also be followed, and the item-sets traced through to the final state machine. You can observe the development of the item-sets by selecting a fairly high report level when executing LR1 -- these range from 0 to 6 and provide an increasingly detailed set of reports. For example, here's a command line that will provide such a report file: LR1 CALC -l3 CALCRPT.TXT There are no hidden object files or undocumented features in QPARSER. LR1 produces a set of parser tables that are clearly defined by the source access procedures and the parser itself. Many of the experiments can be performed without writing any Pascal programs. page 18 QPARSER Demonstration System Copy Protection Only one program is copy protected in the full system -- thats the parser generator LR1. Its provided in object form only. The IBM versions use SoftGuard. After installing LR1 to a hard disk or floppy diskette, you can then freely execute LR1 without having the distribution diskette physically present. You can also install a second copy for backup purposes. The university editions enable multiple copies to be made from a single distribution diskette. The instructor must submit a signed qualification form to us. LR1 Specifications The parser generator LR1 constructs an item-set state machine with an LR(0) construction. Inadequate states are identified and converted into lookahead states. The lookahead token sets for reduce actions are computed by the DeRemer-Penello LALR(1) method, as reported in ACM Transactions on Program Languages & Systems, Vol 4, No. 4, October 1982. A failure to resolve a state by LALR results in a resolution report and a default resolution. Special tags may be added to production rules to guide the resolution process. Resolution may result in one or more useless lookahead states; these are removed along with any unreachable states found. If the halt state cannot be reached from the goal state, an error report is generated. Untagged single productions are bypassed if the LR state machine so permits. The resulting state tables are folded by identifying common subsequences. These tables are of a sparse-matrix type, resulting in great economy of representation of the final machine, without sacrificing performance. Several tables are produced to facilitate symbolic debugging -- these include production and token tables. These may easily be suppressed along with the debugging procedures for a production version of the translator. The automatically-generated lexical scanner assumes that Pascal-style lexical analysis is appropriate. These assumptions are explained in the user manual and are also manifest in the skeleton source files. A choice of Pascal or C is provided for automatic generation. For other host languages, you will have to write your own skeletons, perhaps by editing a Pascal or C skeleton. All the generated material and all the skeleton files are provided in machine-readable source form, permitting their adaptation to other host languages. Limitations of LR1 The demo version of LR1 is limited to 25 production rules. The full system is limited to 255 terminal tokens and 255 nonterminal tokens. There are no other hard-bound limits other than maximum available heap memory size. LR1s data structures are allocated from memory in just sufficient quantities as needed. Space no longer needed is disposed or page 19 QPARSER Demonstration System released. The program requires about 65 Kbytes, and the stack another 65 Kbytes maximum. MS-DOS uses another 30 Kbytes. The rest of the memory is available for heap. Weve found that the mini-Pascal grammar, with 140 productions, requires about 200 Kbytes of heap space. LR1 completes in less than two minutes on an IBM XT. More About Turbo Pascal All of the examples supplied with QPARSER will compile and execute under Turbo Pascal. Turbo has a number of features that make it attractive for software development. Its screen-oriented editor is nicely integrated with the compiler. Syntax errors are reported through the editor by the screen cursor. Run-time errors can be traced to a single line in the program in the editor. Turbo may be ordered separated at the regular retail price from QCAD Systems, Inc. One copy is included with the full QPARSER system. However, you will find that Turbo has a number of size limitations that may interfere with the development of a large compiler system, as follows: * The compiled Turbo program space is limited to 64 Kilobytes. This limit can be exceeded somewhat by using overlays. The limit does not depend on the total memory of your machine. * The Turbo compiler does not support independently compiled modules. Thus the entire Pascal program, with all include files, must fit within the compilers space limitations. * The Turbo editor is limited to 63 Kilobytes of source data. It is possible that the LR1 generator will produce a file too large to be accepted by this editor. Note that EDLIN (on the IBM) can accept an arbitrarily large source file. * There are certain other limitations explained in the Turbo manual that may affect your development. The LR1 parser generator will make full use of the maximum memory of your system, however, and can generate an arbitrarily large program file. If you are impacted by the Turbo memory limitations, we recommend the use of a different editor and/or compiler for the compilation of the generated translators. Other Systems and Services QPARSER can be provided on the VAX/VMS, MacIntosh 512K, and the Hewlett-Packard series 200. Quantity and dealer discounts available. Site and source licenses available. Consultation with our expert staff on specific language or translator issues can be arranged. Call for details and prices. page 20 QPARSER Demonstration System For Further Study We hope weve given you some notion of whats required to make effective use QPARSER user manual is itself a condensed course in language theory, and is available separately. The following is a short bibliography of books on compiler theory. Compiler Construction: Theory and Practice, Barrett and Couch, Science Research Associates, Inc, second edition, 1985. Principles of Compiler Design, Aho and Ullman, Addison-Wesley, 1979. The Theory of Parsing, Translation, and Compiling, two volumes, Aho and Ullman, Prentice-Hall, 1973. Turbo Pascal is a registered trademark of Borland International. QPARSER is a trademark of QCAD Systems, Inc. page 21 QPARSER Demonstration System PRODUCT ORDER FORM QCAD Systems, Inc. 1164 Hyde Avenue, San Jose, CA 95129 (408) 727-6671 TOLL-FREE: (800) 538-9787 (outside California) Quantity Item Unit Price Total ----------------------------------------------------------------------------- Qparser Industrial system $400.00 Qparser University system* 400.00 Qparser Demo Package 10.00 Qparser User Manual 25.00 Turbo Pascal 3.0 69.95 Qtools -- Unix-like tool kit 49.50 Qroff -- Laserjet typesetting system 79.95 SUB-TOTAL TAX (CA residents) SHIPPING (overseas) GRAND TOTAL * Please enclose educational agreement Name __________________________________________________________________ Company _______________________________________________________________ Address _______________________________________________________________ City _________________________________ State ____________ ZIP _________ Computer System _______________________________ Method of payment: ___ check enclosed ___ charge card #: __________________________ Exp: ____________ Name on card: _____________________________________________ Visa ___ Mastercard ___ ___ purchase order #: _______________________ (prices subject to change without notice) OVERSEAS SHIPPING: $5.00 each for Qroff, Qtools, Qparser demo; $20.00 for full Qparser system; $10.00 for Qparser manual or Turbo. page 22