home *** CD-ROM | disk | FTP | other *** search
- .\" troff -ms
- .\" ...if no CW font available, change CW to B globally
- .de ZB
- .DS
- .ft CW
- .nf
- ..
- .de ZE
- .DE
- .ft R
- .fi
- .br
- ..
- .TL
- A LISP interpreter in Awk
- .AU
- Roger Rohrbach
- .AI
- 1592 Union St., #94
- San Francisco, CA 94123
- .ND
- January 3, 1989
- .AB ABSTRACT
- .PP
- This note describes a simple interpreter for the LISP programming
- language, written in \fBawk\fR. It provides
- intrinsic versions of the basic functions on s-expressions, and many others
- written in LISP. It is compatible with the commonly
- available version of \fBawk\fR that is
- supplied with most UNIX systems. The interpreter serves to illustrate
- the use of \fBawk\fR for prototyping or implementing language
- translators, as well as providing a simple example of LISP
- implementation techniques.
- .AE
- .SH
- Intrinsic functions.
- .PP
- The interpreter has thirteen built-in functions.
- These include the five elementary functions on s-expressions defined
- by McCarthy [1]; some conditional expression operators; an
- assignment operator, and some functions to control the evaluation process.
- .PP
- The intrinsic functions are summarized below.
- Familiarity with existing LISP systems is assumed in the
- descriptions of these functions.
- .IP "\f(CW(car \fIl\fP)\fR"
- .br
- Returns the first element of the list \fIl\fR.
- An error occurs if \fIl\fR is atomic.
- .IP "\f(CW(cdr \fIl\fP)\fR"
- .br
- Returns the remainder of the list \fIl\fR, \fIi.e.\fR,
- the sublist containing the second through the last
- elements. If \fIl\fR has only one element, \fBnil\fR is
- returned. \fBcdr\fR is undefined on atoms.
- .IP "\f(CW(cons \fIe l\fP)\fR"
- .br
- Constructs a new list whose \fBcar\fR is \fIe\fR
- and whose \fBcdr\fR is \fIl\fR.
- .IP "\f(CW(eq \fIx y\fP)\fR"
- .br
- Returns \fBt\fR if \fIx\fR and \fIy\fR
- are the same LISP object, \fIi.e.\fR, are either atomic and
- have the same print name, or evaluate to the same
- list cell. Otherwise, \fBnil\fR is returned.
- .IP "\f(CW(atom \fIs\fP)\fR"
- .br
- Returns \fBt\fR if \fIs\fR is an atom, otherwise \fBnil\fR.
- .IP "\f(CW(set \fIx y\fP)\fR"
- .br
- Assigns the value \fIy\fR to \fIx\fR and returns \fIy\fR.
- \fIx\fR must be atomic, and may not be a constant
- or name an intrinsic function.
- .IP "\f(CW(eval \fIs\fP)\fR"
- .br
- Evaluates \fIs\fR and returns the result.
- .IP "\f(CW(error \fIs\fP)\fR"
- .br
- Halts evaluation and returns \fBnil\fR.
- The atom \fIs\fR is printed.
- .IP "\f(CW(quote \fIs\fP)\fR"
- .br
- Returns \fIs\fR, unevaluated. The form
- .ZB
- \'\fIexpr\fP
- .ZE
- is an abbreviation for
- .ZB
- (quote \fIexpr\fP)
- .ZE
- .IP "\f(CW(cond (\fIp1 \f(CW[\fIe1\f(CW]) \fI...\fP (\fIpN \f(CW[\fIeN\f(CW]))\fR"
- .br
- Evaluates each \fIp\fR from left to right. If any
- evaluate to a value other than \fBnil\fR, the
- corresponding \fIe\fR is evaluated and the result is returned.
- If there is no corresponding \fIe\fR, the value of the \fIp\fR
- itself is returned instead.
- If no \fIp\fR has a non-null value, \fBnil\fR is returned.
- .IP "\f(CW(and \fIe1 ... eN\fP)\fR"
- .br
- Evaluates each \fIe\fR and returns \fBnil\fR if any evaluate to \fBnil\fR.
- Otherwise the value of the last \fIe\fR is returned.
- .IP "\f(CW(or \fIe1 ... eN\fP)\fR"
- .br
- Evaluates each \fIe\fR and returns the first whose
- value is non-null. If no such \fIe\fR is found, \fBnil\fR is returned.
- .IP "\f(CW(list \fIe1 ... eN\fP)\fR"
- .br
- Constructs a new list with elements \fIe1 ... eN\fR.
- Equivalent to
- .br
- \f(CW(cons \fIe1\fP (cons \fI...\fP (cons \fIeN\fP nil)\fR.
- .SH
- Lambda functions.
- .PP
- The following functions are written in LISP and are
- defined in the file \fBwalk.w\fR. Most of
- these are commonly supplied with LISP systems.
- .IP "\f(CW(cadr \fIs\fP)\fR"
- .IP "\f(CW(cddr \fIs\fP)\fR"
- .IP "\f(CW(caar \fIs\fP)\fR"
- .IP "\f(CW(cdar \fIs\fP)\fR"
- .IP "\f(CW(cadar \fIs\fP)\fR"
- .IP "\f(CW(caddr \fIs\fP)\fR"
- .IP "\f(CW(cddar \fIs\fP)\fR"
- .IP "\f(CW(cdadr \fIs\fP)\fR"
- .br
- These correspond to various compositions of
- \fBcar\fR and \fBcdr\fR, \fIe.g.\fR,
- .br
- \f(CW(cadr \fIs\fP)\fR \(-> \f(CW(car (cdr \fIs\fP))\fR.
- .IP "\f(CW(null \fIs\fP)\fR"
- .br
- Equivalent to \f(CW(eq \fIs\fP nil)\fR.
- .IP "\f(CW(not \fIs\fP)\fR"
- .br
- Same as \fBnull\fR.
- .IP "\f(CW(ff \fIs\fP)\fR"
- .br
- Returns the first atomic symbol in \fIs\fR.
- .IP "\f(CW(subst \fIx y z\fP)\fR"
- .br
- Substitutes \fIx\fR for all occurrences of the atom \fIy\fR in \fIz\fR.
- \fIx\fR and \fIz\fR are arbitrary s-expressions.
- .IP "\f(CW(equal \fIx y\fP)\fR"
- .br
- Returns \fBt\fR if \fIx\fR and \fIy\fR are
- the same s-expression, otherwise \fBnil\fR.
- .IP "\f(CW(append \fIx y\fP)\fR"
- .br
- Creates a new list containing the elements of x and y,
- which must both be lists.
- .IP "\f(CW(member \fIx y\fP)\fR"
- .br
- Returns \fBt\fR if \fIx\fR is an element of the list \fIy\fR,
- otherwise \fBnil\fR.
- .IP "\f(CW(pair \fIx y\fP)\fR"
- .br
- Pairs each element of the lists \fIx\fR and \fIy\fR,
- and returns a list of the resulting pairs. The number
- of pairs in the result will equal the length of the
- shorter of the two input lists.
- .IP "\f(CW(assoc \fIx y\fP)\fR"
- .br
- Association list selector function.
- \fIy\fR is a list of the
- form \f(CW((\fIu1\fP \fIv1\fP) \fI...\fP (\fIuN\fP \fIvN\fP))\fR
- where the \fIu\fR's are atomic. If \fIx\fR is
- one of these, the corresponding pair \f(CW(\fIu\fP \fIv\fP)\fR
- is returned, otherwise \fBnil\fR.
- .IP "\f(CW(sublis \fIx y\fP)\fR"
- .br
- \fIx\fR is an association list.
- Substitutes the values in \fIx\fR for the keys in \fIy\fR.
- .IP "\f(CW(last \fIl\fP)\fR"
- .br
- Returns the last element of \fIl\fR.
- .IP "\f(CW(reverse \fIl\fP)\fR"
- .br
- Returns a list that contains the elements in \fIl\fR,
- in reverse order.
- .IP "\f(CW(remove \fIe l\fP)\fR"
- .br
- Returns a copy of \fIl\fR with all
- occurrences of the element \fIe\fR removed.
- .IP "\f(CW(succ \fIx y\fP)\fR"
- .br
- Returns the element that immediately follows the atom \fIx\fR
- in the list \fIy\fR. If \fIx\fR does not occur in \fIy\fR
- or is the last element, \fBnil\fR is returned.
- .IP "\f(CW(pred \fIx y\fP)\fR"
- .br
- Returns the element that immediately precedes the atom \fIx\fR
- in the list \fIy\fR. If \fIx\fR does not occur in \fIy\fR
- or is the first element, \fBnil\fR is returned.
- .IP "\f(CW(before \fIx y\fP)\fR"
- .br
- Returns the list of elements occurring before y in x.
- If \fIy\fR does not occur in \fIx\fR
- or is the first element, \fBnil\fR is returned.
- .IP "\f(CW(after \fIx y\fP)\fR"
- .br
- Returns the list of elements occurring after y in x.
- If \fIy\fR does not occur in \fIx\fR
- or is the last element, \fBnil\fR is returned.
- .IP "\f(CW(plist \fIx\fP)\fR"
- .br
- Returns the property list for the atom \fIx\fR.
- .IP "\f(CW(get \fIx i\fP)\fR"
- .br
- Returns the value stored on \fIx\fR's property list
- under the indicator \fIi\fR.
- .IP "\f(CW(putprop \fIx v i\fP)\fR"
- .br
- Stores the value \fIv\fR on \fIx\fR's property list under
- the indicator \fIi\fR.
- .IP "\f(CW(remprop \fIx i\fP)\fR"
- .br
- Remove the indicator \fIi\fR
- and any associated value from \fIx\fR's property list.
- .IP "\f(CW(mapcar \fIf l\fP)\fR"
- .br
- Applies the function \fIf\fR to each element of \fIl\fR and returns
- the list of results.
- .IP "\f(CW(apply \fIf args\fP)\fR"
- .br
- Calls \fIf\fR with the arguments \fIargs\fR, \fIe.g.\fR,
- .ZB
- (apply 'cons '(a (b)))
- .ZE
- is equivalent to
- .ZB
- (cons 'a '(b))
- .ZE
- .SH
- Syntactic conventions.
- .LP
- Atoms take the following forms:
- .IP "\fIRegular identifiers\fR"
- .br
- Atoms matching the regular expression
- .br
- .sp
- .nf
- \f(CW[_A-Za-z][-A-Za-z_0-9]*\fR
- .fi
- .sp
- The initial value of an identifier is \fBnil\fR.
- .IP "\fIIntegers\fR"
- .br
- Atoms matching the regular expression \f(CW[0-9][0-9]*\fR.
- Integers are constants, \fIi.e.\fR, evaluate to themselves.
- .IP "\fIWeird atoms\fR"
- .br
- Identifiers matching the regular expression \f(CW".*"\fR. Weird
- atoms are not constants.
- .LP
- A semicolon introduces a comment, which continues for the rest
- of the line.
- .SH
- Usage.
- .LP
- The command for running the interpreter is
- .ZB
- walk [ \fIfiles\fP ]
- .ZE
- on BSD UNIX and derivative systems, or
- .ZB
- awk -f walk [ \fIfiles\fP ]
- .ZE
- on UNIX System V.
- The file name \f(CW\-\fR represents the standard input.
- This can be omitted if no other files are being read in, or
- if the interpreter is being run non-interactively.
- .PP
- Normally, the interpreter is used interactively, augmented
- with the functions defined in \fBwalk.w\fR, and, perhaps, other files.
- The command line to use for this purpose is
- .ZB
- walk walk.w [ \fIother files\fP ] p -
- .ZE
- The interpreter will first read \fBwalk.w\fR, printing
- the results of evaluating the function definitions
- therein. Then it will read \fBp\fR. This file
- contains no LISP definitions; the interpreter recognizes
- it by name and prints a prompt to signal the user that all
- the prerequisite files have been read and that the interpreter
- is waiting for input. (This is the only way to get \fBawk\fR
- to do this; this can be hidden from the user
- with a shell program that invokes the interpreter if desired.)
- Thereafter, it will evaluate expressions typed in by
- the user, printing a prompt after each one.
- Normally the prompt is \f(CW\->\fR; the first character of the
- prompt changes when appropriate to an integer that represents the number
- of unmatched left parentheses read in so far.
- .PP
- The interpreter exits when it encounters the end of its last input file.
- If this file is the standard input, the number of LISP objects
- created is reported.
- .PP
- Several files defining auxiliary functions are provided.
- .SH
- Implementation.
- .PP
- So that it can run on any UNIX system, The
- LISP interpreter has been written using the UNIX V7 version
- of \fBawk\fR, which
- predates the version described in \fIThe Awk Programming Language\fR [2].
- The only complex data type provided by this language is the array.
- Data that in C might be stored in structures is
- represented, therefore, using
- multiple arrays, one for each field. For example, the C code
- .ZB
- p = allocate_cell();
- p->car = s;
- p->cdr = NIL;
- .ZE
- can be approximated with:
- .ZB
- p = ++cell;
- car[p] = s;
- cdr[p] = nil;
- .ZE
- Lists (using nested array references)
- and stacks are also simulated with arrays. The most important data
- structures are explained in the program and in the following description.
- .PP
- As is usual for LISP implementations, the interpreter is constructed as a
- loop that reads an s-expression, evaluates it, and prints the
- result. The reader collects an s-expression, reading multiple
- input lines if necessary. Like the other two phases of the
- interpreter, this is a recursive procedure and
- in \fBawk\fR this must be managed explicitly.
- When an s-expression is read, its internal representation in
- list structure is formed using the stack \fBread[]\fR. Atoms and
- \fBcons\fR operators are pushed onto the read stack and periodically
- `reduced' or replaced
- with list cells when a complete list has been read; the reader returns
- an atom or list on the top of the stack.
- The reader must
- be able to return an s-expression in the middle of an input line, so the
- entire interpreter is enclosed in a loop that allows the
- current input line to be completely scanned before the next
- input record is read. The general outline is:
- .ZB
- BEGIN {
- .ft I
- initialize interpreter
- say hello if interactive
- .ft R
- .ft CW
- }
-
- {
- .ft I
- initialize reader variables
- .ft R
- .ft CW
-
- while (\fIchars left on this line\fR\f(CW)
- {
- .ft I
- read
- .ft R
- .ft CW
-
- if (\fIhave read an s-expression\fR\f(CW)
- {
- .ft I
- eval
-
- print
- .ft R
- .ft CW
- }
- }
-
- .ft I
- prompt if interactive
- .ft R
- .ft CW
- }
-
- END {
- .ft I
- say goodbye if interactive
- exit
- .ft R
- .ft CW
- }
- .ZE
- .PP
- The evaluator maintains two stacks, one for input and one for output.
- The result returned by the reader is copied onto the input stack
- (\fBeval[]\fR), and evaluated according to the usual LISP rules.
- Evaluated s-expressions are placed on the output stack, \fBarg[]\fR.
- When an intrinsic function that takes evaluated arguments appears on
- the top of the evaluation stack, its arguments are popped from the
- argument stack. Functions (like \fBcond\fR) that take unevaluated
- arguments are handled as special forms before their arguments have been
- pushed onto \fBeval[]\fR. The arguments are handled differently
- depending on the
- semantics of the function. Lambda (user-defined) functions are evaluated
- by temporarily binding the formal parameters in the function
- definition to the results of evaluating the actual arguments with which
- the function was called, and then evaluating the body of the function.
- Temporary bindings only are kept on a special
- pushdown list (the \fIalist\fR). Atoms have a global value that is
- stored separately; this keeps the alist small.
- .LP
- The evaluation procedure is sketched below:
- .ZB
- .ft I
- atom?
- .ft CW
- lambda
- .ft R
- restore previous environment (lambda function
- body has been evaluated already)
- .ft I
- constant?
- .ft R
- return
- .ft I
- bound?
- .ft R
- look up local value
- .ft I
- otherwise
- .ft R
- return global value
-
- .ft I
- intrinsic function?
- .ft R
- apply to already evaluated arguments
-
- .ft I
- lambda function?
- .ft R
- bind formal parameters to already evaluated arguments
- evaluate function body
-
- .ft I
- form?
- intrinsic function application?
- .ft R
- .ft CW
- quote
- .ft R
- return unevaluated argument
- .ft CW
- cond
- and
- or
- .ft R
- begin evaluating arguments according to operator semantics
- .ft CW
- list
- .ft R
- expand to repeated applications of cons
- .ft I
- other?
- .ft R
- push function variable, arguments
-
- .ft I
- lambda function application?
- .ft R
- push lambda function, body
- .ft I
- other?
- .ft R
- eval \fBcar\fR, \fBcdr\fR
- .ZE
- .PP
- When the evaluation stack is emptied, the result
- is popped from the argument stack and printed. A stack is again
- used to manage recursion.
- .SH
- Conclusion.
- .PP
- The goal of writing a small LISP interpreter and extending it
- in LISP has been realized. Though it was not my original intention, it
- would be easy to incorporate the
- LISP functions as intrinsics, and many other extensions (such as
- numeric functions) could be made, in which case the interpreter
- might fulfill more than a pedagogic function.
- Even so, it can be used as is for an introduction to LISP programming
- and implementation concepts. I hope it also inspires more of us to
- learn how to program in \fBawk\fR!
- .SH
- References.
- .IP "[1]"
- .br
- McCarthy, J. Recursive Functions of Symbolic Expressions
- and their Computation by Machine, Part
- I. \fIComm. ACM\fR, 3, 4, pp. 185-195
- April 1960
- .IP "[2]"
- .br
- Aho, A., Weinberger, P., & Kernighan, B.W. \fIThe
- Awk Programming Language\fR. Addison-Wesley, Reading, MA 1988
-