CP/M

home *** CD-ROM | disk | FTP | other *** search

/ CP/M / CPM_CDROM.iso / cpm / draco / draco-1.ark / DRACO.REF < prev next >

Wrap

Text File | 1986-11-12 | 66KB | 1,287 lines

Draco Quick Reference Guide Copyright 1983 by Chris Gray I. Using the compiler under CP/M (CP/M is a trademark of Digital Research Incorporated) draco f1[.drc] f2[.drc] ... fn[.drc] Each file is a separate compilation; they need not be related. If no extension is given, then .DRC is assumed. For each file, if the compilation is successful, a corresponding .REL file is produced. Standard CP/M ambiguous file specifications are accepted - all matching files will be compiled. II. Using the assembler under CP/M das f1[.das] f2[.das] ... fn[.das] Each file is a separate assembly. If no extension is given, then .DAS is assumed. For each file, if the assembly is successful, a corresponding .REL file is produced. III. Using the link editor under CP/M link f1[.rel] f2[.rel] ... fn[.rel] fa.lib fb.lib ... fz.lib Each file is a .REL file produced by DAS or DRACO, a .LIB file produced by DLIB, or a .PLD file produced by LINK. If no extension is given on a file name, then .REL is assumed; thus libraries must have the .LIB given explicitly. Flags can be interspersed with the file names. Each flag starts with a minus sign (UNIX convention) and consists of several flag letters, and perhaps one flag value. The recognized flags and their meanings are: m - produce a map of the load address of the various procedures and the addresses of all local and global variable groups. This map is sent to a file whose base name is the same as that of the resulting .COM file (see later) and whose extension is ".MAP". The symbols will be sorted alphabetically. a - produce a map file as above, but sort the symbols by their load addresses. This is useful when debugging. i - suppress the normal Draco initialization code. This option should only be used by assembler programmers. Note: LINK allocates data areas before code areas if -d is not specified. Thus, if neither -s nor -d are specified, the first .REL file must not have any file variables, and the first procedure in it must not have any local variables. If either set of variables exist, they will be first in the .COM file, and will be at the entry point to the program. This option also prevents the standard libraries TRRUN.LIB and TRCPM.LIB from being automatically searched. (The 'TR' is from a previous name of the language.) q - produce a program which will return to CP/M quickly. This is done by using an alternate initialization section which leaves CP/M's CCP untouched in memory, and which will simply return to CP/M without doing a warm start. This flag should not be given when linking programs which use CP/M's location 6 (pointer to warm boot routine) to determine the top of available memory. The pointer so returned does not take the CCP into account, and so the resulting program will probably not run. A very smart program could determine if it had been compiled with '-q' and subtract the size of the CCP from the top-of-memory pointer. The standard storage allocator does this by referencing the special symbols '_DataEnd' and '_CodeEnd' which point to the ends of the data and code portions of the final object file. o - specifies the name for the resulting .COM file. The name must immediately follow the 'o', with no intervening spaces or other flags. If no explicit name is given, then the name is derived from the name of the first .REL file in the parameter list. c - specifies the first address to be used for program (code). The value must follow immediately and is in hexadecimal. This option is normally only useful for people who wish to produce .COM files suitable for PROM burning. The default program start address is 0x100, which is the standard CP/M entry point. d - specifies the first address to be used for data (variables). The value must follow immediately and is in hexadecimal. This option should be used for programs which have large data areas, or for programs which are to be burned into PROMS. If no value is given, then data areas are intermixed with code areas. s - the linker is forced to take two passes for it's operation. The first pass determines the total code size of the resulting program, and the second pass does the actual linking, using a data start address (as with the '-d' flag) just past the address of the last byte of code. The 's' stands for small - the resulting .COM file will be as small as possible (it will contain only the code of the program) and the total space occupied will be a minimum, since no gaps will exist. In linking with no flags given, code and data will be intermixed, thus the data space will occupy disk space in the .COM file v - verbose. The linker prints out the names of the .REL and .LIB files it is processing. This gives you something to watch when linking a large program on slow disks. p - requests that a .PLD file be produced representing the entire program. This file is a machine readable map, in address order, of all symbols loaded. The format is a 4 character hexadecimal address, a space, the symbol name, and CR/LF for each symbol, and an extra CR/LF at the end of the file. When such a PLD file is given to LINK as input, the named symbols are assumed to pre- exist at the given addresses. This could be because they are in ROM or because they are in a program which has dynamically loaded the program that is referencing them. The Draco linker provides partial support for a type of module, i.e. for a fully independent package of routines with its own local variables, it's own initialization and termination code, and a set of procedures which are exported to its 'clients' or users. Most of this is provided by the normal features of the Draco language. A module is written as a single Draco source file, with its own local variables. Clients import procedures from it in the normal way, using 'extern' declarations. The additional support provided by the linker works as follows. If a procedure in a library is called, and the file which that procedure came from contains another procedure called "_initialize", then the linker will load "_initialize" and will generate code at the beginning of the program to call it. Similarly, a routine called "_terminate" will be automatically loaded and called at the end of the program (directly returning to the system via "exit" or "SystemReset" will bypass the termination call). There can be multiple occurrences of these special symbols, so long as there is only one per source file. In the interests of portability, all versions of Draco will have available a routine called "exit", which has a single integer parameter. This routine will return directly to the host operating system. The parameter passed is an error indicator, and should be 0 to indicate successful execution. CP/M cannot make use of this returned error code, but other systems can, so this facility is provided to simplify the transportation of Draco programs among different systems. LINK's operation can involve one or two passes, each pass consisting of a read of the .REL files and one or more reads of the .LIB files. The second pass is necessary only if the '-s' flag is given or if the program is too large to fit into the available memory. When operating in two-pass mode, LINK can produce a final .COM file larger than the amount of memory on the machine on which LINK is running. LINK will automatically switch to two-pass mode whenever the available memory runs out. Libraries produced by DLIB have a directory at the front which indicates where in the library all of the individual procedures can be found. LINK loads only those procedures which have been referenced, and it will scan the libraries several times if needed to resolve all references. If a procedure is loaded which references file-level global variables, then space for those variables is allocated, and all procedures from that original source file will reference them. When running under CP/M 1.0, random access is not supported, so the entire library is actually read in, but under later versions of CP/M, random access is used to reduce the amount of actual disk I/O done. If a given symbol is present in more than one of the libraries being scanned, then it is loaded from the first library encountered after the first reference to the symbol. All .REL files are scanned first, in the order they appear on the LINK command, then all .LIB files are scanned, in the order they appear on the LINK command. The entire set of .LIB files is rescanned if further unresolved references occur. If the first reference to a symbol comes from a library member, then the search for that symbol starts with the remainder of that library and continues on with later ones. Because of this strictly forward searching, the order of placement of symbols in libraries can be important. If a procedure in a given source file which is to be part of a library references another procedure in that source file, then the referenced procedure should be forward declared and appear LATER in the source file than its referencer. This approach minimizes the number of library scans needed to resolve all references. All of the standard libraries are set up this way. Unless the '-i' flag is given, LINK will automatically add the libraries 'TRRUN.LIB' and 'TRCPM.LIB' to the end of the set of libraries searched. TRRUN.LIB contains the run-time system, including support needed by the compiler, the I/O library and the utility library described later. TRCPM.LIB is an interface library which provides interface routines to all of the CP/M entry points. Entry point names are exactly as given in the CP/M manuals - e.g. SetDMAAddress, ReadSequential, etc. Most simple programs will not need any other libraries, and thus can be linked by simply giving all of the .REL files. A program with only one source file xxx.drc can thus be compiled and linked by: draco xxx link xxx A program with source files p1.drc, p2.drc and p3.drc, which references the CRT library can be fully compiled and linked by: draco p? link p1 p2 p3 crt.lib For a final version, the '-s' flag should probably be given. IV. Using the disassembler under CP/M ddis [-r] f1[.rel] f2[.rel] ... fn[.rel] Each disassembly is separate. The files being disassembled can be produced by either the compiler or the assembler. The disassembler knows about all of the conventions and special code sequences produced by the compiler. If no extension is given on a file name, then .REL is used. Each disassembly produces a .DIS file corresponding to the .REL file. The contents of the .DIS file is assembler source, suitable for assembling with DAS. The disassembler does not generate correct declarations for file variables (it uses only the information passed in the relocation information, which may not completely identify them without processing the entire program). Also, since the assembler does not handle global variables, no declarations for them are produced. The '-r' flag requests that code labels not appear and branches use a position-relative form (*-n or *+n). This is useful when working with a printed, out-of-date disassembly listing, since most of the branches will still be correct. V. Using the librarian under CP/M dlib f[.lib] dlib f[.lib] f1[.rel] f2[.rel] ... fn[.rel] In the first form, the already existing library file is read in, and a listing of it's contents is produced on the console. In the second form, the (1 or more) .REL files are read, and a .LIB library is constructed from them. This is a two pass operation, and the name of the .REL file currently being read is printed on each pass. VI. Using the cross-referencer under CP/M xref [-supo<file>] x1[.rel] ... xn[.rel] A cross reference listing of the procedures in the given .REL files is produced. If no flags are given, the listing is produced on the console. If '-p' is given, the listing is sent to the printer. If '-o' followed by a filename is given, the listing is sent to that file. Flag '-s' tells the cross-referencer to include procedures whose names start with an underscore ('-') - the default is to omit such procedures, since they are usually part of the run-time system or private to a library. Flag '-u' tells the cross-referencer to include procedures whose names start with an upper-case letter - the default is to omit such procedures, since they are usually library routines, not part of the current program. Note that the cross referencer works on the .REL files, NOT the .DRC source files, thus it cross references only procedure calls, and only those that are not conditionally compiled out. VII. Draco source files Source files for the Draco compiler are either normal source files, usually with an extension of .DRC, or are declaration include files, usually with an extension of .G. Declaration include files can contain only declarations (constant, type, external, variable), and the symbols declared in them are called 'global', and are available to all procedures in all source files which include that particular include file. Normal source files contain, in the order given: - 0 or more include file references. These must start in the first column of the first line, and consist of a backslash (\) or number sign (#) followed by the name of the include file being referenced. Several such references may occur, one per line, with no intervening spaces or comments. When a program consists of more than one source file, each of the source files will usually have the same set of include file references. The link editor requires that the 'global' variables be consistent among all .REL files being linked. .REL files being put into libraries may not have any 'global' variables. - declarations. These declarations are called 'file' declarations, and are available to all procedures in that particular source file. Short Draco programs, consisting of only one source file, will not have any 'global' declarations - the 'file' declarations will play that role. In larger programs, 'file' declarations are still useful, in that procedures associated with a given portion of a program can be assembled in one file, and any declarations private to those procedures can be 'file' declarations in that file, and thus will not be accessible to any other procedures. Also, 'file' declarations are common in files which are to be used as part of a library, since any 'file' variables will be allocated (assigned real memory addresses) by the link-editor whenever a routine from that file is referenced. - procedures. Each procedure can have it's own set of local declarations (it's parameters are assumed to be part of that set). These declarations are accessible only within that procedure, and the values of any variables are not preserved between activations of the procedure. Procedures defined in a source file are considered to be declared for the remainder of that source file. If circular referencing of procedures is required, then a 'file' level extern declaration of the procedure can be given before the procedure is used, and, so long as that declaration is consistent with the final definition, the compiler will not complain. Because of the compiler's requirement that all symbols be declared before they are used, the simplest arrangement of source files is that which puts the lowest-level routines first, and the highest-level routines (those which call the lower-level ones) last. Thus, short programs consisting of only one source file will normally put procedure 'main' last. (All Draco programs must have a procedure 'main', since the initialization code starts program execution by calling 'main'.) In keeping with a highly readable convention used by some UNIX programmers, it is suggested that all 'global' and 'file' symbols begin with a capital letter, and all local symbols begin with a small letter (other than constants). In keeping with this convention the names of all routines in all libraries supplied with Draco begin with a capital letter. Even though procedure definitions are technically 'file' declarations, such procedures, unless they are to be part of an externally available library, should have names beginning with small letters. Draco takes a different approach to scope than many standard algorithmic languages. Languages such as Pascal, Algol68, etc. allow the declaration of an identifier, at an inner scope level, which is already declared at an outer scope level. For the duration of that scope, the new, inner, declaration masks the outer declaration so that the outer meaning of the identifier is not available. In Draco, it is illegal to attempt to declare an identifier which already exists, regardless of the scope levels of the declarations (Draco only has 3 levels anyway - global, file and local). Several procedures can declare the same identifier locally (names such as 'i', 'p', 'n', etc. are quite common), but they cannot declare a name which exists at either the global or file level. Similarly, at the file level, a name cannot be used which is already in use at the global level. This approach is used so as to eliminate the problems arising from accidentally masking an outer meaning in a situation which uses that outer meaning, but also declares a similar, inner meaning. This situation can result in bugs which are very difficult to detect. If the naming convention suggested above is used, little inconvenience will occur. Also, since Draco imposes no limit on the length of identifiers, there should be no problem in choosing meaningful ones. VIII. Declarations Declarations in Draco can be of constants, types, variables or external procedures. Constant declarations consist of the type, followed by the identifiers, along with an '=' and their value. Constants can be numeric (signed or unsigned), single character, or 'chars' values. In keeping with a highly readable convention used on UNIX systems, it is suggested that the names of constants be fully capitalized. Example constant declarations: word MAX_LENGTH = 1000, ENTRY_COUNT = 10; char BEL = '\(7)', BS = '\b', LETTER_F = 'f'; *char AUTHOR = "Sam Spade"; unsigned 255 LIMIT = 255 - 1; The values used for constants can be any expression whose value can be determined at compile time by the compiler. This can include conditional expressions, so long as the conditions are known at compile time. Type declarations consist of the word 'type', followed by the names of the new types, each followed by '=' and the type's definition. The various kinds of types in Draco are as follows: - signed numeric types. These are specified by the word 'signed', followed by a constant expression giving the upper bound on the positive values allowed. The negative values have a similar limit (one less for 2's complement machines). The maximum value of the limit will vary from machine to machine and possibly from version to version of the compiler. All versions will allow at least 'signed 32767'. In programs where execution time must be minimized, the programmer should use numeric types with the smallest possible range. This allows the compiler to generate more efficient code for CPU's which do some types of arithmetic better than others. - unsigned numeric types. These are specified by the word 'unsigned', followed by a constant expression giving the upper bound on the values allowed (the lower bound is 0). Signed and unsigned values can be mixed in arithmetic operations. The two kinds of values can be compared for equality, but not for magnitude (they use the same bit pattern for different values). - enumerated types. These are specified by the word 'enum', followed by an open brace ({), followed by a list of the named values of this type, followed by a close brace (}). The values named in the list are the only allowed values of this type. They can be compared (all comparisons are meaningful), subtracted (the result is an unsigned numeric), and can have a numeric added to or subtracted from them (the result is another value of the same enumerated type). This kind of type is usually used for flag values, where the flag can take on a limited set of values. A sample enumerated type: enum {c_red, c_yellow, c_blue, c_black, c_white} - pointer types. These are specified by an '*' preceding the type which is to be pointed to. Thus, '*int' is a type specification meaning 'pointer to integer'. In the language's single exception to the requirement that all identifiers be declared before they can be used, the pointed to type can be an undeclared symbol, which is assumed to be an as-yet undeclared type, which must be declared before the end of the current set of declarations. This rule relaxation is used to allow the construction of circularly referencing structures. Pointer values can be compared, subtracted (as long as the values are of the same pointer type), and can have a numeric value added to or subtracted from them. Unlike similar operations in the 'C' language, the value being added or subtracted is not multiplied by the size of the pointed-to type. Pointer values are usually generated via the '&' address-of operator, or by using a 'chars' constant, which is of type *char. The predefined value 'nil' is compatible with all pointer types. - array types. These are specified by a left square bracket ([), followed by a list of constant expressions giving the size of the array in that dimension, followed by a right square bracket (]), followed by the type of the array elements. There is no limit on the number of dimensions of an array, but the user must keep in mind the amount of memory occupied by the array as compared to the amount of memory available on the computer system. Arrays are stored in row-major order, i.e. when scanning along an array in memory, the last index varies most frequently. Array values can be assigned. Sample array types: [MAX_NAME_LENGTH + 1] char [M, N, P] int [BLOCK_COUNT] [BLOCK_LEN] unsigned 32 - structure types. These are specified by the word 'struct', followed by a left brace bracket ({), followed by the types and names of the fields of the structure, followed by a right brace bracket (}). Unlike some other languages, Draco does not allow field names to be re-used; all must be unique. The easiest way to do this is to follow yet another highly readable UNIX convention which names all fields of a structure as a short abbreviation of the structure name (1 - 3 letters), followed by an underscore (_) and the mnemonic name of the field. Like array types, structure types can be assigned. Some structure declarations: type ProcessState_t = struct { word st_programCounter, st_stackPointer; [8] word st_registers; byte st_statusRegister; }, Process_t = struct { int pr_priority; *Process_t pr_parent, pr_children, pr_nextSibling; ProcessState_t pr_state; *ProcessQueue_t pr_waitQueue; }, ProcessQueue_t = struct { *ProcessQueue_t pq_next; *Process_t pq_this; }; - union types. Union types are declared exactly like structure types except that the word 'union' replaces the word 'struct'. Union types are similar to unions in 'C', in that they specify a type which is a set of types. The space allocated for a value of a union type is the maximum of the spaces needed for the various member types in the union. The programmer informs the compiler which of the member types is currently active by selecting the member type, exactly as a field of a structure is selected, when the union value is referenced or assigned to as other than the union type. Union types are useful when constructing networks of nodes, and the nodes are of differing natures, but all are pointed to by other nodes. The alternative of having separate pointers for each possible node type is very wasteful of memory. Sample union type: (this one from a railroad simulation) type Track_t = union { int tr_straight; /* straight track option */ struct { /* turnout option */ int trn_length; bool trn_open; bool trn_isRight; } tr_turnout; }; - procedure types. These types are declared similarly to actual procedure headers, except that no procedure name occurs, no machine specific options (e.g. 'nonrec') can occur, and the names of the parameters are required, but are irrelevant. Procedure values can be compared for equality, assigned, and called. Sample procedure types: proc (int a, b)int proc (proc (char c)void putChar; *char charsPtr)void proc ([12]**int x, y)[12]**int - operator types. This kind of type is Draco's (somewhat limited) way of being an extensible language. Syntactically an operator type consists of an open parenthesis, a string constant, a comma, a base type, a comma, a numeric constant and a close parenthesis. The string constant is a prefix which is used to build the names of the procedures that the compiler will generate calls to in order to do operations on values of this new type. The base type is the type that is the underlying representation of this new type, and the numeric constant is a set of 16 bits, indicating which operations are enabled for this type. Operator types will be explained in a later section. Types in Draco can be combined in arbitrary ways. The only limitations imposed by the compiler are those inherent in the sizes of the type table and the type information table. The question of type equivalence is answered in Draco in the following way: two types are equivalent if they are equivalently constructed from equivalent component types. The determination of type equivalence is done while the compiler is parsing the type specification. Thus, in the following: [12] int a; [10 + 4 / 2] int b; 'a' and 'b' will have the same type. The type of 'b' is equivalent to the type of 'a', and so will BE the type of 'a'. If a type is given a name via a type declaration, however, then that type is unique and is not equivalent to any other named type. Thus, if we declare: type T1 = [10] int, T2 = [10] int; T1 a; T2 b; [10] int c; Types T1 and T2 are not equivalent, and 'a' and 'b' cannot be assigned to one-another. Both can be assigned to or from 'c', however, else there would be no way to generate values of named types. This scheme is an attempted compromise between the need for usability of named types, and the desire to have the compiler protect us from mistakes when two named types just happen to have equivalent definitions. Signed or unsigned numeric types which are named are always compatible with other named or unnamed numeric types, whether equivalent or not. The following types are supplied predefined: int - signed numeric using the standard fully supported word size on the host processor short - smaller sized signed value (often 8 bit) word - unsigned numeric, same size as int ushort - unsigned numeric, same size as short byte - unsigned numeric, one byte long (8 bits) char - enumeration type of all 256 character values bool - enumeration type consisting of 'false' and 'true' Most programs can safely use types 'int' and 'word', since they will always be at least 16 bits long. The careful programmer will usually use his/her own signed and unsigned types, however, so that the reader is always aware of the range of possible values, and so that compilers can optimally decide the implemented size of variables (which may vary from processor to processor). Variable declarations consist of the type (either named or explicit) followed by a comma separated list of identifiers. This format is similar to that used for constants, but combining the two is not advised, since doing so can be confusing. External procedure declarations consist of the word 'extern' followed by a list of procedure headers, complete with procedure name, parameter types and names, and result type. IX. Procedures Each Draco procedure definition begins with the word 'proc', followed by any special machine dependent modifiers, followed by the name of the procedure, followed by a procedure header, followed by a colon, followed by the body of the procedure and a final terminating word, 'corp'. A procedure header consists of '(', optional parameter declarations, ')', and the result type (or 'void' for procedures which don't return a result). Note that the parentheses are required, even if no parameters are declared. Parameter declarations are just like variable declarations. Unlike Pascal and C, Draco provides a way for arrays of differing sizes to be passed to a common procedure. An array parameter can use an asterisk (*) for the size of one or more of its dimensions, instead of the normal constant expression. When such a procedure is called, the compiler will automatically pass the true size of the dimensions of the passed array along with the array. These true sizes can be determined inside the procedure via the 'dim' construct. Note that this method can only be used for parameter arrays, and can only be used for top level arrays (e.g. if the parameter is an array of arrays, then only the top- level array can have '*' sizes). If the procedure is to return a result, the type placed between the closing ')' of it's header and the following ':' is the type expected by the compiler. Conversions among various numeric types are allowed here as elsewhere. The result is returned by placing it at the end of the procedure's body, just before the closing 'corp'. There must not be a semicolon after the result, since the compiler uses the semicolon as a signal that the previous unit should have been a statement. As a sample procedure, here is the old standard, "Towers of Hanoi": proc hanoi(int n; *char from, to, using)void: if n > 0 then hanoi(n - 1, from, using, to); writeln("Move disk ", n, " from peg ", from, " to peg ", to); hanoi(n - 1, using, to, from); fi; corp; A standard procedure with a result: proc minimum([*] int a)int: int i, min; min := a[0]; for i from 1 upto dim(a, 1) - 1 do if a[i] < min then min := a[i]; fi; od; min corp; X. Statements in Draco Draco is a fairly standard programming language, along the lines of Pascal, C, and Algol. Where several statements are allowed, they are separated by semicolons (the semicolon is a separator, not a terminator). The standard statement forms in Draco are: - assignment statement. This is the usual, consisting of the destination, a ':=', and the source expression. - procedure call statement. This consists of the procedure's name (or an expression yielding a procedure), followed by the procedure's parameters, enclosed in parentheses. The parentheses must be present, even if the procedure has no parameters (this makes it very clear when something is being called - useful for procedures such as random number generators which have no parameters but return a result). The parameters passed must be compatible with those specified in the defining procedure header, in terms of both type and number. - if statements. If statements in Draco are syntactically identical to if statements in Algol68. The simplest form consists of the word 'if', followed by an expression of type 'bool', followed by the word 'then', followed by a sequence of statements to be executed when the bool yields 'true', followed by the word 'fi'. An 'else' clause, which is executed when the bool yields 'false', can be placed between the 'true' statements and the 'fi'. An 'else' clause consists of the word 'else' and a sequence of statements. As in Algol68, alternate conditions, consisting of the word 'elif', a bool expression, the word 'then', and a statement sequence, can be placed between the first statement sequence and the 'else' (or 'fi' if there is no 'else'). In that case, the conditions are evaluated one at a time, until one is found that yields 'true'. The corresponding statement sequence is then executed. Only if no condition yields 'true' will the 'else' statements be executed. When a condition has yielded 'true', no more conditions will be evaluated. As an example, if a then b elif c then d elif e then f else g fi is equivalent to if a then b else if c then d else if e then f else g fi fi fi The advantage of the 'elif' form is fairly obvious - it has far less indentation for the same logic. The standard if statement is the basis for the conditional compilation feature of the Draco compiler. If the condition for an if statement can be evaluated at compile time, then no code is generated for the if statement, and code for only one of the branches is generated. This feature is not as flexible as that provided by full macro preprocessors, but it has the advantage that the compiler always checks all branches for correct syntax and semantics, thus the programmer can be sure that changing the flag value controlling the conditional compilation will not cause the program to stop compiling. (With macro pre-processors, and conditional inclusion, as supported by most C compilers, the compiler does not even see the code which has been conditioned out.) By including conditional compilation in the compiler, rather than requiring a separate pre-processor, compilation times for Draco programs can be significantly less. One common use for conditional compilation is that of including debugging statements, dependent on a global debugging flag. E.g. bool DEBUG = false; ... if DEBUG then writeln(DebugOut; "We got to this point, key values are:"); ... fi; In situations like this, the Draco compiler will generate no code at all for the entire if statement. If the DEBUG flag is set to 'true' instead, then the debugging code will appear, but there will still be no code to actually test DEBUG (DEBUG doesn't even exist). For more complex debugging, the DEBUG flag can be a number, specifying the level of debugging required. Another common use of conditional compilation is to have one source file which can produce two or more different versions of a program, depending on one or more flags. - while statements. The standard while statement consists of the word 'while', followed by a bool expression, followed by the word 'do', followed by a sequence of statements (the loop body), followed by the word 'od'. Draco allows an extension of this form, in which a sequence of statements can be placed between the 'while' and the bool expression. This extension allows the same 'while' construct to serve as beginning, middle and end exit loops. E.g. while write("Enter command: "); command := getCommand(); command ~= HALT do processCommand(command); od; - for statements. The for statement is the standard way in Draco of iterating over a fixed sequence of values. It is similar to the for statements in most programming languages. It consists of the word 'for', followed by the name of the variable to use as an index variable, followed by the word 'from' and an expression giving the start of the range, optionally followed by the word 'by' and an expression giving the step amount, followed by either the word 'upto' or the word 'downto' and an expression giving the end of the range, followed by the word 'do', followed by a statement sequence, and finally, the word 'od'. In Draco, the direction of the loop (increasing or decreasing) is set at compile time, by the selection of 'upto' or 'downto'. If the 'by' part is omitted, then either +1 or -1, whichever is appropriate, is used. The for loop terminates when the index variable attains the last possible value between the two limits (inclusive). Thus we have the loop for i from 1 by 5 upto 13 do ... od; stepping 'i' through the values 1, 6, and 11. The index variable can be numeric (signed or unsigned), an enumeration value, or a pointer value. The limits must be compatible with the index variable, and the 'by' value, if present, must be numeric. Thus we can have a loop which steps through every second letter of the alphabet: for ch from 'a' by 2 upto 'z' do ... od; Most programs which do a lot of computation have a lot of for loops in them (fancy compilers are an exception). Thus, it is beneficial if the compiler can generate fairly fast code for for loops. The Draco compiler does a number of fancy tricks with for loops. Because of this, it is important that none of the assumptions made by the compiler are broken. Thus, the program should never attempt to assign a value to the for index variable within the for loop. (A later version of the compiler may be able to flag such usages as errors.) - case statements. Case statements in Draco are similar to those in many languages; they are of the variety where the individual alternatives being selected among are an explicit part of the case statement. A default alternative is also available. The syntax is as follows: the word 'case'; followed by the expression being used as a selector; followed by several alternatives, each consisting of 1 or more alternative index values given as the word 'incase', a constant expression, and a colon. Each alternative then has a body, which is a sequence of statements to be executed when that alternative is selected. The entire case statement is terminated by the word 'esac'. The default case, if present, can occur anywhere among the alternatives, and consists of the word 'default' and a colon, followed by the statements of the default case. The alternative index values can be a pair of values separated by '..', in which case all values between the two (inclusive) are used. The index expression can be of any numeric or enumerated type. The alternative index values must be compatible with the index expression. A sample case statement: case ch incase 'a': incase 'A': writeln("It was an A."); incase 'b' .. 'd': incase 'B' .. 'D': x := y; y := z; default: flag := true; esac; The various Draco compilers will use different code sequences to handle case statements. At least two forms will probably be supported - one form which uses the index expression as a direct index in a (perhaps sparse) table of code addresses, and one form which uses a binary search through a sorted table of the alternative index values. The appropriate form will be selected by the compiler, based on the range and number of alternative index values. - I/O statements are discussed in a separate section later. - the 'free' construct, which can be applied to any value of a pointer type, returns the storage pointed to to the storage allocator. That storage must have been previously allocated by using 'new'. 'free' is a statement since it returns no result. - the 'pretend' type-cheating construct can be used as a statement if the type being forced is 'void'. This form is used to throw away a value, usually from a procedure, which is not needed. - the 'error' construct, which accepts a parenthesized string constant as its argument, simply uses that string as the text of an error message to print AT COMPILE TIME. This construct is useful for putting consistency checks into code. For example, if a program has been written with the assumption that "IDENTIFIER"s fit in one byte, then the following check, done somewhere in the program, would be appropriate: if range(IDENTIFIER) > 255 then error("IDENTIFIER range must be <= 255"); fi; Then, when someone comes along later and changes the definition of the IDENTIFIER numeric type, if the type is made bigger than 'unsigned 255', a compile time error message will be produced when compiling the file containing the above check. Near the check would be a good place to put comments saying why the limitation exists. - some machine dependent constructs are formulated as statements. XI. Expressions in Draco Most small processors are more efficient at doing 8 bit operations then they are at doing 16 or 32 bit operations. Because of this, the Draco compiler will normally attempt to use the smallest possible size for a given numeric type. One result of this is that the operands to an operator may not be of the same size. In such cases, the compiler will expand the smaller value (doing sign-extension on signed values) and do the operation in the larger size. The one exception to this rule involves the shift operators - the operation is always done in the size of the value being shifted (the left operand). Also, the type of numeric constants will be overridden by any non-constant operand, so long as their value will fit in that size. If both operands are constants, the larger type will be used as the result type. Similarly, the result of an operation can depend on whether that operation is done using signed or unsigned arithmetic. In cases where one operand to an arithmetic operator is signed and the other is unsigned, the operation is done as a signed operation, and the result is considered to be signed. This only affects the result for the division and remainder operations. Note that this rule is opposite to that of C, which would yield an unsigned result. This can be though of as follows: in C, the normal numeric type is signed, while in Draco, the normal numeric type is unsigned. In either language, any ocurrence of a non-normal value forces non-normal operation and result. This choice in Draco is likely to be contentious - the reasoning is that most numbers used in most programs are unsigned. I personally find C's habit of reserving '-1' as an error flag to be quite disgusting. As with size, the signedness of a constant is ignored unless both operands are constants. Draco has a fairly large set of operators. These include the familiar arithmetic operators of addition, subtraction, etc., along with a full set of bit operators (and, xor, etc.), and a few special operators. The operators are at various levels of priority, meaning that a higher priority operator will be evaluated before a lower priority one, unless there are parentheses explicitly governing the order of evaluation. This reflects the usual view that multiplication comes before addition, etc. Draco also has the usual constructs for calling functions, indexing arrays, selecting fields of structures, etc. These are included in the following table, to indicate their position in the precedence scheme. The operators and constructs, in order of decreasing precedence are: ---------- * - postfix dereferencing operator. This operator is postfix in Draco, rather than prefix as in C, so that there is never any ambiguity about the order in which the various constructs are to be applied (consider *a[i] in C, which is either a[i]* or a*[i] in Draco (I can never remember how C evaluates these)). [] - postfix array indexing. Array indexing is 0-origin in Draco, i.e. the first element of an array has index 0. The compiler will attempt to be efficient with indexing, but most microprocessors have little direct support for array indexing, so if the application is critical in terms of CPU time or program size, it may be necessary to use pointer arithmetic instead of array indexing. Values used for indexing can be of any numeric or enumeration type. . - field selection. Field selection in Draco is fairly efficient, usually requiring little, if any, extra machine code. The same notation (structure '.' field-name) is used to select the current form from a union type value. () - function calling. Function calls are identical to procedure calls, except that they return a value. The function to be called can be the result of an expression. (E.g. many versions of UNIX contain an array of structures of procedures, which is used to direct I/O calls based on the device being accessed (the array index), and the particular function requested.) ---------- & - prefix address-of operator. This operator takes the address of it's operand. The type of the value generated is 'pointer-to-X', where 'X' is the type of the operand. This operator cannot be applied to expressions which do not have an inherent address, e.g. '&(a + 1)' will not work, but '&a[i].name[j]' will. In general, these constructs are arranged in Draco in such a way that if you need brackets to express it, it's probably illegal. ---------- ~ - prefix bitwise complement operator. This and the other bit operators can only be applied to numeric values. ---------- & - bitwise and operator. >< - bitwise exclusive-or operator. << - logical left shift operator. In both shift operators, the left operand must be an unsigned numeric, while the right operand can be any numeric. The operation and result are done using the size of the left operand. >> - logical right shift operator. ---------- | - bitwise inclusive-or operator. ---------- | - prefix numeric absolute value operator. This, and other arithmetic operators, can only be applied to numeric values. (Exceptions for binary + and - are listed there.) Both the absolute value and negation unary operators always yield a signed type, regardless of the signedness of their operand. - - prefix numeric negation operator. + - prefix numeric do-nothing operator. This operator is included so that forms like '+0' can be allowed. ---------- * - multiplication operator. / - division operator. % - remainder operator. ---------- + - addition operator. In addition to numeric operands, one operand can be of an enumeration or pointer type. The resulting value will be of the same type, incremented by the other, numeric, operand. Unlike C, which pre-multiplies the numeric value by the size of the pointed-to type, Draco doesn't modify the numeric value at all. - - subtraction operator. Similar to incrementing a pointer or enumeration value, these values can be decremented by using them as the left-hand operand in subtraction. Two enumeration or pointer values of the same type can also be subtracted, yielding an unsigned numeric value. ---------- >, <, >=, <=, =, ~= - comparison operators. Most values in Draco can be compared. For some comparisons, only the equality comparisons (= and ~=) are meaningful. For example, comparing a signed numeric with an unsigned numeric can yield two different results, depending on whether a signed or unsigned comparison is used. Because of this, the compiler will not allow a signed value to be compared with an unsigned value with other than = or ~=. The values being compared must be of compatible types. Structure and array types cannot be compared, since these types might contain internal gaps due to alignment requirements, and the contents of these gaps is undefined. ---------- Along with the capability of conditional compilation provided by the if statement, the Draco compiler attempts to evaluate expressions at compile-time, so that they need not be evaluated at run-time. If both operands to an operator can be evaluated at compile time, then the operation is done at compile time, producing a constant. The evaluation is done using the highest precision supported by the compiler. The nature of the evaluation will be the same as if it was done at run-time, i.e. mixing signed and unsigned values will yield a signed result, etc. This facility is used in all places where constants appear, e.g. in array declarations, signed/unsigned declarations, case statement alternative index values, etc. There are several forms of expressions in Draco which do not involve actual operators. These include the boolean 'and', 'or' and 'not' operations. These are not classed with the normal operators, since they are actually language constructs instead. Both 'and' and 'or' will not evaluate their right-hand operand if the value of the left-hand operand is sufficient to determine the result. This is known as 'short-circuit- evaluation', or the McCarthy form of the 'and' and 'or' operators. There is no exclusive-or operation for bools, but the same result can be achieved using the ~= operator, which can be applied to bool values. Draco also allows conditional expressions - the if expression and the case expression. These forms are identical to their statement forms, except that the various statement sequences used as their alternatives must end with an expression, which is the result for that alternative. Also, if expressions must have an else part, since they must yield a result in all cases. The same feature which allows if statements to be used for conditional compilation allows the use of if expressions in constant expressions, so long as the conditions and all alternative values are themselves constant expressions. The unwise programmer can 'type cheat' (convince the compiler to allow him to do things which he would not normally be allowed to do) by misusing union types. In the hope of preventing this, Draco has an explicit construct for type cheating. It uses the word 'pretend'. The form 'pretend(expr, type)' instructs the compiler to consider 'expr' to be of type 'type', regardless of what it thinks the type must be. As a special case, 'type' can be 'void', in which case the value of 'expr' is simply discarded (this action, called voiding, is done automatically by most C compilers, often resulting in programming errors, since it is easy to do it unintentionally). The pretend construct should be used with great care, since some values cannot possibly be of some types. For example, what is supposed to happen in something like 'pretend(x + y, [10] int)'? A more innocuous form of the pretend construct uses the word 'make' instead of the word 'pretend'. This form requests that the compiler convert the given expression to the given type. This form will only allow those conversions which make sense. 'make' is normally used to expand a short value to a longer form, to force an operation to be in a longer form (e.g. to force 16 bit arithmetic on 8 bit values). The form 'dim(arrayname, number)' will be replaced by the size of the named array in the given dimension (the first dimension is dimension number 1). If the array is a parameter array and the selected dimension was declared as '*', then the value will be obtained at run-time from a hidden parameter passed along with the normal parameters, otherwise, the value is a compile-time constant and can be used in constant expressions. Note that the value is the size of the array in that dimension, which is one greater than the maximum legal index in that dimension. The form 'sizeof(type)' yields a numeric constant which is the number of bytes needed to store an object of the given type. The type can be the name of a declared type, or can be a more complex type description. Proper use of this construct is needed to allow some programs to be portable among machines which have, say, different sized integers. Most programmers will not have to use it, however. The construct 'new(type)' creates a call to the standard storage allocator to allocate a new object of type 'type'. It can be thought of as equivalent to 'pretend(malloc(sizeof(type)), *type). Note that the value returned is a pointer to the newly allocated storage, and thus its type is *type. The form 'range(type)' can be applied to signed or unsigned numeric types to return the upper limit of that type (the value given when type was declared); or to an enumeration type to return the number of values in that type. Thus 'range(bool)' is equivalent to '2', and 'range(int)' returns the maximum signed numeric value allowed with the normal integer values supported by that version of the compiler. Note that 'range(byte)' is not legal since 'byte' is not considered to be a normal numeric type, since it is forced to be exactly 1 byte long, regardless of whether that is efficient for the target machine. XII. Basic components of Draco programs Identifiers in Draco can be any length. This applies to variables, constants, types and procedure names. The link editor maintains the lack of a limit - the full name of an external procedure is used when searching for it in other files and libraries. Draco treats upper and lower case letters as distinct, thus the identifiers 'A' and 'a' are not the same. The programmer can use any convention he wishes with regard to capitalization, but the conventions mentioned previously are highly recommended. Note also that keywords in Draco are recognized only in the exact case in which they are specified. Identifiers in Draco must start with a letter or an underscore (_), and must consist of letters, digits and underscores. Comments in Draco consist of the delimiters '/*' and '*/' around the portion of the source to be commented out. Comments can span several input lines. Comments can be nested, i.e. a comment entirely within an outer comment is recognized and handled properly by the compiler. Thus, a section of code can be commented out by enclosing it in /* and */, regardless of whether it has any comments in it or not. Comments, along with 'whitespace' (blanks, tabs, carriage-returns and linefeeds) can occur between any two tokens, as well as in string breaks (see below). Numeric constants in Draco can be in decimal, octal, hexadecimal or binary. Simple numbers like '10' and '6348' are treated as decimal. Other bases are selected by preceeding the number by a prefix consisting of a '0' and a base indicator. The base indicators, which can be in upper or lower case, are 'x' for hexadecimal, 'o' for octal, and 'b' for binary. Hexadecimal digits 'a' - 'f' can be in upper or lower case. The compiler checks for proper digits for a given base and for numeric overflow in constants. Character constants in Draco come in two forms. The apostrophe (') is used to delimit single character constants, as in 'a', '.', etc. Quotes (") are used to delimit C - style strings, consisting of a sequence of characters terminated by a 0 character. In both forms, an escape convention is available. The escapes consist of a backslash followed either by a single character, or by a numeric expression enclosed in parentheses. The single character forms are: \b - the ASCII backspace character \t - the ASCII tab character \r - the ASCII carriage return character \n - the ASCII linefeed (newline) character \e - the C - style string termination character (0) Any other character used this way will be passed through unchanged. This can be used to put backslashes and quotation marks of the same type as the delimiter into the string. The convention of doubling a quote mark to produce a single one is also supported. The escape form consisting of a numeric value in parentheses must yield a constant between 0 and 255. This form can be used for special named characters, as in: write('\(BEEP)'); /* ring terminal's bell */ The multi-character form of character constants ('chars' values using ") supports the 'string break'. This is a convention which allows a long string to be split up over several input lines, and to be indented nicely. If the last thing (other than spaces, comments, etc.) on an input line is a portion of a chars constant, and the first thing (other than spaces, comments, etc.) on the next input line is a similar constant, then the two are concatenated at compile time to yield a single, longer constant. This can be carried on for as many input lines as are needed to nicely format the constant. Many CP/M systems in use today do not have full ASCII keyboards (e.g. CP/M on the Apple-II or Apple-II+). In such systems, it could be difficult to use Draco, since the language uses characters not found on the keyboards. To help alleviate this problem, the compiler recognizes the following alternate forms for some operators and characters: standard alternate \ # [ (: ] :) { ($ } $) ~= /= ~ $- | $/ _ ^ Draco allows the construction of array and structure constants for named array and structure types. The form is that of a parenthesized list of values. Such constants can be arbitrarily complex. If one is used in a constant declaration, it simply appears after the '='. If one is desired inside executable code, it must be preceeded by the name of the type in question, so that the compiler has some clue as to what is going on. For example: type type1 = struct {int field1, field2; char field3}; type type2 = [2] type1; type2 CONST = ((1, 2, 'a'), (3, 4, 'a' - FRED / 2)); type2 var; ... var := type2((-26, 13 + 2 / 7, 'a' + 2), (+1, -1, '\e')); XIII. Machine specific constructs The 8080 (CP/M) version of the compiler has several additional features, which can make certain types of programming easier. When a variable (global, file or local) is declared, it can be followed by an '@' and a numeric constant. This informs the compiler that that variable is to be located at that address. This is useful for things like memory-mapped displays and memory-mapped I/O. This same modifier can be appended to 'extern' procedures, enabling Draco programs to call routines at absolute addresses in ROMS. When declaring variables, the value given after the '@' can also be the name of some other variable. In this case, the second named variable must occupy at least as many bytes of storage as the first, and the two will then occupy the same storage. This technique can be used to "type-cheat", but the programmer is strongly advised to use 'pretend' instead, unless unreadable code is desired. This feature of the compiler is intended to be used to conserve storage space as used for variables. The 8080 processor has no really efficient way to access variables stored on the stack. This tends to make recursive programs quite inefficient. Draco sidesteps the problem by not storing any variables on the stack - all can be directly addressed at a fixed location. Recursion is allowed by using special code at the beginning and end of procedures, which saves and restores that procedure's local variables on the stack. This can be slightly time-consuming if the procedure is called often. If the word 'nonrec' is placed between the word 'proc' and the name of the procedure, then this special code is ommitted. Such a procedure must not be used recursively. The scheme used has one slight flaw - taking the address of a local variable of a routine used recursively may not work as expected, since the value originally pointed to will be moved onto the stack when the procedure is called recursively, and the pointer will be left pointing to the new version of the variable. This flaw will not affect many programs (it did affect the compiler, however). Several provisions were added to the CP/M compiler to allow nearly all types of programming to be done directly in Draco, rather than having to write assembler language subroutines. The form 'input(port)' will return a 'byte' value obtained from input port 'port' ('port' must be a compile time expression whose value is between 0 and 255). Similarly, the form 'output(port, value)' will output an 8 bit value 'value' to the specified output port. If the 'value' expression is ommitted, then a indeterminate value is output. (This is useful with hardware configurations in which the output instruction itself causes the desired external action.) The statement form 'halt' will generate a HLT instruction. The statement form 'ion' will generate an EI instruction. The statement form 'ioff' will generate a DI instruction. If the word 'vector' is used instead of 'nonrec' when defining a procedure, ('vector' also implies 'nonrec') then that procedure is assumed to be an interrupt handler, and will start with code to stack all of the processor's registers, and will end with code to unstack the registers, enable the interrupts, and return. Remember that the 8080's EI instruction will not enable interrupts until after the NEXT instruction. 'vector' procedures must not have any parameters (who would supply them?), and cannot yield any result (where would it go?). The cleanest way to set up interrupt vectors would be something like: type VECTOR = struct { byte v_jmp; proc()void v_handler; [5] byte v_padding; /* pad to 8 bytes each */ }; byte JMP = 0xc3; /* 8080 JUMP instruction */ [8] VECTOR Vector @ 0x0000; /* array of vectors at absolute */ /* address 0x0000 */ proc vector handle0()void: ... corp; ... Vector[0].v_jmp := JMP; /* set up the machine's vectors */ Vector[0].v_handler := handle0; Vector[1].v_jmp := JMP; Vector[1].v_handler := handle1; ... ion; /* enable interrupts */ Since the Draco compiler directly emits object code, rather than assembler source code, it is not possible to allow in-line assembler language statments. Instead, Draco has the 'code' construct, which consists of the keyword 'code' followed by a parenthesized list of constant expressions and symbol references. The values of constant expressions are emitted directly into the code stream. The type of the constants controls its size as emitted. Variable and procedure references yield 16 bit words which will be relocated at link time to contain the required address. For example (an 8080 example): byte OP_CALL = 0o315, OP_MOV = 0o100, OP_ADD = 0o200, OP_SUB = 0o220, OP_DAA = 0o047, OP_LXI = 0o001, R_A = 0o7, R_B = 0o0, R_C = 0o1; int x; word CALL_ADDRESS = 0x1234; ... code ( OP_MOV | R_A << 3 | R_B, OP_ADD | R_C, OP_DAA, OP_CALL, CALL_ADDRESS, OP_LXI | R_H << 3, x /* load address of x into HL */ );