MacFormat 1997 January

home *** CD-ROM | disk | FTP | other *** search

/ MacFormat 1997 January / macformat-046.iso / Shareware Plus / Developers / EnterAct / EnterAct Stuff / Documentation / hAWK_notes.h < prev next >

Wrap

C/C++ Source or Header | 1992-11-22 | 43.7 KB | 940 lines | [TEXT/KEEN]

/* Add this file to an EnterAct project (and update the dictionary) for a bit of help with hAWK programming. */ /* This help is general, and defines the following terms: term what’s in the note ---- -------------------- variables hAWK’s built–in variables array general discussion in the “in” operator BEGIN about BEGIN blocks END ditto END blocks regular a discussion of regular expressions, with examples patterns general discussion operators table, precedence and definition numeric (functions) table string (functions) table control for while do etc function how to define one, etc print printf redirect redirecting input and output To look up one of the above defined terms, type its name (or click at the end of the name) and press the <Enter> key. If you have the AutoLook window open, the definition will appear there if you type the name or double–click on it (or click at the end of it, or paste it...) */ struct variables {/* Built–in variables hAWK's built-in variables are: ARGC the number of input files plus one ARGV array of command line arguments. The array is indexed from 0 to ARGC - 1, the input file names being ARGV[1] through ARGV[ARGC-1]. Dynamically changing the contents of ARGV can control the files used for data. FILENAME the name of the current input file. If no files are specified on the command line, the value of FILENAME is "-". A hAWK program may do all of its work in a BEGIN block, with no need for input (generating a list of random numbers for example). FNR the input record number in the current input file. Reset to 1 when starting a new input file. Hence the pattern “FNR == 1” detects the start of each file. FS the input field separator, a blank by default. If the default FS is used then leading blanks and tabs are trimmed from $1. IGNORECASE controls the case-sensitivity of all regular expression operations. If IGNORECASE has a non-zero value, then pattern matching in rules, field splitting with FS , regular expression matching with ~ and !~ , and the gsub() , index() , match() , split() , and sub() pre-defined functions will all ignore case when doing regular expression operations. Thus, if IGNORECASE is not equal to zero, /aB/ matches all of the strings "ab", "aB", "Ab", and "AB". The initial value of IGNORECASE is zero, so all regular expression operations are normally case-sensitive. NF the number of fields in the current input record. NR the total number of input records seen so far. OFMT the output format for numbers, %.6g by default. OFS the output field separator, a blank by default. ORS the output record separator, by default a newline. RS the input record separator, by default a newline. RS is exceptional in that only the first character of its string value is used for separating records. If RS is set to the null string, then records are separated by blank lines. When RS is set to the null string, then the newline character always acts as a field separator, in addition to whatever value FS may have. RSTART the index of the first character matched by match(); 0 if no match. RLENGTH the length of the string matched by match(); -1 if no match. SUBSEP the character used to separate multiple subscripts in array elements, by default "\034", some kinda up arrow very rare in text. (and three added for the Macintosh version) RUNERR short for "run error", a file name that you can use to print your own error messages to, as in print "Error during run" > RUNERR. Default name is $tempRunErr, and you'll find the file in the same folder as $tempStdOut. STDPATH path name that can be prefixed to any file name you wish to be written to the same folder as stdout ($tempStdOut). Typically looks like "Disk:folder1...:folderN:" and typical use looks like outname = "MyOutFile" fullOutName = STDPATH outname; print "something" > fullOutName; TIME at start of run, eg "Sunday, October 13, 1991 07:58 AM" */}; struct array {/* Arrays are subscripted with an expression between square brackets, arr"["expr"]". Array values can be numbers or strings, but the index is always interpreted as a string. For example, when you write arr[1] the 1 is converted to the string "1" for use as the array index, so arr[1] is the same as arr["1"]. This sort of array is called “associative” since it can associate one string of text with any other, eg arr["John Henry"] = "was a log-drivin man" If the index expression is an expression list ( expr1 ", " expr " ...)" then the array subscript is a string consisting of the concatenation of the (string) value of each expression, separated by the value of the SUBSEP variable, which is by default “\034” (decimal 28, an up arrow). This facility is used to simulate multiply–dimensioned arrays. For example: i = "A" ; j = "B" ;k = "C" x[i, j, k] = "hello, world" assigns the string "hello, world" to the element of the array x which is indexed by the string "A\034B\034C". */}; struct in {/* The special operator "in" may be used in an if statement to see if an array has an index consisting of a particular value: if (val in array) print array[val] If the array has multiple subscripts i j k, use if ((i, j,k) in array) instead . The alternate if (array[val] != "") actually creates the array array[val] element if it does not exist, so using “in” is usually better. The "in" construct may also be used in a for loop to iterate over all the elements of an array: for (i in arr) delete arr[i] An element may be deleted from an array using the delete statement. New elements should not be added to an array while looping over it with the "in" for-loop, since hAWK isn’t quite smart enough to handle that very well. */}; struct BEGIN {/* BEGIN and END are two special kinds of patterns which are not tested against the input. The action parts of all BEGIN patterns are merged as if all the statements had been written in a single BEGIN block. They are executed before any of the input is read. Similarly, all the END blocks are merged, and executed when all the input is exhausted (or when an exit statement is executed). BEGIN and END patterns cannot be combined with other patterns in pattern expressions. BEGIN and END patterns cannot have missing action parts. BEGIN {FS = ",[ \t]*|[ \t]+"} sets the field separator to either a comma followed by optional blanks and tabs or one or more blanks and tabs—a common field separator in a real database. END blocks are often used to finish up after all the input has been seen, as in this little program: {out[++n] = $0} END {for (i = n; i >= 1; --i) print out[i]} which accumulates all input records in the array “out”, and then at the end prints out the records in reverse order. */}; struct END {/* BEGIN and END are two special kinds of patterns which are not tested against the input. The action parts of all BEGIN patterns are merged as if all the statements had been written in a single BEGIN block. They are executed before any of the input is read. Similarly, all the END blocks are merged, and executed when all the input is exhausted (or when an exit statement is executed). BEGIN and END patterns cannot be combined with other patterns in pattern expressions. BEGIN and END patterns cannot have missing action parts. BEGIN {FS = ",[ \t]*|[ \t]+"} sets the field separator to either a comma followed by optional blanks and tabs or one or more blanks and tabs—a common field separator in a real database. END blocks are often used to finish up after all the input has been seen, as in this little program: {out[++n] = $0} END {for (i = n; i >= 1; --i) print out[i]} which accumulates all input records in the array “out”, and then at the end prints out the records in reverse order. */}; struct regular /*expression */ {/* A regular expression is nothing more than a string of text with optional special “metacharacters”, and in most cases the string to be used can result from the evaluation of a variable, or the concatenation of several strings or variables. This means you can build the regular expressions for your program during the execution of your program, modifying them on the fly to suit changing circumstances. Parts of a regular expression can be grouped (with ordinary parentheses), and later in the regular expression or in a replacement string can be referred to by the group “tags” \1, \2, ... \9 where \1 refers to the group started by the first left parenthesis, \2 to the second, etc. These allow you to match a small pattern within the context of a larger one, detect duplicate expressions, change the order of the groups and so on. Note that parentheses have the highest precedence of all regular expression “operators”, so they serve two purposes; changing the order in which the metacharacters apply, and marking the boundaries of a group, for later reference via \1..\9. More on this in a bit. Regular expressions are built from ordinary characters, the escape sequences \t \n \b \B \w \W \< \> \1 \2 \3 \4 \5 \6 \7 \8 \9 and from the metacharacters \ ^ $ . [ ] | ( ) * + ? which are the ones with the special powers mentioned above. As you saw in the above section, if a regular expression contains no metacharacters then it behaves like an ordinary “find” string in that each character in the regular expression must match a character in the string being searched. The following table summarizes all character usage in a regular expression (where a b c are ordinary characters, m is a metacharacter, r is a regular expression, and d is a digit): c matches the non-metacharacter c itself \m matches the literal character m, eg \$ matches the dollar sign. . matches any single character except newline. ^ matches the beginning of a line or a string. $ matches the end of a line or a string. [ abc... ] character class, matches any one of the characters a or b or c etc... . [^ abc... ] negated character class, matches any character except abc... and newline. (Ranges of characters may be abbreviated in character classes, as in [0-9] which matches any digit, [A-Za-z] which matches any letter, [^0-9] which matches anything but a digit) \w matches a “word” character, exactly equivalent to [0-9A-Za-z_] \W matches a non-word character, ie [^0-9A-Za-z_] \< matches the beginning of a word. \> matches the end of a word. \b matches the beginning or end of a word (a word boundary). \B matches the boundary (beginning or end) of a set of non-word characters. \t matches a tab. \n matches a newline (the Return key). r1 | r2 alternation: matches either r1 or r2, eg "blue|green" r1r2 concatenation: matches r1 followed by r2 . r + matches one or more r 's. r * matches zero or more r 's. (Note that zero r’s can be anywhere in the text) r ? matches zero or one r 's. ( r ) grouping: matches r. Parentheses have two distinct uses; to override default precedence of metacharacter operators, and to tag a subexpression for subsequent reference. \1...\9 stand for whatever text the first through ninth set of parentheses currently match, counting opening parentheses from left to right. Note that if the pair of parentheses has a + or * or ? operator after it, then all of the matches are included, eg /(foo)+bar/ applied to "foofoofoobar" will set \1 to "foofoofoo". To get just the first foo, use /(foo)\1*bar/ - then \1 is set to "foo". (Perl users note this is the opposite of what you are used to). \ddd is interpreted as an octal number, as in C. The digits exclude 8 and 9, needless to say, and there can be from 1 to 3 digits in the number. Note that \1 through \7 are interpreted as subexpression tags unless followed immediately by another octal digit (eg \23 is not tag 2 followed by a 3, it is the octal number 19 decimal). \8 and \9 are always tags, since 8 and 9 are not octal numbers. To refer to octal numbers 1 to 7, use \01 to \07. To follow a tag with a low number (eg \2 followed by 3), use the octal representation of the number (eg \2\063 -- \063 equals 51 decimal, the ASCII code for 3). The metacharacters ^ and $ to match the beginning and end of strings, and \b \B \< \> to match various boundaries don’t actually match any characters; rather they force alignment to a particular text position. For example, /\brun\b/ will always match just “run” if it matches anything, but will not match "runner" or "brunt". By comparison, /\Wrun\W/ won’t match “runner” or “brunt” either, but it will include any non–word character that happens to come before or after the word “run”. Normally you won’t want to include leading or trailing spaces etc in the match. Parentheses () have the highest precedence, allowing you to override default precedence when needed. The “repetition” operators * + ? have the next–highest precedence, followed by concatenation, with alternation having the lowest precedence of all. For example, in abc*d the * applies only to the c since the repetition operator acts before concatenation, and in abd|def the | applies to abd and def since concatenation binds them together into little groups of three before alternation can play. Regular expression can be used to just locate an instance of a pattern, as in $0 ~ /extern/ but they can also be used to specify text for replacement, by using the “sub” and “gsub” functions. Looking ahead just a bit, these functions take a regular expression as the first argument, the string to use for replacement as the second argument, and the string to do the search and replace in as the third argument, with $0 used by default if there is no third argument. “sub” does a single substitution on the text, and “gsub” does all possible non-overlapping substitutions. Within the replacement strings of these functions, you can use \1 through \9 to refer to text currently matched by tagged subexpressions, and the ampersand “&” stands for all of the text that was matched. To put a plain ampersand in the replacement, use “\&”. At this point some considerable exampling usually helps: The quick brown matches just that, "The quick brown". Note it would match "The quick brown" in "The quick brownie". red fox\. matches "red fox." (the period must be escaped for a literal match). [ \t] matches a single space or tab (that’s a space before the \). [ \t]+ matches any consecutive run of spaces and tabs in any mix. [0-9]+ matches an integer (read “one or more digits”) [+-]?[0-9]+ matches an integer, together with any preceding sign. [A-Za-z]+ matches an English word (unhyphenated). houses? matches "house" or "houses". m(iss)*ippi matches "mippi", "missippi", "mississippi", "missississippi", etc. ar*g matches "ag", "arg", "arrg", "arrrg", etc. MyFunction$ matches "MyFunction(". array\[index\] matches "array[index]". array\[.+\] matches "array[i]", "array[j]", "array[2*q-1]", etc. \\([0-7]|[0-7][0-7]) matches "\d" or "\dd" where d is an octal digit. ([^\\]?|(\\\$+)" (horrors, be brave) matches an unescaped quote or a quote preceded by an even number of backslashes—in other words a true quote in C. The backslash is a metacharacter, so matching one literally requires a backslash before the backslash. The[ \t]+quick[ \t]+brown matches "The quick brown" with variable spaces and tabs between the words. \/\* matches the start of a C comment, "/ *". The forward slash is escaped so that you can place the whole regular expression inside forward slashes. The escape before '/' would not be needed if you placed the expression inside quotes, but then you would need two escapes before the '*', ie "/\\*". \/\*.*\*\/ matches all of a one–line C comment, "/ * - anything - * /". ^Z matches a 'Z' at the beginning of a string. ^. matches the first character of a string. .$ matches the last character of a string. ^.*$ matches any string completely (and is therefore useless). ^A..$ matches any string which is three characters long, the first being an 'A'. ^(A|B).* matches all of any string that begins with 'A' or 'B'. ^[AB].* does likewise. \w+ matches a C term, or integer constant. ((->)|(\.))(mem\b) matches “mem” when it is immediately preceded by “->” or “.”, and is not the beginning of a longer word. For replacement purposes in a “sub” or “gsub”, the part before “mem” is given by \1, and mem itself is \4. gsub(/((->)|(\.))(mem\b)/, "\1\4ber") will turn “->mem” into “->member” and “.mem” into “.member” everywhere in the current input line $0, ignoring things like “remember” or “->memories”. gsub(/\bFuncName([ \t]*\()/, "FunctionName\1") will replace “FuncName” by “FunctionName” everywhere in the current input line $0, provided it is followed on the same line by an opening parenthesis, with optional spaces or tabs between the name and “(”. The match extends from the “F” of “FuncName” up to and including the “(”, so the “(” and any intervening white space must be put back into the replacement string by tagging them in parentheses and using \1 after “FuncName” to refer to what was matched by the first set of parentheses in the pattern. Within a character class most metacharacters are taken literally. The exceptions are the escaping backslash \, the negating ^ (only at the beginning), and the range hyphen - (only between two characters). For example, [A-Za-z-] matches an English word, hyphens included [-A-Za-z] does the same [\-A-Za-z] also does the same (the '\' is unnecessary but harmless) ^[^^] matches any single character that is not a '^' at the beginning of a string [\^] matches a '^'. The toughest metacharacter to remember is the '^' which has three meanings: at the beginning of a character class it signals a negated character class; outside of a character class it matches the beginning of a string; and when escaped or not the first character in a character class it matches a literal '^'. Regular expressions are “left greedy”; where there could be more than one match in a string, a regular expression matches the leftmost one, and extends the match as far as possible. Now that we’re starting to get the hang of things, more examples using the replacement functions “sub” and “gsub” mentioned above. The format is sub(r,s,t) where r is a regular expression, s is the replacement string, and t is the string in which the search and replace is to be done. The contents of t before and after the sub are spelled out below. using t = "Don’t run that prune over, runt!": sub(/run/, "fly", t) turns t into "Don’t fly that prune over, runt!" gsub(/run/, "fly", t) turns t into "Don’t fly that pflye over, flyt!" gsub(/\brun\b/, "fly", t) turns t into "Don’t fly that prune over, runt!" gsub(/run/, "t&k", t) turns t into "Don’t trunk that ptrunke over, trunkt!" using t = "#define FOO 1": sub(/#define\W+(\w+)\W+([0-9]+)/, "int \1 = \2;",t) turns t into "int FOO = 1;" (\W+ means one or more non-word characters, \w+ means one or more word characters, [0-9]+ means one or more digits; two groups are tagged). Three programs are supplied to help you do general–purpose listing of matches or search–and–replace; $MFSLister searches for either plain text or a regular expression with “Set variables” in the setup dialog, and lists file name/ line number of all single–line matches to stdout; $MFS_SuperLister does much the same, but finds matches that span a variable number of lines; and $MFS_SuperReplace does the ultimate search and replace, matching either plain text or full–blown regular expressions over a variable number of lines, handling any number of files at once, documenting the (post–change) locations of all changes to stdout. Heck, it even prints the fragments of original text before the changes, so that if you mess up you can at least (manually) undo the damage. */}; struct patterns {/* Summary of patterns A list of beasts in the pattern zoo (regex stands for regular expression, pat stands for pattern, str stands for string variable): Pattern Example ---------------- ------------------------------- BEGIN BEGIN blocks are done before all input END END blocks are done after all input /regex/ /Mary( \t)+had/ str ~ /regex/ (or !~) $1 ~ /(\-)?[0-9]+/ str ~ "regex" (or !~) $1 ~ "(\\-)?[0-9]+" relational expression NF > 4 pattern && pattern FNR == 1 && /File title:/ pattern || pattern /Vermont/ || /Maine/ pattern ? pattern : pattern $3 != 0 ? $2 / $3 > 25 : $2 < 0 ( pattern ) - see next line ! pattern !($0 == "" || $0 ~/^The end$/) pattern1 , pattern2 FNR == 5, FNR == 8 */}; struct operators {/* The operators in hAWK, in order of increasing precedence, are: ------------------------------------------------------------- = += -= *= /= %= ^= Assignment. Both absolute assignment ( var " = " value ) and operator-assignment (the other forms) are supported. “a += b” is equivalent to “a = a + b”. ?: The C conditional expression. This has the form expr1 " ? " expr2 " : " expr3 If expr1 is true, the value of the expression is expr2 , otherwise it is expr3 . Only one of expr2 and expr3 is evaluated. || logical OR. In “a || b” if a is true then b is not evaluated. && logical AND. In “a && b” if a is false then b is not evaluated. ~ !~ regular expression match, negated match. See “String-matching patterns”. < <= > >= != == the regular relational operators. Note especially that strings can be compared, eg if ($3 == "cat"). In “a <= b” or the like, if both arguments are numbers the comparison is done numerically, otherwise they are compared as ASCII strings. blank string concatenation; if a = "John" and b = "Henry" then c = a b; produces c = "JohnHenry". + - addition and subtraction. * / % multiplication, division, and modulus (x%y produces the remainder of x divided by y, equivalent to x - int(x/y)*y). + - ! unary plus, unary minus, and logical negation. ^ exponentiation. ++ -- increment and decrement, both prefix and postfix. $ field reference. $0 is the entire current record, $1 the first field, and $NF the last field. Fields may be changed or added. */}; struct numeric /*functions */ {/* Built–in numeric functions hAWK has the following pre-defined arithmetic functions, with x and y as arbitrary expressions: atan2( y , x ) returns the arctangent of y/x in radians. cos( x ) returns the cosine of x in radians. exp( x ) the exponential function "e to the x" int( x ) truncates to integer (eg int(7.325) gives 7); to round, use int(x + .5). log( x ) the natural logarithm function, base e. For log base 10, use log(x)/log(10). rand() returns a random number, 0 <= rand() < 1. sin( x ) returns the sine of x in radians. sqrt( x ) the square root function. srand( x ) use x as a new seed for the random number generator. If no x is provided, the time of day will be used. The return value is the previous seed for the random number generator. */}; struct string /*functions*/ {/* Built–in string functions There is only one string operator, the concatenation operator, invoked when two variables or constants are separated by a space. Other useful string manuipulations in hAWK are carried out by built–in functions. In the following table, r is a regular expression, s and t are strings, the a and b are arrays, and i and n are integers. gsub(r, s, t) for each substring matching the regular expression r in the string t , substitutes the string s , and returns the number of substitutions. If t is not supplied, uses $0 . index( s , t ) returns the index of the string t in the string s, or 0 if t is not present. length( s ) returns the length of the string s . match( s , r ) returns the position in s where the regular expression r occurs, or 0 if r is not present, and sets the values of RSTART and RLENGTH . split(s, a, r) splits the string s into the array a on the regular expression r , and returns the number of fields. If r is omitted, FS is used instead. sprintf( fmt , expr-list ) prints expr-list according to fmt , and returns the resulting string. See the discussion of “printf” for details. sub(r, s,t) this is just like gsub , but only the leftmost matching substring is replaced. Returns number of substitutions. substr(s, i, n) returns the n-character substring of s starting at i . If n is omitted, the rest of s is used. tolower( s ) returns a copy of the string s , with all the upper-case characters in s translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged. toupper( s ) returns a copy of the string s , with all the lower-case characters in s translated to their corresponding upper-case counterparts. Non-alphabetic characters are left unchanged. lookup ( s ) returns integer–coded C type of s (s should be a word). At present this function is supported only by EnterAct. Types are taken from whatever EnterAct project is open at the time. See “$LookupTest” for an example. Type integer returned ---- ------------ defined constant or macro 1 file–scope variable 2 function 4 enum constant 8 typedef 16 struct tag 32 union tag 64 enum tag 128 other 0 sort(a,b,s) produces an index in the array “b” that can be used to access the elements of “a” in sorted order. The string “s” specifies the kind of sort; "a" for ASCII, "n" for numeric, "d" for dictionary order, and "ra", "rn", "rd" for reverse of the same. Returns the number of elements in the array “b”, which is indexed numerically from 1 upwards. The elements of “b” are the indexes of “a” in sorted order provided “b” is accessed in the sequence b[1], b[2], b[3] etc. Typical use is maxIndex = sort(a, b, "d") for (i = 1; i <= maxIndex; ++i) print a[b[i]] which will print the elements of a in sorted dictionary order. See “$WordFrequency” and “$XRef_Full” for examples, and “$SortTest_Nums” for a simple numeric example. time ( ) returns the current time, eg "Sunday, October 27, 1991 09:03:30 AM" —note this is the time when the function is called, down to the second, whereas the TIME variable holds the time at which your program run starts, down to the minute. See “$TIME” for an example. prompt ( s ) displays an OK/Cancel dialog. The string “s” appears at the top of the dialog, and you can type in a string in an edit field. Returns what you type in, as though it was a string constant. Both the string “s” and what you type in are limited to 255 characters. For an example of usage see “$PromptTest” and “$YoungMath”. Typical use is x = prompt("Enter the number of lines to print:") if (x+0 > 0) { while (getline lne > 0 && ++i <= x) print lne } If you cancel the dialog or hit <Return.> without typing in any text, prompt returns the null string "". progress (s) displays the string “s” in a dialog on your screen (the message stays on the screen). You can change the message with another “progress” call. “progress” returns the number of times it has been called, and the dialog goes away by itself at the end of your program run. For a test sample, see “$ProgressTest”. Within the replacement string 's' of gsub(r,s,t) and sub(r,s,t), a '&' is taken to stand for the entire string of text that was matched by the regular expression 'r'. For example, gsub(/cat/, "&s", t) with t = "cat and dogs" produces t = "cats and dogs" after the substitution. Use “\&” if you want a literal '&' in the replacement string. --and added for hAWK version 2 (mainly file functions): Note in the functions below where a file or directory name is required it must be a full pathname, of the form “disk:folder1:folder2:...:folderN:filename” for a file, or “disk:folder1:...:folderN” or “disk:folder1:..:folderN:” for a directory (the second version has a colon at the end). For a disk name, use “disk:” rather than “disk”. beep( n ) does a SysBeep(n); if the duration "n" is <= 0, the menu bar will flash instead. Durations of 0,1,2,5 work best. copy( s, t ) copies the file named “s” to the file named “t”. Both file names must be full pathnames (disk:folder:...folder:filename). Either the location or name or both can be changed. If file “t” already exists, it must be closed and unlocked. Both creator and type are preserved, and the resource fork is copied as well as the data fork. Any kind of file can be copied. To move or rename a file, use if (copy(s,t)) remove(s) (note this is an efficient way to move a file, but not a very fast way to rename one). Returns 1 if successful, 0 if the copy could not be done. exists( s ) returns 1 if the file named “s” exists, 0 if it does not. Any kind of file can be tested. fdate( s ) returns date/time of last modification of file named “s”, format “yr:mo:day:hr:min:sec” where yr is 4 digits, and the rest are 2 (eg always 01 rather than just 1). The length of the string is always 19 (or 0 if no date could be extracted) and the colons and digits always occupy the same positions. fsize( s ) returns size in bytes of the data fork only of the file named “s” getclip( n ) returns the calling application’s current clipboard text, up to a maximum of the first “n” bytes. Use n = 0 or omit it entirely if you want the entire clipboard. For example, if the current clip is “Some text here” then getclip(6) returns “Some t” whereas getclip(0) or getclip() returns the entire clip. At present this function is supported by: EnterAct. list( s, a ) given file or directory full pathname in “s”, produces list of full pathnames for all TEXT files in the directory (either the directory named or the directory holding the file), as elements indexed 1,2,3... in the array “a”. Note subdirectories are also excluded. Returns the number of files in the list. nested( s, a ) given a file full pathname in “s”, generates list of full pathnames for directories at the same level ("sibling folders"); given directory name, generates list of subdirectories at the top level in the named directory (“offspring directories”). The list is returned as elements indexed 1,2,3... in the array “a”. In other words, the same as “list” but for folders rather than TEXT files. Note neither “list” nor “nested” look beneath the top level of the folder in question. Returns the number of directories in the list. remove( s ) deletes the file named “s”, provided it is closed and unlocked. Use with caution, this is not undoable unless you get lucky using your favourite file recovery tool. Returns 1 if the file was deleted, 0 otherwise. rename( s, t ) takes the file with full pathname “s”, and renames it “t”. The new name “t” can be a full pathname, or just the new file name proper, as in rename("Disk:dir1:aardvark", "Disk:dir1:fruitbat") or equivalently rename("Disk:dir1:aardvark", "fruitbat") This function works only with files, not directories or volumes, returning 1 if the rename was carried out, 0 if not. */}; struct control {/* In the following list of control statments, any instance of “statement” can be replaced by a group of statements enclosed in curly braces {}: { statements } Simple grouping of several statements together, so that conditional or repeated execution can be applied to the group. if (condition) statement1 [ else statement1 ] If the condition evaluates to true then statement1 is carried out; the “else” clause is optional, and its statements will be executed if the condition is false. while (condition) statement The condition is first evaluated, and if it is false then the statement is skipped. If it is true then the statement is executed; the condition is again evaluated, and the statements again executed if the condition is true, and this process continues until the condition is false. Note that if the condition is false the first time then the statement will not be executed at all. “while” loops are affected by break and continue statements. do statement while (condition) The statement is always executed at least once; then the condition is evaluated, and if it is true then the statement is excuted again. This process continues until the condition is false. Unlike the “while” loop, the “do” loop always executes its statement at least once. for (expr1; expr2; expr3) statement eg “for (i = 1; i <= 6; ++i) {print i}” Mnemonically, “for it’s (a jolly good fellow)” helps: the “i” stands for initialization, the “t” for “test”, and the “s” for “step”. expr1 is the initialization, executed only once, just before the “for” loop proper is entered. Next expr2, the test, is evaluated, and if it is true then the statement is executed, otherwise the for loop ends and control passes to the next statement beyond it. If the statement is executed then expr3, the step, is carried out, and then it’s back to the top of the loop —no more initialization, but the sequence test, execute, step, continues until the test produces false. for (var in array) statement Indexes for the array are retrieved one–by–one to the variable “var”, though not in a readily predictable order, and the statement is executed for each index. break For use only among the statements that make up the body of a while, do, or for loop. Usually found in the form “if (condition) break;”, when the break is executed then control immediately passes to the next statement after the loop. continue Also for use only in a while, do, or for loop, and also usually executed only when the condition of some if–statement is true. When encountered, control passes to the very end of the statements making up the body of the loop, and the next iteration of the loop begins. next Stop processing the current input record. The next input record is read and processing starts over with the first pattern in the hAWK program. If the end of the input data is reached, the END block(s), if any, are executed. exit [ expression ] In an END action, exit truly causes the hAWK program to terminate. Anywhere else, the exit statement causes the program to jump to the END actions, and only if none are present does the program immediately terminate. The “expression” is provided for compatiblilty with standard AWK programs, and won’t be of any use to you. */}; struct function {/* Functions in hAWK take the form: "function" name(parameter1, parameter2,... local1, local2...) { statements } They are executed when called from within an action statement. hAWK function definitions begin with the keyword “function”, and no return type is declared, though a value may optionally be returned. Local variables are listed after the parameters for the function, more to simplify the grammar of the language than anything else. Scalar parameters are passed by value (ie a local copy is made for the function, and the original variable in the function call is not touched by the function) whereas array parameters are passed by reference (the parameter array name refers to the same array that is provided as the argument). Function definitions must be placed at the top level of your program outside any pattern–action blocks, and you generally end up with a readable program if you put all of your function definitions at the end of your program. Here’s a typical function: function Swap(a, i, j temp) { temp = a[i] a[i] = a[j] a[j] = temp } When called, it appears for example as arr[1] = 7; arr[4] = 9; Swap(arr, 1, 4) which results in arr[1] = 9, arr[4] = 7. Note that the “temp” variable is intended for use only within the Swap function, and is a local variable rather than a parameter of the function. Local variables are initialized to 0/"" each time the function is called. No space should be put between the function name and the '(' of the argument list when calling one of your own functions, to avoid invoking the simple–minded concatenation operator. Functions may return an expression, as in function SumArraySquared(a, sum) { for (i in a) #unlike C, array size need not be known separately sum += a[i]#note sum is local, automatically inited to zero return sum*sum } or function StringUpTo(str, upto) { return substr(str, 1, index(str, upto) - 1) } (eg StringUpTo("This is: a test", ":") would return "This is"). Some details about functions: Newlines are optional after the left curly brace of the function body and before the closing left brace. Functions may call each other and may be recursive. The word func may be used in place of function. For tired typers only. */}; struct print {/* The “print” statement “print” sends simply–formatted strings to a file, stdout by default. The expressions supplied to the print statement are separated from one another by commas, and may also be entirely surrounded by parentheses. The variations are print print expression1, expression2, ..., expressionN print (expression1, expression2, ..., expressionN) A “print” with no expressions is an abbreviation for print$0 Each expression is converted to a string and printed in turn, with each comma being replaced by the built–in variable OFS, by default a single blank. Each print statement is terminated with the built–in ORS, by default a newline. The parenthesized version of “print” is necessary if relational operators are present in the expressions, since the '>' operator can mean “greater than” or “redirect output to the file...”—see “Output into files”. The print statement is used in virtually every sample program provided, and the more–sophisticated “printf” is seldom seen since fancy formatting is not often needed. print "" #prints just a blank line */}; struct printf {/* The “printf” statement This function also has a parenthesized and unparenthesized form, printf format, expression1, expression2, ..., expressionN printf(format, expression1, expression2, ..., expressionN) and, as with “print”, the parentheses are needed only if a relational operator is contained in one of the expressions. The “format” argument is interpreted as a string, and may contain either literal text to be printed or format specifications for strings or numbers to be printed. Format specs are indicated in the format string by a '%', and there should be one expression following the format for each format specification—eg if you specify that a string, a number, and a string be printed, then you list the string, number, and string after the format, in the same order, separated by commas. The hAWK versions of the printf and sprintf functions accept the following conversion specification formats, entirely borrowed from C: %c an ASCII character. If the argument used for %c is numeric, it is treated as a character and printed. Otherwise, the argument is assumed to be a string, and the only first character of that string is printed. %d a decimal number (the integer part). %i just like %d . %e a floating point number of the form [-]d.ddddddE[+-]dd . %f a floating point number of the form [-]ddd.dddddd . %g use e or f conversion, whichever is shorter, with nonsignificant zeros suppressed. %o an unsigned octal number (again, an integer). %s a character string. %x an unsigned hexadecimal number (an integer). %X like %x , but using ABCDEF instead of abcdef . %% a single % character; no argument is converted. There are optional, additional parameters that may lie between the % and the control letter (also from C): - the expression should be left justified within its field (note if the '-' is absent then the expression is right justified) width the field should be padded to this width. If the number has a leading zero, then the field will be padded with zeros. Otherwise it is padded with blanks. . prec a number indicating the maximum width of strings or digits to the right of the decimal point. For example, %-23.14s prints strings in a field 23 characters wide, left justified, printing at most 14 characters from the string. And %8.4f will print a floating point number in a field 8 characters wide, right justified, with 4 digits to the right of the decimal point. The dynamic width and prec capabilities of the C library printf routines are not supported. However, they may be simulated by using the hAWK concatenation operation to build up a format specification dynamically. Some examples: “print var” always appends the value of ORS (by default a newline); to avoid this, use printf("%s ", var) and when a newline is needed, supply one yourself with something like print "" or printf("%s\n", var). Given strings of variable width in fields $1 and $2, reformat to print these strings right–justified in two nicely–lined–up columns: { one[++n] = $1 two[n] = $2 if (w1 < length($1)) w1 = length($1) if (w2 < length($2)) w2 = length($2) } END {w1 += 2; w2 += 2;#a couple of spaces between columns for (i = 1; i <= n; ++i) printf "%" w1 "s" "%" w2 "s\n", one[i], two[i] } —this illustrates using the hAWK concatenation operation “to build up a format specification dynamically”; for example, if w1 = 9 and w2 = 15 (after adding 2) then we get printf "%9s%15s\n", one[i], two[i] as the effective printf statement. */}; struct redirect {/* OUTPUT: By default, “print” and “printf” send all of their output to stdout. However, the redirection operators '>' and '>>' allow you to send output to any text file. Redirecting output takes one of the forms print expression–list > outfile print(expression–list) > outfile printf format, expression–list > outfile printf(format, expression–list) > outfile print > outfile or any of those with '>>' instead of '>'. The '>' operator will erase the contents of outfile before beginning to write to it, whereas '>>' will append what is being printed to outfile without clearing the file first. Both operators open the file “outfile” the first time it is encountered in the program, and keep it open. The file will be closed for you at the end of your program, but if you have many files to write to you should close each output file yourself when you are done with it, with “close(outfile)”. INPUT: “getline” is a built–in function that allows you to retrieve input records from the current input file or from any other file. As you know, the default behaviour of a hAWK program is to retrieve input from your input files one record at a time, marching through the records and files from beginning to end. Often, however, one needs to read in a group of lines until some condition is met, or interrupt regular input to retrieve records from some other file, and these are the special capabilities that “getline” provides. It can be used in the following ways: getline sets $0 from next input record; sets NF, NR, FNR . getline < file sets $0 from next record of file; sets NF . getline var sets var from next input record; sets NR, FNR . getline var < file sets var from next record of file . and in all cases “getline” returns 1 if a record was successfully retrieved, 0 if the end of file was encountered, and -1 if some problem occurred, such as failure to find the file. The effect of “getline” by itself is to dump the current string in $0 and replace it with the next input record, setting all the usual built–in variables. Program execution then continues with the statement following “getline”. By comparison, the “next” statement does everything that “getline” by itself does, but in addition processing starts over with the first pattern in your hAWK program. If a variable name is present immediately after “getline”, then the input record is retrieved to the variable instead of to $0. The '<' symbol is the input redirection operator meaning “get input from the file...”, and is followed by the name of the input file to use. Note that file names must be full path names, as is always the case in hAWK. */};