home *** CD-ROM | disk | FTP | other *** search
- /* Add this file to an EnterAct project (and update the dictionary)
- for a bit of help with hAWK programming.
- */
- /* This help is general, and defines the following terms:
- term what’s in the note
- ---- --------------------
- variables hAWK’s built–in variables
- array general discussion
- in the “in” operator
- BEGIN about BEGIN blocks
- END ditto END blocks
- regular a discussion of regular expressions,
- with examples
- patterns general discussion
- operators table, precedence and definition
- numeric (functions) table
- string (functions) table
- control for while do etc
- function how to define one, etc
- print
- printf
- redirect redirecting input and output
-
- To look up one of the above defined terms, type its name (or click at the end
- of the name) and press the <Enter> key. If you have the AutoLook window open,
- the definition will appear there if you type the name or double–click
- on it (or click at the end of it, or paste it...)
- */
- struct variables
- {/*
- Built–in variables
- hAWK's built-in variables are:
- ARGC the number of input files plus one
- ARGV array of command line arguments. The array is indexed from
- 0 to ARGC - 1, the input file names being ARGV[1] through
- ARGV[ARGC-1]. Dynamically changing the contents of ARGV
- can control the files used for data.
- FILENAME the name of the current input file. If no files are
- specified on the command line, the value of FILENAME is
- "-". A hAWK program may do all of its work in a BEGIN
- block, with no need for input (generating a list of random
- numbers for example).
- FNR the input record number in the current input file. Reset
- to 1 when starting a new input file. Hence the pattern
- “FNR == 1” detects the start of each file.
- FS the input field separator, a blank by default. If the
- default FS is used then leading blanks and tabs are
- trimmed from $1.
- IGNORECASE controls the case-sensitivity of all regular expression
- operations. If IGNORECASE has a non-zero value, then
- pattern matching in rules, field splitting with FS ,
- regular expression matching with ~ and !~ , and the gsub()
- , index() , match() , split() , and sub() pre-defined
- functions will all ignore case when doing regular
- expression operations. Thus, if IGNORECASE is not equal to
- zero, /aB/ matches all of the strings "ab", "aB", "Ab",
- and "AB". The initial value of IGNORECASE is zero, so all
- regular expression operations are normally case-sensitive.
- NF the number of fields in the current input record.
- NR the total number of input records seen so far.
- OFMT the output format for numbers, %.6g by default.
- OFS the output field separator, a blank by default.
- ORS the output record separator, by default a newline.
- RS the input record separator, by default a newline. RS is
- exceptional in that only the first character of its string
- value is used for separating records. If RS is set to the
- null string, then records are separated by blank lines.
- When RS is set to the null string, then the newline
- character always acts as a field separator, in addition to
- whatever value FS may have.
- RSTART the index of the first character matched by match(); 0 if no match.
- RLENGTH the length of the string matched by match(); -1 if no match.
- SUBSEP the character used to separate multiple subscripts in array
- elements, by default "\034", some kinda up arrow very rare in text.
- (and three added for the Macintosh version)
- RUNERR short for "run error", a file name that you can use to print
- your own error messages to, as in
- print "Error during run" > RUNERR.
- Default name is $tempRunErr, and you'll find the file
- in the same folder as $tempStdOut.
- STDPATH path name that can be prefixed to any file name you wish to be
- written to the same folder as stdout ($tempStdOut). Typically
- looks like
- "Disk:folder1...:folderN:" and typical use looks like
- outname = "MyOutFile"
- fullOutName = STDPATH outname;
- print "something" > fullOutName;
- TIME at start of run, eg "Sunday, October 13, 1991 07:58 AM"
- */};
-
- struct array
- {/*
- Arrays are subscripted with an expression between square brackets,
- arr"["expr"]". Array values can be numbers or strings, but the index is
- always interpreted as a string. For example, when you write
- arr[1]
- the 1 is converted to the string "1" for use as the array index, so arr[1]
- is the same as arr["1"]. This sort of array is called “associative” since
- it can associate one string of text with any other, eg
- arr["John Henry"] = "was a log-drivin man"
-
- If the index expression is an expression list ( expr1 ", " expr " ...)"
- then the array subscript is a string consisting of the concatenation of
- the (string) value of each expression, separated by the value of the
- SUBSEP variable, which is by default “\034” (decimal 28, an up arrow).
- This facility is used to simulate multiply–dimensioned arrays. For
- example:
- i = "A" ; j = "B" ;k = "C"
- x[i, j, k] = "hello, world"
- assigns the string "hello, world" to the element of the array x
- which is indexed by the string "A\034B\034C".
- */};
-
- struct in
- {/*
- The special operator "in" may be used in an if statement to see if an
- array has an index consisting of a particular value:
- if (val in array)
- print array[val]
- If the array has multiple subscripts i j k, use
- if ((i, j,k) in array) instead . The alternate
- if (array[val] != "")
- actually creates the array array[val] element if it does not exist, so
- using “in” is usually better.
-
- The "in" construct may also be used in a for loop to iterate over all the
- elements of an array:
- for (i in arr)
- delete arr[i]
- An element may be deleted from an array using the delete statement. New
- elements should not be added to an array while looping over it with the
- "in" for-loop, since hAWK isn’t quite smart enough to handle that very
- well.
- */};
-
- struct BEGIN
- {/*
- BEGIN and END are two special kinds of patterns which are not tested
- against the input. The action parts of all BEGIN patterns are merged as if
- all the statements had been written in a single BEGIN block. They are
- executed before any of the input is read. Similarly, all the END blocks
- are merged, and executed when all the input is exhausted (or when an exit
- statement is executed). BEGIN and END patterns cannot be combined with
- other patterns in pattern expressions. BEGIN and END patterns cannot have
- missing action parts.
-
- BEGIN {FS = ",[ \t]*|[ \t]+"}
- sets the field separator to either a comma followed by optional blanks and
- tabs or one or more blanks and tabs—a common field separator in a real
- database.
-
- END blocks are often used to finish up after all the input has been seen,
- as in this little program:
- {out[++n] = $0}
- END {for (i = n; i >= 1; --i) print out[i]}
- which accumulates all input records in the array “out”, and then at the
- end prints out the records in reverse order.
- */};
-
- struct END
- {/*
- BEGIN and END are two special kinds of patterns which are not tested
- against the input. The action parts of all BEGIN patterns are merged as if
- all the statements had been written in a single BEGIN block. They are
- executed before any of the input is read. Similarly, all the END blocks
- are merged, and executed when all the input is exhausted (or when an exit
- statement is executed). BEGIN and END patterns cannot be combined with
- other patterns in pattern expressions. BEGIN and END patterns cannot have
- missing action parts.
-
- BEGIN {FS = ",[ \t]*|[ \t]+"}
- sets the field separator to either a comma followed by optional blanks and
- tabs or one or more blanks and tabs—a common field separator in a real
- database.
-
- END blocks are often used to finish up after all the input has been seen,
- as in this little program:
- {out[++n] = $0}
- END {for (i = n; i >= 1; --i) print out[i]}
- which accumulates all input records in the array “out”, and then at the
- end prints out the records in reverse order.
- */};
-
- struct regular /*expression */
- {/*
- A regular expression is nothing more than a string of text with optional
- special “metacharacters”, and in most cases the string to be used can
- result from the evaluation of a variable, or the concatenation of several
- strings or variables. This means you can build the regular expressions for
- your program during the execution of your program, modifying them on the
- fly to suit changing circumstances.
-
- Parts of a regular expression can be grouped (with ordinary parentheses),
- and later in the regular expression or in a replacement string can be
- referred to by the group “tags” \1, \2, ... \9 where \1 refers to the
- group started by the first left parenthesis, \2 to the second, etc. These
- allow you to match a small pattern within the context of a larger one,
- detect duplicate expressions, change the order of the groups and so on.
- Note that parentheses have the highest precedence of all regular
- expression “operators”, so they serve two purposes; changing the order in
- which the metacharacters apply, and marking the boundaries of a group, for
- later reference via \1..\9. More on this in a bit.
-
- Regular expressions are built from ordinary characters, the escape
- sequences
- \t \n \b \B \w \W \< \> \1 \2 \3 \4 \5 \6 \7 \8 \9
- and from the metacharacters
- \ ^ $ . [ ] | ( ) * + ?
- which are the ones with the special powers mentioned above. As you saw in
- the above section, if a regular expression contains no metacharacters then
- it behaves like an ordinary “find” string in that each character in the
- regular expression must match a character in the string being searched.
- The following table summarizes all character usage in a regular expression
- (where a b c are ordinary characters, m is a metacharacter, r is a regular
- expression, and d is a digit):
-
- c matches the non-metacharacter c itself
- \m matches the literal character m, eg \$ matches the dollar sign.
- . matches any single character except newline.
- ^ matches the beginning of a line or a string.
- $ matches the end of a line or a string.
- [ abc... ] character class, matches any one of the characters a or b or c etc... .
- [^ abc... ] negated character class, matches any character except abc...
- and newline. (Ranges of characters may be abbreviated in
- character classes, as in [0-9] which matches any digit,
- [A-Za-z] which matches any letter, [^0-9] which matches
- anything but a digit)
- \w matches a “word” character, exactly equivalent to [0-9A-Za-z_]
- \W matches a non-word character, ie [^0-9A-Za-z_]
- \< matches the beginning of a word.
- \> matches the end of a word.
- \b matches the beginning or end of a word (a word boundary).
- \B matches the boundary (beginning or end) of a set
- of non-word characters.
- \t matches a tab.
- \n matches a newline (the Return key).
- r1 | r2 alternation: matches either r1 or r2, eg "blue|green"
- r1r2 concatenation: matches r1 followed by r2 .
- r + matches one or more r 's.
- r * matches zero or more r 's. (Note that zero r’s can
- be anywhere in the text)
- r ? matches zero or one r 's.
- ( r ) grouping: matches r. Parentheses have two distinct uses;
- to override default precedence of metacharacter operators, and
- to tag a subexpression for subsequent reference.
- \1...\9 stand for whatever text the first through ninth set of
- parentheses currently match, counting opening parentheses from
- left to right. Note that if the pair of parentheses has a + or
- * or ? operator after it, then all of the matches are
- included, eg /(foo)+bar/ applied to "foofoofoobar" will set \1
- to "foofoofoo". To get just the first foo, use /(foo)\1*bar/ -
- then \1 is set to "foo". (Perl users note this is the opposite
- of what you are used to).
- \ddd is interpreted as an octal number, as in C. The digits
- exclude 8 and 9, needless to say, and there can be from 1 to 3
- digits in the number. Note that \1 through \7 are interpreted
- as subexpression tags unless followed immediately by another
- octal digit (eg \23 is not tag 2 followed by a 3, it is the
- octal number 19 decimal). \8 and \9 are always tags, since 8
- and 9 are not octal numbers. To refer to octal numbers 1 to 7,
- use \01 to \07. To follow a tag with a low number (eg \2
- followed by 3), use the octal representation of the number (eg
- \2\063 -- \063 equals 51 decimal, the ASCII code for 3).
-
- The metacharacters ^ and $ to match the beginning and end of strings, and
- \b \B \< \> to match various boundaries don’t actually match any
- characters; rather they force alignment to a particular text position. For
- example, /\brun\b/ will always match just “run” if it matches anything,
- but will not match "runner" or "brunt". By comparison, /\Wrun\W/ won’t
- match “runner” or “brunt” either, but it will include any non–word
- character that happens to come before or after the word “run”. Normally
- you won’t want to include leading or trailing spaces etc in the match.
-
- Parentheses () have the highest precedence, allowing you to override
- default precedence when needed. The “repetition” operators * + ? have the
- next–highest precedence, followed by concatenation, with alternation
- having the lowest precedence of all. For example, in abc*d the * applies
- only to the c since the repetition operator acts before concatenation, and
- in abd|def the | applies to abd and def since concatenation binds them
- together into little groups of three before alternation can play.
-
- Regular expression can be used to just locate an instance of a
- pattern, as in
- $0 ~ /extern/
- but they can also be used to specify text for replacement, by using the
- “sub” and “gsub” functions. Looking ahead just a bit, these functions take
- a regular expression as the first argument, the string to use for
- replacement as the second argument, and the string to do the search and
- replace in as the third argument, with $0 used by default if there is no
- third argument. “sub” does a single substitution on the text, and “gsub”
- does all possible non-overlapping substitutions. Within the replacement
- strings of these functions, you can use \1 through \9 to refer to text
- currently matched by tagged subexpressions, and the ampersand “&” stands
- for all of the text that was matched. To put a plain ampersand in the
- replacement, use “\&”.
-
- At this point some considerable exampling usually helps:
- The quick brown matches just that, "The quick brown". Note it
- would match "The quick brown" in "The quick brownie".
- red fox\. matches "red fox." (the period must be
- escaped for a literal match).
- [ \t] matches a single space or tab
- (that’s a space before the \).
- [ \t]+ matches any consecutive run of spaces and tabs
- in any mix.
- [0-9]+ matches an integer (read “one or more digits”)
- [+-]?[0-9]+ matches an integer, together with any preceding sign.
- [A-Za-z]+ matches an English word (unhyphenated).
- houses? matches "house" or "houses".
- m(iss)*ippi matches "mippi", "missippi", "mississippi",
- "missississippi", etc.
- ar*g matches "ag", "arg", "arrg", "arrrg", etc.
- MyFunction\( matches "MyFunction(".
- array\[index\] matches "array[index]".
- array\[.+\] matches "array[i]", "array[j]", "array[2*q-1]", etc.
- \\([0-7]|[0-7][0-7]) matches "\d" or "\dd" where d is an octal digit.
- ([^\\]?|(\\\\)+)" (horrors, be brave) matches an unescaped quote or
- a quote preceded by an even number of
- backslashes—in other words a true quote in C. The
- backslash is a metacharacter, so matching one
- literally requires a backslash before the
- backslash.
- The[ \t]+quick[ \t]+brown matches "The quick brown" with variable
- spaces and tabs between the words.
- \/\* matches the start of a C comment, "/ *". The
- forward slash is escaped so that you can place the
- whole regular expression inside forward slashes.
- The escape before '/' would not be needed if you
- placed the expression inside quotes, but then you
- would need two escapes before the '*', ie "/\\*".
- \/\*.*\*\/ matches all of a one–line C comment,
- "/ * - anything - * /".
- ^Z matches a 'Z' at the beginning of a string.
- ^. matches the first character of a string.
- .$ matches the last character of a string.
- ^.*$ matches any string completely (and is therefore useless).
- ^A..$ matches any string which is three characters long,
- the first being an 'A'.
- ^(A|B).* matches all of any string that begins with 'A' or 'B'.
- ^[AB].* does likewise.
- \w+ matches a C term, or integer constant.
- ((->)|(\.))(mem\b) matches “mem” when it is immediately preceded by “->”
- or “.”, and is not the beginning of a longer word.
- For replacement purposes in a “sub” or “gsub”, the
- part before “mem” is given by \1, and mem itself
- is \4.
- gsub(/((->)|(\.))(mem\b)/, "\1\4ber") will turn “->mem” into “->member”
- and “.mem” into “.member” everywhere in the
- current input line $0, ignoring things like
- “remember” or “->memories”.
- gsub(/\bFuncName([ \t]*\()/, "FunctionName\1") will replace “FuncName” by
- “FunctionName” everywhere in the current input
- line $0, provided it is followed on the same line
- by an opening parenthesis, with optional spaces or
- tabs between the name and “(”. The match extends
- from the “F” of “FuncName” up to and including the
- “(”, so the “(” and any intervening white space
- must be put back into the replacement string by
- tagging them in parentheses and using \1 after
- “FuncName” to refer to what was matched by the
- first set of parentheses in the pattern.
-
-
- Within a character class most metacharacters are taken literally. The
- exceptions are the escaping backslash \, the negating ^ (only at the
- beginning), and the range hyphen - (only between two characters). For
- example,
- [A-Za-z-] matches an English word, hyphens included
- [-A-Za-z] does the same
- [\-A-Za-z] also does the same (the '\' is unnecessary but harmless)
- ^[^^] matches any single character that is not a '^' at
- the beginning of a string
- [\^] matches a '^'.
- The toughest metacharacter to remember is the '^' which has three
- meanings: at the beginning of a character class it signals a negated
- character class; outside of a character class it matches the beginning of
- a string; and when escaped or not the first character in a character class
- it matches a literal '^'.
-
- Regular expressions are “left greedy”; where there could be more than one
- match in a string, a regular expression matches the leftmost one, and
- extends the match as far as possible.
-
- Now that we’re starting to get the hang of things, more examples using the
- replacement functions “sub” and “gsub” mentioned above. The format is
- sub(r,s,t) where r is a regular expression, s is the replacement string,
- and t is the string in which the search and replace is to be done. The
- contents of t before and after the sub are spelled out below.
-
- using t = "Don’t run that prune over, runt!":
- sub(/run/, "fly", t) turns t into "Don’t fly that prune over, runt!"
- gsub(/run/, "fly", t) turns t into "Don’t fly that pflye over, flyt!"
- gsub(/\brun\b/, "fly", t) turns t into "Don’t fly that prune over, runt!"
- gsub(/run/, "t&k", t) turns t into "Don’t trunk that ptrunke over, trunkt!"
- using t = "#define FOO 1":
- sub(/#define\W+(\w+)\W+([0-9]+)/, "int \1 = \2;",t) turns t into
- "int FOO = 1;" (\W+ means one or more non-word characters, \w+
- means one or more word characters, [0-9]+ means one or more digits;
- two groups are tagged).
-
- Three programs are supplied to help you do general–purpose listing of
- matches or search–and–replace; $MFSLister searches for either plain text
- or a regular expression with “Set variables” in the setup dialog, and
- lists file name/ line number of all single–line matches to stdout;
- $MFS_SuperLister does much the same, but finds matches that span a
- variable number of lines; and $MFS_SuperReplace does the ultimate search
- and replace, matching either plain text or full–blown regular expressions
- over a variable number of lines, handling any number of files at once,
- documenting the (post–change) locations of all changes to stdout. Heck, it
- even prints the fragments of original text before the changes, so that if
- you mess up you can at least (manually) undo the damage.
- */};
-
- struct patterns
- {/*
- Summary of patterns
- A list of beasts in the pattern zoo (regex stands for regular expression,
- pat stands for pattern, str stands for string variable):
- Pattern Example
- ---------------- -------------------------------
- BEGIN BEGIN blocks are done before all input
- END END blocks are done after all input
- /regex/ /Mary( \t)+had/
- str ~ /regex/ (or !~) $1 ~ /(\-)?[0-9]+/
- str ~ "regex" (or !~) $1 ~ "(\\-)?[0-9]+"
- relational expression NF > 4
- pattern && pattern FNR == 1 && /File title:/
- pattern || pattern /Vermont/ || /Maine/
- pattern ? pattern : pattern $3 != 0 ? $2 / $3 > 25 : $2 < 0
- ( pattern ) - see next line
- ! pattern !($0 == "" || $0 ~/^The end$/)
- pattern1 , pattern2 FNR == 5, FNR == 8
- */};
-
- struct operators
- {/*
- The operators in hAWK, in order of increasing precedence, are:
- -------------------------------------------------------------
- = += -= *= /= %= ^=
- Assignment. Both absolute assignment ( var " = " value ) and
- operator-assignment (the other forms) are supported. “a += b” is
- equivalent to “a = a + b”.
-
- ?: The C conditional expression. This has the form
- expr1 " ? " expr2 " : " expr3
- If expr1 is true, the value of the expression is expr2 , otherwise it is
- expr3 . Only one of expr2 and expr3 is evaluated.
-
- || logical OR. In “a || b” if a is true then b is not evaluated.
-
- && logical AND. In “a && b” if a is false then b is not evaluated.
-
- ~ !~ regular expression match, negated match. See “String-matching patterns”.
-
- < <= > >= != ==
- the regular relational operators. Note especially that strings
- can be compared, eg if ($3 == "cat"). In “a <= b” or the like,
- if both arguments are numbers the comparison is done
- numerically, otherwise they are compared as ASCII strings.
-
- blank string concatenation; if a = "John" and b = "Henry" then
- c = a b; produces c = "JohnHenry".
-
- + - addition and subtraction.
-
- * / % multiplication, division, and modulus (x%y produces the
- remainder of x divided by y, equivalent to x - int(x/y)*y).
-
- + - ! unary plus, unary minus, and logical negation.
-
- ^ exponentiation.
-
- ++ -- increment and decrement, both prefix and postfix.
-
- $ field reference. $0 is the entire current record, $1 the first
- field, and $NF the last field. Fields may be changed or added.
- */};
-
- struct numeric /*functions */
- {/*
- Built–in numeric functions
- hAWK has the following pre-defined arithmetic functions, with x and y as
- arbitrary expressions:
- atan2( y , x ) returns the arctangent of y/x in radians.
- cos( x ) returns the cosine of x in radians.
- exp( x ) the exponential function "e to the x"
- int( x ) truncates to integer (eg int(7.325) gives 7); to round,
- use int(x + .5).
- log( x ) the natural logarithm function, base e. For log base 10, use
- log(x)/log(10).
- rand() returns a random number, 0 <= rand() < 1.
- sin( x ) returns the sine of x in radians.
- sqrt( x ) the square root function.
- srand( x ) use x as a new seed for the random number generator.
- If no x is provided, the time of day will be used. The
- return value is the previous seed for the random
- number generator.
- */};
-
- struct string /*functions*/
- {/*
- Built–in string functions
- There is only one string operator, the concatenation operator, invoked
- when two variables or constants are separated by a space. Other useful
- string manuipulations in hAWK are carried out by built–in functions. In
- the following table, r is a regular expression, s and t are strings, the a
- and b are arrays, and i and n are integers.
-
- gsub(r, s, t) for each substring matching the regular expression r
- in the string t , substitutes the string s , and
- returns the number of substitutions. If t is not
- supplied, uses $0 .
- index( s , t ) returns the index of the string t in the string s,
- or 0 if t is not present.
- length( s ) returns the length of the string s .
- match( s , r ) returns the position in s where the regular expression r
- occurs, or 0 if r is not present, and sets the values of
- RSTART and RLENGTH .
- split(s, a, r) splits the string s into the array a on the regular
- expression r , and returns the number of fields. If r is
- omitted, FS is used instead.
- sprintf( fmt , expr-list ) prints expr-list according to fmt , and returns the
- resulting string. See the discussion of “printf” for details.
- sub(r, s,t) this is just like gsub , but only the leftmost matching
- substring is replaced. Returns number of substitutions.
- substr(s, i, n) returns the n-character substring of s starting at i . If n
- is omitted, the rest of s is used.
- tolower( s ) returns a copy of the string s , with all the upper-case
- characters in s translated to their corresponding
- lower-case counterparts. Non-alphabetic characters are
- left unchanged.
- toupper( s ) returns a copy of the string s , with all the lower-case
- characters in s translated to their corresponding
- upper-case counterparts. Non-alphabetic characters are
- left unchanged.
- lookup ( s ) returns integer–coded C type of s (s should be a word).
- At present this function is supported only by EnterAct.
- Types are taken from whatever EnterAct project is open
- at the time. See “$LookupTest” for an example.
- Type integer returned
- ---- ------------
- defined constant or macro 1
- file–scope variable 2
- function 4
- enum constant 8
- typedef 16
- struct tag 32
- union tag 64
- enum tag 128
- other 0
- sort(a,b,s) produces an index in the array “b” that can be used to
- access the elements of “a” in sorted order. The string
- “s” specifies the kind of sort; "a" for ASCII, "n" for
- numeric, "d" for dictionary order, and "ra", "rn",
- "rd" for reverse of the same. Returns the number of
- elements in the array “b”, which is indexed
- numerically from 1 upwards. The elements of “b” are
- the indexes of “a” in sorted order provided “b” is
- accessed in the sequence b[1], b[2], b[3] etc. Typical
- use is
- maxIndex = sort(a, b, "d")
- for (i = 1; i <= maxIndex; ++i)
- print a[b[i]]
- which will print the elements of a in sorted
- dictionary order. See “$WordFrequency” and
- “$XRef_Full” for examples, and “$SortTest_Nums” for a
- simple numeric example.
- time ( ) returns the current time, eg
- "Sunday, October 27, 1991 09:03:30 AM"
- —note this is the time when the function
- is called, down to the second, whereas the TIME
- variable holds the time at which your program run
- starts, down to the minute. See “$TIME” for an example.
- prompt ( s ) displays an OK/Cancel dialog. The string “s” appears
- at the top of the dialog, and you can type in a string
- in an edit field. Returns what you type in, as though
- it was a string constant. Both the string “s” and what
- you type in are limited to 255 characters. For an
- example of usage see “$PromptTest” and “$YoungMath”.
- Typical use is
- x = prompt("Enter the number of lines to print:")
- if (x+0 > 0) {
- while (getline lne > 0 && ++i <= x) print lne }
- If you cancel the dialog or hit <Return.> without
- typing in any text, prompt returns the null string "".
- progress (s) displays the string “s” in a dialog on your screen
- (the message stays on the screen). You can change the
- message with another “progress” call. “progress”
- returns the number of times it has been called, and
- the dialog goes away by itself at the end of your
- program run. For a test sample, see “$ProgressTest”.
- Within the replacement string 's' of gsub(r,s,t) and sub(r,s,t), a '&' is
- taken to stand for the entire string of text that was matched by the
- regular expression 'r'. For example, gsub(/cat/, "&s", t) with t = "cat
- and dogs" produces t = "cats and dogs" after the substitution. Use “\&” if
- you want a literal '&' in the replacement string.
-
- --and added for hAWK version 2 (mainly file functions):
- Note in the functions below where a file or directory name is required it must
- be a full pathname, of the form “disk:folder1:folder2:...:folderN:filename”
- for a file, or “disk:folder1:...:folderN” or “disk:folder1:..:folderN:”
- for a directory (the second version has a colon at the end). For a disk name,
- use “disk:” rather than “disk”.
- beep( n ) does a SysBeep(n); if the duration "n" is <= 0, the menu bar will
- flash instead. Durations of 0,1,2,5 work best.
- copy( s, t ) copies the file named “s” to the file named “t”. Both file names
- must be full pathnames (disk:folder:...folder:filename). Either
- the location or name or both can be changed. If file “t” already
- exists, it must be closed and unlocked. Both creator and type are
- preserved, and the resource fork is copied as well as the data
- fork. Any kind of file can be copied. To move or rename a file, use
- if (copy(s,t)) remove(s)
- (note this is an efficient way to move a file, but not a very fast
- way to rename one).
- Returns 1 if successful, 0 if the copy could not be done.
- exists( s ) returns 1 if the file named “s” exists, 0 if it does not. Any kind
- of file can be tested.
- fdate( s ) returns date/time of last modification of file named “s”, format
- “yr:mo:day:hr:min:sec” where yr is 4 digits, and the rest are 2
- (eg always 01 rather than just 1). The length of the string is
- always 19 (or 0 if no date could be extracted) and the colons
- and digits always occupy the same positions.
- fsize( s ) returns size in bytes of the data fork only of the file named “s”
- getclip( n ) returns the calling application’s current clipboard text, up to
- a maximum of the first “n” bytes. Use n = 0 or omit it entirely
- if you want the entire clipboard. For example, if the current
- clip is “Some text here” then getclip(6) returns “Some t”
- whereas getclip(0) or getclip() returns the entire clip. At
- present this function is supported by: EnterAct.
- list( s, a ) given file or directory full pathname in “s”, produces list of
- full pathnames for all TEXT files in the directory (either the
- directory named or the directory holding the file), as elements
- indexed 1,2,3... in the array “a”. Note subdirectories are also
- excluded. Returns the number of files in the list.
- nested( s, a ) given a file full pathname in “s”, generates list of full pathnames
- for directories at the same level ("sibling folders"); given directory
- name, generates list of subdirectories at the top level in the named
- directory (“offspring directories”). The list is returned as elements
- indexed 1,2,3... in the array “a”. In other words, the same as
- “list” but for folders rather than TEXT files. Note neither “list”
- nor “nested” look beneath the top level of the folder in question.
- Returns the number of directories in the list.
- remove( s ) deletes the file named “s”, provided it is closed and unlocked. Use
- with caution, this is not undoable unless you get lucky using your
- favourite file recovery tool. Returns 1 if the file was deleted,
- 0 otherwise.
- rename( s, t ) takes the file with full pathname “s”, and renames it “t”. The
- new name “t” can be a full pathname, or just the new file name
- proper, as in
- rename("Disk:dir1:aardvark", "Disk:dir1:fruitbat")
- or equivalently
- rename("Disk:dir1:aardvark", "fruitbat")
- This function works only with files, not directories or volumes,
- returning 1 if the rename was carried out, 0 if not.
- */};
-
- struct control
- {/*
- In the following list of control statments, any instance of “statement”
- can be replaced by a group of statements enclosed in curly braces {}:
-
- { statements }
- Simple grouping of several statements together, so that conditional or
- repeated execution can be applied to the group.
- if (condition) statement1 [ else statement1 ]
- If the condition evaluates to true then statement1 is carried out; the
- “else” clause is optional, and its statements will be executed if the
- condition is false.
- while (condition) statement
- The condition is first evaluated, and if it is false then the
- statement is skipped. If it is true then the statement is executed;
- the condition is again evaluated, and the statements again executed if
- the condition is true, and this process continues until the condition
- is false. Note that if the condition is false the first time then the
- statement will not be executed at all. “while” loops are affected by
- break and continue statements.
- do statement while (condition)
- The statement is always executed at least once; then the condition is
- evaluated, and if it is true then the statement is excuted again. This
- process continues until the condition is false. Unlike the “while”
- loop, the “do” loop always executes its statement at least once.
- for (expr1; expr2; expr3) statement
- eg “for (i = 1; i <= 6; ++i) {print i}”
- Mnemonically, “for it’s (a jolly good fellow)” helps: the “i” stands
- for initialization, the “t” for “test”, and the “s” for “step”. expr1
- is the initialization, executed only once, just before the “for” loop
- proper is entered. Next expr2, the test, is evaluated, and if it is
- true then the statement is executed, otherwise the for loop ends and
- control passes to the next statement beyond it. If the statement is
- executed then expr3, the step, is carried out, and then it’s back to
- the top of the loop —no more initialization, but the sequence test,
- execute, step, continues until the test produces false.
- for (var in array) statement
- Indexes for the array are retrieved one–by–one to the variable “var”,
- though not in a readily predictable order, and the statement is
- executed for each index.
- break
- For use only among the statements that make up the body of a while,
- do, or for loop. Usually found in the form “if (condition) break;”,
- when the break is executed then control immediately passes to the next
- statement after the loop.
- continue
- Also for use only in a while, do, or for loop, and also usually
- executed only when the condition of some if–statement is true. When
- encountered, control passes to the very end of the statements making
- up the body of the loop, and the next iteration of the loop begins.
- next
- Stop processing the current input record. The next input record is
- read and processing starts over with the first pattern in the hAWK
- program. If the end of the input data is reached, the END block(s), if
- any, are executed.
- exit [ expression ]
- In an END action, exit truly causes the hAWK program to terminate.
- Anywhere else, the exit statement causes the program to jump to the
- END actions, and only if none are present does the program immediately
- terminate. The “expression” is provided for compatiblilty with
- standard AWK programs, and won’t be of any use to you.
- */};
-
- struct function
- {/*
- Functions in hAWK take the form:
- "function" name(parameter1, parameter2,... local1, local2...)
- {
- statements
- }
- They are executed when called from within an action statement.
-
- hAWK function definitions begin with the keyword “function”, and no return
- type is declared, though a value may optionally be returned. Local
- variables are listed after the parameters for the function, more to
- simplify the grammar of the language than anything else. Scalar parameters
- are passed by value (ie a local copy is made for the function, and the
- original variable in the function call is not touched by the function)
- whereas array parameters are passed by reference (the parameter array name
- refers to the same array that is provided as the argument). Function
- definitions must be placed at the top level of your program outside any
- pattern–action blocks, and you generally end up with a readable program if
- you put all of your function definitions at the end of your program.
-
- Here’s a typical function:
- function Swap(a, i, j temp)
- {
- temp = a[i]
- a[i] = a[j]
- a[j] = temp
- }
- When called, it appears for example as
- arr[1] = 7; arr[4] = 9; Swap(arr, 1, 4)
- which results in arr[1] = 9, arr[4] = 7. Note that the “temp” variable is
- intended for use only within the Swap function, and is a local variable
- rather than a parameter of the function.
-
- Local variables are initialized to 0/"" each time the function is called.
- No space should be put between the function name and the '(' of the
- argument list when calling one of your own functions, to avoid invoking
- the simple–minded concatenation operator.
-
- Functions may return an expression, as in
- function SumArraySquared(a, sum)
- {
- for (i in a) #unlike C, array size need not be known separately
- sum += a[i]#note sum is local, automatically inited to zero
- return sum*sum
- }
- or
- function StringUpTo(str, upto)
- {
- return substr(str, 1, index(str, upto) - 1)
- }
- (eg StringUpTo("This is: a test", ":") would return "This is").
-
- Some details about functions:
- Newlines are optional after the left curly brace of the function body and
- before the closing left brace.
- Functions may call each other and may be recursive.
- The word func may be used in place of function. For tired typers only.
- */};
-
- struct print
- {/*
- The “print” statement
- “print” sends simply–formatted strings to a file, stdout by default. The
- expressions supplied to the print statement are separated from one another
- by commas, and may also be entirely surrounded by parentheses. The
- variations are
- print
- print expression1, expression2, ..., expressionN
- print (expression1, expression2, ..., expressionN)
- A “print” with no expressions is an abbreviation for
- print$0
- Each expression is converted to a string and printed in turn, with each
- comma being replaced by the built–in variable OFS, by default a single
- blank. Each print statement is terminated with the built–in ORS, by
- default a newline.
-
- The parenthesized version of “print” is necessary if relational operators
- are present in the expressions, since the '>' operator can mean “greater
- than” or “redirect output to the file...”—see “Output into files”.
-
- The print statement is used in virtually every sample program provided,
- and the more–sophisticated “printf” is seldom seen since fancy formatting
- is not often needed.
-
- print "" #prints just a blank line
- */};
-
- struct printf
- {/*
- The “printf” statement
- This function also has a parenthesized and unparenthesized form,
- printf format, expression1, expression2, ..., expressionN
- printf(format, expression1, expression2, ..., expressionN)
- and, as with “print”, the parentheses are needed only if a relational
- operator is contained in one of the expressions. The “format” argument is
- interpreted as a string, and may contain either literal text to be printed
- or format specifications for strings or numbers to be printed. Format
- specs are indicated in the format string by a '%', and there should be one
- expression following the format for each format specification—eg if you
- specify that a string, a number, and a string be printed, then you list
- the string, number, and string after the format, in the same order,
- separated by commas.
-
- The hAWK versions of the printf and sprintf functions accept the following
- conversion specification formats, entirely borrowed from C:
- %c an ASCII character. If the argument used for %c is numeric, it is
- treated as a character and printed. Otherwise, the argument is
- assumed to be a string, and the only first character of that
- string is printed.
- %d a decimal number (the integer part).
- %i just like %d .
- %e a floating point number of the form [-]d.ddddddE[+-]dd .
- %f a floating point number of the form [-]ddd.dddddd .
- %g use e or f conversion, whichever is shorter, with nonsignificant zeros
- suppressed.
- %o an unsigned octal number (again, an integer).
- %s a character string.
- %x an unsigned hexadecimal number (an integer).
- %X like %x , but using ABCDEF instead of abcdef .
- %% a single % character; no argument is converted.
-
- There are optional, additional parameters that may lie between the % and
- the control letter (also from C):
- - the expression should be left justified within its field
- (note if the '-' is absent then the expression is right
- justified)
- width the field should be padded to this width. If the number
- has a leading zero, then the field will be padded with
- zeros. Otherwise it is padded with blanks.
- . prec a number indicating the maximum width of strings or digits
- to the right of the decimal point.
- For example, %-23.14s prints strings in a field 23 characters wide, left
- justified, printing at most 14 characters from the string. And %8.4f will
- print a floating point number in a field 8 characters wide, right
- justified, with 4 digits to the right of the decimal point.
-
- The dynamic width and prec capabilities of the C library printf routines
- are not supported. However, they may be simulated by using the hAWK
- concatenation operation to build up a format specification dynamically.
-
- Some examples:
- “print var” always appends the value of ORS (by default a newline);
- to avoid this, use
- printf("%s ", var)
- and when a newline is needed, supply one yourself with something like
- print "" or printf("%s\n", var).
-
- Given strings of variable width in fields $1 and $2, reformat to print
- these strings right–justified in two nicely–lined–up columns:
- { one[++n] = $1
- two[n] = $2
- if (w1 < length($1))
- w1 = length($1)
- if (w2 < length($2))
- w2 = length($2)
-
- }
- END {w1 += 2; w2 += 2;#a couple of spaces between columns
- for (i = 1; i <= n; ++i)
- printf "%" w1 "s" "%" w2 "s\n", one[i], two[i]
- }
- —this illustrates using the hAWK concatenation operation “to build up a
- format specification dynamically”; for example, if w1 = 9 and w2 = 15
- (after adding 2) then we get
- printf "%9s%15s\n", one[i], two[i]
- as the effective printf statement.
- */};
-
- struct redirect
- {/*
- OUTPUT:
- By default, “print” and “printf” send all of their output to stdout.
- However, the redirection operators '>' and '>>' allow you to send output
- to any text file. Redirecting output takes one of the forms
- print expression–list > outfile
- print(expression–list) > outfile
- printf format, expression–list > outfile
- printf(format, expression–list) > outfile
- print > outfile
- or any of those with '>>' instead of '>'. The '>' operator will erase the
- contents of outfile before beginning to write to it, whereas '>>' will
- append what is being printed to outfile without clearing the file first.
- Both operators open the file “outfile” the first time it is encountered in
- the program, and keep it open. The file will be closed for you at the end
- of your program, but if you have many files to write to you should close
- each output file yourself when you are done with it, with
- “close(outfile)”.
-
- INPUT:
- “getline” is a built–in function that allows you to retrieve input records
- from the current input file or from any other file. As you know, the
- default behaviour of a hAWK program is to retrieve input from your input
- files one record at a time, marching through the records and files from
- beginning to end. Often, however, one needs to read in a group of lines
- until some condition is met, or interrupt regular input to retrieve
- records from some other file, and these are the special capabilities that
- “getline” provides. It can be used in the following ways:
- getline sets $0 from next input record; sets NF, NR, FNR .
- getline < file sets $0 from next record of file; sets NF .
- getline var sets var from next input record; sets NR, FNR .
- getline var < file sets var from next record of file .
- and in all cases “getline” returns 1 if a record was successfully
- retrieved, 0 if the end of file was encountered, and -1 if some problem
- occurred, such as failure to find the file.
-
- The effect of “getline” by itself is to dump the current string in $0 and
- replace it with the next input record, setting all the usual built–in
- variables. Program execution then continues with the statement following
- “getline”. By comparison, the “next” statement does everything that
- “getline” by itself does, but in addition processing starts over with the
- first pattern in your hAWK program.
-
- If a variable name is present immediately after “getline”, then the input
- record is retrieved to the variable instead of to $0. The '<' symbol is
- the input redirection operator meaning “get input from the file...”, and
- is followed by the name of the input file to use. Note that file names
- must be full path names, as is always the case in hAWK.
- */};
-
-