Usenet 1994 October

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 October / usenetsourcesnewsgroupsinfomagicoctober1994disk2.iso / unix / volume22 / gawk2.11 / part06 / gawk.texinfo.03 next >

Wrap

Text File | 1990-06-07 | 50KB | 1,390 lines

printf format, "----", "------" @} @{ printf format, $1, $2 @}' BBS-list @end example See if you can use the @code{printf} statement to line up the headings and table data for our @file{inventory-shipped} example covered earlier in the section on the @code{print} statement (@pxref{Print}). @node Redirection, Special Files, Printf, Printing @section Redirecting Output of @code{print} and @code{printf} @cindex output redirection @cindex redirection of output So far we have been dealing only with output that prints to the standard output, usually your terminal. Both @code{print} and @code{printf} can be told to send their output to other places. This is called @dfn{redirection}.@refill A redirection appears after the @code{print} or @code{printf} statement. Redirections in @code{awk} are written just like redirections in shell commands, except that they are written inside the @code{awk} program. @menu * File/Pipe Redirection:: Redirecting Output to Files and Pipes. * Close Output:: How to close output files and pipes. @end menu @node File/Pipe Redirection, Close Output, Redirection, Redirection @subsection Redirecting Output to Files and Pipes Here are the three forms of output redirection. They are all shown for the @code{print} statement, but they work identically for @code{printf} also. @table @code @item print @var{items} > @var{output-file} This type of redirection prints the items onto the output file @var{output-file}. The file name @var{output-file} can be any expression. Its value is changed to a string and then used as a file name (@pxref{Expressions}).@refill When this type of redirection is used, the @var{output-file} is erased before the first output is written to it. Subsequent writes do not erase @var{output-file}, but append to it. If @var{output-file} does not exist, then it is created.@refill For example, here is how one @code{awk} program can write a list of BBS names to a file @file{name-list} and a list of phone numbers to a file @file{phone-list}. Each output file contains one name or number per line. @example awk '@{ print $2 > "phone-list" print $1 > "name-list" @}' BBS-list @end example @item print @var{items} >> @var{output-file} This type of redirection prints the items onto the output file @var{output-file}. The difference between this and the single-@samp{>} redirection is that the old contents (if any) of @var{output-file} are not erased. Instead, the @code{awk} output is appended to the file. @cindex pipes for output @cindex output, piping @item print @var{items} | @var{command} It is also possible to send output through a @dfn{pipe} instead of into a file. This type of redirection opens a pipe to @var{command} and writes the values of @var{items} through this pipe, to another process created to execute @var{command}.@refill The redirection argument @var{command} is actually an @code{awk} expression. Its value is converted to a string, whose contents give the shell command to be run. For example, this produces two files, one unsorted list of BBS names and one list sorted in reverse alphabetical order: @example awk '@{ print $1 > "names.unsorted" print $1 | "sort -r > names.sorted" @}' BBS-list @end example Here the unsorted list is written with an ordinary redirection while the sorted list is written by piping through the @code{sort} utility. Here is an example that uses redirection to mail a message to a mailing list @samp{bug-system}. This might be useful when trouble is encountered in an @code{awk} script run periodically for system maintenance. @example print "Awk script failed:", $0 | "mail bug-system" print "at record number", FNR, "of", FILENAME | "mail bug-system" close("mail bug-system") @end example We call the @code{close} function here because it's a good idea to close the pipe as soon as all the intended output has been sent to it. @xref{Close Output}, for more information on this. @end table Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system to open a file or pipe only if the particular @var{file} or @var{command} you've specified has not already been written to by your program.@refill @node Close Output, , File/Pipe Redirection, Redirection @subsection Closing Output Files and Pipes @cindex closing output files and pipes @findex close When a file or pipe is opened, the file name or command associated with it is remembered by @code{awk} and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until @code{awk} exits. This is usually convenient. Sometimes there is a reason to close an output file or pipe earlier than that. To do this, use the @code{close} function, as follows: @example close(@var{filename}) @end example @noindent or @example close(@var{command}) @end example The argument @var{filename} or @var{command} can be any expression. Its value must exactly equal the string used to open the file or pipe to begin with---for example, if you open a pipe with this: @example print $1 | "sort -r > names.sorted" @end example @noindent then you must close it with this: @example close("sort -r > names.sorted") @end example Here are some reasons why you might need to close an output file: @itemize @bullet @item To write a file and read it back later on in the same @code{awk} program. Close the file when you are finished writing it; then you can start reading it with @code{getline} (@pxref{Getline}). @item To write numerous files, successively, in the same @code{awk} program. If you don't close the files, eventually you will exceed the system limit on the number of open files in one process. So close each one when you are finished writing it. @item To make a command finish. When you redirect output through a pipe, the command reading the pipe normally continues to try to read input as long as the pipe is open. Often this means the command cannot really do its work until the pipe is closed. For example, if you redirect output to the @code{mail} program, the message is not actually sent until the pipe is closed. @item To run the same program a second time, with the same arguments. This is not the same thing as giving more input to the first run! For example, suppose you pipe output to the @code{mail} program. If you output several lines redirected to this pipe without closing it, they make a single message of several lines. By contrast, if you close the pipe after each line of output, then each line makes a separate message. @end itemize @node Special Files, , Redirection, Printing @section Standard I/O Streams @cindex standard input @cindex standard output @cindex standard error output @cindex file descriptors Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error output}. These streams are, by default, terminal input and output, but they are often redirected with the shell, via the @samp{<}, @samp{<<}, @samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error is used only for writing error messages; the reason we have two separate streams, standard output and standard error, is so that they can be redirected separately. @c @cindex differences between @code{gawk} and @code{awk} In other implementations of @code{awk}, the only way to write an error message to standard error in an @code{awk} program is as follows: @example print "Serious error detected!\n" | "cat 1>&2" @end example @noindent This works by opening a pipeline to a shell command which can access the standard error stream which it inherits from the @code{awk} process. This is far from elegant, and is also inefficient, since it requires a separate process. So people writing @code{awk} programs have often neglected to do this. Instead, they have sent the error messages to the terminal, like this: @example NF != 4 @{ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty" @} @end example @noindent This has the same effect most of the time, but not always: although the standard error stream is usually the terminal, it can be redirected, and when that happens, writing to the terminal is not correct. In fact, if @code{awk} is run from a background job, it may not have a terminal at all. Then opening @file{/dev/tty} will fail. @code{gawk} provides special file names for accessing the three standard streams. When you redirect input or output in @code{gawk}, if the file name matches one of these special names, then @code{gawk} directly uses the stream it stands for. @cindex @file{/dev/stdin} @cindex @file{/dev/stdout} @cindex @file{/dev/stderr} @cindex @file{/dev/fd/} @table @file @item /dev/stdin The standard input (file descriptor 0). @item /dev/stdout The standard output (file descriptor 1). @item /dev/stderr The standard error output (file descriptor 2). @item /dev/fd/@var{n} The file associated with file descriptor @var{n}. Such a file must have been opened by the program initiating the @code{awk} execution (typically the shell). Unless you take special pains, only descriptors 0, 1 and 2 are available. @end table The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2}, respectively, but they are more self-explanatory. The proper way to write an error message in a @code{gawk} program is to use @file{/dev/stderr}, like this: @example NF != 4 @{ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr" @} @end example Recognition of these special file names is disabled if @code{gawk} is in compatibility mode (@pxref{Command Line}). @node One-liners, Patterns, Printing, Top @chapter Useful ``One-liners'' @cindex one-liners Useful @code{awk} programs are often short, just a line or two. Here is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. The description of the program will give you a good idea of what is going on, but please read the rest of the manual to become an @code{awk} expert! @table @code @item awk '@{ num_fields = num_fields + NF @} @itemx @ @ @ @ @ END @{ print num_fields @}' This program prints the total number of fields in all input lines. @item awk 'length($0) > 80' This program prints every line longer than 80 characters. The sole rule has a relational expression as its pattern, and has no action (so the default action, printing the record, is used). @item awk 'NF > 0' This program prints every line that has at least one field. This is an easy way to delete blank lines from a file (or rather, to create a new file similar to the old file but from which the blank lines have been deleted). @item awk '@{ if (NF > 0) print @}' This program also prints every line that has at least one field. Here we allow the rule to match every line, then decide in the action whether to print. @item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++) @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}' This program prints 7 random numbers from 0 to 100, inclusive. @item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}' This program prints the total number of bytes used by @var{files}. @item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @} @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}' This program prints the maximum line length of @var{file}. The input is piped through the @code{expand} program to change tabs into spaces, so the widths compared are actually the right-margin columns. @end table @node Patterns, Actions, One-liners, Top @chapter Patterns @cindex pattern, definition of Patterns in @code{awk} control the execution of rules: a rule is executed when its pattern matches the current input record. This chapter tells all about how to write patterns. @menu * Kinds of Patterns:: A list of all kinds of patterns. The following subsections describe them in detail. * Empty:: The empty pattern, which matches every record. * Regexp:: Regular expressions such as @samp{/foo/}. * Comparison Patterns:: Comparison expressions such as @code{$1 > 10}. * Boolean Patterns:: Combining comparison expressions. * Expression Patterns:: Any expression can be used as a pattern. * Ranges:: Using pairs of patterns to specify record ranges. * BEGIN/END:: Specifying initialization and cleanup rules. @end menu @node Kinds of Patterns, Empty, Patterns, Patterns @section Kinds of Patterns @cindex patterns, types of Here is a summary of the types of patterns supported in @code{awk}. @table @code @item /@var{regular expression}/ A regular expression as a pattern. It matches when the text of the input record fits the regular expression. (@xref{Regexp, , Regular Expressions as Patterns}.) @item @var{expression} A single expression. It matches when its value, converted to a number, is nonzero (if a number) or nonnull (if a string). (@xref{Expression Patterns}.) @item @var{pat1}, @var{pat2} A pair of patterns separated by a comma, specifying a range of records. (@xref{Ranges, , Specifying Record Ranges With Patterns}.) @item BEGIN @itemx END Special patterns to supply start-up or clean-up information to @code{awk}. (@xref{BEGIN/END}.) @item @var{null} The empty pattern matches every input record. (@xref{Empty, , The Empty Pattern}.) @end table @node Empty, Regexp, Kinds of Patterns, Patterns @section The Empty Pattern @cindex empty pattern @cindex pattern, empty An empty pattern is considered to match @emph{every} input record. For example, the program:@refill @example awk '@{ print $1 @}' BBS-list @end example @noindent prints just the first field of every record. @node Regexp, Comparison Patterns, Empty, Patterns @section Regular Expressions as Patterns @cindex pattern, regular expressions @cindex regexp @cindex regular expressions as patterns A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a class of strings. A regular expression enclosed in slashes (@samp{/}) is an @code{awk} pattern that matches every input record whose text belongs to that class. The simplest regular expression is a sequence of letters, numbers, or both. Such a regexp matches any string that contains that sequence. Thus, the regexp @samp{foo} matches any string containing @samp{foo}. Therefore, the pattern @code{/foo/} matches any input record containing @samp{foo}. Other kinds of regexps let you specify more complicated classes of strings. @menu * Usage: Regexp Usage. How regexps are used in patterns. * Operators: Regexp Operators. How to write a regexp. * Case-sensitivity:: How to do case-insensitive matching. @end menu @node Regexp Usage, Regexp Operators, Regexp, Regexp @subsection How to Use Regular Expressions A regular expression can be used as a pattern by enclosing it in slashes. Then the regular expression is matched against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, this prints the second field of each record that contains @samp{foo} anywhere: @example awk '/foo/ @{ print $2 @}' BBS-list @end example @cindex regular expression matching operators @cindex string-matching operators @cindex operators, string-matching @cindex operators, regular expression matching @cindex regexp search operators Regular expressions can also be used in comparison expressions. Then you can specify the string to match against; it need not be the entire current input record. These comparison expressions can be used as patterns or in @code{if} and @code{while} statements. @table @code @item @var{exp} ~ /@var{regexp}/ This is true if the expression @var{exp} (taken as a character string) is matched by @var{regexp}. The following example matches, or selects, all input records with the upper-case letter @samp{J} somewhere in the first field:@refill @example awk '$1 ~ /J/' inventory-shipped @end example So does this: @example awk '@{ if ($1 ~ /J/) print @}' inventory-shipped @end example @item @var{exp} !~ /@var{regexp}/ This is true if the expression @var{exp} (taken as a character string) is @emph{not} matched by @var{regexp}. The following example matches, or selects, all input records whose first field @emph{does not} contain the upper-case letter @samp{J}:@refill @example awk '$1 !~ /J/' inventory-shipped @end example @end table @cindex computed regular expressions @cindex regular expressions, computed @cindex dynamic regular expressions The right hand side of a @samp{~} or @samp{!~} operator need not be a constant regexp (i.e., a string of characters between slashes). It may be any expression. The expression is evaluated, and converted if necessary to a string; the contents of the string are used as the regexp. A regexp that is computed in this way is called a @dfn{dynamic regexp}. For example: @example identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" $0 ~ identifier_regexp @end example @noindent sets @code{identifier_regexp} to a regexp that describes @code{awk} variable names, and tests if the input record matches this regexp. @node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp @subsection Regular Expression Operators @cindex metacharacters @cindex regular expression metacharacters You can combine regular expressions with the following characters, called @dfn{regular expression operators}, or @dfn{metacharacters}, to increase the power and versatility of regular expressions. Here is a table of metacharacters. All characters not listed in the table stand for themselves. @table @code @item ^ This matches the beginning of the string or the beginning of a line within the string. For example: @example ^@@chapter @end example @noindent matches the @samp{@@chapter} at the beginning of a string, and can be used to identify chapter beginnings in Texinfo source files. @item $ This is similar to @samp{^}, but it matches only at the end of a string or the end of a line within the string. For example: @example p$ @end example @noindent matches a record that ends with a @samp{p}. @item . This matches any single character except a newline. For example: @example .P @end example @noindent matches any single character followed by a @samp{P} in a string. Using concatenation we can make regular expressions like @samp{U.A}, which matches any three-character sequence that begins with @samp{U} and ends with @samp{A}. @item [@dots{}] This is called a @dfn{character set}. It matches any one of the characters that are enclosed in the square brackets. For example: @example [MVX] @end example @noindent matches any of the characters @samp{M}, @samp{V}, or @samp{X} in a string.@refill Ranges of characters are indicated by using a hyphen between the beginning and ending characters, and enclosing the whole thing in brackets. For example:@refill @example [0-9] @end example @noindent matches any digit. To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a character set, put a @samp{\} in front of it. For example: @example [d\]] @end example @noindent matches either @samp{]}, or @samp{d}.@refill This treatment of @samp{\} is compatible with other @code{awk} implementations but incompatible with the proposed POSIX specification for @code{awk}. The current draft specifies the use of the same syntax used in @code{egrep}. We may change @code{gawk} to fit the standard, once we are sure it will no longer change. For the meanwhile, the @samp{-a} option specifies the traditional @code{awk} syntax described above (which is also the default), while the @samp{-e} option specifies @code{egrep} syntax. @xref{Options}. In @code{egrep} syntax, backslash is not syntactically special within square brackets. This means that special tricks have to be used to represent the characters @samp{]}, @samp{-} and @samp{^} as members of a character set. To match @samp{-}, write it as @samp{---}, which is a range containing only @samp{-}. You may also give @samp{-} as the first or last character in the set. To match @samp{^}, put it anywhere except as the first character of a set. To match a @samp{]}, make it the first character in the set. For example: @example []d^] @end example @noindent matches either @samp{]}, @samp{d} or @samp{^}.@refill @item [^ @dots{}] This is a @dfn{complemented character set}. The first character after the @samp{[} @emph{must} be a @samp{^}. It matches any characters @emph{except} those in the square brackets. For example: @example [^0-9] @end example @noindent matches any character that is not a digit. @item | This is the @dfn{alternation operator} and it is used to specify alternatives. For example: @example ^P|[0-9] @end example @noindent matches any string that matches either @samp{^P} or @samp{[0-9]}. This means it matches any string that contains a digit or starts with @samp{P}. The alternation applies to the largest possible regexps on either side. @item (@dots{}) Parentheses are used for grouping in regular expressions as in arithmetic. They can be used to concatenate regular expressions containing the alternation operator, @samp{|}. @item * This symbol means that the preceding regular expression is to be repeated as many times as possible to find a match. For example: @example ph* @end example @noindent applies the @samp{*} symbol to the preceding @samp{h} and looks for matches to one @samp{p} followed by any number of @samp{h}s. This will also match just @samp{p} if no @samp{h}s are present. The @samp{*} repeats the @emph{smallest} possible preceding expression. (Use parentheses if you wish to repeat a larger expression.) It finds as many repetitions as possible. For example: @example awk '/$c[ad][ad]*r x$/ @{ print @}' sample @end example @noindent prints every record in the input containing a string of the form @samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill @item + This symbol is similar to @samp{*}, but the preceding expression must be matched at least once. This means that: @example wh+y @end example @noindent would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas @samp{wh*y} would match all three of these strings. This is a simpler way of writing the last @samp{*} example: @example awk '/$c[ad]+r x$/ @{ print @}' sample @end example @item ? This symbol is similar to @samp{*}, but the preceding expression can be matched once or not at all. For example: @example fe?d @end example @noindent will match @samp{fed} or @samp{fd}, but nothing else.@refill @item \ This is used to suppress the special meaning of a character when matching. For example: @example \$ @end example @noindent matches the character @samp{$}. The escape sequences used for string constants (@pxref{Constants}) are valid in regular expressions as well; they are also introduced by a @samp{\}. @end table In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have the highest precedence, followed by concatenation, and finally by @samp{|}. As in arithmetic, parentheses can change how operators are grouped.@refill @node Case-sensitivity,, Regexp Operators, Regexp @subsection Case-sensitivity in Matching Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters), and inside character sets. Thus a @samp{w} in a regular expression matches only a lower case @samp{w} and not an upper case @samp{W}. The simplest way to do a case-independent match is to use a character set: @samp{[Ww]}. However, this can be cumbersome if you need to use it often; and it can make the regular expressions harder for humans to read. There are two other alternatives that you might prefer. One way to do a case-insensitive match at a particular point in the program is to convert the data to a single case, using the @code{tolower} or @code{toupper} built-in string functions (which we haven't discussed yet; @pxref{String Functions}). For example: @example tolower($1) ~ /foo/ @{ @dots{} @} @end example @noindent converts the first field to lower case before matching against it. Another method is to set the variable @code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero, @emph{all} regexp operations ignore case. Changing the value of @code{IGNORECASE} dynamically controls the case sensitivity of your program as it runs. Case is significant by default because @code{IGNORECASE} (like most variables) is initialized to zero. @example x = "aB" if (x ~ /ab/) @dots{} # this test will fail IGNORECASE = 1 if (x ~ /ab/) @dots{} # now it will succeed @end example You cannot generally use @code{IGNORECASE} to make certain rules case-insensitive and other rules case-sensitive, because there is no way to set @code{IGNORECASE} just for the pattern of a particular rule. To do this, you must use character sets or @code{tolower}. However, one thing you can do only with @code{IGNORECASE} is turn case-sensitivity on or off dynamically for all the rules at once. @code{IGNORECASE} can be set on the command line, or in a @code{BEGIN} rule. Setting @code{IGNORECASE} from the command line is a way to make a program case-insensitive without having to edit it. The value of @code{IGNORECASE} has no effect if @code{gawk} is in compatibility mode (@pxref{Command Line}). Case is always significant in compatibility mode. @node Comparison Patterns, Boolean Patterns, Regexp, Patterns @section Comparison Expressions as Patterns @cindex comparison expressions as patterns @cindex pattern, comparison expressions @cindex relational operators @cindex operators, relational @dfn{Comparison patterns} test relationships such as equality between two strings or numbers. They are a special case of expression patterns (@pxref{Expression Patterns}). They are written with @dfn{relational operators}, which are a superset of those in C. Here is a table of them: @table @code @item @var{x} < @var{y} True if @var{x} is less than @var{y}. @item @var{x} <= @var{y} True if @var{x} is less than or equal to @var{y}. @item @var{x} > @var{y} True if @var{x} is greater than @var{y}. @item @var{x} >= @var{y} True if @var{x} is greater than or equal to @var{y}. @item @var{x} == @var{y} True if @var{x} is equal to @var{y}. @item @var{x} != @var{y} True if @var{x} is not equal to @var{y}. @item @var{x} ~ @var{y} True if @var{x} matches the regular expression described by @var{y}. @item @var{x} !~ @var{y} True if @var{x} does not match the regular expression described by @var{y}. @end table The operands of a relational operator are compared as numbers if they are both numbers. Otherwise they are converted to, and compared as, strings (@pxref{Conversion}). Strings are compared by comparing the first character of each, then the second character of each, and so on, until there is a difference. If the two strings are equal until the shorter one runs out, the shorter one is considered to be less than the longer one. Thus, @code{"10"} is less than @code{"9"}. The left operand of the @samp{~} and @samp{!~} operators is a string. The right operand is either a constant regular expression enclosed in slashes (@code{/@var{regexp}/}), or any expression, whose string value is used as a dynamic regular expression (@pxref{Regexp Usage}). The following example prints the second field of each input record whose first field is precisely @samp{foo}. @example awk '$1 == "foo" @{ print $2 @}' BBS-list @end example @noindent Contrast this with the following regular expression match, which would accept any record with a first field that contains @samp{foo}: @example awk '$1 ~ "foo" @{ print $2 @}' BBS-list @end example @noindent or, equivalently, this one: @example awk '$1 ~ /foo/ @{ print $2 @}' BBS-list @end example @node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns @section Boolean Operators and Patterns @cindex patterns, boolean @cindex boolean patterns A @dfn{boolean pattern} is an expression which combines other patterns using the @dfn{boolean operators} ``or'' (@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern matches an input record depends on whether its subpatterns match. For example, the following command prints all records in the input file @file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill @example awk '/2400/ && /foo/' BBS-list @end example The following command prints all records in the input file @file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or both.@refill @example awk '/2400/ || /foo/' BBS-list @end example The following command prints all records in the input file @file{BBS-list} that do @emph{not} contain the string @samp{foo}. @example awk '! /foo/' BBS-list @end example Note that boolean patterns are a special case of expression patterns (@pxref{Expression Patterns}); they are expressions that use the boolean operators. For complete information on the boolean operators, see @ref{Boolean Ops}. The subpatterns of a boolean pattern can be constant regular expressions, comparisons, or any other @code{gawk} expressions. Range patterns are not expressions, so they cannot appear inside boolean patterns. Likewise, the special patterns @code{BEGIN} and @code{END}, which never match any input record, are not expressions and cannot appear inside boolean patterns. @node Expression Patterns, Ranges, Boolean Patterns, Patterns @section Expressions as Patterns Any @code{awk} expression is valid also as a pattern in @code{gawk}. Then the pattern ``matches'' if the expression's value is nonzero (if a number) or nonnull (if a string). The expression is reevaluated each time the rule is tested against a new input record. If the expression uses fields such as @code{$1}, the value depends directly on the new input record's text; otherwise, it depends only on what has happened so far in the execution of the @code{awk} program, but that may still be useful. Comparison patterns are actually a special case of this. For example, the expression @code{$5 == "foo"} has the value 1 when the value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this expression as a pattern matches when the two values are equal. Boolean patterns are also special cases of expression patterns. A constant regexp as a pattern is also a special case of an expression pattern. @code{/foo/} as an expression has the value 1 if @samp{foo} appears in the current input record; thus, as a pattern, @code{/foo/} matches any record containing @samp{foo}. Other implementations of @code{awk} are less general than @code{gawk}: they allow comparison expressions, and boolean combinations thereof (optionally with parentheses), but not necessarily other kinds of expressions. @node Ranges, BEGIN/END, Expression Patterns, Patterns @section Specifying Record Ranges With Patterns @cindex range pattern @cindex patterns, range A @dfn{range pattern} is made of two patterns separated by a comma, of the form @code{@var{begpat}, @var{endpat}}. It matches ranges of consecutive input records. The first pattern @var{begpat} controls where the range begins, and the second one @var{endpat} controls where it ends. For example,@refill @example awk '$1 == "on", $1 == "off"' @end example @noindent prints every record between @samp{on}/@samp{off} pairs, inclusive. In more detail, a range pattern starts out by matching @var{begpat} against every input record; when a record matches @var{begpat}, the range pattern becomes @dfn{turned on}. The range pattern matches this record. As long as it stays turned on, it automatically matches every input record read. But meanwhile, it also matches @var{endpat} against every input record, and when that succeeds, the range pattern is turned off again for the following record. Now it goes back to checking @var{begpat} against each record. The record that turns on the range pattern and the one that turns it off both match the range pattern. If you don't want to operate on these records, you can write @code{if} statements in the rule's action to distinguish them. It is possible for a pattern to be turned both on and off by the same record, if both conditions are satisfied by that record. Then the action is executed for just that record. @node BEGIN/END,, Ranges, Patterns @section @code{BEGIN} and @code{END} Special Patterns @cindex @code{BEGIN} special pattern @cindex patterns, @code{BEGIN} @cindex @code{END} special pattern @cindex patterns, @code{END} @code{BEGIN} and @code{END} are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your @code{awk} script. A @code{BEGIN} rule is executed, once, before the first input record has been read. An @code{END} rule is executed, once, after all the input has been read. For example:@refill @group @example awk 'BEGIN @{ print "Analysis of `foo'" @} /foo/ @{ ++foobar @} END @{ print "`foo' appears " foobar " times." @}' BBS-list @end example @end group This program finds out how many times the string @samp{foo} appears in the input file @file{BBS-list}. The @code{BEGIN} rule prints a title for the report. There is no need to use the @code{BEGIN} rule to initialize the counter @code{foobar} to zero, as @code{awk} does this for us automatically (@pxref{Variables}). The second rule increments the variable @code{foobar} every time a record containing the pattern @samp{foo} is read. The @code{END} rule prints the value of @code{foobar} at the end of the run.@refill The special patterns @code{BEGIN} and @code{END} cannot be used in ranges or with boolean operators. An @code{awk} program may have multiple @code{BEGIN} and/or @code{END} rules. They are executed in the order they appear, all the @code{BEGIN} rules at start-up and all the @code{END} rules at termination. Multiple @code{BEGIN} and @code{END} sections are useful for writing library functions, since each library can have its own @code{BEGIN} or @code{END} rule to do its own initialization and/or cleanup. Note that the order in which library functions are named on the command line controls the order in which their @code{BEGIN} and @code{END} rules are executed. Therefore you have to be careful to write such rules in library files so that it doesn't matter what order they are executed in. @xref{Command Line}, for more information on using library functions. If an @code{awk} program only has a @code{BEGIN} rule, and no other rules, then the program exits after the @code{BEGIN} rule has been run. (Older versions of @code{awk} used to keep reading and ignoring input until end of file was seen.) However, if an @code{END} rule exists as well, then the input will be read, even if there are no other rules in the program. This is necessary in case the @code{END} rule checks the @code{NR} variable. @code{BEGIN} and @code{END} rules must have actions; there is no default action for these rules since there is no current record when they run. @node Actions, Expressions, Patterns, Top @chapter Actions: Overview @cindex action, definition of @cindex curly braces @cindex action, curly braces @cindex action, separating statements An @code{awk} @dfn{program} or @dfn{script} consists of a series of @dfn{rules} and function definitions, interspersed. (Functions are described later; see @ref{User-defined}.) A rule contains a pattern and an @dfn{action}, either of which may be omitted. The purpose of the action is to tell @code{awk} what to do once a match for the pattern is found. Thus, the entire program looks somewhat like this: @example @r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} @r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} @dots{} function @var{name} (@var{args}) @{ @dots{} @} @dots{} @end example An action consists of one or more @code{awk} @dfn{statements}, enclosed in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one thing to be done. The statements are separated by newlines or semicolons. The curly braces around an action must be used even if the action contains only one statement, or even if it contains no statements at all. However, if you omit the action entirely, omit the curly braces as well. (An omitted action is equivalent to @samp{@{ print $0 @}}.) Here are the kinds of statement supported in @code{awk}: @itemize @bullet @item Expressions, which can call functions or assign values to variables (@pxref{Expressions}). Executing this kind of statement simply computes the value of the expression and then ignores it. This is useful when the expression has side effects (@pxref{Assignment Ops}). @item Control statements, which specify the control flow of @code{awk} programs. The @code{awk} language gives you C-like constructs (@code{if}, @code{for}, @code{while}, and so on) as well as a few special ones (@pxref{Statements}).@refill @item Compound statements, which consist of one or more statements enclosed in curly braces. A compound statement is used in order to put several statements together in the body of an @code{if}, @code{while}, @code{do} or @code{for} statement. @item Input control, using the @code{getline} function (@pxref{Getline}), and the @code{next} statement (@pxref{Next Statement}). @item Output statements, @code{print} and @code{printf}. @xref{Printing}. @item Deletion statements, for deleting array elements. @xref{Delete}. @end itemize @iftex The next two chapters cover in detail expressions and control statements, respectively. We go on to treat arrays, and built-in functions, both of which are used in expressions. Then we proceed to discuss how to define your own functions. @end iftex @node Expressions, Statements, Actions, Top @chapter Actions: Expressions @cindex expression Expressions are the basic building block of @code{awk} actions. An expression evaluates to a value, which you can print, test, store in a variable or pass to a function. But, beyond that, an expression can assign a new value to a variable or a field, with an assignment operator. An expression can serve as a statement on its own. Most other kinds of statement contain one or more expressions which specify data to be operated on. As in other languages, expressions in @code{awk} include variables, array references, constants, and function calls, as well as combinations of these with various operators. @menu * Constants:: String, numeric, and regexp constants. * Variables:: Variables give names to values for later use. * Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.) * Concatenation:: Concatenating strings. * Comparison Ops:: Comparison of numbers and strings with @samp{<}, etc. * Boolean Ops:: Combining comparison expressions using boolean operators @samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not''). * Assignment Ops:: Changing the value of a variable or a field. * Increment Ops:: Incrementing the numeric value of a variable. * Conversion:: The conversion of strings to numbers and vice versa. * Conditional Exp:: Conditional expressions select between two subexpressions under control of a third subexpression. * Function Calls:: A function call is an expression. * Precedence:: How various operators nest. @end menu @node Constants, Variables, Expressions, Expressions @section Constant Expressions @cindex constants, types of @cindex string constants The simplest type of expression is the @dfn{constant}, which always has the same value. There are three types of constant: numeric constants, string constants, and regular expression constants. @cindex numeric constant @cindex numeric value A @dfn{numeric constant} stands for a number. This number can be an integer, a decimal fraction, or a number in scientific (exponential) notation. Note that all numeric values are represented within @code{awk} in double-precision floating point. Here are some examples of numeric constants, which all have the same value: @example 105 1.05e+2 1050e-1 @end example A string constant consists of a sequence of characters enclosed in double-quote marks. For example: @example "parrot" @end example @noindent @c @cindex differences between @code{gawk} and @code{awk} represents the string whose contents are @samp{parrot}. Strings in @code{gawk} can be of any length and they can contain all the possible 8-bit ASCII characters including ASCII NUL. Other @code{awk} implementations may have difficulty with some character codes.@refill @cindex escape sequence notation Some characters cannot be included literally in a string constant. You represent them instead with @dfn{escape sequences}, which are character sequences beginning with a backslash (@samp{\}). One use of an escape sequence is to include a double-quote character in a string constant. Since a plain double-quote would end the string, you must use @samp{\"} to represent a single double-quote character as a part of the string. Backslash itself is another character that can't be included normally; you write @samp{\\} to put one backslash in the string. Thus, the string whose contents are the two characters @samp{"\} must be written @code{"\"\\"}. Another use of backslash is to represent unprintable characters such as newline. While there is nothing to stop you from writing most of these characters directly in a string constant, they may look ugly. Here is a table of all the escape sequences used in @code{awk}: @table @code @item \\ Represents a literal backslash, @samp{\}. @item \a Represents the ``alert'' character, control-g, ASCII code 7. @item \b Represents a backspace, control-h, ASCII code 8. @item \f Represents a formfeed, control-l, ASCII code 12. @item \n Represents a newline, control-j, ASCII code 10. @item \r Represents a carriage return, control-m, ASCII code 13. @item \t Represents a horizontal tab, control-i, ASCII code 9. @item \v Represents a vertical tab, control-k, ASCII code 11. @item \@var{nnn} Represents the octal value @var{nnn}, where @var{nnn} are one to three digits between 0 and 7. For example, the code for the ASCII ESC (escape) character is @samp{\033}.@refill @item \x@var{hh@dots{}} Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or @samp{a} through @samp{f}). Like the same construct in ANSI C, the escape sequence continues until the first non-hexadecimal digit is seen. However, using more than two hexadecimal digits produces undefined results.@refill @end table A constant regexp is a regular expression description enclosed in slashes, such as @code{/^beginning and end$/}. Most regexps used in @code{awk} programs are constant, but the @samp{~} and @samp{!~} operators can also match computed or ``dynamic'' regexps (@pxref{Regexp Usage}). Constant regexps are useful only with the @samp{~} and @samp{!~} operators; you cannot assign them to variables or print them. They are not truly expressions in the usual sense. @node Variables, Arithmetic Ops, Constants, Expressions @section Variables @cindex variables, user-defined @cindex user-defined variables Variables let you give names to values and refer to them later. You have already seen variables in many of the examples. The name of a variable must be a sequence of letters, digits and underscores, but it may not begin with a digit. Case is significant in variable names; @code{a} and @code{A} are distinct variables. A variable name is a valid expression by itself; it represents the variable's current value. Variables are given new values with @dfn{assignment operators} and @dfn{increment operators}. @xref{Assignment Ops}. A few variables have special built-in meanings, such as @code{FS}, the field separator, and @code{NF}, the number of fields in the current input record. @xref{Built-in Variables}, for a list of them. These built-in variables can be used and assigned just like all other variables, but their values are also used or changed automatically by @code{awk}. Each built-in variable's name is made entirely of upper case letters. Variables in @code{awk} can be assigned either numeric values or string values. By default, variables are initialized to the null string, which is effectively zero if converted to a number. So there is no need to ``initialize'' each variable explicitly in @code{awk}, the way you would need to do in C or most other traditional programming languages. @menu * Assignment Options:: Setting variables on the command line and a summary of command line syntax. This is an advanced method of input. @end menu @node Assignment Options,, Variables, Variables @subsection Assigning Variables on the Command Line You can set any @code{awk} variable by including a @dfn{variable assignment} among the arguments on the command line when you invoke @code{awk} (@pxref{Command Line}). Such an assignment has this form: @example @var{variable}=@var{text} @end example @noindent With it, you can set a variable either at the beginning of the @code{awk} run or in between input files. If you precede the assignment with the @samp{-v} option, like this: @example -v @var{variable}=@var{text} @end example @noindent then the variable is set at the very beginning, before even the @code{BEGIN} rules are run. The @samp{-v} option and its assignment must precede all the file name arguments. Otherwise, the variable assignment is performed at a time determined by its position among the input file arguments: after the processing of the preceding input file argument. For example: @example awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list @end example @noindent prints the value of field number @code{n} for all input records. Before the first file is read, the command line sets the variable @code{n} equal to 4. This causes the fourth field to be printed in lines from the file @file{inventory-shipped}. After the first file has finished, but before the second file is started, @code{n} is set to 2, so that the second field is printed in lines from @file{BBS-list}. Command line arguments are made available for explicit examination by the @code{awk} program in an array named @code{ARGV} (@pxref{Built-in Variables}). @node Arithmetic Ops, Concatenation, Variables, Expressions @section Arithmetic Operators @cindex arithmetic operators @cindex operators, arithmetic @cindex addition @cindex subtraction @cindex multiplication @cindex division @cindex remainder @cindex quotient @cindex exponentiation The @code{awk} language uses the common arithmetic operators when evaluating expressions. All of these arithmetic operators follow normal precedence rules, and work as you would expect them to. This example divides field three by field four, adds field two, stores the result into field one, and prints the resulting altered input record: @example awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped @end example The arithmetic operators in @code{awk} are: @table @code @item @var{x} + @var{y} Addition. @item @var{x} - @var{y} Subtraction. @item - @var{x} Negation. @item @var{x} * @var{y} Multiplication. @item @var{x} / @var{y} Division. Since all numbers in @code{awk} are double-precision floating point, the result is not rounded to an integer: @code{3 / 4} has the value 0.75. @item @var{x} % @var{y} @c @cindex differences between @code{gawk} and @code{awk} Remainder. The quotient is rounded toward zero to an integer, multiplied by @var{y} and this result is subtracted from @var{x}. This operation is sometimes known as ``trunc-mod''. The following relation always holds: @example b * int(a / b) + (a % b) == a @end example One undesirable effect of this definition of remainder is that @code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus, @example -17 % 8 = -1 @end example In other @code{awk} implementations, the signedness of the remainder may be machine dependent. @item @var{x} ^ @var{y} @itemx @var{x} ** @var{y} Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has the value 8. The character sequence @samp{**} is equivalent to @samp{^}. @end table @node Concatenation, Comparison Ops, Arithmetic Ops, Expressions @section String Concatenation @cindex string operators @cindex operators, string @cindex concatenation There is only one string operation: concatenation. It does not have a specific operator to represent it. Instead, concatenation is performed by writing expressions next to one another, with no operator. For example: @example awk '@{ print "Field number one: " $1 @}' BBS-list @end example @noindent produces, for the first record in @file{BBS-list}: @example Field number one: aardvark @end example Without the space in the string constant after the @samp{:}, the line would run together. For example: @example awk '@{ print "Field number one:" $1 @}' BBS-list @end example @noindent produces, for the first record in @file{BBS-list}: @example Field number one:aardvark @end example Since string concatenation does not have an explicit operator, it is