home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 October
/
usenetsourcesnewsgroupsinfomagicoctober1994disk2.iso
/
unix
/
volume22
/
gawk2.11
/
part06
/
gawk.texinfo.03
next >
Wrap
Text File
|
1990-06-07
|
50KB
|
1,390 lines
printf format, "----", "------" @}
@{ printf format, $1, $2 @}' BBS-list
@end example
See if you can use the @code{printf} statement to line up the headings and
table data for our @file{inventory-shipped} example covered earlier in the
section on the @code{print} statement (@pxref{Print}).
@node Redirection, Special Files, Printf, Printing
@section Redirecting Output of @code{print} and @code{printf}
@cindex output redirection
@cindex redirection of output
So far we have been dealing only with output that prints to the standard
output, usually your terminal. Both @code{print} and @code{printf} can be
told to send their output to other places. This is called
@dfn{redirection}.@refill
A redirection appears after the @code{print} or @code{printf} statement.
Redirections in @code{awk} are written just like redirections in shell
commands, except that they are written inside the @code{awk} program.
@menu
* File/Pipe Redirection:: Redirecting Output to Files and Pipes.
* Close Output:: How to close output files and pipes.
@end menu
@node File/Pipe Redirection, Close Output, Redirection, Redirection
@subsection Redirecting Output to Files and Pipes
Here are the three forms of output redirection. They are all shown for
the @code{print} statement, but they work identically for @code{printf}
also.
@table @code
@item print @var{items} > @var{output-file}
This type of redirection prints the items onto the output file
@var{output-file}. The file name @var{output-file} can be any
expression. Its value is changed to a string and then used as a
file name (@pxref{Expressions}).@refill
When this type of redirection is used, the @var{output-file} is erased
before the first output is written to it. Subsequent writes do not
erase @var{output-file}, but append to it. If @var{output-file} does
not exist, then it is created.@refill
For example, here is how one @code{awk} program can write a list of
BBS names to a file @file{name-list} and a list of phone numbers to a
file @file{phone-list}. Each output file contains one name or number
per line.
@example
awk '@{ print $2 > "phone-list"
print $1 > "name-list" @}' BBS-list
@end example
@item print @var{items} >> @var{output-file}
This type of redirection prints the items onto the output file
@var{output-file}. The difference between this and the
single-@samp{>} redirection is that the old contents (if any) of
@var{output-file} are not erased. Instead, the @code{awk} output is
appended to the file.
@cindex pipes for output
@cindex output, piping
@item print @var{items} | @var{command}
It is also possible to send output through a @dfn{pipe} instead of into a
file. This type of redirection opens a pipe to @var{command} and writes
the values of @var{items} through this pipe, to another process created
to execute @var{command}.@refill
The redirection argument @var{command} is actually an @code{awk}
expression. Its value is converted to a string, whose contents give the
shell command to be run.
For example, this produces two files, one unsorted list of BBS names
and one list sorted in reverse alphabetical order:
@example
awk '@{ print $1 > "names.unsorted"
print $1 | "sort -r > names.sorted" @}' BBS-list
@end example
Here the unsorted list is written with an ordinary redirection while
the sorted list is written by piping through the @code{sort} utility.
Here is an example that uses redirection to mail a message to a mailing
list @samp{bug-system}. This might be useful when trouble is encountered
in an @code{awk} script run periodically for system maintenance.
@example
print "Awk script failed:", $0 | "mail bug-system"
print "at record number", FNR, "of", FILENAME | "mail bug-system"
close("mail bug-system")
@end example
We call the @code{close} function here because it's a good idea to close
the pipe as soon as all the intended output has been sent to it.
@xref{Close Output}, for more information on this.
@end table
Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system
to open a file or pipe only if the particular @var{file} or @var{command}
you've specified has not already been written to by your program.@refill
@node Close Output, , File/Pipe Redirection, Redirection
@subsection Closing Output Files and Pipes
@cindex closing output files and pipes
@findex close
When a file or pipe is opened, the file name or command associated with
it is remembered by @code{awk} and subsequent writes to the same file or
command are appended to the previous writes. The file or pipe stays
open until @code{awk} exits. This is usually convenient.
Sometimes there is a reason to close an output file or pipe earlier
than that. To do this, use the @code{close} function, as follows:
@example
close(@var{filename})
@end example
@noindent
or
@example
close(@var{command})
@end example
The argument @var{filename} or @var{command} can be any expression.
Its value must exactly equal the string used to open the file or pipe
to begin with---for example, if you open a pipe with this:
@example
print $1 | "sort -r > names.sorted"
@end example
@noindent
then you must close it with this:
@example
close("sort -r > names.sorted")
@end example
Here are some reasons why you might need to close an output file:
@itemize @bullet
@item
To write a file and read it back later on in the same @code{awk}
program. Close the file when you are finished writing it; then
you can start reading it with @code{getline} (@pxref{Getline}).
@item
To write numerous files, successively, in the same @code{awk}
program. If you don't close the files, eventually you will exceed the
system limit on the number of open files in one process. So close
each one when you are finished writing it.
@item
To make a command finish. When you redirect output through a pipe,
the command reading the pipe normally continues to try to read input
as long as the pipe is open. Often this means the command cannot
really do its work until the pipe is closed. For example, if you
redirect output to the @code{mail} program, the message is not
actually sent until the pipe is closed.
@item
To run the same program a second time, with the same arguments.
This is not the same thing as giving more input to the first run!
For example, suppose you pipe output to the @code{mail} program. If you
output several lines redirected to this pipe without closing it, they make
a single message of several lines. By contrast, if you close the pipe
after each line of output, then each line makes a separate message.
@end itemize
@node Special Files, , Redirection, Printing
@section Standard I/O Streams
@cindex standard input
@cindex standard output
@cindex standard error output
@cindex file descriptors
Running programs conventionally have three input and output streams
already available to them for reading and writing. These are known as
the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
output}. These streams are, by default, terminal input and output, but
they are often redirected with the shell, via the @samp{<}, @samp{<<},
@samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error
is used only for writing error messages; the reason we have two separate
streams, standard output and standard error, is so that they can be
redirected separately.
@c @cindex differences between @code{gawk} and @code{awk}
In other implementations of @code{awk}, the only way to write an error
message to standard error in an @code{awk} program is as follows:
@example
print "Serious error detected!\n" | "cat 1>&2"
@end example
@noindent
This works by opening a pipeline to a shell command which can access the
standard error stream which it inherits from the @code{awk} process.
This is far from elegant, and is also inefficient, since it requires a
separate process. So people writing @code{awk} programs have often
neglected to do this. Instead, they have sent the error messages to the
terminal, like this:
@example
NF != 4 @{
printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
@}
@end example
@noindent
This has the same effect most of the time, but not always: although the
standard error stream is usually the terminal, it can be redirected, and
when that happens, writing to the terminal is not correct. In fact, if
@code{awk} is run from a background job, it may not have a terminal at all.
Then opening @file{/dev/tty} will fail.
@code{gawk} provides special file names for accessing the three standard
streams. When you redirect input or output in @code{gawk}, if the file name
matches one of these special names, then @code{gawk} directly uses the
stream it stands for.
@cindex @file{/dev/stdin}
@cindex @file{/dev/stdout}
@cindex @file{/dev/stderr}
@cindex @file{/dev/fd/}
@table @file
@item /dev/stdin
The standard input (file descriptor 0).
@item /dev/stdout
The standard output (file descriptor 1).
@item /dev/stderr
The standard error output (file descriptor 2).
@item /dev/fd/@var{n}
The file associated with file descriptor @var{n}. Such a file must have
been opened by the program initiating the @code{awk} execution (typically
the shell). Unless you take special pains, only descriptors 0, 1 and 2
are available.
@end table
The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
respectively, but they are more self-explanatory.
The proper way to write an error message in a @code{gawk} program
is to use @file{/dev/stderr}, like this:
@example
NF != 4 @{
printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
@}
@end example
Recognition of these special file names is disabled if @code{gawk} is in
compatibility mode (@pxref{Command Line}).
@node One-liners, Patterns, Printing, Top
@chapter Useful ``One-liners''
@cindex one-liners
Useful @code{awk} programs are often short, just a line or two. Here is a
collection of useful, short programs to get you started. Some of these
programs contain constructs that haven't been covered yet. The description
of the program will give you a good idea of what is going on, but please
read the rest of the manual to become an @code{awk} expert!
@table @code
@item awk '@{ num_fields = num_fields + NF @}
@itemx @ @ @ @ @ END @{ print num_fields @}'
This program prints the total number of fields in all input lines.
@item awk 'length($0) > 80'
This program prints every line longer than 80 characters. The sole
rule has a relational expression as its pattern, and has no action (so the
default action, printing the record, is used).
@item awk 'NF > 0'
This program prints every line that has at least one field. This is an
easy way to delete blank lines from a file (or rather, to create a new
file similar to the old file but from which the blank lines have been
deleted).
@item awk '@{ if (NF > 0) print @}'
This program also prints every line that has at least one field. Here we
allow the rule to match every line, then decide in the action whether
to print.
@item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++)
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}'
This program prints 7 random numbers from 0 to 100, inclusive.
@item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}'
This program prints the total number of bytes used by @var{files}.
@item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @}
@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}'
This program prints the maximum line length of @var{file}. The input
is piped through the @code{expand} program to change tabs into spaces,
so the widths compared are actually the right-margin columns.
@end table
@node Patterns, Actions, One-liners, Top
@chapter Patterns
@cindex pattern, definition of
Patterns in @code{awk} control the execution of rules: a rule is
executed when its pattern matches the current input record. This
chapter tells all about how to write patterns.
@menu
* Kinds of Patterns:: A list of all kinds of patterns.
The following subsections describe them in detail.
* Empty:: The empty pattern, which matches every record.
* Regexp:: Regular expressions such as @samp{/foo/}.
* Comparison Patterns:: Comparison expressions such as @code{$1 > 10}.
* Boolean Patterns:: Combining comparison expressions.
* Expression Patterns:: Any expression can be used as a pattern.
* Ranges:: Using pairs of patterns to specify record ranges.
* BEGIN/END:: Specifying initialization and cleanup rules.
@end menu
@node Kinds of Patterns, Empty, Patterns, Patterns
@section Kinds of Patterns
@cindex patterns, types of
Here is a summary of the types of patterns supported in @code{awk}.
@table @code
@item /@var{regular expression}/
A regular expression as a pattern. It matches when the text of the
input record fits the regular expression. (@xref{Regexp, , Regular
Expressions as Patterns}.)
@item @var{expression}
A single expression. It matches when its value, converted to a number,
is nonzero (if a number) or nonnull (if a string). (@xref{Expression
Patterns}.)
@item @var{pat1}, @var{pat2}
A pair of patterns separated by a comma, specifying a range of records.
(@xref{Ranges, , Specifying Record Ranges With Patterns}.)
@item BEGIN
@itemx END
Special patterns to supply start-up or clean-up information to
@code{awk}. (@xref{BEGIN/END}.)
@item @var{null}
The empty pattern matches every input record. (@xref{Empty, , The Empty
Pattern}.)
@end table
@node Empty, Regexp, Kinds of Patterns, Patterns
@section The Empty Pattern
@cindex empty pattern
@cindex pattern, empty
An empty pattern is considered to match @emph{every} input record. For
example, the program:@refill
@example
awk '@{ print $1 @}' BBS-list
@end example
@noindent
prints just the first field of every record.
@node Regexp, Comparison Patterns, Empty, Patterns
@section Regular Expressions as Patterns
@cindex pattern, regular expressions
@cindex regexp
@cindex regular expressions as patterns
A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
class of strings. A regular expression enclosed in slashes (@samp{/})
is an @code{awk} pattern that matches every input record whose text
belongs to that class.
The simplest regular expression is a sequence of letters, numbers, or
both. Such a regexp matches any string that contains that sequence.
Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
Therefore, the pattern @code{/foo/} matches any input record containing
@samp{foo}. Other kinds of regexps let you specify more complicated
classes of strings.
@menu
* Usage: Regexp Usage. How regexps are used in patterns.
* Operators: Regexp Operators. How to write a regexp.
* Case-sensitivity:: How to do case-insensitive matching.
@end menu
@node Regexp Usage, Regexp Operators, Regexp, Regexp
@subsection How to Use Regular Expressions
A regular expression can be used as a pattern by enclosing it in
slashes. Then the regular expression is matched against the entire text
of each record. (Normally, it only needs to match some part of the text
in order to succeed.) For example, this prints the second field of each
record that contains @samp{foo} anywhere:
@example
awk '/foo/ @{ print $2 @}' BBS-list
@end example
@cindex regular expression matching operators
@cindex string-matching operators
@cindex operators, string-matching
@cindex operators, regular expression matching
@cindex regexp search operators
Regular expressions can also be used in comparison expressions. Then
you can specify the string to match against; it need not be the entire
current input record. These comparison expressions can be used as
patterns or in @code{if} and @code{while} statements.
@table @code
@item @var{exp} ~ /@var{regexp}/
This is true if the expression @var{exp} (taken as a character string)
is matched by @var{regexp}. The following example matches, or selects,
all input records with the upper-case letter @samp{J} somewhere in the
first field:@refill
@example
awk '$1 ~ /J/' inventory-shipped
@end example
So does this:
@example
awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
@end example
@item @var{exp} !~ /@var{regexp}/
This is true if the expression @var{exp} (taken as a character string)
is @emph{not} matched by @var{regexp}. The following example matches,
or selects, all input records whose first field @emph{does not} contain
the upper-case letter @samp{J}:@refill
@example
awk '$1 !~ /J/' inventory-shipped
@end example
@end table
@cindex computed regular expressions
@cindex regular expressions, computed
@cindex dynamic regular expressions
The right hand side of a @samp{~} or @samp{!~} operator need not be a
constant regexp (i.e., a string of characters between slashes). It may
be any expression. The expression is evaluated, and converted if
necessary to a string; the contents of the string are used as the
regexp. A regexp that is computed in this way is called a @dfn{dynamic
regexp}. For example:
@example
identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
$0 ~ identifier_regexp
@end example
@noindent
sets @code{identifier_regexp} to a regexp that describes @code{awk}
variable names, and tests if the input record matches this regexp.
@node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp
@subsection Regular Expression Operators
@cindex metacharacters
@cindex regular expression metacharacters
You can combine regular expressions with the following characters,
called @dfn{regular expression operators}, or @dfn{metacharacters}, to
increase the power and versatility of regular expressions.
Here is a table of metacharacters. All characters not listed in the
table stand for themselves.
@table @code
@item ^
This matches the beginning of the string or the beginning of a line
within the string. For example:
@example
^@@chapter
@end example
@noindent
matches the @samp{@@chapter} at the beginning of a string, and can be used
to identify chapter beginnings in Texinfo source files.
@item $
This is similar to @samp{^}, but it matches only at the end of a string
or the end of a line within the string. For example:
@example
p$
@end example
@noindent
matches a record that ends with a @samp{p}.
@item .
This matches any single character except a newline. For example:
@example
.P
@end example
@noindent
matches any single character followed by a @samp{P} in a string. Using
concatenation we can make regular expressions like @samp{U.A}, which
matches any three-character sequence that begins with @samp{U} and ends
with @samp{A}.
@item [@dots{}]
This is called a @dfn{character set}. It matches any one of the
characters that are enclosed in the square brackets. For example:
@example
[MVX]
@end example
@noindent
matches any of the characters @samp{M}, @samp{V}, or @samp{X} in a
string.@refill
Ranges of characters are indicated by using a hyphen between the beginning
and ending characters, and enclosing the whole thing in brackets. For
example:@refill
@example
[0-9]
@end example
@noindent
matches any digit.
To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a
character set, put a @samp{\} in front of it. For example:
@example
[d\]]
@end example
@noindent
matches either @samp{]}, or @samp{d}.@refill
This treatment of @samp{\} is compatible with other @code{awk}
implementations but incompatible with the proposed POSIX specification
for @code{awk}. The current draft specifies the use of the same syntax
used in @code{egrep}.
We may change @code{gawk} to fit the standard, once we are sure it will
no longer change. For the meanwhile, the @samp{-a} option specifies the
traditional @code{awk} syntax described above (which is also the
default), while the @samp{-e} option specifies @code{egrep} syntax.
@xref{Options}.
In @code{egrep} syntax, backslash is not syntactically special within
square brackets. This means that special tricks have to be used to
represent the characters @samp{]}, @samp{-} and @samp{^} as members of a
character set.
To match @samp{-}, write it as @samp{---}, which is a range containing
only @samp{-}. You may also give @samp{-} as the first or last
character in the set. To match @samp{^}, put it anywhere except as the
first character of a set. To match a @samp{]}, make it the first
character in the set. For example:
@example
[]d^]
@end example
@noindent
matches either @samp{]}, @samp{d} or @samp{^}.@refill
@item [^ @dots{}]
This is a @dfn{complemented character set}. The first character after
the @samp{[} @emph{must} be a @samp{^}. It matches any characters
@emph{except} those in the square brackets. For example:
@example
[^0-9]
@end example
@noindent
matches any character that is not a digit.
@item |
This is the @dfn{alternation operator} and it is used to specify
alternatives. For example:
@example
^P|[0-9]
@end example
@noindent
matches any string that matches either @samp{^P} or @samp{[0-9]}. This
means it matches any string that contains a digit or starts with @samp{P}.
The alternation applies to the largest possible regexps on either side.
@item (@dots{})
Parentheses are used for grouping in regular expressions as in
arithmetic. They can be used to concatenate regular expressions
containing the alternation operator, @samp{|}.
@item *
This symbol means that the preceding regular expression is to be
repeated as many times as possible to find a match. For example:
@example
ph*
@end example
@noindent
applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
to one @samp{p} followed by any number of @samp{h}s. This will also match
just @samp{p} if no @samp{h}s are present.
The @samp{*} repeats the @emph{smallest} possible preceding expression.
(Use parentheses if you wish to repeat a larger expression.) It finds
as many repetitions as possible. For example:
@example
awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample
@end example
@noindent
prints every record in the input containing a string of the form
@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill
@item +
This symbol is similar to @samp{*}, but the preceding expression must be
matched at least once. This means that:
@example
wh+y
@end example
@noindent
would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas
@samp{wh*y} would match all three of these strings. This is a simpler
way of writing the last @samp{*} example:
@example
awk '/\(c[ad]+r x\)/ @{ print @}' sample
@end example
@item ?
This symbol is similar to @samp{*}, but the preceding expression can be
matched once or not at all. For example:
@example
fe?d
@end example
@noindent
will match @samp{fed} or @samp{fd}, but nothing else.@refill
@item \
This is used to suppress the special meaning of a character when
matching. For example:
@example
\$
@end example
@noindent
matches the character @samp{$}.
The escape sequences used for string constants (@pxref{Constants}) are
valid in regular expressions as well; they are also introduced by a
@samp{\}.
@end table
In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have
the highest precedence, followed by concatenation, and finally by @samp{|}.
As in arithmetic, parentheses can change how operators are grouped.@refill
@node Case-sensitivity,, Regexp Operators, Regexp
@subsection Case-sensitivity in Matching
Case is normally significant in regular expressions, both when matching
ordinary characters (i.e., not metacharacters), and inside character
sets. Thus a @samp{w} in a regular expression matches only a lower case
@samp{w} and not an upper case @samp{W}.
The simplest way to do a case-independent match is to use a character
set: @samp{[Ww]}. However, this can be cumbersome if you need to use it
often; and it can make the regular expressions harder for humans to
read. There are two other alternatives that you might prefer.
One way to do a case-insensitive match at a particular point in the
program is to convert the data to a single case, using the
@code{tolower} or @code{toupper} built-in string functions (which we
haven't discussed yet; @pxref{String Functions}). For example:
@example
tolower($1) ~ /foo/ @{ @dots{} @}
@end example
@noindent
converts the first field to lower case before matching against it.
Another method is to set the variable @code{IGNORECASE} to a nonzero
value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero,
@emph{all} regexp operations ignore case. Changing the value of
@code{IGNORECASE} dynamically controls the case sensitivity of your
program as it runs. Case is significant by default because
@code{IGNORECASE} (like most variables) is initialized to zero.
@example
x = "aB"
if (x ~ /ab/) @dots{} # this test will fail
IGNORECASE = 1
if (x ~ /ab/) @dots{} # now it will succeed
@end example
You cannot generally use @code{IGNORECASE} to make certain rules
case-insensitive and other rules case-sensitive, because there is no way
to set @code{IGNORECASE} just for the pattern of a particular rule. To
do this, you must use character sets or @code{tolower}. However, one
thing you can do only with @code{IGNORECASE} is turn case-sensitivity on
or off dynamically for all the rules at once.
@code{IGNORECASE} can be set on the command line, or in a @code{BEGIN}
rule. Setting @code{IGNORECASE} from the command line is a way to make
a program case-insensitive without having to edit it.
The value of @code{IGNORECASE} has no effect if @code{gawk} is in
compatibility mode (@pxref{Command Line}). Case is always significant
in compatibility mode.
@node Comparison Patterns, Boolean Patterns, Regexp, Patterns
@section Comparison Expressions as Patterns
@cindex comparison expressions as patterns
@cindex pattern, comparison expressions
@cindex relational operators
@cindex operators, relational
@dfn{Comparison patterns} test relationships such as equality between
two strings or numbers. They are a special case of expression patterns
(@pxref{Expression Patterns}). They are written with @dfn{relational
operators}, which are a superset of those in C. Here is a table of
them:
@table @code
@item @var{x} < @var{y}
True if @var{x} is less than @var{y}.
@item @var{x} <= @var{y}
True if @var{x} is less than or equal to @var{y}.
@item @var{x} > @var{y}
True if @var{x} is greater than @var{y}.
@item @var{x} >= @var{y}
True if @var{x} is greater than or equal to @var{y}.
@item @var{x} == @var{y}
True if @var{x} is equal to @var{y}.
@item @var{x} != @var{y}
True if @var{x} is not equal to @var{y}.
@item @var{x} ~ @var{y}
True if @var{x} matches the regular expression described by @var{y}.
@item @var{x} !~ @var{y}
True if @var{x} does not match the regular expression described by @var{y}.
@end table
The operands of a relational operator are compared as numbers if they
are both numbers. Otherwise they are converted to, and compared as,
strings (@pxref{Conversion}). Strings are compared by comparing the
first character of each, then the second character of each, and so on,
until there is a difference. If the two strings are equal until the
shorter one runs out, the shorter one is considered to be less than the
longer one. Thus, @code{"10"} is less than @code{"9"}.
The left operand of the @samp{~} and @samp{!~} operators is a string.
The right operand is either a constant regular expression enclosed in
slashes (@code{/@var{regexp}/}), or any expression, whose string value
is used as a dynamic regular expression (@pxref{Regexp Usage}).
The following example prints the second field of each input record
whose first field is precisely @samp{foo}.
@example
awk '$1 == "foo" @{ print $2 @}' BBS-list
@end example
@noindent
Contrast this with the following regular expression match, which would
accept any record with a first field that contains @samp{foo}:
@example
awk '$1 ~ "foo" @{ print $2 @}' BBS-list
@end example
@noindent
or, equivalently, this one:
@example
awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
@end example
@node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns
@section Boolean Operators and Patterns
@cindex patterns, boolean
@cindex boolean patterns
A @dfn{boolean pattern} is an expression which combines other patterns
using the @dfn{boolean operators} ``or'' (@samp{||}), ``and''
(@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern
matches an input record depends on whether its subpatterns match.
For example, the following command prints all records in the input file
@file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill
@example
awk '/2400/ && /foo/' BBS-list
@end example
The following command prints all records in the input file
@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or
both.@refill
@example
awk '/2400/ || /foo/' BBS-list
@end example
The following command prints all records in the input file
@file{BBS-list} that do @emph{not} contain the string @samp{foo}.
@example
awk '! /foo/' BBS-list
@end example
Note that boolean patterns are a special case of expression patterns
(@pxref{Expression Patterns}); they are expressions that use the boolean
operators. For complete information on the boolean operators, see
@ref{Boolean Ops}.
The subpatterns of a boolean pattern can be constant regular
expressions, comparisons, or any other @code{gawk} expressions. Range
patterns are not expressions, so they cannot appear inside boolean
patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
which never match any input record, are not expressions and cannot
appear inside boolean patterns.
@node Expression Patterns, Ranges, Boolean Patterns, Patterns
@section Expressions as Patterns
Any @code{awk} expression is valid also as a pattern in @code{gawk}.
Then the pattern ``matches'' if the expression's value is nonzero (if a
number) or nonnull (if a string).
The expression is reevaluated each time the rule is tested against a new
input record. If the expression uses fields such as @code{$1}, the
value depends directly on the new input record's text; otherwise, it
depends only on what has happened so far in the execution of the
@code{awk} program, but that may still be useful.
Comparison patterns are actually a special case of this. For
example, the expression @code{$5 == "foo"} has the value 1 when the
value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this
expression as a pattern matches when the two values are equal.
Boolean patterns are also special cases of expression patterns.
A constant regexp as a pattern is also a special case of an expression
pattern. @code{/foo/} as an expression has the value 1 if @samp{foo}
appears in the current input record; thus, as a pattern, @code{/foo/}
matches any record containing @samp{foo}.
Other implementations of @code{awk} are less general than @code{gawk}:
they allow comparison expressions, and boolean combinations thereof
(optionally with parentheses), but not necessarily other kinds of
expressions.
@node Ranges, BEGIN/END, Expression Patterns, Patterns
@section Specifying Record Ranges With Patterns
@cindex range pattern
@cindex patterns, range
A @dfn{range pattern} is made of two patterns separated by a comma, of
the form @code{@var{begpat}, @var{endpat}}. It matches ranges of
consecutive input records. The first pattern @var{begpat} controls
where the range begins, and the second one @var{endpat} controls where
it ends. For example,@refill
@example
awk '$1 == "on", $1 == "off"'
@end example
@noindent
prints every record between @samp{on}/@samp{off} pairs, inclusive.
In more detail, a range pattern starts out by matching @var{begpat}
against every input record; when a record matches @var{begpat}, the
range pattern becomes @dfn{turned on}. The range pattern matches this
record. As long as it stays turned on, it automatically matches every
input record read. But meanwhile, it also matches @var{endpat} against
every input record, and when that succeeds, the range pattern is turned
off again for the following record. Now it goes back to checking
@var{begpat} against each record.
The record that turns on the range pattern and the one that turns it
off both match the range pattern. If you don't want to operate on
these records, you can write @code{if} statements in the rule's action
to distinguish them.
It is possible for a pattern to be turned both on and off by the same
record, if both conditions are satisfied by that record. Then the action is
executed for just that record.
@node BEGIN/END,, Ranges, Patterns
@section @code{BEGIN} and @code{END} Special Patterns
@cindex @code{BEGIN} special pattern
@cindex patterns, @code{BEGIN}
@cindex @code{END} special pattern
@cindex patterns, @code{END}
@code{BEGIN} and @code{END} are special patterns. They are not used to
match input records. Rather, they are used for supplying start-up or
clean-up information to your @code{awk} script. A @code{BEGIN} rule is
executed, once, before the first input record has been read. An @code{END}
rule is executed, once, after all the input has been read. For
example:@refill
@group
@example
awk 'BEGIN @{ print "Analysis of `foo'" @}
/foo/ @{ ++foobar @}
END @{ print "`foo' appears " foobar " times." @}' BBS-list
@end example
@end group
This program finds out how many times the string @samp{foo} appears in
the input file @file{BBS-list}. The @code{BEGIN} rule prints a title
for the report. There is no need to use the @code{BEGIN} rule to
initialize the counter @code{foobar} to zero, as @code{awk} does this
for us automatically (@pxref{Variables}).
The second rule increments the variable @code{foobar} every time a
record containing the pattern @samp{foo} is read. The @code{END} rule
prints the value of @code{foobar} at the end of the run.@refill
The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
or with boolean operators.
An @code{awk} program may have multiple @code{BEGIN} and/or @code{END}
rules. They are executed in the order they appear, all the @code{BEGIN}
rules at start-up and all the @code{END} rules at termination.
Multiple @code{BEGIN} and @code{END} sections are useful for writing
library functions, since each library can have its own @code{BEGIN} or
@code{END} rule to do its own initialization and/or cleanup. Note that
the order in which library functions are named on the command line
controls the order in which their @code{BEGIN} and @code{END} rules are
executed. Therefore you have to be careful to write such rules in
library files so that it doesn't matter what order they are executed in.
@xref{Command Line}, for more information on using library functions.
If an @code{awk} program only has a @code{BEGIN} rule, and no other
rules, then the program exits after the @code{BEGIN} rule has been run.
(Older versions of @code{awk} used to keep reading and ignoring input
until end of file was seen.) However, if an @code{END} rule exists as
well, then the input will be read, even if there are no other rules in
the program. This is necessary in case the @code{END} rule checks the
@code{NR} variable.
@code{BEGIN} and @code{END} rules must have actions; there is no default
action for these rules since there is no current record when they run.
@node Actions, Expressions, Patterns, Top
@chapter Actions: Overview
@cindex action, definition of
@cindex curly braces
@cindex action, curly braces
@cindex action, separating statements
An @code{awk} @dfn{program} or @dfn{script} consists of a series of
@dfn{rules} and function definitions, interspersed. (Functions are
described later; see @ref{User-defined}.)
A rule contains a pattern and an @dfn{action}, either of which may be
omitted. The purpose of the action is to tell @code{awk} what to do
once a match for the pattern is found. Thus, the entire program
looks somewhat like this:
@example
@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
@dots{}
function @var{name} (@var{args}) @{ @dots{} @}
@dots{}
@end example
An action consists of one or more @code{awk} @dfn{statements}, enclosed
in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one
thing to be done. The statements are separated by newlines or
semicolons.
The curly braces around an action must be used even if the action
contains only one statement, or even if it contains no statements at
all. However, if you omit the action entirely, omit the curly braces as
well. (An omitted action is equivalent to @samp{@{ print $0 @}}.)
Here are the kinds of statement supported in @code{awk}:
@itemize @bullet
@item
Expressions, which can call functions or assign values to variables
(@pxref{Expressions}). Executing this kind of statement simply computes
the value of the expression and then ignores it. This is useful when
the expression has side effects (@pxref{Assignment Ops}).
@item
Control statements, which specify the control flow of @code{awk}
programs. The @code{awk} language gives you C-like constructs
(@code{if}, @code{for}, @code{while}, and so on) as well as a few
special ones (@pxref{Statements}).@refill
@item
Compound statements, which consist of one or more statements enclosed in
curly braces. A compound statement is used in order to put several
statements together in the body of an @code{if}, @code{while}, @code{do}
or @code{for} statement.
@item
Input control, using the @code{getline} function (@pxref{Getline}),
and the @code{next} statement (@pxref{Next Statement}).
@item
Output statements, @code{print} and @code{printf}. @xref{Printing}.
@item
Deletion statements, for deleting array elements. @xref{Delete}.
@end itemize
@iftex
The next two chapters cover in detail expressions and control
statements, respectively. We go on to treat arrays, and built-in
functions, both of which are used in expressions. Then we proceed
to discuss how to define your own functions.
@end iftex
@node Expressions, Statements, Actions, Top
@chapter Actions: Expressions
@cindex expression
Expressions are the basic building block of @code{awk} actions. An
expression evaluates to a value, which you can print, test, store in a
variable or pass to a function.
But, beyond that, an expression can assign a new value to a variable
or a field, with an assignment operator.
An expression can serve as a statement on its own. Most other kinds of
statement contain one or more expressions which specify data to be
operated on. As in other languages, expressions in @code{awk} include
variables, array references, constants, and function calls, as well as
combinations of these with various operators.
@menu
* Constants:: String, numeric, and regexp constants.
* Variables:: Variables give names to values for later use.
* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.)
* Concatenation:: Concatenating strings.
* Comparison Ops:: Comparison of numbers and strings with @samp{<}, etc.
* Boolean Ops:: Combining comparison expressions using boolean operators
@samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not'').
* Assignment Ops:: Changing the value of a variable or a field.
* Increment Ops:: Incrementing the numeric value of a variable.
* Conversion:: The conversion of strings to numbers and vice versa.
* Conditional Exp:: Conditional expressions select between two subexpressions
under control of a third subexpression.
* Function Calls:: A function call is an expression.
* Precedence:: How various operators nest.
@end menu
@node Constants, Variables, Expressions, Expressions
@section Constant Expressions
@cindex constants, types of
@cindex string constants
The simplest type of expression is the @dfn{constant}, which always has
the same value. There are three types of constant: numeric constants,
string constants, and regular expression constants.
@cindex numeric constant
@cindex numeric value
A @dfn{numeric constant} stands for a number. This number can be an
integer, a decimal fraction, or a number in scientific (exponential)
notation. Note that all numeric values are represented within
@code{awk} in double-precision floating point. Here are some examples
of numeric constants, which all have the same value:
@example
105
1.05e+2
1050e-1
@end example
A string constant consists of a sequence of characters enclosed in
double-quote marks. For example:
@example
"parrot"
@end example
@noindent
@c @cindex differences between @code{gawk} and @code{awk}
represents the string whose contents are @samp{parrot}. Strings in
@code{gawk} can be of any length and they can contain all the possible
8-bit ASCII characters including ASCII NUL. Other @code{awk}
implementations may have difficulty with some character codes.@refill
@cindex escape sequence notation
Some characters cannot be included literally in a string constant. You
represent them instead with @dfn{escape sequences}, which are character
sequences beginning with a backslash (@samp{\}).
One use of an escape sequence is to include a double-quote character in
a string constant. Since a plain double-quote would end the string, you
must use @samp{\"} to represent a single double-quote character as a
part of the string. Backslash itself is another character that can't be
included normally; you write @samp{\\} to put one backslash in the
string. Thus, the string whose contents are the two characters
@samp{"\} must be written @code{"\"\\"}.
Another use of backslash is to represent unprintable characters
such as newline. While there is nothing to stop you from writing most
of these characters directly in a string constant, they may look ugly.
Here is a table of all the escape sequences used in @code{awk}:
@table @code
@item \\
Represents a literal backslash, @samp{\}.
@item \a
Represents the ``alert'' character, control-g, ASCII code 7.
@item \b
Represents a backspace, control-h, ASCII code 8.
@item \f
Represents a formfeed, control-l, ASCII code 12.
@item \n
Represents a newline, control-j, ASCII code 10.
@item \r
Represents a carriage return, control-m, ASCII code 13.
@item \t
Represents a horizontal tab, control-i, ASCII code 9.
@item \v
Represents a vertical tab, control-k, ASCII code 11.
@item \@var{nnn}
Represents the octal value @var{nnn}, where @var{nnn} are one to three
digits between 0 and 7. For example, the code for the ASCII ESC
(escape) character is @samp{\033}.@refill
@item \x@var{hh@dots{}}
Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
@samp{a} through @samp{f}). Like the same construct in ANSI C, the escape
sequence continues until the first non-hexadecimal digit is seen. However,
using more than two hexadecimal digits produces undefined results.@refill
@end table
A constant regexp is a regular expression description enclosed in
slashes, such as @code{/^beginning and end$/}. Most regexps used in
@code{awk} programs are constant, but the @samp{~} and @samp{!~}
operators can also match computed or ``dynamic'' regexps (@pxref{Regexp
Usage}).
Constant regexps are useful only with the @samp{~} and @samp{!~} operators;
you cannot assign them to variables or print them. They are not truly
expressions in the usual sense.
@node Variables, Arithmetic Ops, Constants, Expressions
@section Variables
@cindex variables, user-defined
@cindex user-defined variables
Variables let you give names to values and refer to them later. You have
already seen variables in many of the examples. The name of a variable
must be a sequence of letters, digits and underscores, but it may not begin
with a digit. Case is significant in variable names; @code{a} and @code{A}
are distinct variables.
A variable name is a valid expression by itself; it represents the
variable's current value. Variables are given new values with
@dfn{assignment operators} and @dfn{increment operators}.
@xref{Assignment Ops}.
A few variables have special built-in meanings, such as @code{FS}, the
field separator, and @code{NF}, the number of fields in the current
input record. @xref{Built-in Variables}, for a list of them. These
built-in variables can be used and assigned just like all other
variables, but their values are also used or changed automatically by
@code{awk}. Each built-in variable's name is made entirely of upper case
letters.
Variables in @code{awk} can be assigned either numeric values or string
values. By default, variables are initialized to the null string, which
is effectively zero if converted to a number. So there is no need to
``initialize'' each variable explicitly in @code{awk}, the way you would
need to do in C or most other traditional programming languages.
@menu
* Assignment Options:: Setting variables on the command line and a summary
of command line syntax. This is an advanced method
of input.
@end menu
@node Assignment Options,, Variables, Variables
@subsection Assigning Variables on the Command Line
You can set any @code{awk} variable by including a @dfn{variable assignment}
among the arguments on the command line when you invoke @code{awk}
(@pxref{Command Line}). Such an assignment has this form:
@example
@var{variable}=@var{text}
@end example
@noindent
With it, you can set a variable either at the beginning of the
@code{awk} run or in between input files.
If you precede the assignment with the @samp{-v} option, like this:
@example
-v @var{variable}=@var{text}
@end example
@noindent
then the variable is set at the very beginning, before even the
@code{BEGIN} rules are run. The @samp{-v} option and its assignment
must precede all the file name arguments.
Otherwise, the variable assignment is performed at a time determined by
its position among the input file arguments: after the processing of the
preceding input file argument. For example:
@example
awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
@end example
@noindent
prints the value of field number @code{n} for all input records. Before
the first file is read, the command line sets the variable @code{n}
equal to 4. This causes the fourth field to be printed in lines from
the file @file{inventory-shipped}. After the first file has finished,
but before the second file is started, @code{n} is set to 2, so that the
second field is printed in lines from @file{BBS-list}.
Command line arguments are made available for explicit examination by
the @code{awk} program in an array named @code{ARGV} (@pxref{Built-in
Variables}).
@node Arithmetic Ops, Concatenation, Variables, Expressions
@section Arithmetic Operators
@cindex arithmetic operators
@cindex operators, arithmetic
@cindex addition
@cindex subtraction
@cindex multiplication
@cindex division
@cindex remainder
@cindex quotient
@cindex exponentiation
The @code{awk} language uses the common arithmetic operators when
evaluating expressions. All of these arithmetic operators follow normal
precedence rules, and work as you would expect them to. This example
divides field three by field four, adds field two, stores the result
into field one, and prints the resulting altered input record:
@example
awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped
@end example
The arithmetic operators in @code{awk} are:
@table @code
@item @var{x} + @var{y}
Addition.
@item @var{x} - @var{y}
Subtraction.
@item - @var{x}
Negation.
@item @var{x} * @var{y}
Multiplication.
@item @var{x} / @var{y}
Division. Since all numbers in @code{awk} are double-precision
floating point, the result is not rounded to an integer: @code{3 / 4}
has the value 0.75.
@item @var{x} % @var{y}
@c @cindex differences between @code{gawk} and @code{awk}
Remainder. The quotient is rounded toward zero to an integer,
multiplied by @var{y} and this result is subtracted from @var{x}.
This operation is sometimes known as ``trunc-mod''. The following
relation always holds:
@example
b * int(a / b) + (a % b) == a
@end example
One undesirable effect of this definition of remainder is that
@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus,
@example
-17 % 8 = -1
@end example
In other @code{awk} implementations, the signedness of the remainder
may be machine dependent.
@item @var{x} ^ @var{y}
@itemx @var{x} ** @var{y}
Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has
the value 8. The character sequence @samp{**} is equivalent to
@samp{^}.
@end table
@node Concatenation, Comparison Ops, Arithmetic Ops, Expressions
@section String Concatenation
@cindex string operators
@cindex operators, string
@cindex concatenation
There is only one string operation: concatenation. It does not have a
specific operator to represent it. Instead, concatenation is performed by
writing expressions next to one another, with no operator. For example:
@example
awk '@{ print "Field number one: " $1 @}' BBS-list
@end example
@noindent
produces, for the first record in @file{BBS-list}:
@example
Field number one: aardvark
@end example
Without the space in the string constant after the @samp{:}, the line
would run together. For example:
@example
awk '@{ print "Field number one:" $1 @}' BBS-list
@end example
@noindent
produces, for the first record in @file{BBS-list}:
@example
Field number one:aardvark
@end example
Since string concatenation does not have an explicit operator, it is