home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Geek Gadgets 1
/
ADE-1.bin
/
ade-dist
/
gawk-2.15.6-src.tgz
/
tar.out
/
fsf
/
gawk
/
gawk.info-6
(
.txt
)
< prev
next >
Wrap
GNU Info File
|
1996-09-28
|
49KB
|
906 lines
This is Info file gawk.info, produced by Makeinfo-1.55 from the input
file /gnu-src/gawk-2.15.6/gawk.texi.
This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.
This is Edition 0.15 of `The GAWK Manual',
for the 2.15 version of the GNU implementation
of AWK.
Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
File: gawk.info, Node: I/O Functions, Next: Time Functions, Prev: String Functions, Up: Built-in
Built-in Functions for Input/Output
===================================
`close(FILENAME)'
Close the file FILENAME, for input or output. The argument may
alternatively be a shell command that was used for redirecting to
or from a pipe; then the pipe is closed.
*Note Closing Input Files and Pipes: Close Input, regarding closing
input files and pipes. *Note Closing Output Files and Pipes:
Close Output, regarding closing output files and pipes.
`system(COMMAND)'
The system function allows the user to execute operating system
commands and then return to the `awk' program. The `system'
function executes the command given by the string COMMAND. It
returns, as its value, the status returned by the command that was
executed.
For example, if the following fragment of code is put in your `awk'
program:
END {
system("mail -s 'awk run done' operator < /dev/null")
}
the system operator will be sent mail when the `awk' program
finishes processing input and begins its end-of-input processing.
Note that much the same result can be obtained by redirecting
`print' or `printf' into a pipe. However, if your `awk' program
is interactive, `system' is useful for cranking up large
self-contained programs, such as a shell or an editor.
Some operating systems cannot implement the `system' function.
`system' causes a fatal error if it is not supported.
Controlling Output Buffering with `system'
------------------------------------------
Many utility programs will "buffer" their output; they save
information to be written to a disk file or terminal in memory, until
there is enough to be written in one operation. This is often more
efficient than writing every little bit of information as soon as it is
ready. However, sometimes it is necessary to force a program to
"flush" its buffers; that is, write the information to its destination,
even if a buffer is not full. You can do this from your `awk' program
by calling `system' with a null string as its argument:
system("") # flush output
`gawk' treats this use of the `system' function as a special case, and
is smart enough not to run a shell (or other command interpreter) with
the empty command. Therefore, with `gawk', this idiom is not only
useful, it is efficient. While this idiom should work with other `awk'
implementations, it will not necessarily avoid starting an unnecessary
shell.
File: gawk.info, Node: Time Functions, Prev: I/O Functions, Up: Built-in
Functions for Dealing with Time Stamps
======================================
A common use for `awk' programs is the processing of log files. Log
files often contain time stamp information, indicating when a
particular log record was written. Many programs log their time stamp
in the form returned by the `time' system call, which is the number of
seconds since a particular epoch. On POSIX systems, it is the number
of seconds since Midnight, January 1, 1970, UTC.
In order to make it easier to process such log files, and to easily
produce useful reports, `gawk' provides two functions for working with
time stamps. Both of these are `gawk' extensions; they are not
specified in the POSIX standard, nor are they in any other known version
of `awk'.
`systime()'
This function returns the current time as the number of seconds
since the system epoch. On POSIX systems, this is the number of
seconds since Midnight, January 1, 1970, UTC. It may be a
different number on other systems.
`strftime(FORMAT, TIMESTAMP)'
This function returns a string. It is similar to the function of
the same name in the ANSI C standard library. The time specified
by TIMESTAMP is used to produce a string, based on the contents of
the FORMAT string.
The `systime' function allows you to compare a time stamp from a log
file with the current time of day. In particular, it is easy to
determine how long ago a particular record was logged. It also allows
you to produce log records using the "seconds since the epoch" format.
The `strftime' function allows you to easily turn a time stamp into
human-readable information. It is similar in nature to the `sprintf'
function, copying non-format specification characters verbatim to the
returned string, and substituting date and time values for format
specifications in the FORMAT string. If no TIMESTAMP argument is
supplied, `gawk' will use the current time of day as the time stamp.
`strftime' is guaranteed by the ANSI C standard to support the
following date format specifications:
The locale's abbreviated weekday name.
The locale's full weekday name.
The locale's abbreviated month name.
The locale's full month name.
The locale's "appropriate" date and time representation.
The day of the month as a decimal number (01-31).
The hour (24-hour clock) as a decimal number (00-23).
The hour (12-hour clock) as a decimal number (01-12).
The day of the year as a decimal number (001-366).
The month as a decimal number (01-12).
The minute as a decimal number (00-59).
The locale's equivalent of the AM/PM designations associated with
a 12-hour clock.
The second as a decimal number (00-61). (Occasionally there are
minutes in a year with one or two leap seconds, which is why the
seconds can go from 0 all the way to 61.)
The week number of the year (the first Sunday as the first day of
week 1) as a decimal number (00-53).
The weekday as a decimal number (0-6). Sunday is day 0.
The week number of the year (the first Monday as the first day of
week 1) as a decimal number (00-53).
The locale's "appropriate" date representation.
The locale's "appropriate" time representation.
The year without century as a decimal number (00-99).
The year with century as a decimal number.
The time zone name or abbreviation, or no characters if no time
zone is determinable.
A literal `%'.
If a conversion specifier is not one of the above, the behavior is
undefined. (This is because the ANSI standard for C leaves the
behavior of the C version of `strftime' undefined, and `gawk' will use
the system's version of `strftime' if it's there. Typically, the
conversion specifier will either not appear in the returned string, or
it will appear literally.)
Informally, a "locale" is the geographic place in which a program is
meant to run. For example, a common way to abbreviate the date
September 4, 1991 in the United States would be "9/4/91". In many
countries in Europe, however, it would be abbreviated "4.9.91". Thus,
the `%x' specification in a `"US"' locale might produce `9/4/91', while
in a `"EUROPE"' locale, it might produce `4.9.91'. The ANSI C standard
defines a default `"C"' locale, which is an environment that is typical
of what most C programmers are used to.
A public-domain C version of `strftime' is shipped with `gawk' for
systems that are not yet fully ANSI-compliant. If that version is used
to compile `gawk' (*note Installing `gawk': Installation.), then the
following additional format specifications are available:
Equivalent to specifying `%m/%d/%y'.
The day of the month, padded with a blank if it is only one digit.
Equivalent to `%b', above.
A newline character (ASCII LF).
Equivalent to specifying `%I:%M:%S %p'.
Equivalent to specifying `%H:%M'.
Equivalent to specifying `%H:%M:%S'.
A TAB character.
is replaced by the hour (24-hour clock) as a decimal number (0-23).
Single digit numbers are padded with a blank.
is replaced by the hour (12-hour clock) as a decimal number (1-12).
Single digit numbers are padded with a blank.
The century, as a number between 00 and 99.
is replaced by the weekday as a decimal number [1 (Monday)-7].
is replaced by the week number of the year (the first Monday as
the first day of week 1) as a decimal number (01-53). The method
for determining the week number is as specified by ISO 8601 (to
wit: if the week containing January 1 has four or more days in the
new year, then it is week 1, otherwise it is week 53 of the
previous year and the next week is week 1).
`%Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI'
`%Om %OM %OS %Ou %OU %OV %Ow %OW %Oy'
These are "alternate representations" for the specifications that
use only the second letter (`%c', `%C', and so on). They are
recognized, but their normal representations are used. (These
facilitate compliance with the POSIX `date' utility.)
The date in VMS format (e.g. 20-JUN-1991).
Here are two examples that use `strftime'. The first is an `awk'
version of the C `ctime' function. (This is a user defined function,
which we have not discussed yet. *Note User-defined Functions:
User-defined, for more information.)
# ctime.awk
#
# awk version of C ctime(3) function
function ctime(ts, format)
{
format = "%a %b %e %H:%M:%S %Z %Y"
if (ts == 0)
ts = systime() # use current time as default
return strftime(format, ts)
}
This next example is an `awk' implementation of the POSIX `date'
utility. Normally, the `date' utility prints the current date and time
of day in a well known format. However, if you provide an argument to
it that begins with a `+', `date' will copy non-format specifier
characters to the standard output, and will interpret the current time
according to the format specifiers in the string. For example:
date '+Today is %A, %B %d, %Y.'
might print
Today is Thursday, July 11, 1991.
Here is the `awk' version of the `date' utility.
#! /bin/gawk -f
#
# date --- implement the P1003.2 Draft 11 'date' command
#
# Bug: does not recognize the -u argument.
BEGIN \
{
format = "%a %b %e %H:%M:%S %Z %Y"
exitval = 0
if (ARGC > 2)
exitval = 1
else if (ARGC == 2) {
format = ARGV[1]
if (format ~ /^\+/)
format = substr(format, 2) # remove leading +
}
print strftime(format)
exit exitval
}
File: gawk.info, Node: User-defined, Next: Built-in Variables, Prev: Built-in, Up: Top
User-defined Functions
**********************
Complicated `awk' programs can often be simplified by defining your
own functions. User-defined functions can be called just like built-in
ones (*note Function Calls::.), but it is up to you to define them--to
tell `awk' what they should do.
* Menu:
* Definition Syntax:: How to write definitions and what they mean.
* Function Example:: An example function definition and
what it does.
* Function Caveats:: Things to watch out for.
* Return Statement:: Specifying the value a function returns.
File: gawk.info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined
Syntax of Function Definitions
==============================
Definitions of functions can appear anywhere between the rules of the
`awk' program. Thus, the general form of an `awk' program is extended
to include sequences of rules *and* user-defined function definitions.
The definition of a function named NAME looks like this:
function NAME (PARAMETER-LIST) {
BODY-OF-FUNCTION
}
NAME is the name of the function to be defined. A valid function name
is like a valid variable name: a sequence of letters, digits and
underscores, not starting with a digit. Functions share the same pool
of names as variables and arrays.
PARAMETER-LIST is a list of the function's arguments and local
variable names, separated by commas. When the function is called, the
argument names are used to hold the argument values given in the call.
The local variables are initialized to the null string.
The BODY-OF-FUNCTION consists of `awk' statements. It is the most
important part of the definition, because it says what the function
should actually *do*. The argument names exist to give the body a way
to talk about the arguments; local variables, to give the body places
to keep temporary values.
Argument names are not distinguished syntactically from local
variable names; instead, the number of arguments supplied when the
function is called determines how many argument variables there are.
Thus, if three argument values are given, the first three names in
PARAMETER-LIST are arguments, and the rest are local variables.
It follows that if the number of arguments is not the same in all
calls to the function, some of the names in PARAMETER-LIST may be
arguments on some occasions and local variables on others. Another way
to think of this is that omitted arguments default to the null string.
Usually when you write a function you know how many names you intend
to use for arguments and how many you intend to use as locals. By
convention, you should write an extra space between the arguments and
the locals, so other people can follow how your function is supposed to
be used.
During execution of the function body, the arguments and local
variable values hide or "shadow" any variables of the same names used
in the rest of the program. The shadowed variables are not accessible
in the function definition, because there is no way to name them while
their names have been taken away for the local variables. All other
variables used in the `awk' program can be referenced or set normally
in the function definition.
The arguments and local variables last only as long as the function
body is executing. Once the body finishes, the shadowed variables come
back.
The function body can contain expressions which call functions. They
can even call this function, either directly or by way of another
function. When this happens, we say the function is "recursive".
There is no need in `awk' to put the definition of a function before
all uses of the function. This is because `awk' reads the entire
program before starting to execute any of it.
In many `awk' implementations, the keyword `function' may be
abbreviated `func'. However, POSIX only specifies the use of the
keyword `function'. This actually has some practical implications. If
`gawk' is in POSIX-compatibility mode (*note Invoking `awk': Command
Line.), then the following statement will *not* define a function:
func foo() { a = sqrt($1) ; print a }
Instead it defines a rule that, for each record, concatenates the value
of the variable `func' with the return value of the function `foo', and
based on the truth value of the result, executes the corresponding
action. This is probably not what was desired. (`awk' accepts this
input as syntactically valid, since functions may be used before they
are defined in `awk' programs.)
File: gawk.info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
Function Definition Example
===========================
Here is an example of a user-defined function, called `myprint', that
takes a number and prints it in a specific format.
function myprint(num)
{
printf "%6.3g\n", num
}
To illustrate, here is an `awk' rule which uses our `myprint' function:
$3 > 0 { myprint($3) }
This program prints, in our special format, all the third fields that
contain a positive number in our input. Therefore, when given:
1.2 3.4 5.6 7.8
9.10 11.12 -13.14 15.16
17.18 19.20 21.22 23.24
this program, using our function to format the results, prints:
5.6
21.2
Here is a rather contrived example of a recursive function. It
prints a string backwards:
function rev (str, len) {
if (len == 0) {
printf "\n"
return
}
printf "%c", substr(str, len, 1)
rev(str, len - 1)
}
File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
Calling User-defined Functions
==============================
"Calling a function" means causing the function to run and do its
job. A function call is an expression, and its value is the value
returned by the function.
A function call consists of the function name followed by the
arguments in parentheses. What you write in the call for the arguments
are `awk' expressions; each time the call is executed, these
expressions are evaluated, and the values are the actual arguments. For
example, here is a call to `foo' with three arguments (the first being
a string concatenation):
foo(x y, "lose", 4 * z)
*Caution:* whitespace characters (spaces and tabs) are not allowed
between the function name and the open-parenthesis of the argument
list. If you write whitespace by mistake, `awk' might think that
you mean to concatenate a variable with an expression in
parentheses. However, it notices that you used a function name
and not a variable name, and reports an error.
When a function is called, it is given a *copy* of the values of its
arguments. This is called "call by value". The caller may use a
variable as the expression for the argument, but the called function
does not know this: it only knows what value the argument had. For
example, if you write this code:
foo = "bar"
z = myfunc(foo)
then you should not think of the argument to `myfunc' as being "the
variable `foo'." Instead, think of the argument as the string value,
`"bar"'.
If the function `myfunc' alters the values of its local variables,
this has no effect on any other variables. In particular, if `myfunc'
does this:
function myfunc (win) {
print win
win = "zzz"
print win
}
to change its first argument variable `win', this *does not* change the
value of `foo' in the caller. The role of `foo' in calling `myfunc'
ended when its value, `"bar"', was computed. If `win' also exists
outside of `myfunc', the function body cannot alter this outer value,
because it is shadowed during the execution of `myfunc' and cannot be
seen or changed from there.
However, when arrays are the parameters to functions, they are *not*
copied. Instead, the array itself is made available for direct
manipulation by the function. This is usually called "call by
reference". Changes made to an array parameter inside the body of a
function *are* visible outside that function. This can be *very*
dangerous if you do not watch what you are doing. For example:
function changeit (array, ind, nvalue) {
array[ind] = nvalue
}
BEGIN {
a[1] = 1 ; a[2] = 2 ; a[3] = 3
changeit(a, 2, "two")
printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
}
prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit'
stores `"two"' in the second element of `a'.
File: gawk.info, Node: Return Statement, Prev: Function Caveats, Up: User-defined
The `return' Statement
======================
The body of a user-defined function can contain a `return' statement.
This statement returns control to the rest of the `awk' program. It
can also be used to return a value for use in the rest of the `awk'
program. It looks like this:
return EXPRESSION
The EXPRESSION part is optional. If it is omitted, then the returned
value is undefined and, therefore, unpredictable.
A `return' statement with no value expression is assumed at the end
of every function definition. So if control reaches the end of the
function body, then the function returns an unpredictable value. `awk'
will not warn you if you use the return value of such a function; you
will simply get unpredictable or unexpected results.
Here is an example of a user-defined function that returns a value
for the largest number among the elements of an array:
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
You call `maxelt' with one argument, which is an array name. The local
variables `i' and `ret' are not intended to be arguments; while there
is nothing to stop you from passing two or three arguments to `maxelt',
the results would be strange. The extra space before `i' in the
function parameter list is to indicate that `i' and `ret' are not
supposed to be arguments. This is a convention which you should follow
when you define functions.
Here is a program that uses our `maxelt' function. It loads an
array, calls `maxelt', and then reports the maximum number in that
array:
awk '
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
for(i = 1; i <= NF; i++)
nums[NR, i] = $i
}
END {
print maxelt(nums)
}'
Given the following input:
1 5 23 8 16
44 3 5 2 8 26
256 291 1396 2962 100
-6 467 998 1101
99385 11 0 225
our program tells us (predictably) that:
99385
is the largest number in our array.
File: gawk.info, Node: Built-in Variables, Next: Command Line, Prev: User-defined, Up: Top
Built-in Variables
******************
Most `awk' variables are available for you to use for your own
purposes; they never change except when your program assigns values to
them, and never affect anything except when your program examines them.
A few variables have special built-in meanings. Some of them `awk'
examines automatically, so that they enable you to tell `awk' how to do
certain things. Others are set automatically by `awk', so that they
carry information from the internal workings of `awk' to your program.
This chapter documents all the built-in variables of `gawk'. Most
of them are also documented in the chapters where their areas of
activity are described.
* Menu:
* User-modified:: Built-in variables that you change
to control `awk'.
* Auto-set:: Built-in variables where `awk'
gives you information.
File: gawk.info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables
Built-in Variables that Control `awk'
=====================================
This is a list of the variables which you can change to control how
`awk' does certain things.
`CONVFMT'
This string is used by `awk' to control conversion of numbers to
strings (*note Conversion of Strings and Numbers: Conversion.).
It works by being passed, in effect, as the first argument to the
`sprintf' function. Its default value is `"%.6g"'. `CONVFMT' was
introduced by the POSIX standard.
`FIELDWIDTHS'
This is a space separated list of columns that tells `gawk' how to
manage input with fixed, columnar boundaries. It is an
experimental feature that is still evolving. Assigning to
`FIELDWIDTHS' overrides the use of `FS' for field splitting.
*Note Reading Fixed-width Data: Constant Size, for more
information.
If `gawk' is in compatibility mode (*note Invoking `awk': Command
Line.), then `FIELDWIDTHS' has no special meaning, and field
splitting operations are done based exclusively on the value of
`FS'.
`FS' is the input field separator (*note Specifying how Fields are
Separated: Field Separators.). The value is a single-character
string or a multi-character regular expression that matches the
separations between fields in an input record.
The default value is `" "', a string consisting of a single space.
As a special exception, this value actually means that any
sequence of spaces and tabs is a single separator. It also causes
spaces and tabs at the beginning or end of a line to be ignored.
You can set the value of `FS' on the command line using the `-F'
option:
awk -F, 'PROGRAM' INPUT-FILES
If `gawk' is using `FIELDWIDTHS' for field-splitting, assigning a
value to `FS' will cause `gawk' to return to the normal,
regexp-based, field splitting.
`IGNORECASE'
If `IGNORECASE' is nonzero, then *all* regular expression matching
is done in a case-independent fashion. In particular, regexp
matching with `~' and `!~', and the `gsub' `index', `match',
`split' and `sub' functions all ignore case when doing their
particular regexp operations. *Note:* since field splitting with
the value of the `FS' variable is also a regular expression
operation, that too is done with case ignored. *Note
Case-sensitivity in Matching: Case-sensitivity.
If `gawk' is in compatibility mode (*note Invoking `awk': Command
Line.), then `IGNORECASE' has no special meaning, and regexp
operations are always case-sensitive.
`OFMT'
This string is used by `awk' to control conversion of numbers to
strings (*note Conversion of Strings and Numbers: Conversion.) for
printing with the `print' statement. It works by being passed, in
effect, as the first argument to the `sprintf' function. Its
default value is `"%.6g"'. Earlier versions of `awk' also used
`OFMT' to specify the format for converting numbers to strings in
general expressions; this has been taken over by `CONVFMT'.
`OFS'
This is the output field separator (*note Output Separators::.).
It is output between the fields output by a `print' statement. Its
default value is `" "', a string consisting of a single space.
`ORS'
This is the output record separator. It is output at the end of
every `print' statement. Its default value is a string containing
a single newline character, which could be written as `"\n"'.
(*Note Output Separators::.)
This is `awk''s input record separator. Its default value is a
string containing a single newline character, which means that an
input record consists of a single line of text. (*Note How Input
is Split into Records: Records.)
`SUBSEP'
`SUBSEP' is the subscript separator. It has the default value of
`"\034"', and is used to separate the parts of the name of a
multi-dimensional array. Thus, if you access `foo[12,3]', it
really accesses `foo["12\0343"]' (*note Multi-dimensional Arrays:
Multi-dimensional.).
File: gawk.info, Node: Auto-set, Prev: User-modified, Up: Built-in Variables
Built-in Variables that Convey Information
==========================================
This is a list of the variables that are set automatically by `awk'
on certain occasions so as to provide information to your program.
`ARGC'
`ARGV'
The command-line arguments available to `awk' programs are stored
in an array called `ARGV'. `ARGC' is the number of command-line
arguments present. *Note Invoking `awk': Command Line. `ARGV' is
indexed from zero to `ARGC - 1'. For example:
awk 'BEGIN {
for (i = 0; i < ARGC; i++)
print ARGV[i]
}' inventory-shipped BBS-list
In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
`"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The
value of `ARGC' is 3, one more than the index of the last element
in `ARGV' since the elements are numbered from zero.
The names `ARGC' and `ARGV', as well the convention of indexing
the array from 0 to `ARGC - 1', are derived from the C language's
method of accessing command line arguments.
Notice that the `awk' program is not entered in `ARGV'. The other
special command line options, with their arguments, are also not
entered. But variable assignments on the command line *are*
treated as arguments, and do show up in the `ARGV' array.
Your program can alter `ARGC' and the elements of `ARGV'. Each
time `awk' reaches the end of an input file, it uses the next
element of `ARGV' as the name of the next input file. By storing a
different string there, your program can change which files are
read. You can use `"-"' to represent the standard input. By
storing additional elements and incrementing `ARGC' you can cause
additional files to be read.
If you decrease the value of `ARGC', that eliminates input files
from the end of the list. By recording the old value of `ARGC'
elsewhere, your program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null
string (`""') into `ARGV' in place of the file's name. As a
special feature, `awk' ignores file names that have been replaced
with the null string.
`ARGIND'
The index in `ARGV' of the current file being processed. Every
time `gawk' opens a new data file for processing, it sets `ARGIND'
to the index in `ARGV' of the file name. Thus, the condition
`FILENAME == ARGV[ARGIND]' is always true.
This variable is useful in file processing; it allows you to tell
how far along you are in the list of data files, and to
distinguish between multiple successive instances of the same
filename on the command line.
While you can change the value of `ARGIND' within your `awk'
program, `gawk' will automatically set it to a new value when the
next file is opened.
This variable is a `gawk' extension; in other `awk' implementations
it is not special.
`ENVIRON'
This is an array that contains the values of the environment. The
array indices are the environment variable names; the values are
the values of the particular environment variables. For example,
`ENVIRON["HOME"]' might be `/u/close'. Changing this array does
not affect the environment passed on to any programs that `awk'
may spawn via redirection or the `system' function. (In a future
version of `gawk', it may do so.)
Some operating systems may not have environment variables. On
such systems, the array `ENVIRON' is empty.
`ERRNO'
If a system error occurs either doing a redirection for `getline',
during a read for `getline', or during a `close' operation, then
`ERRNO' will contain a string describing the error.
This variable is a `gawk' extension; in other `awk' implementations
it is not special.
`FILENAME'
This is the name of the file that `awk' is currently reading. If
`awk' is reading from the standard input (in other words, there
are no files listed on the command line), `FILENAME' is set to
`"-"'. `FILENAME' is changed each time a new file is read (*note
Reading Input Files: Reading Files.).
`FNR'
`FNR' is the current record number in the current file. `FNR' is
incremented each time a new record is read (*note Explicit Input
with `getline': Getline.). It is reinitialized to 0 each time a
new input file is started.
`NF' is the number of fields in the current input record. `NF' is
set each time a new record is read, when a new field is created,
or when `$0' changes (*note Examining Fields: Fields.).
This is the number of input records `awk' has processed since the
beginning of the program's execution. (*note How Input is Split
into Records: Records.). `NR' is set each time a new record is
read.
`RLENGTH'
`RLENGTH' is the length of the substring matched by the `match'
function (*note Built-in Functions for String Manipulation: String
Functions.). `RLENGTH' is set by invoking the `match' function.
Its value is the length of the matched string, or -1 if no match
was found.
`RSTART'
`RSTART' is the start-index in characters of the substring matched
by the `match' function (*note Built-in Functions for String
Manipulation: String Functions.). `RSTART' is set by invoking the
`match' function. Its value is the position of the string where
the matched substring starts, or 0 if no match was found.
File: gawk.info, Node: Command Line, Next: Language History, Prev: Built-in Variables, Up: Top
Invoking `awk'
**************
There are two ways to run `awk': with an explicit program, or with
one or more program files. Here are templates for both of them; items
enclosed in `[...]' in these templates are optional.
Besides traditional one-letter POSIX-style options, `gawk' also
supports GNU long named options.
awk [POSIX OR GNU STYLE OPTIONS] -f progfile [`--'] FILE ...
awk [POSIX OR GNU STYLE OPTIONS] [`--'] 'PROGRAM' FILE ...
* Menu:
* Options:: Command line options and their meanings.
* Other Arguments:: Input file names and variable assignments.
* AWKPATH Variable:: Searching directories for `awk' programs.
* Obsolete:: Obsolete Options and/or features.
* Undocumented:: Undocumented Options and Features.
File: gawk.info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Command Line
Command Line Options
====================
Options begin with a minus sign, and consist of a single character.
GNU style long named options consist of two minus signs and a keyword
that can be abbreviated if the abbreviation allows the option to be
uniquely identified. If the option takes an argument, then the keyword
is immediately followed by an equals sign (`=') and the argument's
value. For brevity, the discussion below only refers to the
traditional short options; however the long and short options are
interchangeable in all contexts.
Each long named option for `gawk' has a corresponding POSIX-style
option. The options and their meanings are as follows:
`-F FS'
`--field-separator=FS'
Sets the `FS' variable to FS (*note Specifying how Fields are
Separated: Field Separators.).
`-f SOURCE-FILE'
`--file=SOURCE-FILE'
Indicates that the `awk' program is to be found in SOURCE-FILE
instead of in the first non-option argument.
`-v VAR=VAL'
`--assign=VAR=VAL'
Sets the variable VAR to the value VAL *before* execution of the
program begins. Such variable values are available inside the
`BEGIN' rule (see below for a fuller explanation).
The `-v' option can only set one variable, but you can use it more
than once, setting another variable each time, like this:
`-v foo=1 -v bar=2'.
`-W GAWK-OPT'
Following the POSIX standard, options that are implementation
specific are supplied as arguments to the `-W' option. With
`gawk', these arguments may be separated by commas, or quoted and
separated by whitespace. Case is ignored when processing these
options. These options also have corresponding GNU style long
named options. The following `gawk'-specific options are
available:
`-W compat'
`--compat'
Specifies "compatibility mode", in which the GNU extensions in
`gawk' are disabled, so that `gawk' behaves just like Unix
`awk'. *Note Extensions in `gawk' not in POSIX `awk':
POSIX/GNU, which summarizes the extensions. Also see *Note
Downward Compatibility and Debugging: Compatibility Mode.
`-W copyleft'
`-W copyright'
`--copyleft'
`--copyright'
Print the short version of the General Public License. This
option may disappear in a future version of `gawk'.
`-W help'
`-W usage'
`--help'
`--usage'
Print a "usage" message summarizing the short and long style
options that `gawk' accepts, and then exit.
`-W lint'
`--lint'
Provide warnings about constructs that are dubious or
non-portable to other `awk' implementations. Some warnings
are issued when `gawk' first reads your program. Others are
issued at run-time, as your program executes.
`-W posix'
`--posix'
Operate in strict POSIX mode. This disables all `gawk'
extensions (just like `-W compat'), and adds the following
additional restrictions:
* `\x' escape sequences are not recognized (*note Constant
Expressions: Constants.).
* The synonym `func' for the keyword `function' is not
recognized (*note Syntax of Function Definitions:
Definition Syntax.).
* The operators `**' and `**=' cannot be used in place of
`^' and `^=' (*note Arithmetic Operators: Arithmetic
Ops., and also *note Assignment Expressions: Assignment
Ops.).
* Specifying `-Ft' on the command line does not set the
value of `FS' to be a single tab character (*note
Specifying how Fields are Separated: Field Separators.).
Although you can supply both `-W compat' and `-W posix' on the
command line, `-W posix' will take precedence.
`-W source=PROGRAM-TEXT'
`--source=PROGRAM-TEXT'
Program source code is taken from the PROGRAM-TEXT. This
option allows you to mix `awk' source code in files with
program source code that you would enter on the command line.
This is particularly useful when you have library functions
that you wish to use from your command line programs (*note
The `AWKPATH' Environment Variable: AWKPATH Variable.).
`-W version'
`--version'
Prints version information for this particular copy of `gawk'.
This is so you can determine if your copy of `gawk' is up to
date with respect to whatever the Free Software Foundation is
currently distributing. This option may disappear in a
future version of `gawk'.
Signals the end of the command line options. The following
arguments are not treated as options even if they begin with `-'.
This interpretation of `--' follows the POSIX argument parsing
conventions.
This is useful if you have file names that start with `-', or in
shell scripts, if you have file names that will be specified by
the user which could start with `-'.
Any other options are flagged as invalid with a warning message, but
are otherwise ignored.
In compatibility mode, as a special case, if the value of FS supplied
to the `-F' option is `t', then `FS' is set to the tab character
(`"\t"'). This is only true for `-W compat', and not for `-W posix'
(*note Specifying how Fields are Separated: Field Separators.).
If the `-f' option is *not* used, then the first non-option command
line argument is expected to be the program text.
The `-f' option may be used more than once on the command line. If
it is, `awk' reads its program source from all of the named files, as
if they had been concatenated together into one big file. This is
useful for creating libraries of `awk' functions. Useful functions can
be written once, and then retrieved from a standard place, instead of
having to be included into each individual program. You can still type
in a program at the terminal and use library functions, by specifying
`-f /dev/tty'. `awk' will read a file from the terminal to use as part
of the `awk' program. After typing your program, type `Control-d' (the
end-of-file character) to terminate it. (You may also use `-f -' to
read program source from the standard input, but then you will not be
able to also use the standard input as a source of data.)
Because it is clumsy using the standard `awk' mechanisms to mix
source file and command line `awk' programs, `gawk' provides the
`--source' option. This does not require you to pre-empt the standard
input for your source code, and allows you to easily mix command line
and library source code (*note The `AWKPATH' Environment Variable:
AWKPATH Variable.).
If no `-f' or `--source' option is specified, then `gawk' will use
the first non-option command line argument as the text of the program
source code.
File: gawk.info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Command Line
Other Command Line Arguments
============================
Any additional arguments on the command line are normally treated as
input files to be processed in the order specified. However, an
argument that has the form `VAR=VALUE', means to assign the value VALUE
to the variable VAR--it does not specify a file at all.
All these arguments are made available to your `awk' program in the
`ARGV' array (*note Built-in Variables::.). Command line options and
the program text (if present) are omitted from the `ARGV' array. All
other arguments, including variable assignments, are included.
The distinction between file name arguments and variable-assignment
arguments is made when `awk' is about to open the next input file. At
that point in execution, it checks the "file name" to see whether it is
really a variable assignment; if so, `awk' sets the variable instead of
reading a file.
Therefore, the variables actually receive the specified values after
all previously specified files have been read. In particular, the
values of variables assigned in this fashion are *not* available inside
a `BEGIN' rule (*note `BEGIN' and `END' Special Patterns: BEGIN/END.),
since such rules are run before `awk' begins scanning the argument list.
The values given on the command line are processed for escape sequences
(*note Constant Expressions: Constants.).
In some earlier implementations of `awk', when a variable assignment
occurred before any file names, the assignment would happen *before*
the `BEGIN' rule was executed. Some applications came to depend upon
this "feature." When `awk' was changed to be more consistent, the `-v'
option was added to accommodate applications that depended upon this
old behavior.
The variable assignment feature is most useful for assigning to
variables such as `RS', `OFS', and `ORS', which control input and
output formats, before scanning the data files. It is also useful for
controlling state if multiple passes are needed over a data file. For
example:
awk 'pass == 1 { PASS 1 STUFF }
pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
Given the variable assignment feature, the `-F' option is not
strictly necessary. It remains for historical compatibility.
File: gawk.info, Node: AWKPATH Variable, Next: Obsolete, Prev: Other Arguments, Up: Command Line
The `AWKPATH' Environment Variable
==================================
The previous section described how `awk' program files can be named
on the command line with the `-f' option. In some `awk'
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
But in `gawk', if the file name supplied in the `-f' option does not
contain a `/', then `gawk' searches a list of directories (called the
"search path"), one by one, looking for a file with the specified name.
The search path is actually a string consisting of directory names
separated by colons. `gawk' gets its search path from the `AWKPATH'
environment variable. If that variable does not exist, `gawk' uses the
default path, which is `.:/local/lib/awk:/ade/lib/awk'. (Programs
written by system administrators should use an `AWKPATH' variable that
does not include the current directory, `.'.)
The search path feature is particularly useful for building up
libraries of useful `awk' functions. The library files can be placed
in a standard directory that is in the default path, and then specified
on the command line with a short file name. Otherwise, the full file
name would have to be typed for each file.
By combining the `--source' and `-f' options, your command line
`awk' programs can use facilities in `awk' library files.
Path searching is not done if `gawk' is in compatibility mode. This
is true for both `-W compat' and `-W posix'. *Note Command Line
Options: Options.
*Note:* if you want files in the current directory to be found, you
must include the current directory in the path, either by writing `.'
as an entry in the path, or by writing a null entry in the path. (A
null entry is indicated by starting or ending the path with a colon, or
by placing two colons next to each other (`::').) If the current
directory is not included in the path, then files cannot be found in
the current directory. This path search mechanism is identical to the
shell's.
File: gawk.info, Node: Obsolete, Next: Undocumented, Prev: AWKPATH Variable, Up: Command Line
Obsolete Options and/or Features
================================
This section describes features and/or command line options from the
previous release of `gawk' that are either not available in the current
version, or that are still supported but deprecated (meaning that they
will *not* be in the next release).
For version 2.15 of `gawk', the following command line options from
version 2.11.1 are no longer recognized.
Use `-W compat' instead.
Use `-W version' instead.
Use `-W copyright' instead.
These options produce an "unrecognized option" error message but
have no effect on the execution of `gawk'. The POSIX standard now
specifies traditional `awk' regular expressions for the `awk'
utility.
The public-domain version of `strftime' that is distributed with
`gawk' changed for the 2.14 release. The `%V' conversion specifier
that used to generate the date in VMS format was changed to `%v'. This
is because the POSIX standard for the `date' utility now specifies a
`%V' conversion specifier. *Note Functions for Dealing with Time
Stamps: Time Functions, for details.
File: gawk.info, Node: Undocumented, Prev: Obsolete, Up: Command Line
Undocumented Options and Features
=================================
This section intentionally left blank.
File: gawk.info, Node: Language History, Next: Installation, Prev: Command Line, Up: Top
The Evolution of the `awk' Language
***********************************
This manual describes the GNU implementation of `awk', which is
patterned after the POSIX specification. Many `awk' users are only
familiar with the original `awk' implementation in Version 7 Unix,
which is also the basis for the version in Berkeley Unix (through
4.3-Reno). This chapter briefly describes the evolution of the `awk'
language.
* Menu:
* V7/S5R3.1:: The major changes between V7 and
System V Release 3.1.
* S5R4:: Minor changes between System V
Releases 3.1 and 4.
* POSIX:: New features from the POSIX standard.
* POSIX/GNU:: The extensions in `gawk'
not in POSIX `awk'.