home *** CD-ROM | disk | FTP | other *** search
-
-
-
- REGEXP(3) C LIBRARY FUNCTIONS REGEXP(3)
-
-
-
- NAME
- regcomp, regexec, regsub, regerror - regular expression
- handler
-
- SYNOPSIS
- #include <regexp.h>
-
- regexp *regcomp(exp)
- char *exp;
-
- int regexec(prog, string)
- regexp *prog;
- char *string;
-
- regsub(prog, source, dest)
- regexp *prog;
- char *source;
- char *dest;
-
- regerror(msg)
- char *msg;
-
- DESCRIPTION
- These functions implement _e_g_r_e_p(1)-style regular expressions
- and supporting facilities.
-
- _R_e_g_c_o_m_p compiles a regular expression into a structure of
- type _r_e_g_e_x_p, and returns a pointer to it. The space has
- been allocated using _m_a_l_l_o_c(3) and may be released by _f_r_e_e.
-
- _R_e_g_e_x_e_c matches a NUL-terminated _s_t_r_i_n_g against the compiled
- regular expression in _p_r_o_g. It returns 1 for success and 0
- for failure, and adjusts the contents of _p_r_o_g's _s_t_a_r_t_p and
- _e_n_d_p (see below) accordingly.
-
- The members of a _r_e_g_e_x_p structure include at least the fol-
- lowing (not necessarily in order):
-
- char *startp[NSUBEXP];
- char *endp[NSUBEXP];
-
- where _N_S_U_B_E_X_P is defined (as 10) in the header file. Once a
- successful _r_e_g_e_x_e_c has been done using the _r_e_g_e_x_p, each
- _s_t_a_r_t_p-_e_n_d_p pair describes one substring within the _s_t_r_i_n_g,
- with the _s_t_a_r_t_p pointing to the first character of the sub-
- string and the _e_n_d_p pointing to the first character follow-
- ing the substring. The 0th substring is the substring of
- _s_t_r_i_n_g that matched the whole regular expression. The oth-
- ers are those substrings that matched parenthesized expres-
- sions within the regular expression, with parenthesized
- expressions numbered in left-to-right order of their opening
- parentheses.
-
-
-
- Sun Release 4.1 Last change: local 1
-
-
-
-
-
-
- REGEXP(3) C LIBRARY FUNCTIONS REGEXP(3)
-
-
-
- _R_e_g_s_u_b copies _s_o_u_r_c_e to _d_e_s_t, making substitutions according
- to the most recent _r_e_g_e_x_e_c performed using _p_r_o_g. Each
- instance of `&' in _s_o_u_r_c_e is replaced by the substring indi-
- cated by _s_t_a_r_t_p[_0] and _e_n_d_p[_0]. Each instance of `\_n',
- where _n is a digit, is replaced by the substring indicated
- by _s_t_a_r_t_p[_n] and _e_n_d_p[_n]. To get a literal `&' or `\_n' into
- _d_e_s_t, prefix it with `\'; to get a literal `\' preceding `&'
- or `\_n', prefix it with another `\'.
-
- _R_e_g_e_r_r_o_r is called whenever an error is detected in _r_e_g_c_o_m_p,
- _r_e_g_e_x_e_c, or _r_e_g_s_u_b. The default _r_e_g_e_r_r_o_r writes the string
- _m_s_g, with a suitable indicator of origin, on the standard
- error output and invokes _e_x_i_t(2). _R_e_g_e_r_r_o_r can be replaced
- by the user if other actions are desirable.
-
- REGULAR EXPRESSION SYNTAX
- A regular expression is zero or more _b_r_a_n_c_h_e_s, separated by
- `|'. It matches anything that matches one of the branches.
-
- A branch is zero or more _p_i_e_c_e_s, concatenated. It matches a
- match for the first, followed by a match for the second,
- etc.
-
- A piece is an _a_t_o_m possibly followed by `*', `+', or `?'.
- An atom followed by `*' matches a sequence of 0 or more
- matches of the atom. An atom followed by `+' matches a
- sequence of 1 or more matches of the atom. An atom followed
- by `?' matches a match of the atom, or the null string.
-
- An atom is a regular expression in parentheses (matching a
- match for the regular expression), a _r_a_n_g_e (see below), `.'
- (matching any single character), `^' (matching the null
- string at the beginning of the input string), `$' (matching
- the null string at the end of the input string), a `\' fol-
- lowed by a single character (matching that character), or a
- single character with no other significance (matching that
- character).
-
- A _r_a_n_g_e is a sequence of characters enclosed in `[]'. It
- normally matches any single character from the sequence. If
- the sequence begins with `^', it matches any single charac-
- ter _n_o_t from the rest of the sequence. If two characters in
- the sequence are separated by `-', this is shorthand for the
- full list of ASCII characters between them (e.g. `[0-9]'
- matches any decimal digit). To include a literal `]' in the
- sequence, make it the first character (following a possible
- `^'). To include a literal `-', make it the first or last
- character.
-
- AMBIGUITY
- If a regular expression could match two different parts of
- the input string, it will match the one which begins
-
-
-
- Sun Release 4.1 Last change: local 2
-
-
-
-
-
-
- REGEXP(3) C LIBRARY FUNCTIONS REGEXP(3)
-
-
-
- earliest. If both begin in the same place but match dif-
- ferent lengths, or match the same length in different ways,
- life gets messier, as follows.
-
- In general, the possibilities in a list of branches are con-
- sidered in left-to-right order, the possibilities for `*',
- `+', and `?' are considered longest-first, nested constructs
- are considered from the outermost in, and concatenated con-
- structs are considered leftmost-first. The match that will
- be chosen is the one that uses the earliest possibility in
- the first choice that has to be made. If there is more than
- one choice, the next will be made in the same manner (earli-
- est possibility) subject to the decision on the first
- choice. And so forth.
-
- For example, `(ab|a)b*c' could match `abc' in one of two
- ways. The first choice is between `ab' and `a'; since `ab'
- is earlier, and does lead to a successful overall match, it
- is chosen. Since the `b' is already spoken for, the `b*'
- must match its last possibility-the empty string-since it
- must respect the earlier choice.
-
- In the particular case where no `|'s are present and there
- is only one `*', `+', or `?', the net effect is that the
- longest possible match will be chosen. So `ab*', presented
- with `xabbbby', will match `abbbb'. Note that if `ab*' is
- tried against `xabyabbbz', it will match `ab' just after
- `x', due to the begins-earliest rule. (In effect, the deci-
- sion on where to start the match is the first choice to be
- made, hence subsequent choices must respect it even if this
- leads them to less-preferred alternatives.)
-
- SEE ALSO
- egrep(1), expr(1)
-
- DIAGNOSTICS
- _R_e_g_c_o_m_p returns NULL for a failure (_r_e_g_e_r_r_o_r permitting),
- where failures are syntax errors, exceeding implementation
- limits, or applying `+' or `*' to a possibly-null operand.
-
- HISTORY
- Both code and manual page were written at U of T. They are
- intended to be compatible with the Bell V8 _r_e_g_e_x_p(3), but
- are not derived from Bell code.
-
- BUGS
- Empty branches and empty regular expressions are not port-
- able to V8.
-
- The restriction against applying `*' or `+' to a possibly-
- null operand is an artifact of the simplistic implementa-
- tion.
-
-
-
- Sun Release 4.1 Last change: local 3
-
-
-
-
-
-
- REGEXP(3) C LIBRARY FUNCTIONS REGEXP(3)
-
-
-
- Does not support _e_g_r_e_p's newline-separated branches; neither
- does the V8 _r_e_g_e_x_p(3), though.
-
- Due to emphasis on compactness and simplicity, it's not
- strikingly fast. It does give special attention to handling
- simple cases quickly.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Sun Release 4.1 Last change: local 4
-
-
-
-