home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Datafile PD-CD 4
/
DATAFILE_PDCD4.iso
/
unix
/
unixtools
/
util
/
regex
/
doc
Wrap
Text File
|
1992-07-21
|
5KB
|
124 lines
Regular Expression matching functions.
Seven functions are supplied.
1. re_create(buffer,len,translate)
Creates a regular expression buffer, re, of type (REGEX *).
The user may supply three (optional) parameters. BUFFER and
LEN define a user-supplied (fixed) buffer to hold the compiled
regular expression. If this is not supplied, the package
automatically allocates a buffer of the required size from the
heap. TRANSLATE is a 256-character translation table for
characters. If it is supplied, the package treats character c
in all cases as if it were translate[c].
2. re_free(re)
Frees all storage associated with the regular expression
buffer RE.
3. re_compile(patt,re)
Compiles the pattern string PATT into a regular expression
buffer RE. Special characters (defined in the header file
H.Chars) are as described below. The return value is 0 if the
compile succeeds, or a pointer to an error message if it
fails.
4. re_anchored_match(re,str,len,start,mem)
Matches string STR against the regular expression buffer RE.
The match starts at position START in STR, and must match
starting from there. The value returned is the number of
characters matched or RE_FAIL if the match fails, or RE_ERROR
if an error occurs. The optional argument LEN is the length of
STR, and will be calculated if it is omitted (-1). The optional
argument MEM is a (REGMEM *) which will be filled with details
of the matched portions of the string. mem[0] holds the whole
matched string, and mem[1] to mem[9] hold the values matched
by the regular expression memories \1 to \9.
5. re_match(re,str,len,mem)
Matches string STR against the regular expression buffer RE.
The match may occur at any position in STR, and the value
returned is the position of the start of the match (0 to
strlen(str)), or RE_FAIL if the match fails, or RE_ERROR if an
error occurs. The optional argument LEN is the length of STR,
and will be calculated if it is omitted (-1). The optional
argument MEM is a (REGMEM *) which will be filled with details
of the matched portions of the string. mem[0] holds the whole
matched string, and mem[1] to mem[9] hold the values matched by
the regular expression memories \1 to \9. Note that the only way
of finding the matched string is via mem[0].
6. re_magic(string)
Tests STRING to see if it has any regular expression operators
in it. Returns 1 if so, 0 otherwise. Note that brackets ('(' and
')' ) are not treated as operators, as they are so common in
normal strings.
7. re_dump(re)
This is a debugging tool, and prints a formatted listing of the
regular expression in buffer RE.
Special characters.
The special characters which can be used in a regular expression
pattern are as follows:
Character expressions (ce's)
^ Start of a line
$ End of a line
\` Start of the string
\' End of the string
\< Start of a word
\> End of a word
\@ A word boundary
. Any character
\w A word character
[...] A set of characters.
\c Where `c' is a special character. Matches c.
c Where `c' is any non-special character. Matches itself.
Operators
~ Not. `~ce' matches anything but ce.
| Or. `re1 | re2' matches either re1 or re2.
* Repeat. `re*' matches re repeated 0 or more times.
+ Many. `re+' matches re repeated 1 or more times.
? Optional. `re?' matches re or nothing.
(...) Bracketing. `(re)' matches the same as re.
Memory.
\{ Start memory.
\} End memory.
\n Match the characters in memory 'n' (n is 1-9).
The operators \{...\} do not affect matching, but the
characters matched by the expression between the n'th \{ and
its corresponding \} are saved in memory n, for use in \n
matches, and to return in the MEM array.
Words.
Word characters are numbers and letters (ie. \w is the same as
the expression [a-zA-Z0-9]). A word boundary is the position
between a word character and a non-word character (in either
order).
Character sets.
Within sets, [...], all characters are taken literally, except \,
], and -. The '-' character indicates a range, unless it is the
first or last character, when it indicates a literal '-'. The '\'
character causes the following character to be taken literally
(as in \-, \\, \]). In addition the normal C escape sequences \b,
\f, \n, \r, \t, \v are available.