home *** CD-ROM | disk | FTP | other *** search
- REGULAR EXPRESSION SYNTAX
- =========================
- A regular expression is zero or more branches, separated by |. It matches
- anything that matches one of the branches. A branch is zero or more pieces,
- concatenated. It matches a match for the first, followed by a match for the
- second, etc.
-
- A piece is an atom possibly followed by *, +, or ?.
- An atom followed by * matches a sequence of 0 or more matches of the atom.
- An atom followed by + matches a sequence of 1 or more matches of the atom.
- An atom followed by ? matches a match of the atom, or the null string.
-
- An atom is a regular expression in parentheses (matching a match for the
- regular expression), a range (see below),
- . (matching any single character),
- ^ (matching the null string at the beginning of the input string),
- $ (matching the null string at the end of the input string),
- a \ followed by a single character (matching that character),
- or a single character with no other significance (matching that character).
-
- A range is a sequence of characters enclosed in []. It normally matches any
- single character from the sequence. If the sequence begins with ^, it matches
- any single character not from the rest of the sequence. If two characters in
- the sequence are separated by -, this is shorthand for the full list of ASCII
- characters between them (e.g. [0-9] matches any decimal digit). To include a
- literal ] in the sequence, make it the first character (following a possible
- ^). To include a literal -, make it the first or last character. AMBIGUITY If
- a regular expression could match two different parts of the input string, it
- will match the one which begins earliest. If both begin in the same place but
- match different lengths, or match the same length in different ways, life gets
- messier, as follows.
-
- In general, the possibilities in a list of branches are considered in
- left-to-right order, the possibilities for *, +, and ? are considered
- longest-first, nested constructs are considered from the outermost in, and
- concatenated constructs are considered leftmost-first. The match that will be
- chosen is the one that uses the earliest possibility in the first choice that
- has to be made. If there is more than one choice, the next will be made in the
- same manner (earliest possibility) subject to the decision on the first
- choice. And so forth.
-
- For example, '(ab|a)b*c' could match 'abc' in one of two ways. The first
- choice is between 'ab' and 'a'; since 'ab' is earlier, and does lead to a
- successful overall match, it is chosen. Since the 'b' is already spoken for,
- the 'b*' must match its last possibility, the empty string, since it must
- respect the earlier choice.
-
- In the particular case where no |'s are present and there is only one *, +, or
- ?, the net effect is that the longest possible match will be chosen. So 'ab*',
- presented with 'xabbbby', will match 'abbbb'. Note that if 'ab*' is tried
- against 'xabyabbbz', it will match 'ab' just after 'x', due to the
- begins-earliest rule. (In effect, the decision on where to start the match is
- the first choice to be made, hence subsequent choices must respect it even if
- this leads them to less-preferred alternatives.
-
- REGULAR EXPRESSION SUBSTITUTION
- ===============================
- Substitutions are made according to the most recent RE search. Each instance
- of '$&' in the paste buffer is replaced by the string that matched the whole
- regular expression. Each instance of '$n', where n is a digit between 1 and 9,
- is replaced by the substrings that matched parenthesized expressions within
- the regular expression, with parenthesized expressions numbered in
- left-to-right order of their opening parentheses. To get a literal '$' or '\'
- into dest, prefix it with '\'.
-
-