# rxb, a simple regular expression builder (by Ka-Ping Yee, 20 Sept 1996) # # 1996-10-22: changed to backslash-parens everywhere as a workaround for the # regex.symcomp() bug pointed out by William S. Lear # # 1996-11-08: bug reported by Jonathan Giddy # literal parentheses no longer escaped # # 2000-01-26: converted for re module; added sub, split, followedby """rxb, a simple regular expression builder (by Ka-Ping Yee, 20 Sept 1996) From an idea by Greg Ewing on comp.lang.python. This module encapsulates the construction and functionality of regular expressions in a class named 'Pattern'. To build 'Pattern's, use the functions and constants in this module; you should not need to instance the 'Pattern' class directly unless you are actually supplying a real (awk-style) regular expression. You can concatenate 'Pattern' instances using the '+' operator or repeat them using the '*' operator with a number. The available functions are: exactly() :: exactly the given string anybut() :: text not containing the string member(, , ...) :: any single char mentioned nonmember(, , ...) :: any single char not mentioned maybe() :: zero or one occurrence some() :: one or more occurrences any() :: zero or more occurrences either(, , ...) :: one of the alternatives label(, ) :: label a subgroup for later followedby() :: positive lookahead assertion notfollowedby() :: negative lookahead assertion For 'label' you can also use the alternate, more concise syntax label.() The 'followedby' and 'notfollowedby' functions indicate that you want to look for a match after a particular point, or make sure that there is *not* a match after a particular point, without actually consuming any of the string being matched. The first four functions only accept literal strings. The rest all accept either literals or 'Pattern's otherwise created by this module. Note that 'exactly()' is necessary only if used alone, since any string will be converted from a literal to a 'Pattern' by any of the other operations (including '+'). 'member()' and 'nonmember()' accept any literal characters or strings of characters among their arguments, as well as the special constants 'letters', 'digits', 'hexdigits', 'wordchars', and 'whitespace' from this module. (The corresponding constants starting with 'non-' do not work here.) You can also give to 'member()' or 'nonmember()' a sequence created using 'chrange(, )'. For your convenience, the following 'Pattern' constants are also available: letter, letters :: any small or capital letter digit, digits :: any digit wordchar, wordchars :: letter, digit, or underscore hexdigit, hexdigits :: any hexadecimal digit whitespace :: space, return, newline, tab anychar, anychars :: any single character nonletter, nondigit, nonwordchar, nonhexdigit, or nonwhitespace :: any char other than the indicated type begline, endline :: beginning or end of line anything :: any number of non-newlines something :: one or more non-newlines anyspace :: any amount of whitespace somespace :: one or more whitespace chars When you're done constructing, use these 'Pattern' methods to do real work: match([, ]) :: match at beginning of string or at index search([, ]) :: find anywhere in string or after index sub(repl, string[, ]) :: substitute (at most 'count' times) subn(repl, string[, ]) :: substitute and also return count of hits split(string[, ]) :: split (into at most given # of pieces) imatch([, ]) :: case-insensitive match isearch([, ]) :: case-insensitive search Each 'Pattern' will manage its own compilation. If for some reason you must get the compiled regular expression (compiled using Python's built-in 're' module) you can use the 'compile()' and 'icompile()' methods. The following 'group' method and attributes work both on the 'Match' object returned by one of the above four methods, or on the 'Pattern' object itself where they refer to the last match or search attempt. found :: the entire string that matched before :: everything before what matched after :: everything after what matched group(