Regular expressions

Regular expressions are a concise and flexible notation for finding and replacing patterns of text. The Alias Map component uses the Regex++ Regular Expression Engine (Copyright Dr John Mattock 1998-2001). For further information about the syntax, see:

When a regular expression contains characters, it usually means that the text being searched must match those characters. However, regular expressions use a number of characters that have a special meaning. The special characters are case sensitive: for example, \S is not the same as \s.

Indicates that the match is to start at the beginning of the input text.

Indicates that the match is to be applied to the end of the input text. It does not match line breaks in the Alias Map component. Generally it should only be used in the top-level rule to match the end of the input text for the alias name. Using $ elsewhere must be done with utmost care, because it might match the end of a subtext passed to a subrule instead of matching only the end of the entire input text, and this can lead to unexpected results. For example, the same counter might be inserted multiple times, once for each subtext passed to the counter-matching rule.

Matches any character that is not specified in the set of characters after the ^. For example, [^A-Z] matches any character that is not an uppercase English letter.

Repeats the preceding expression one or more times.

Repeats the preceding expression zero or more times.

This is the escape character that you use to match characters that have a special meaning in regular expressions, such as the following characters , . ? { } [ ] ( ) $ ^ *. For example, to match the { character, you would specify \{.

Matches the whitespace character.

Groups a subexpression.

Matches any character that is not an uppercase English letter.

Matches any digits in the range 0 to 9 that appear at the end of the input text.

Matches a dollar sign at the start of the input text.

Matches the [ character and the ] character.

Regular expressions are specified in the Alias Map component using the IRule object, which has a Locale property. This controls the locale settings used to parse the regular expression. The exact behavior of localized regular expressions tends to differ on different operating systems. The Alias Map component does not support locale-independent Unicode operations. You are therefore advised to avoid using the word (\w) syntax and to use Unicode ranges instead.

When you cannot specify a range as a sequence, specify the characters individually or as combinations of ranges. For example, [0-9a-fA-F] specifies a combination of ranges to recognize hexadecimal values and is not locale-specific.

To handle more complex problems, such as what is a valid letter, or what is a valid variable name, you often need to know what a letter is. The supported regular expression escape codes for deciding such questions are locale-specific and are therefore best avoided.

The following example avoids using locale-specific escape codes to approximate ECMA Script 7.6 Unicode variable names, on which the current definition of valid UNICOM Intelligence Data Model names is based. Although this example is reasonably successful at defining useful variable names, note that it is an approximation and is subject to change.