Text pattern expressions perform operations on the sets of possible text values that one or more terms recognize.
A primary expression can be:
A text literal
A reference to a syntax or token rule
An expression indicating a repeated sequence of primary expressions of a specified length
An expression indicating any of a continuous range of characters
An inline sequence of pattern declarations
The following grammar reflects this structure.
Primary: | |
ReferencePrimary | |
TextLiteral | |
RepetitionPrimary | |
CharacterClassPrimary | |
InlineRulePrimary | |
AnyPrimary |
A character class is a compact syntax for a range of continuous characters. This expression requires that the text literals be of length 1 and that the Unicode offset of the right operand be greater than that of the left.
CharacterClassPrimary: | |
TextLiteral .. TextLiteral |
The expression "0".."9"
is equivalent to:
"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
A reference primary is the name of another rule possibly with arguments for parameterized rules. All rules defined within the same language can be accessed without qualification. The protocol to access rules defined in a different language within the same module are defined in Section 12.2. The protocol to access rules defined in a different module are defined in Section 13.3.
ReferencePrimary: GrammarReference GrammarReference: Identifier GrammarReference . Identifier GrammarReference . Identifier ( TypeArguments ) Identifier ( TypeArguments ) TypeArguments: PrimaryExpression TypeArguments , PrimaryExpression
Note that whitespace between a rule name and its arguments list is significant to discriminate between a reference to a parameterized rule and a reference without parameters and an inline rule. In a reference to a parameterized rule, no whitespace is permitted between the identifier and the arguments.
The repetition operators recognize a primary expression repeated a specified number of times. The number of repetitions can be stated as a (possibly open) integer range or using one of the Kleene operators, ?
, +
, *
.
RepetitionPrimary: Primary Range Primary CollectionRanges Range: ? * + CollectionRanges: #IntegerLiteral #IntegerLiteral .. IntegerLiteralopt
The left operand of ..
must be greater than zero and less than the right operand of ..
, if present.
"A"#5 recognizes exactly 5 "A"s "AAAAA" "A"#2..4 recognizes from 2 to 4 "A"s "AA", "AAA", "AAAA" "A"#3.. recognizes 3 or more "A"s "AAA", "AAAA", "AAAAA", . . .
The Kleene operators can be defined in terms of the collection range operator:
"A"? is equivalent to "A"#0..1 "A"+ is equivalent to "A"1.. "A"* is equivalent to "A"#0..
An inline rule is a means to group pattern declarations together as a term.
InlineRulePrimary: ( ProductionDeclarations )
An inline rule is typically used in conjunction with a range operator:
"A" ("," "A")*
recognizes 1 or more "A"
s separated by commas.
Although syntactically legal, variable bindings within inline rules are not accessible within the constructor of the containing production. Inline rules are described further in Section 11.4.
The any
term is a wildcard that matches any text value of length 1.
Any:
any
"1"
, "z"
, and "*"
all match any.
The error
production enables error recovery. Consider the following example:
module HelloWorld { language HelloWorld { syntax Main = HelloList; token Hello = "Hello"; checkpoint syntax HelloList = Hello | HelloList "," Hello | HelloList "," error; } }
The language recognizes the text "Hello,Hello,Hello"
as expected and produces the following default output:
Main[ HelloList[ HelloList[ HelloList[ Hello ], ,, Hello ], ,, Hello ] ]
The text "Hello,hello,Hello"
is not in the language because the second "h"
is not capitalized (and case sensitivity is true). However, rather than stop at "h"
, the language processor matches "h"
to the error
token, then matches "e"
to the error
token, and so forth. Until it reaches the comma. At this point the text conforms to the language and normal processing can continue. The language process reports the position of the errors and produces the following output:
Main[ HelloList[ HelloList[ HelloList[ Hello ], error["hello"], ], ,, Hello ] ]
Hello
occurs twice instead of three times as above and the text the error
token matched is returned as error["hello"]
.
A primary term expression can be thought of as the set of possible text values that it recognizes. The term operators perform the standard set difference, intersection, and negation operations on these sets. (Pattern declarations perform the union operation with |
.)
TextPatternExpression: Difference Difference: Intersect Difference - Intersect Intersect: Inverse Intersect & Inverse Inverse: Primary ^ Primary
Inverse requires every value in the set of possible text values to be of length 1.
("11" | "12") – ("12" | "13")
recognizes "11"
.
("11" | "12") & ("12" | "13")
recognizes "12"
.
^("11" | "12")
is an error.
^("1" | "2")
recognizes any text value of length 1 other than "1"
or "2"
.