A rule is a named collection of alternative productions. There are three kinds of rules: syntax
, token
, and interleave
. A text value conforms to a rule if it conforms to any one of the productions in the rule. If a text value conforms to more than one production in the rule, then the rule is ambiguous. The three different kinds of rules differ in how they treat ambiguity and how they handle their output.
RuleDeclaration: Attributesopt MemberModifiersopt Kind Name RuleParametersopt RuleBodyopt ; Kind: token syntax interleave MemberModifiers: MemberModifier MemberModifiers MemberModifer MemberModifier: final identifier RuleBody: = ProductionDeclarations ProductionDeclarations: ProductionDeclaration ProductionDeclarations | ProductionDeclaration
The rule Main
below recognizes the two text values "Hello"
and "Goodbye"
.
module HelloGoodby { language HelloGoodbye { syntax Main = "Hello" | "Goodbye"; } }
Token rules recognize a restricted family of languages. However, token rules can be negated, intersected, and subtracted, which is not the case for syntax rules. Attempting to perform these operations on a syntax rule results in an error. The output from a token rule is the text matched by the token. No constructor may be defined.
Token rules do not permit precedence directives in the rule body. They have a built-in protocol to deal with ambiguous productions. A language processor attempts to match all tokens in the language against a text value starting with the first character, then the first two, etc. If two or more productions within the same token or two different tokens can match the beginning of a text value, a token rule will choose the production with the longest match. If all matches are exactly the same length, the language processor will choose a token rule marked final
if present. If no token rule is marked final
, all the matches succeed and the language processor evaluates whether each alternative is recognized in a larger context. The language processor retains all of the matches and begins attempting to match a new token starting with the first character that has not already been matched.
Syntax rules recognize all languages that Mg is capable of defining. The Main
start rule must be a syntax rule. Syntax rules allow all precedence directives and may have constructors.
An interleave rule recognizes the same family of languages as a token rule and also cannot have constructors. Further, interleave rules cannot have parameters and the name of an interleave rule cannot be references.
Text that matches an interleave rule is excluded from further processing.
The following example demonstrates whitespace handling with an interleave rule:
module HelloWorld { language HelloWorld { syntax Main = = Hello World; token Hello = "Hello"; token World = "World"; interleave Whitespace = " "; } }
This language recognizes the text value "Hello World"
. It also recognizes "Hello World"
, "Hello World"
, "Hello World"
, and "HelloWorld"
. It does not recognize "Hello World"
because "He"
does not match any token.
An inline rule is an anonymous rule embedded within the pattern of a production. The inline rule is processed as any other rule; however, it cannot be reused since it does not have a name. Variables defined within an inline rule are scoped to their productions as usual. A variable may be bound to the output of an inline rule as with any pattern.
In the following, Example1
and Example2
recognize the same language and produce the same output. Example1
uses a named rule AppleOrOrange
while Example2
states the same rule inline.
module Example { language Example1 { syntax Main = aos:AppleOrOrange* => aos; syntax AppleOrOrange = "Apple" => Apple{} | "Orange" => Orange{}; } language Example2 { syntax Main = aos:("Apple" => Apple{} | "Orange" => Orange{})* => aos; } }
A rule may define parameters that can be used within the body of the rule.
RuleParameters: ( RuleParameterList ) RuleParameterList: RuleParameter RuleParameterList , RuleParameter RuleParameter: Identifier
A single rule identifier may have multiple definitions with different numbers of parameters. The following example uses List(Content,Separator)
to define List(Content)
with a default separator of ","
.
module HelloWorld { language HelloWorld { syntax Main = List(Hello); token Hello = "Hello"; syntax List(Content, Separator) = Content | List(Content,Separator) Separator Content; syntax List(Content) = List(Content, ","); } }
This language will recognize "Hello"
, "Hello,Hello"
, "Hello,Hello,Hello"
, and so on.