Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. Rules

A rule is a named collection of alternative productions. There are three kinds of rules: syntax, token, and interleave. A text value conforms to a rule if it conforms to any one of the productions in the rule. If a text value conforms to more than one production in the rule, then the rule is ambiguous. The three different kinds of rules differ in how they treat ambiguity and how they handle their output.

RuleDeclaration:
  Attributes_opt MemberModifiers_opt Kind Name RuleParameters_opt RuleBody_opt ;
Kind:
  token
  syntax
  interleave
MemberModifiers:
  MemberModifier
  MemberModifiers MemberModifer
MemberModifier:
  final
  identifier
RuleBody:
  = ProductionDeclarations
ProductionDeclarations:
  ProductionDeclaration
  ProductionDeclarations | ProductionDeclaration

The rule Main below recognizes the two text values "Hello" and "Goodbye".

module HelloGoodby {
    language HelloGoodbye {
        syntax Main
          = "Hello"
          | "Goodbye";
    }
}

Token Rules

Token rules recognize a restricted family of languages. However, token rules can be negated, intersected, and subtracted, which is not the case for syntax rules. Attempting to perform these operations on a syntax rule results in an error. The output from a token rule is the text matched by the token. No constructor may be defined.

Final Modifier

Token rules do not permit precedence directives in the rule body. They have a built-in protocol to deal with ambiguous productions. A language processor attempts to match all tokens in the language against a text value starting with the first character, then the first two, etc. If two or more productions within the same token or two different tokens can match the beginning of a text value, a token rule will choose the production with the longest match. If all matches are exactly the same length, the language processor will choose a token rule marked final if present. If no token rule is marked final, all the matches succeed and the language processor evaluates whether each alternative is recognized in a larger context. The language processor retains all of the matches and begins attempting to match a new token starting with the first character that has not already been matched.

Identifier Modifier

The identifier modifier applies only to tokens. It is used to lower the precedence of language identifiers so they do not conflict with language keywords.

Syntax Rules

Syntax rules recognize all languages that M_g is capable of defining. The Main start rule must be a syntax rule. Syntax rules allow all precedence directives and may have constructors.

Interleave Rules

An interleave rule recognizes the same family of languages as a token rule and also cannot have constructors. Further, interleave rules cannot have parameters and the name of an interleave rule cannot be references.

Text that matches an interleave rule is excluded from further processing.

The following example demonstrates whitespace handling with an interleave rule:

module HelloWorld {
    language HelloWorld {
        syntax Main =
          = Hello World;
        token Hello
          = "Hello";
        token World
          = "World";
        interleave Whitespace
          = " ";
    }
}

This language recognizes the text value "Hello World". It also recognizes "Hello World", "Hello World", "Hello World", and "HelloWorld". It does not recognize "Hello World" because "He" does not match any token.

Inline Rules

An inline rule is an anonymous rule embedded within the pattern of a production. The inline rule is processed as any other rule; however, it cannot be reused since it does not have a name. Variables defined within an inline rule are scoped to their productions as usual. A variable may be bound to the output of an inline rule as with any pattern.

In the following, Example1 and Example2 recognize the same language and produce the same output. Example1 uses a named rule AppleOrOrange while Example2 states the same rule inline.

module Example {
    language Example1 {
        syntax Main
          = aos:AppleOrOrange*
            => aos;

        syntax AppleOrOrange
          = "Apple" => Apple{}
          | "Orange" => Orange{};
    }

    language Example2 {
        syntax Main
          = aos:("Apple" => Apple{} | "Orange" => Orange{})*
            => aos;
    }
}

Rule Parameters

A rule may define parameters that can be used within the body of the rule.

RuleParameters:
  ( RuleParameterList )
RuleParameterList:
  RuleParameter
  RuleParameterList , RuleParameter
RuleParameter:
  Identifier

A single rule identifier may have multiple definitions with different numbers of parameters. The following example uses List(Content,Separator) to define List(Content) with a default separator of ",".

module HelloWorld {
    language HelloWorld {
        syntax Main
          = List(Hello);
        token Hello
          = "Hello";
        syntax List(Content, Separator)
          = Content
          | List(Content,Separator) Separator Content;

        syntax List(Content) = List(Content, ",");
    }
}

This language will recognize "Hello", "Hello,Hello", "Hello,Hello,Hello", and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 11. Rules

Create new playlist

Sign In

Sign Up