Chapter 27. The Roslyn Compiler

C# 6.0 has a brand-new compiler, written entirely in C#. The new compiler is modular, so you can utilize its functionality in many ways besides compiling source code to an executable or library. Known as “Roslyn”, the new compiler makes it easier to write static code analysis and refactoring tools, editors with syntax highlighting and code completion, and Visual Studio plugins that understand C# code.

The Roslyn libraries can be downloaded from NuGet, and there are packages for both C# and VB. As both languages share some architecture, there are common dependencies. The NuGet package ID for the C# compiler libraries is Microsoft.CodeAnalysis.CSharp.

The source code for Roslyn is publicly available under the Apache 2 open source license. This opens up further possibilities, including morphing C# into a custom or domain-specific language. The source code is available on GitHub, at https://github.com/dotnet/roslyn.

The GitHub site also hosts documentation, examples, and walkthroughs that demonstrate code analysis and refactoring.

Warning

.NET Framework 4.6 does not ship with the Roslyn assemblies, and its version of csc.exe invokes the old C# 5 compiler. Installing Visual Studio 2015 remaps csc.exe to the C# 6 (Roslyn) compiler.

Without Visual Studio 2015, you can still programmatically invoke the compiler (and its services) if you download and reference the Roslyn assemblies. But the csc.exe tool that ships with the .NET Framework will remain pointed at C# 5 until you install Visual Studio 2015.

The assemblies that comprise the C# compiler library are as follows:

Microsoft.CodeAnalysis.dll
Microsoft.CodeAnalysis.CSharp.dll
System.Collections.Immutable.dll
System.Reflection.Metadata.dll

The former assembly is also used by the VB compiler and contains common base types for trees, symbols, compilations, and so on.

Note

All code listings in this chapter are available as interactive samples in LINQPad 5. Go to LINQPad’s Samples tab at the bottom left, click “Download more samples,” and choose “C# 6.0 in a Nutshell.”

Roslyn Architecture

The Roslyn architecture separates compilation into three phases:

  1. Parsing code into syntax trees (the syntactic layer)

  2. Binding identifiers to symbols (the semantic layer)

  3. Emitting IL

In the first phase, a parser reads C# code and outputs syntax trees. A syntax tree is a DOM (Document Object Model) that describes source code in tree structure.

The second phase is where C#’s static binding takes place. Assembly references are read, and the compiler figures out, for instance, that “Console” refers to System.Console in mscorlib.dll. Overload resolution and type inference are a part of this, too.

The third phase produces the output assembly. If you plan to use Roslyn for code analysis or refactoring, you won’t use this functionality.

Visual Studio’s editor uses the output of the syntactic layer to color keywords, strings, comments, and disabled code (in blue, red, green, and gray, respectively), whereas it uses the output of the semantic layer to color resolved type names (in turquoise).

Workspaces

In this chapter, we describe the compiler and the features it exposes. It’s worth keeping in mind that there’s an additional “layer” above the compiler called workspaces. It’s also available on NuGet; the package ID is Microsoft.CodeAnalysis.CSharp.Workspaces.

The workspaces layer understands Visual Studio solutions, projects, and documents, and includes additional services, such as code refactoring, not strictly related to the compilation processes.

The workspaces layer is open source, and by looking at the source code, it’s possible to learn more about the compilation layer.

Syntax Trees

A syntax tree is a DOM for source code. The syntax tree API is completely separate from the System.Linq.Expressions API we discussed in “Expression Trees” in Chapter 8, although the two have conceptual similarities. Both APIs can represent C# expressions in a DOM; however, a Roslyn syntax tree has the following unique features:

  • It can represent the entire C# language, not just expressions.

  • It can include comments, whitespace, and other “trivia,” and can round-trip with full fidelity back to the original source code.

  • It comes with a ParseText method that parses source code into a syntax tree.

Conversely, the System.Linq.Expressions API has the following unique features:

  • It’s built into the .NET Framework, and the C# compiler itself is programmed to emit System.Linq.Expression types when it encounters a lambda expression with an assignment conversion to Expression<T>.

  • It has a fast and lightweight Compile method that emits a delegate. In contrast, the semantic layer that compiles Roslyn syntax trees offers only the heavyweight option of compiling a complete program into an assembly.

Something that both APIs have in common is that syntax trees are immutable, so none of its elements can be altered once created. This means that applications such as Visual Studio and LINQPad must create a new syntax tree each time you press a key in the editor in order to update syntax highlighting and autocompletion services. This is less expensive than it sounds because the new syntax tree is able to re-use most of the elements of the old (see “Transforming a Syntax Tree”). And knowing that an object cannot change makes the API simpler to work with. It also allows for easier and faster parallelization, since multithreaded code can safely access all parts of a syntax tree without locks.

SyntaxTree Structure

A SyntaxTree comprises three main elements:

Nodes
(Abstract SyntaxNode class.) Represents C# constructs such as expressions, statements, and method declarations. Nodes always have at least one child, so a node can never be a leaf in the tree. Nodes can have both nodes and tokens as children.
Tokens
(SyntaxToken struct.) Represents the identifiers, keywords, operators, and punctuation that make up your source code. The only kind of children that tokens can have is optional leading and trailing trivia. A token’s parent is always a node.
Trivia
(SyntaxTrivia struct.) Trivia is for whitespace, comments, preprocessor directives, and code that’s inactive due to conditional compilation. Trivia is always associated with the token that’s immediately to its left or right and is exposed via that token’s TrailingTrivia and LeadingTrivia properties, respectively.

Figure 27-1 shows the structure of the following code, with nodes in black, tokens in gray, and trivia in white:

Console.WriteLine ("Hello");
Syntax Trees
Figure 27-1. Syntax trees

SyntaxNode is abstract and has a C#-specific subclass for each kind of syntactic element, such as VariableDeclarationSyntax or TryStatementSyntax.

SyntaxToken / SyntaxTrivia are structs, and so a single type represents every kind of token / trivia. To distinguish different kinds of token or trivia, you must use the RawKind property or Kind extension method (which we’ll explain in the following section).

Note

The best way to explore a syntax tree is with a visualizer. Visual Studio has a downloadable visualizer for use with its debugger, and LINQPad has one built in. LINQPad displays the visualizer automatically for the code in the text editor when you click the Tree button in the output window. You can also ask LINQPad to display a visualizer for a syntax tree that you’ve created programmatically by calling DumpSyntaxTree on the tree (or DumpSyntaxNode on a node).

Common properties and methods

Nodes, tokens, and trivia have a number of important common properties and methods:

SyntaxTree property
Returns the syntax tree to which the object belongs.
Span property
Returns the object’s position in source code (see “Finding a child by its offset”).
Kind extension method
Returns a SyntaxKind enum that classifies the node, token, or trivia into one of several hundred values (e.g., IntKeyword, CommaToken, and WhitespaceTrivia). The same SyntaxKind enum covers nodes, tokens, and trivia.
ToString method
Returns the text (source code) for the node, token, or trivia. For tokens, the Text property is equivalent.
GetDiagnostics method
Returns errors or warnings generated during parsing.
IsEquivalentTo method
Returns true if the object is identical to another node, token, or trivia instance. Whitespace differences are significant (to ignore whitespace, call NormalizeWhitespace before comparing).
Note

Nodes and tokens also have a FullSpan property and ToFullString method. These take into account trivia, whereas Span and ToString do not.

The Kind extension method is a shortcut for casting the RawKind property, which is of type int, to Microsoft.CodeAnalysis.CSharp.SyntaxKind. The reason for not simply having a Kind property of type SyntaxKind is that the token and trivia types are also used in VB syntax trees, which have a different enum type for SyntaxKind.

Obtaining a Syntax Tree

The static ParseText method on CSharpSyntaxTree parses C# code into a SyntaxTree:

SyntaxTree tree = CSharpSyntaxTree.ParseText (@"class Test
{
  static void Main() => Console.WriteLine (""Hello"");
}");

Console.WriteLine (tree.ToString());

tree.DumpSyntaxTree();    // Displays Syntax Tree Visualizer in LINQPad

To run this in a Visual Studio project, install the Microsoft.CodeAnalysis.CSharp NuGet package, and import the following namespaces:

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

You can optionally pass in a CSharpParseOptions object to specify a C# language version, preprocessor symbols, and a DocumentationMode to indicate whether XML comments should be parsed (see “Structured trivia”). There’s also an option to specify a SourceCodeKind. Choosing Interactive or Script instructs the parser to accept a single expression or statement(s) instead of requiring an entire program, although doing so currently throws a NotSupportedException.

Another way to obtain a syntax tree is to call CSharpSyntaxTree.Create, passing in an object graph of nodes and tokens. We describe how to create these objects in “Transforming a Syntax Tree”.

After parsing a tree, you can obtain errors and warnings by calling GetDiagnostics. (You can also call this method on a specific node or token.)

Warning

If the parse resulted in unexpected errors, the tree’s structure may not be as you expect. For this reason, it’s worth calling GetDiagnostics before proceeding further.

A nice feature is that a tree with errors will round-trip back to the original text (with the same errors). In such cases, the parser does its best to provide a syntax tree that’s useful to the semantic layer, creating “phantom nodes” if necessary. This allows tools such as code completion to work with incomplete code. (You can determine if a node is phantom by checking the IsMissing property.)

Calling GetDiagnostics on the syntax tree we created in the last section indicates no errors, despite having called Console.WriteLine without importing the System namespace. This is a good example of syntactic versus semantic parsing: our program is syntactically correct, and our error will not manifest until we create a compilation, add assembly references, and query the semantic model, where binding takes place.

Traversing and Searching a Tree

A SyntaxTree acts as a wrapper for the tree structure. It has a reference to a single root node, which you obtain by calling GetRoot:

var tree = CSharpSyntaxTree.ParseText (@"class Test
{
  static void Main() => Console.WriteLine (""Hello"");
}");

SyntaxNode root = tree.GetRoot();

The root node of a C# program is a CompilationUnitSyntax:

Console.WriteLine (root.GetType().Name);   // CompilationUnitSyntax

Traversing children

SyntaxNode exposes LINQ-friendly methods to traverse its child nodes and tokens. The simplest are:

IEnumerable<SyntaxNode> ChildNodes()
IEnumerable<SyntaxToken> ChildTokens()

Following on from our previous example, our root node has a single child node of type ClassDeclarationSyntax:

var cds = (ClassDeclarationSyntax) root.ChildNodes().Single();

We can enumerate the members of cds via either its ChildNodes method or the Members property of ClassDeclarationSyntax:

foreach (MemberDeclarationSyntax member in cds.Members)
  Console.WriteLine (member.ToString());

with the following result:

static void Main() => Console.WriteLine (""Hello"");

There are also Descendant* methods which descend recursively into children. We can enumerate the tokens that make up our program as follows:

foreach (var token in root.DescendantTokens())
  Console.WriteLine ($"{token.Kind(),-30} {token.Text}");

Here’s the result:

ClassKeyword                   class
IdentifierToken                Test
OpenBraceToken                 {
StaticKeyword                  static
VoidKeyword                    void
IdentifierToken                Main
OpenParenToken                 (
CloseParenToken                )
EqualsGreaterThanToken         =>
IdentifierToken                Console
DotToken                       .
IdentifierToken                WriteLine
OpenParenToken                 (
StringLiteralToken             "Hello"
CloseParenToken                )
SemicolonToken                 ;
CloseBraceToken                }
EndOfFileToken

Notice that there’s no whitespace in the result. Replacing token.Text with token.ToFullString() would give us whitespace (and any other trivia).

The following uses the DescendantNodes method to locate the syntax node for our method declaration:

var ourMethod = root.DescendantNodes()
                    .First (m => m.Kind() == SyntaxKind.MethodDeclaration);

or alternatively:

var ourMethod = root.DescendantNodes()
                    .OfType<MethodDeclarationSyntax>()
                    .Single();

With the latter example, ourMethod is of type MethodDeclarationSyntax, which exposes useful properties specific to method declarations. For instance, if our example contained more than one method definition, and we wanted to find just the method whose name is “Main”, we could do this:

var mainMethod = root.DescendantNodes()
                     .OfType<MethodDeclarationSyntax>()
                     .Single (m => m.Identifier.Text == "Main");

Identifier is a property on MethodDeclarationSyntax that returns the token corresponding to the method’s identifier (i.e., its name). We could get the same result with more effort, as follows:

root.DescendantNodes().First (m =>
  m.Kind() == SyntaxKind.MethodDeclaration &&
  m.ChildTokens().Any (t =>
    t.Kind() == SyntaxKind.IdentifierToken && t.Text == "Main"));

SyntaxNode also has GetFirstToken and GetLastToken methods which are equivalent to calling DescendantTokens().First() and DescendantTokens().Last().

Note

GetLastToken() is faster than DescendantTokens().Last() because it returns a direct link rather than enumerating through all descendants.

As nodes can contain both child nodes and tokens whose relative order is significant, there are also methods to enumerate both together:

ChildSyntaxList ChildNodesAndTokens()
IEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokens()
IEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokensAndSelf()

(ChildSyntaxList implements IEnumerable<SyntaxNodeOrToken> while also exposing a Count property and an indexer to access an element by position.)

You can traverse trivia directly from a node with the GetLeadingTrivia, GetTrailingTrivia, and DescendantTrivia methods. More commonly, though, you’d access trivia through the token to which it’s attached, via the token’s LeadingTrivia and TrailingTrivia properties. Or to convert to text, you’d use the ToFullString method, which includes trivia in the result.

Traversing parents

Nodes and tokens have a Parent property of type SyntaxNode.

For SyntaxTrivia, the “parent” is its token, accessible via the Token property.

Nodes also have methods which ascend back up the tree, which are prefixed with “Ancestor”.

Finding a child by its offset

All nodes, tokens, and trivia have a Span property of type TextSpan to indicate starting and ending offsets in the source code. Nodes and tokens also have a FullSpan property which includes leading and trailing trivia (whereas Span does not). A node’s Span does, however, include child nodes and tokens.

You can find a descendant object by position with the FindNode, FindToken, and FindTrivia methods on SyntaxNode. These methods return the descendant object with the smallest span that fully contains the span that you specify. There’s also a ChildThatContainsPosition method which searches both descendant nodes and tokens.

Should a search result in two nodes with an identical span (typically a child and grandchild), the FindNode method will return the outer (parent) node. You can change this behavior by passing true to the optional argument getInnermostNodeForTie.

The Find* methods also have an optional findInsideTrivia bool parameter. If true, this also searches for nodes or tokens within structured trivia (see “Trivia”).

CSharpSyntaxWalker

Another way to traverse a tree is by subclassing CSharpSyntaxWalker, overriding one or more of its hundreds of virtual methods. This following class counts the number of if statements:

class IfCounter : CSharpSyntaxWalker
{
  public int IfCount { get; private set; }

  public override void VisitIfStatement (IfStatementSyntax node)
  {
    IfCount++;
    // Call the base method if you want to descend into children.
    base.VisitIfStatement (node);
  }
}

Here’s how to invoke it:

var ifCounter = new IfCounter ();
ifCounter.Visit (root);
Console.WriteLine ($"I found {ifCounter.IfCount} if statements");

The result is equivalent to:

root.DescendantNodes().OfType<IfStatementSyntax>().Count()

Writing a syntax walker can be easier than using the Descendant* methods in more complex cases when you need to override multiple methods (in part, because C# has no F#-like pattern matching ability).

By default, CSharpSyntaxWalker visits just nodes. To visit tokens or trivia, you must call the base constructor with a SyntaxWalkerDepth, indicating the desired depth (node→token→trivia). Then you can override VisitToken and VisitTrivia:

class WhiteWalker : CSharpSyntaxWalker   // Counts space characters
{
  public int SpaceCount { get; private set; }

  public WhiteWalker() : base (SyntaxWalkerDepth.Trivia) { }

  public override void VisitTrivia (SyntaxTrivia trivia)
  {
    SpaceCount += trivia.ToString().Count (char.IsWhiteSpace);
    base.VisitTrivia (trivia);
  }
}

If you remove WhiteWalker’s call to the base constructor, VisitTrivia will not fire.

Trivia

Trivia is for code that, after parsing, the compiler can almost entirely ignore in terms of producing an output assembly. This comprises whitespace, comments, XML documentation, preprocessor directives, and code that’s inactive by virtue of conditional compilation.

The mandatory whitespace in your code is also considered trivia. Although essential for parsing, it’s not needed once the syntax tree has been produced (at least by the compiler). Trivia is still important for round-tripping back to the original source code.

Trivia belongs to the token to which it’s adjacent. By convention, the parser puts whitespace and comments that follow a token, up to the end of the line, into the token’s trailing trivia. Anything after that, it treats as leading trivia for the next token. (There are exceptions for the very start/end of the file.) If you’re creating tokens programmatically (see “Transforming a Syntax Tree”), you can put the whitespace in either place (or not at all, if you’re not going to convert back to source code):

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
    static /*comment*/ void Main() {}
}");

SyntaxNode root = tree.GetRoot();

// Find the static keyword token:
var method = root.DescendantTokens().Single (t =>
  t.Kind() == SyntaxKind.StaticKeyword);

// Print out the trivia around the static keyword token:
foreach (SyntaxTrivia t in method.LeadingTrivia)
  Console.WriteLine (new { Kind = "Leading " + t.Kind(), t.Span.Length });

foreach (SyntaxTrivia t in method.TrailingTrivia)
  Console.WriteLine (new { Kind = "Trailing " + t.Kind(), t.Span.Length });

Here’s the output:

{ Kind = Leading WhitespaceTrivia, Length = 1 }
{ Kind = Trailing WhitespaceTrivia, Length = 1 }
{ Kind = Trailing MultiLineCommentTrivia, Length = 11 }
{ Kind = Trailing WhitespaceTrivia, Length = 1 }

Preprocessor directives

It might seem odd that preprocessor directives are considered trivia, given that some directives (in particular, conditional compilation directives) have a nontrivial effect on the output.

The reason is that preprocessor directives are processed semantically by the parser itself, i.e., it’s the parser’s job to do the preprocessing. After which, there’s nothing left that the compiler need explicitly consider (except for #pragma). To illustrate, let’s examine how the parser handles conditional compilation directives:

#define FOO

#if FOO
    Console.WriteLine ("FOO is defined");
#else
    Console.WriteLine ("FOO is not defined");
#endif

Upon reading the #if FOO directive, the parser knows that FOO is defined, and so the line that follows is parsed normally (as nodes and tokens), whereas the line of code following the #else directive is parsed into DisabledTextTrivia.

Note

When calling CSharpSyntaxTree.Parse, you can supply additional preprocessor symbols by constructing and passing in a CSharpParseOptions instance.

Hence, with conditional compilation, it is precisely the text that can be ignored that ends up in trivia (i.e., the inactive code and the preprocessor directives themselves).

The #line directive is handled similarly, in that the parser reads and interprets the directive. The information that it harvests is used when you call GetMappedLineSpan on the syntax tree.

The #region directive is semantically empty: the only role of the parser is to check that #region directives are matched with #endregion directives. The #error and #warning directives are also processed by the parser, which generates errors and warnings that you can see by calling GetDiagnostics on the tree or node.

It can be still useful to examine the content of preprocessor directives for purposes other than producing the output assembly (syntax highlighting, for instance). This is made easier through structured trivia.

Structured trivia

There are two kinds of trivia:

Unstructured trivia
Comments, whitespace, and code that’s inactive due to conditional compilation
Structured trivia
Preprocessor directives and XML documentation

Unstructured trivia is treated purely as text, whereas structured trivia also has its content parsed into a miniature syntax tree.

The HasStructure property on SyntaxTrivia indicates whether structured trivia is present, and the GetStructure method returns the root node for the miniature syntax tree:

var tree = CSharpSyntaxTree.ParseText (@"#define FOO");

// In LINQPad:
tree.DumpSyntaxTree();  // LINQPad displays structured trivia in Visualizer

SyntaxNode root = tree.GetRoot();

var trivia = root.DescendantTrivia().First();
Console.WriteLine (trivia.HasStructure);           // True
Console.WriteLine (trivia.GetStructure().Kind());  // DefineDirectiveTrivia

In the case of preprocessor directives, you can navigate directly to the structured trivia by calling GetFirstDirective on a SyntaxNode. There’s also a ContainsDirectives property to indicate whether preprocessor trivia is present:

var tree = CSharpSyntaxTree.ParseText (@"#define FOO");
SyntaxNode root = tree.GetRoot();

Console.WriteLine (root.ContainsDirectives);      // True

// directive is the root node of the structured trivia:
var directive = root.GetFirstDirective();
Console.WriteLine (directive.Kind());             // DefineDirectiveTrivia
Console.WriteLine (directive.ToString());         // #define FOO

// If there were more directives, we could get to them as follows:
Console.WriteLine (directive.GetNextDirective());    // (null)

Once we’ve got a trivia node, we can cast it to a specific type and query its properties, just as we would with any other node:

var hashDefine = (DefineDirectiveTriviaSyntax) root.GetFirstDirective();
Console.WriteLine (hashDefine.Name.Text);     // FOO
Note

All nodes, tokens, and trivia have the IsPartOfStructuredTrivia property to indicate whether the object in question is part of a structured trivia tree (i.e., descends from a trivia object).

Transforming a Syntax Tree

You can “modify” nodes, tokens, and trivia via a set of methods with the following prefixes (most of which are extension methods):

Add*
Insert*
Remove*
Replace*
With*
Without*

Because syntax trees are immutable, all of these methods return a new object with the desired modifications, leaving the original untouched.

Handling changes to the source code

If you’re writing a C# editor, for instance, you’ll need to update a syntax tree based on changes to the source code. The SyntaxTree class has a WithChangedText method which does exactly this: it partially reparses the source code based on modifications that you describe with a SourceText instance (in Microsoft.CodeAnalysis.Text).

To create a SourceText, use its static From method, giving it the complete source code. You can then use this to create a syntax tree:

SourceText sourceText = SourceText.From ("class Program {}");
var tree = CSharpSyntaxTree.ParseText (sourceText);

Alternatively, you can obtain the SourceText for an existing tree by calling GetText.

You can now “update” sourceText by calling Replace or WithChanges. For example, we could replace the first five characters (“class”) with “struct,” as follows:

var newSource = sourceText.Replace (0, 5, "struct");

Finally, we can call WithChangedText on the tree to update it:

var newTree = tree.WithChangedText (newSource);
Console.WriteLine (newTree.ToString());         // struct Program {}

Creating new nodes, tokens, and trivia with SyntaxFactory

The static methods on SyntaxFactory programmatically create nodes, tokens, and trivia, which you can use to “transform” existing syntax trees or to create new trees from scratch.

The hardest part of doing this is figuring out exactly what kind of nodes and tokens to create. The solution is to first parse a sample of the code you want, examining the result in a syntax visualizer. For instance, suppose we want to create a syntax node for the following:

using System.Text;

We can visualize the syntax tree for this in LINQPad as follows:

CSharpSyntaxTree.ParseText ("using System.Text;").DumpSyntaxTree();

(We can parse “using System.Text;” without error because it’s valid as a complete program, albeit a functionally empty one. For most other code snippets, you’ll need to wrap the snippet in a method and/or type definition so that it will parse.)

The result has the following structure, of which we are interested in the second node (i.e., UsingDirective and its descendants):

Kind                               Token Text
=================================  ==========
CompilationUnit (node)
  UsingDirective (node)
    UsingKeyword (token)           using
      WhitespaceTrivia (trailing)
    QualifiedName (node)
      IdentifierName (node)
        IdentifierToken (token)    System
      DotToken (token)             .
      IdentifierName (node)
        IdentifierToken (token)    Text
    SemiColonToken (token)         ;
  EndOfFileToken (token)

Starting from the inside, we have two IdentifierName nodes, whose parent is a QualifiedName. We can create that as follows:

QualifiedNameSyntax qualifiedName = SyntaxFactory.QualifiedName (
  SyntaxFactory.IdentifierName ("System"),
  SyntaxFactory.IdentifierName ("Text"));

We used the overload of QualifiedName that accepts two identifiers. This overload inserts the dot token for us automatically.

We now need to wrap this in a UsingDirective:

UsingDirectiveSyntax usingDirective =
  SyntaxFactory.UsingDirective (qualifiedName);

Because we didn’t specify tokens for the “using” keyword or the trailing semicolon, tokens for each were created and added automatically. However, the automatically created tokens don’t include whitespace. This wouldn’t prevent compilation, but converting the tree to a string would result in syntactically incorrect code:

Console.WriteLine (usingDirective.ToFullString());  // usingSystem.Text;

We can fix this by calling NormalizeWhitespace on the node (or one of its ancestors); doing so automatically adds whitespace trivia (for both syntactic correctness and readability). Or for more control, we could add whitespace explicitly:

usingDirective = usingDirective.WithUsingKeyword (
  usingDirective.UsingKeyword.WithTrailingTrivia (
    SyntaxFactory.Whitespace (" ")));

Console.WriteLine (usingDirective.ToFullString());  // using System.Text;

For brevity, we “harvested” the node’s existing UsingKeyword, to which we added trailing trivia. We could have created an equivalent token with more effort by calling SyntaxFactory.Token(SyntaxKind.UsingKeyword).

The final step is to add our UsingDirective node to an existing or new syntax tree (or more precisely, the root node of a tree). To do the former, we cast the existing tree’s root to a CompilationUnitSyntax and call the AddUsings method. We can then create a new tree from the transformed compilation unit:

var existingTree = CSharpSyntaxTree.ParseText ("class Program {}");
var existingUnit = (CompilationUnitSyntax) existingTree.GetRoot();

var unitWithUsing = existingUnit.AddUsings (usingDirective);

var treeWithUsing = CSharpSyntaxTree.Create (
  unitWithUsing.NormalizeWhitespace());
Warning

Remember that all parts of a syntax tree are immutable. Calling AddUsings returns a new node, leaving the original untouched. Ignoring the return value is an easy mistake to make!

We called NormalizeWhitespace on our compilation unit so that calling ToString on the tree will yield syntactically correct and readable code. Alternatively, we could have added explicit new-line trivia to usingDirective as follows:

.WithTrailingTrivia (SyntaxFactory.EndOfLine("

"))

Creating a compilation unit and syntax tree from scratch is a similar process. The easiest approach is to start with an empty compilation unit and call AddUsings on the unit as we did before:

var unit = SyntaxFactory.CompilationUnit().AddUsings (usingDirective);

We can add type definitions to our compilation unit by creating them in a similar fashion, and then calling AddMembers:

// Create a simple empty class definition:
unit = unit.AddMembers (SyntaxFactory.ClassDeclaration ("Program"));

The final step is to create the tree:

var tree = CSharpSyntaxTree.Create (unit.NormalizeWhitespace());
Console.WriteLine (tree.ToString());

// Output:
using System.Text;

class Program
{
}

CSharpSyntaxRewriter

For more complex syntax tree transformations, you can subclass CSharpSyntaxRewriter.

CSharpSyntaxRewriter is similar to the CSharpSyntaxWalker class that we looked at previously (see “CSharpSyntaxWalker”), except that each Visit* method accepts and returns a syntax node. By returning something other than was passed in, you can “rewrite” the syntax tree.

For instance, the following rewriter changes method declaration names to uppercase:

class MyRewriter : CSharpSyntaxRewriter
{
  public override SyntaxNode VisitMethodDeclaration
    (MethodDeclarationSyntax node)
  {
    // "Replace" the method's identifier with an uppercase version:
    return node.WithIdentifier (
      SyntaxFactory.Identifier (
        node.Identifier.LeadingTrivia,            // Preserve old trivia
        node.Identifier.Text.ToUpperInvariant(),
        node.Identifier.TrailingTrivia));         // Preserve old trivia
  }
}

Here’s how to use it:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() { Test(); }
  static void Test() {         }
}");

var rewriter = new MyRewriter();
var newRoot = rewriter.Visit (tree.GetRoot());
Console.WriteLine (newRoot.ToFullString());

// Output:
class Program
{
  static void MAIN() { Test(); }
  static void TEST() {         }
}

Notice that our call to Test() in the main method did not get renamed, because we visited just member declarations and ignored invocations. To reliably rename invocations, however, we must be able to determine whether calls to Main() or Test() refer to the Program type, and not some other type. To do this, a syntax tree is not enough on its own; we also need a semantic model.

Compilations and Semantic Models

A compilation comprises syntax trees, references, and compilation options. It serves two purposes:

  • Allows compilation to a library or executable (the emit phase)

  • Exposes a semantic model that provides symbol information (obtained from binding)

The semantic model is essential in implementing features such as symbol renaming or offering code completion listings in an editor.

Creating a Compilation

Whether you’re interested in querying the semantic model or performing a full compilation, the first step is to create a CSharpCompilation, passing in the (simple) name of the assembly you wish to create:

var compilation = CSharpCompilation.Create ("test");

An assembly’s simple name is important even if you don’t plan to emit an assembly, because it forms part of the identity of the types inside the compilation.

By default, it assumes that you want to create a library. You can specify a different kind of output (windows executable, console executable, etc.) as follows:

compilation = compilation.WithOptions (
  new CSharpCompilationOptions (OutputKind.ConsoleApplication));

The CSharpCompilationOptions class has more than a dozen optional constructor parameters that correspond to the command-line options of the csc.exe tool. So if you enable compiler optimizations and give your assembly a strong name for instance, you would do this:

compilation = compilation.WithOptions (
  new CSharpCompilationOptions (OutputKind.ConsoleApplication,
    cryptoKeyFile:"myKeyFile.snk",
    optimizationLevel:OptimizationLevel.Release));

Next, we’ll add syntax trees. Each syntax tree corresponds to a “file” to be included in the compilation:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() => System.Console.WriteLine (""Hello"");
}");

compilation = compilation.AddSyntaxTrees (tree);

Finally, we need to add references. The simplest program will require a single reference to mscorlib.dll, which we can add as follows:

compilation = compilation.AddReferences (
  MetadataReference.CreateFromFile (typeof (int).Assembly.Location));

The call to MetadataReference.CreateFromFile reads the content of an assembly into memory, but not using ordinary reflection. Instead, it uses a high-performance portable assembly reader (available on NuGet) called System.Reflection.Metadata. The reader is side-effect free and does not load the assembly into the current application domain.

Warning

The PortableExecutableReference that you get back from MetadataReference.CreateFromFile can have a significant memory footprint, so be careful about holding onto references that you don’t need. Also, if you find yourself repeatedly creating references to the same assembly, a cache is worth considering (one that holds weak references is ideal).

You can do everything in a single step by calling the overload of CSharpCompilation.Create that takes syntax trees, references, and options. Or you can do it fluently in a single expression, too:

var compilation = CSharpCompilation.Create ("...")
  .WithOptions (...)
  .AddSyntaxTrees (...)
  .AddReferences (...);

Diagnostics

A compilation may generate errors and warnings, even if the syntax trees are error-free. Examples include forgetting to import a namespace, a typo when referring to a type or member name, and type parameter inference failing. You can get the errors and warnings by calling GetDiagnostics on the compilation object. Any syntax errors will be included, too.

Emitting an Assembly

Creating an output assembly is simply a matter of calling Emit:

EmitResult result = compilation.Emit (@"c:	emp	est.exe");
Console.WriteLine (result.Success);

If result.Success is false, EmitResult also has a Diagnostics property to indicate the errors that occurred during emission (this also includes diagnostics from the previous stages). If Emit fails due to a file I/O error, it will throw an exception rather than generate error codes.

The Emit method also lets you specify a .pdb file path (for debug information) and an XML documentation file path.

Querying the Semantic Model

Calling GetSemanticModel on a compilation returns the semantic model for a syntax tree:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() => System.Console.WriteLine (123);
}");

var compilation = CSharpCompilation.Create ("test")
  .AddReferences (
     MetadataReference.CreateFromFile (typeof(int).Assembly.Location))
  .AddSyntaxTrees (tree);

SemanticModel model = compilation.GetSemanticModel (tree);

(The reason for needing to specify a tree is that a compilation can contain multiple trees.)

You might expect a semantic model to be similar to syntax tree, but with more properties and methods and a more detailed structure. This is not the case, and there is no overarching DOM associated with the semantic model. Instead, you’re given set of methods to call to obtain semantic information about a particular position or node in the syntax tree.

This means that you can’t “explore” a semantic model like you would a syntax tree, and using it is rather like playing 20 Questions: the challenge is figuring out the right questions to ask. There are nearly 50 methods and extension methods; in this section, we’ll cover some of the most commonly used methods, in particular, those that demonstrate the principles of using the semantic model.

Following on from our previous example, we could ask for symbol information on the “WriteLine” identifier as follows:

var writeLineNode = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "WriteLine").Parent;

SymbolInfo symbolInfo = model.GetSymbolInfo (writeLineNode);
Console.WriteLine (symbolInfo.Symbol);   // System.Console.WriteLine(int)

SymbolInfo is a wrapper for symbols, whose nuances we’ll discuss shortly. We’ll start first with symbols.

Symbols

In the syntax tree, names such as “System”, “Console”, and “WriteLine” are parsed as identifiers (IdentifierNameSyntax node). Identifiers have little meaning, and the syntactic parser does no work on “understanding” them other than to distinguish them from contextual keywords.

The semantic model is able to transform identifiers into symbols, which have type information (the output of the binding phase).

All symbols implement the ISymbol interface, although there are more specific interfaces for each kind of symbol. In our example, “System”, “Console”, and “WriteLine” map to symbols of the following types:

"System"      INamespaceSymbol
"Console"     INamedTypeSymbol
"WriteLine"   IMethodSymbol

Some symbol types, such as IMethodSymbol have a conceptual analog in the System.Reflection namespace (MethodInfo, in this case); whereas some other symbol types, such as INamespaceSymbol, do not. This is because the Roslyn type system exists for the benefit of the compiler, whereas the Reflection type system exists for the benefit of the CLR (after the source code has melted away).

Nonetheless, working with ISymbol types is similar in many ways to using the Reflection API we described in Chapter 19. Extending our previous example:

ISymbol symbol = model.GetSymbolInfo (writeLineNode).Symbol;

Console.WriteLine (symbol.Name);                   // WriteLine
Console.WriteLine (symbol.Kind);                   // Method
Console.WriteLine (symbol.IsStatic);               // True
Console.WriteLine (symbol.ContainingType.Name);    // Console

var method = (IMethodSymbol) symbol;
Console.WriteLine (method.ReturnType.ToString());  // void

The output of the last line illustrates a subtle difference with Reflection. Notice that “void” is in lowercase, which is C# nomenclature (Reflection is language-agnostic). Similarly, calling ToString() on the INamedTypeSymbol for System.Int32 returns “int”. Here’s something else you can’t do with Reflection:

Console.WriteLine (symbol.Language);                // C#
Note

With the syntax trees API, the classes for syntax nodes differ for C# and VB (although they share an abstract SyntaxNode base type). This makes sense because the languages have a different lexical structure. In contrast, ISymbol and its derived interfaces are shared between C# and VB. However, their internal concrete implementations are specific to each language, and the output from their methods and properties reflects language-specific differences.

We can also ask the symbol where it came from:

var location = symbol.Locations.First();
Console.WriteLine (location.Kind);                     // MetadataFile
Console.WriteLine (location.MetadataModule
                   == compilation.References.Single()  // True

If the symbol was defined in our own source code (i.e., a syntax tree), the SourceTree property will return that tree, and SourceSpan will return its location in the tree:

Console.WriteLine (location.SourceTree == null);    // True
Console.WriteLine (location.SourceSpan);            // [0..0)

A partial type may have multiple definitions, in which case it will have multiple Locations.

The following query returns all the overloads of WriteLine:

symbol.ContainingType.GetMembers ("WriteLine").OfType<IMethodSymbol>()

You can also call ToDisplayParts on a symbol. This returns a collection of “parts” that make up the full name; in our case, System.Console.WriteLine(int) is comprised of four symbols interspersed with punctuation.

SymbolInfo

If you’re writing code completion for an editor, you’ll need to obtain symbols for code that’s incomplete or incorrect. For instance, consider the following incomplete code:

System.Console.Writeline(

Because the WriteLine method is overloaded, it’s impossible to match to a single ISymbol. Instead, we want to present options to the user. To deal with this, the semantic model’s GetSymbolInfo method returns an ISymbolInfo struct which has the following properties:

ISymbol Symbol
ImmutableArray<ISymbol> CandidateSymbols
CandidateReason CandidateReason

If there’s an error or ambiguity, the Symbol property returns null, and CandidateSymbols returns a collection comprising the best matches. The CandidateReason property returns an enum telling you what went wrong.

Note

To obtain error and warning information for a section of code, you can also call GetDiagnostics on a semantic model, specifying a TextSpan. Calling GetDiagnostics with no argument is equivalent to calling the same method on the CSharpCompilation object.

Symbol accessibility

ISymbol has a DeclaredAccessibility property that indicates whether the symbol is public, protected, internal, and so on. However, this isn’t sufficient to determine whether a given symbol is accessible at a particular position in your source code. Local variables, for instance, have a lexically limited scope, and a protected class member is accessible from source code positions within its type or a derived type. To help with this, SemanticModel has an IsAccessible method:

bool canAccess = model.IsAccessible (42, someSymbol);

This returns true if someSymbol can be accessed at offset 42 in the source code.

Declared symbols

If you call GetSymbolInfo on a type or member declaration, you’ll get no symbols back. For instance, suppose we want the symbol for our Main method:

var mainMethod = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "Main").Parent;

SymbolInfo symbolInfo = model.GetSymbolInfo (mainMethod);
Console.WriteLine (symbolInfo.Symbol == null);              // True
Console.WriteLine (symbolInfo.CandidateSymbols.Length);     // 0
Note

This applies not just to type/member declarations, but any node where you’re introducing a new symbol rather than consuming an existing symbol.

To obtain the symbol, we must instead call GetDeclaredSymbol:

ISymbol symbol = model.GetDeclaredSymbol (mainMethod);

Unlike GetSymbolInfo, GetDeclaredSymbol either succeeds or it doesn’t. (If it fails, it will because it can’t find a valid declaration node.)

To give another example, suppose our Main method is as follows:

static void Main()
{
  int xyz = 123;
}

We can determine the type of xyz as follows:

SyntaxNode variableDecl = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "xyz").Parent;

var local = (ILocalSymbol) model.GetDeclaredSymbol (variableDecl);
Console.WriteLine (local.Type.ToString());             // int
Console.WriteLine (local.Type.BaseType.ToString());    // System.ValueType

TypeInfo

Sometimes you need type information about an expression or literal for which there’s no explicit symbol. Consider the following:

var now = System.DateTime.Now;
System.Console.WriteLine (now - now);

To determine the type of now - now, we call GetTypeInfo on the semantic model:

SyntaxNode binaryExpr = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "-").Parent;

TypeInfo typeInfo = model.GetTypeInfo (binaryExpr);

TypeInfo has two properties, Type and ConvertedType. The latter indicates the type after any implicit conversions:

Console.WriteLine (typeInfo.Type);             // System.TimeSpan
Console.WriteLine (typeInfo.ConvertedType);    // object

Because Console.WriteLine is overloaded to accept an object but not a TimeSpan, an implicit conversion to object took place, which manifested in typeInfo.ConvertedType.

Looking up symbols

A powerful feature of the semantic model is the ability to ask for all symbols in scope at a particular point in the source code. The result is the basis for IntelliSense listings, when the user requests a list of available symbols.

To obtain the listing, simply call LookupSymbols, with the desired source code offset. To give a complete example:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main()
  {
    int x = 123, y = 234;

  }
}");

CSharpCompilation compilation = CSharpCompilation.Create ("test")
  .AddReferences (
    MetadataReference.CreateFromFile (typeof(int).Assembly.Location))
  .AddSyntaxTrees (tree);

SemanticModel model = compilation.GetSemanticModel (tree);

// Look for available symbols at start of 6th line:
int index = tree.GetText().Lines[5].Start;

foreach (ISymbol symbol in model.LookupSymbols (index))
  Console.WriteLine (symbol.ToString());

Here’s the result:

y
x
Program.Main()
object.ToString()
object.Equals(object)
object.Equals(object, object)
object.ReferenceEquals(object, object)
object.GetHashCode()
object.GetType()
object.~Object()
object.MemberwiseClone()
Program
Microsoft
System
Windows

(If we imported the System namespace, we’d see hundreds more symbols for types in that namespace.)

Example: Renaming a Symbol

To illustrate the features we’ve covered, we’ll write a method to rename a symbol, which is robust to the most common use-cases. In particular:

  • The symbol can be a type, member, local variable, range, or loop variable.

  • You can specify the symbol from either its use or declaration.

  • (In the case of a class or struct), it will rename the static & instance constructors.

  • (In the case of a class), it will rename the finalizer (destructor).

For brevity, we’ll omit some checks, such as ensuring that the new name is not already in use and that the symbol isn’t an edge-case for which the rename will fail. Our method will consider just a single syntax tree and so will have the following signature:

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)

One obvious way to implement this is to subclass CSharpSyntaxRewriter. However, a more elegant and flexible approach is to have RenameSymbol call a lower-level method that returns the text spans to be renamed:

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)

This allows an editor to call GetRenameSpans directly and apply just the changes (within an Undo transaction), avoiding the loss of editor state that might otherwise result in replacing the entire text.

This makes RenameSymbol a relatively simple wrapper around GetRenameSpans. We can use SourceText’s WithChanges method to apply a sequence of text changes:

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)
{
  IEnumerable<TextSpan> renameSpans = GetRenameSpans (model, token);

  SourceText newSourceText = model.SyntaxTree.GetText().WithChanges (
    renameSpans.Select (span => new TextChange (span, newName))
               .OrderBy (tc => tc));

  return model.SyntaxTree.WithChangedText (newSourceText);
}

WithChanges throws an exception unless the changes are in order; this is why we called OrderBy on the latter.

Now we must write GetRenameSpans. The first step is to find the symbol corresponding to the token we want to rename. The token may be part of either a declaration or usage, so we’ll first call GetSymbolInfo, and if the result is null, call GetDeclaredSymbol:

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)
{
  var node = token.Parent;

  ISymbol symbol = model.GetSymbolInfo (node).Symbol
                ?? model.GetDeclaredSymbol (node);

  if (symbol == null) return null;   // No symbol to rename.

Next, we need to find the symbol definitions. We can get this from the symbol’s Locations property. (Our consideration of multiple locations makes us robust to the scenario of partial classes and methods, although for the former to be useful, we would need to expand the example to work with multiple syntax trees.)

var definitions =
  from location in symbol.Locations
  where location.SourceTree == node.SyntaxTree
  select location.SourceSpan;

Now we need to find usages of the symbol. For this, we start by looking for descendant tokens whose name matches the symbol’s name, as this is a fast way to weed out most tokens. Then we can call GetSymbolInfo on the token’s parent node and see whether it matches the symbol we want to rename:

var usages =
  from t in model.SyntaxTree.GetRoot().DescendantTokens()
  where t.Text == symbol.Name
  let s = model.GetSymbolInfo (t.Parent).Symbol
  where s == symbol
  select t.Span;
Note

Binding-related operations, such as asking for symbol information, have a tendency to be slower than operations that consider just text or syntax trees. This is because the process of binding may require searching for types in assemblies, applying type inference rules, and checking for extensions methods.

If the symbol is something other than a named type (local variable, range variable, etc.), our job is done and we can return the definitions plus usages:

if (symbol.Kind != SymbolKind.NamedType)
  return definitions.Concat (usages);

If the symbol is a named type, we need to rename its constructors and destructor, if present. To do so, we enumerate the descendant nodes, looking for type declarations whose name matches the one we want to rename. Then we get its declared symbol, and if it matches the one we’re renaming, we locate its constructor and destructor methods, returning the spans of their identifiers if present:

  var structors =
    from type in model.SyntaxTree.GetRoot().DescendantNodes()
                                           .OfType<TypeDeclarationSyntax>()
    where type.Identifier.Text == symbol.Name
    let declaredSymbol = model.GetDeclaredSymbol (type)
    where declaredSymbol == symbol
    from method in type.Members
    let constructor = method as ConstructorDeclarationSyntax
    let destructor = method as DestructorDeclarationSyntax
    where constructor != null || destructor != null
    let identifier = constructor?.Identifier ?? destructor.Identifier
    select identifier.Span;

  return definitions.Concat (usages).Concat (structors);
}

Here’s the complete listing, along with an example of how to use it:

void Demo()
{
  var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static Program() {}
  public Program() {}

  static void Main()
  {
    Program p = new Program();
    p.Foo();
  }

  static void Foo() => Bar();
  static void Bar() => Foo();
}
");

  var compilation = CSharpCompilation.Create ("test")
    .AddReferences (
       MetadataReference.CreateFromFile (typeof(int).Assembly.Location))
    .AddSyntaxTrees (tree);

  var model = compilation.GetSemanticModel (tree);

  var tokens = tree.GetRoot().DescendantTokens();

  // Rename the Program class to Program2:
  SyntaxToken program = tokens.First (t => t.Text == "Program");
  Console.WriteLine (RenameSymbol (model, program, "Program2").ToString());

  // Rename the Foo method to Foo2:
  SyntaxToken foo = tokens.Last (t => t.Text == "Foo");
  Console.WriteLine (RenameSymbol (model, foo, "Foo2").ToString());

  // Rename the p local variable to p2:
  SyntaxToken p = tokens.Last (t => t.Text == "p");
  Console.WriteLine (RenameSymbol (model, p, "p2").ToString());
}

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)
{
  IEnumerable<TextSpan> renameSpans =
    GetRenameSpans (model, token).OrderBy (s => s);

  SourceText newSourceText = model.SyntaxTree.GetText().WithChanges (
    renameSpans.Select (s => new TextChange (s, newName)));

  return model.SyntaxTree.WithChangedText (newSourceText);
}

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)
{
  var node = token.Parent;

  ISymbol symbol =
    model.GetSymbolInfo (node).Symbol ??
    model.GetDeclaredSymbol (node);

  if (symbol == null) return null;   // No symbol to rename.

  var definitions =
    from location in symbol.Locations
    where location.SourceTree == node.SyntaxTree
    select location.SourceSpan;

  var usages =
    from t in model.SyntaxTree.GetRoot().DescendantTokens ()
    where t.Text == symbol.Name
    let s = model.GetSymbolInfo (t.Parent).Symbol
    where s == symbol
    select t.Span;

  if (symbol.Kind != SymbolKind.NamedType)
    return definitions.Concat (usages);

  var structors =
    from type in model.SyntaxTree.GetRoot().DescendantNodes()
                                           .OfType<TypeDeclarationSyntax>()
    where type.Identifier.Text == symbol.Name
    let declaredSymbol = model.GetDeclaredSymbol (type)
    where declaredSymbol == symbol
    from method in type.Members
    let constructor = method as ConstructorDeclarationSyntax
    let destructor = method as DestructorDeclarationSyntax
    where constructor != null || destructor != null
    let identifier = constructor?.Identifier ?? destructor.Identifier
    select identifier.Span;

  return definitions.Concat (usages).Concat (structors);
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset