Chapter 27. The Roslyn Compiler

The C# compiler is itself written in C# and available as a set of modular libraries known as Roslyn. By referencing these libraries, you can utilize the compiler’s functionality in many ways besides compiling source code to an assembly. For example, you can write static code analysis and refactoring tools, editors with syntax highlighting and code completion, and Visual Studio plug-ins that understand C# code.

You can download the Roslyn libraries from NuGet, and there are packages for both C# and Visual Basic. Because both languages share some architecture, there are common dependencies. The NuGet package ID for the C# compiler libraries is Microsoft.CodeAnalysis.CSharp.

Roslyn’s GitHub site also includes documentation, examples, and walkthroughs that demonstrate code analysis and refactoring.

Roslyn Architecture

The Roslyn architecture separates compilation into three phases:

  1. Parsing code into syntax trees (the syntactic layer)

  2. Binding identifiers to symbols (the semantic layer)

  3. Emitting Intermediate Language (IL)

In the first phase, a parser reads C# code and outputs syntax trees. A syntax tree is a Document Object Model (DOM) that describes source code in tree structure.

The second phase is the one in which C#’s static binding takes place. Assembly references are read, and the compiler determines, for instance, that “Console” refers to System.Console in System.Console.dll. Overload resolution and type inference are a part of this, too.

The third phase produces the output assembly. If you plan to use Roslyn for code analysis or refactoring, you won’t use this functionality.

Visual Studio’s editor uses the output of the syntactic layer to color keywords, strings, comments, and disabled code (in blue, red, green, and gray, respectively), whereas it uses the output of the semantic layer to color resolved type names (in turquoise).

Workspaces

In this chapter, we describe the compiler and the features it exposes. It’s worth keeping in mind that there are additional “layers” above the compiler, including workspaces and features.

The workspaces layer is shipped in the Microsoft.CodeAnalysis.CSharp.Workspaces NuGet package and provides APIs to work with solutions, projects, and documents.

The features layer is shipped in Microsoft.CodeAnalysis.CSharp.Features and includes numerous APIs for code analysis and refactoring.

Scripting

With the Microsoft.CodeAnalysis.CSharp.Scripting NuGet package, you can write code such as the following:

int result = (int) await CSharpScript.EvaluateAsync ("1 + 2");

Behind the scenes, the scripting API compiles “1 + 2” into a program that it then executes, so it’s less efficient than the solution that we described in Chapter 20 (see “Interoperating with Dynamic Languages”). There are more examples on how to use the Roslyn scripting API at https://github.com/dotnet/roslyn/wiki/Scripting-API-Samples.

Syntax Trees

A syntax tree is a DOM for source code. The syntax tree API is completely separate from the System.Linq.Expressions API we discussed in “Expression Trees” in Chapter 8, although the two have conceptual similarities. Both APIs can represent C# expressions in a DOM; however, a Roslyn syntax tree has the following unique features:

  • It can represent the entire C# language, not just expressions.

  • It can include comments, whitespace, and other “trivia” and can round-trip with full fidelity back to the original source code.

  • It comes with a ParseText method that parses source code into a syntax tree.

Conversely, the System.Linq.Expressions API has the following unique features:

  • It’s built into .NET Core, and the C# compiler itself is programmed to emit System.Linq.Expression types when it encounters a lambda expression with an assignment conversion to Expression<T>.

  • It has a fast and lightweight Compile method that emits a delegate. In contrast, the semantic layer that compiles Roslyn syntax trees offers only the heavyweight option of compiling a complete program into an assembly.

Something that both APIs have in common is that syntax trees are immutable, so none of its elements can be altered after it’s created. This means that applications such as Visual Studio and LINQPad must create a new syntax tree each time you press a key in the editor in order to update syntax highlighting and autocompletion services. This is less expensive than it sounds because the new syntax tree is able to reuse most of the elements of the old (see “Transforming a Syntax Tree”). And knowing that an object cannot change makes the API simpler to work with. It also allows for easier and faster parallelization because multithreaded code can safely access all parts of a syntax tree without locks.

SyntaxTree Structure

A SyntaxTree comprises three main elements:

Nodes
(Abstract SyntaxNode class) Represents C# constructs such as expressions, statements, method declarations. Nodes always have at least one child, so a node can never be a leaf in the tree. Nodes can have both nodes and tokens as children.
Tokens
(SyntaxToken struct) Represents the identifiers, keywords, operators, and punctuation that make up your source code. The only kind of children that tokens can have is optional leading and trailing trivia. A token’s parent is always a node.
Trivia
(SyntaxTrivia struct) Trivia is for whitespace, comments, preprocessor directives, and code that’s inactive due to conditional compilation. Trivia is always associated with the token that’s immediately to its left or right, and is exposed via that token’s TrailingTrivia and LeadingTrivia properties, respectively.

Figure 27-1 shows the structure of the following code, with nodes in black, tokens in gray, and trivia in white:

Console.WriteLine ("Hello");
Syntax trees
Figure 27-1. Syntax trees

SyntaxNode is abstract and has a C#-specific subclass for each kind of syntactic element, such as VariableDeclarationSyntax or TryStatementSyntax.

SyntaxToken/SyntaxTrivia are structs, and so a single type represents every kind of token/trivia. To distinguish different kinds of token or trivia, you must use the RawKind property or Kind extension method (which we explain in the following section).

Note

The best way to explore a syntax tree is with a visualizer. Visual Studio has a downloadable visualizer for use with its debugger, and LINQPad has one built in. LINQPad displays the visualizer automatically for the code in the text editor when you click the Tree button in the output window. You can also ask LINQPad to display a visualizer for a syntax tree that you’ve created programmatically by calling DumpSyntaxTree on the tree (or DumpSyntaxNode on a node).

Common properties and methods

Nodes, tokens, and trivia have a number of important common properties and methods:

SyntaxTree property
Returns the syntax tree to which the object belongs.
Span property
Returns the object’s position in source code (see “Finding a child by its offset”).
Kind extension method
Returns a SyntaxKind enum that classifies the node, token, or trivia into one of several hundred values (e.g., IntKeyword, CommaToken, and Whitespace​Trivia). The same SyntaxKind enum covers nodes, tokens, and trivia.
ToString method
Returns the text (source code) for the node, token, or trivia. For tokens, the Text property is equivalent.
GetDiagnostics method
Returns errors or warnings generated during parsing.
IsEquivalentTo method
Returns true if the object is identical to another node, token, or trivia instance. Whitespace differences are significant (to ignore whitespace, call Normalize​Whitespace before comparing).
Note

Nodes and tokens also have a FullSpan property and ToFullString method. These take into account trivia, whereas Span and ToString do not.

The Kind extension method is a shortcut for casting the RawKind property, which is of type int, to Microsoft.CodeAnalysis.CSharp.SyntaxKind. The reason for not simply having a Kind property of type SyntaxKind is that the token and trivia types are also used in Visual Basic syntax trees, which has a different enum type for SyntaxKind.

Obtaining a Syntax Tree

The static ParseText method on CSharpSyntaxTree parses C# code into a SyntaxTree:

SyntaxTree tree = CSharpSyntaxTree.ParseText (@"class Test
{
  static void Main() => Console.WriteLine (""Hello"");
}");

Console.WriteLine (tree.ToString());

tree.DumpSyntaxTree();    // Displays Syntax Tree Visualizer in LINQPad

To run this in a Visual Studio project, install the Microsoft.CodeAnalysis.CSharp NuGet package, and import the following namespaces:

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

You can optionally pass in a CSharpParseOptions object to specify a C# language version, preprocessor symbols, and a DocumentationMode to indicate whether XML comments should be parsed (see “Structured trivia”). There’s also an option to specify a SourceCodeKind. Choosing Script instructs the parser to accept a single expression or statement(s) instead of requiring an entire program (supported in Roslyn version 2 and later).

Another way to obtain a syntax tree is to call CSharpSyntaxTree.Create, passing in an object graph of nodes and tokens. We describe how to create these objects in “Transforming a Syntax Tree”.

After parsing a tree, you can obtain errors and warnings by calling GetDiagnostics. (You can also call this method on a specific node or token.)

Note

If the parse resulted in unexpected errors, the tree’s structure may not be as you expect. For this reason, it’s worth calling GetDiagnostics before proceeding further.

A nice feature is that a tree with errors will round-trip back to the original text (with the same errors). In such cases, the parser does its best to provide a syntax tree that’s useful to the semantic layer, creating “phantom nodes” if necessary. This allows tools such as code completion to work with incomplete code. (You can determine whether a node is phantom by checking the IsMissing property.)

Calling GetDiagnostics on the syntax tree we created in the last section indicates no errors, despite having called Console.WriteLine without importing the System namespace. This is a good example of syntactic versus semantic parsing: our program is syntactically correct, and our error will not manifest until we create a compilation, add assembly references, and query the semantic model, where binding takes place.

Traversing and Searching a Tree

A SyntaxTree acts as a wrapper for the tree structure. It has a reference to a single root node, which you obtain by calling GetRoot:

var tree = CSharpSyntaxTree.ParseText (@"class Test
{
  static void Main() => Console.WriteLine (""Hello"");
}");

SyntaxNode root = tree.GetRoot();

The root node of a C# program is a CompilationUnitSyntax:

Console.WriteLine (root.GetType().Name);   // CompilationUnitSyntax

Traversing children

SyntaxNode exposes LINQ-friendly methods to traverse its child nodes and tokens. Here are the simplest:

IEnumerable<SyntaxNode> ChildNodes()
IEnumerable<SyntaxToken> ChildTokens()

Following on from our previous example, our root node has a single child node of type ClassDeclarationSyntax:

var cds = (ClassDeclarationSyntax) root.ChildNodes().Single();

We can enumerate the members of cds via either its ChildNodes method or the Members property of ClassDeclarationSyntax:

foreach (MemberDeclarationSyntax member in cds.Members)
  Console.WriteLine (member.ToString());

with the following result:

static void Main() => Console.WriteLine (""Hello"");

There are also Descendant* methods that descend recursively into children. We can enumerate the tokens that make up our program as follows:

foreach (var token in root.DescendantTokens())
  Console.WriteLine ($"{token.Kind(),-30} {token.Text}");

Here’s the result:

ClassKeyword                   class
IdentifierToken                Test
OpenBraceToken                 {
StaticKeyword                  static
VoidKeyword                    void
IdentifierToken                Main
OpenParenToken                 (
CloseParenToken                )
EqualsGreaterThanToken         =>
IdentifierToken                Console
DotToken                       .
IdentifierToken                WriteLine
OpenParenToken                 (
StringLiteralToken             "Hello"
CloseParenToken                )
SemicolonToken                 ;
CloseBraceToken                }
EndOfFileToken

Notice that there’s no whitespace in the result. Replacing token.Text with token.ToFullString() would give us whitespace (and any other trivia).

The following uses the DescendantNodes method to locate the syntax node for our method declaration:

var ourMethod = root.DescendantNodes()
                    .First (m => m.Kind() == SyntaxKind.MethodDeclaration);

Or, alternatively:

var ourMethod = root.DescendantNodes()
                    .OfType<MethodDeclarationSyntax>()
                    .Single();

With the latter example, ourMethod is of type MethodDeclarationSyntax, which exposes useful properties specific to method declarations. For instance, if our example contained more than one method definition and we wanted to find just the method whose name is “Main,” we could do this:

var mainMethod = root.DescendantNodes()
                     .OfType<MethodDeclarationSyntax>()
                     .Single (m => m.Identifier.Text == "Main");

Identifier is a property on MethodDeclarationSyntax that returns the token corresponding to the method’s identifier (i.e., its name). We could get the same result with more effort, as follows:

root.DescendantNodes().First (m =>
  m.Kind() == SyntaxKind.MethodDeclaration &&
  m.ChildTokens().Any (t =>
    t.Kind() == SyntaxKind.IdentifierToken && t.Text == "Main"));

SyntaxNode also has GetFirstToken and GetLastToken methods, which are equivalent to calling DescendantTokens().First() and DescendantTokens().Last().

Note

GetLastToken() is faster than DescendantTokens().Last() because it returns a direct link rather than enumerating through all descendants.

As nodes can contain both child nodes and tokens whose relative order is significant, there are also methods to enumerate both together:

ChildSyntaxList ChildNodesAndTokens()
IEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokens()
IEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokensAndSelf()

(ChildSyntaxList implements IEnumerable<SyntaxNodeOrToken> while also exposing a Count property and an indexer to access an element by position.)

You can traverse trivia directly from a node with the GetLeadingTrivia, Get​TrailingTrivia, and DescendantTrivia methods. More commonly, though, you’d access trivia through the token to which it’s attached via the token’s LeadingTrivia and TrailingTrivia properties. Or, to convert to text, you’d use the ToFullString method, which includes trivia in the result.

Traversing parents

Nodes and tokens have a Parent property of type SyntaxNode.

For SyntaxTrivia, the “parent” is its token, accessible via the Token property.

Nodes also have methods that ascend back up the tree; these are prefixed with Ancestor.

Finding a child by its offset

All nodes, tokens, and trivia have a Span property of type TextSpan to indicate starting and ending offsets in the source code. Nodes and tokens also have a FullSpan property that includes leading and trailing trivia (whereas Span does not). A node’s Span does, however, include child nodes and tokens.

You can find a descendant object by position by calling the FindNode, FindToken, and FindTrivia methods on SyntaxNode. These methods return the descendant object with the smallest span that fully contains the span that you specify. There’s also a ChildThatContainsPosition method that searches both descendant nodes and tokens.

Should a search result in two nodes with an identical span (typically a child and grandchild), the FindNode method will return the outer (parent) node. You can change this behavior by passing true to the optional argument getInnermostNodeForTie.

The Find* methods also have an optional findInsideTrivia bool parameter. If true, this also searches for nodes or tokens within structured trivia (see “Trivia”).

CSharpSyntaxWalker

Another way to traverse a tree is by subclassing CSharpSyntaxWalker, overriding one or more of its hundreds of virtual methods. This following class counts the number of if statements:

class IfCounter : CSharpSyntaxWalker
{
  public int IfCount { get; private set; }

  public override void VisitIfStatement (IfStatementSyntax node)
  {
    IfCount++;
    // Call the base method if you want to descend into children.
    base.VisitIfStatement (node);
  }
}

Here’s how to invoke it:

var ifCounter = new IfCounter ();
ifCounter.Visit (root);
Console.WriteLine ($"I found {ifCounter.IfCount} if statements");

The result is equivalent to the following:

root.DescendantNodes().OfType<IfStatementSyntax>().Count()

Writing a syntax walker can be easier than using the Descendant* methods in more complex cases when you need to override multiple methods (in part, because C# has no F#-like pattern matching ability).

By default, CSharpSyntaxWalker visits just nodes. To visit tokens or trivia, you must call the base constructor with a SyntaxWalkerDepth, indicating the desired depth (nodetokentrivia). Then, you can override VisitToken and VisitTrivia:

class WhiteWalker : CSharpSyntaxWalker   // Counts space characters
{
  public int SpaceCount { get; private set; }

  public WhiteWalker() : base (SyntaxWalkerDepth.Trivia) { }

  public override void VisitTrivia (SyntaxTrivia trivia)
  {
    SpaceCount += trivia.ToString().Count (char.IsWhiteSpace);
    base.VisitTrivia (trivia);
  }
}

If you remove WhiteWalker’s call to the base constructor, VisitTrivia will not fire.

Trivia

Trivia is for code that, after parsing, the compiler can almost entirely ignore in terms of producing an output assembly. This comprises whitespace, comments, XML documentation, preprocessor directives, and code that’s inactive by virtue of conditional compilation.

The mandatory whitespace in your code is also considered trivia. Although essential for parsing, it’s not needed once the syntax tree has been produced (at least by the compiler). Trivia is still important for round-tripping back to the original source code.

Trivia belongs to the token to which it’s adjacent. By convention, the parser puts whitespace and comments that follow a token, up to the end of the line, into the token’s trailing trivia. Anything after that, it treats as leading trivia for the next token. (There are exceptions for the very start/end of the file.) If you’re creating tokens programmatically (see “Transforming a Syntax Tree”), you can put the whitespace in either place (or not at all, if you’re not going to convert back to source code):

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
    static /*comment*/ void Main() {}
}");

SyntaxNode root = tree.GetRoot();

// Find the static keyword token:
var method = root.DescendantTokens().Single (t =>
  t.Kind() == SyntaxKind.StaticKeyword);

// Print out the trivia around the static keyword token:
foreach (SyntaxTrivia t in method.LeadingTrivia)
  Console.WriteLine (new { Kind = "Leading " + t.Kind(), t.Span.Length });

foreach (SyntaxTrivia t in method.TrailingTrivia)
  Console.WriteLine (new { Kind = "Trailing " + t.Kind(), t.Span.Length });

Here’s the output:

{ Kind = Leading WhitespaceTrivia, Length = 1 }
{ Kind = Trailing WhitespaceTrivia, Length = 1 }
{ Kind = Trailing MultiLineCommentTrivia, Length = 11 }
{ Kind = Trailing WhitespaceTrivia, Length = 1 }

Preprocessor directives

It might seem odd that preprocessor directives are considered trivia given that some directives (in particular, conditional compilation directives) have a nontrivial effect on the output.

The reason is that preprocessor directives are processed semantically by the parser itself; that is, it’s the parser’s job to do the preprocessing. After which, there’s nothing left that the compiler need explicitly consider (except for #pragma). To illustrate, let’s examine how the parser handles conditional compilation directives:

#define FOO

#if FOO
    Console.WriteLine ("FOO is defined");
#else
    Console.WriteLine ("FOO is not defined");
#endif

Upon reading the #if FOO directive, the parser knows that FOO is defined, and so the line that follows is parsed normally (as nodes and tokens), whereas the line of code following the #else directive is parsed into DisabledTextTrivia.

Note

When calling CSharpSyntaxTree.Parse, you can supply additional preprocessor symbols by constructing and passing in a CSharpParseOptions instance.

Hence, with conditional compilation, it is precisely the text that can be ignored that ends up in trivia (i.e., the inactive code and the preprocessor directives themselves).

The #line directive is handled similarly, in that the parser reads and interprets the directive. The information that it harvests is used when you call GetMappedLineSpan on the syntax tree.

The #region directive is semantically empty: the only role of the parser is to check that #region directives are matched with #endregion directives. The #error and #warning directives are also processed by the parser, which generates errors and warnings that you can see by calling GetDiagnostics on the tree or node.

It can still be useful to examine the content of preprocessor directives for purposes other than producing the output assembly (syntax highlighting, for instance). This is made easier through structured trivia.

Structured trivia

There are two kinds of trivia:

Unstructured trivia
Comments, whitespace, and code that’s inactive due to conditional compilation
Structured trivia
Preprocessor directives and XML documentation

Unstructured trivia is treated purely as text, whereas structured trivia also has its content parsed into a miniature syntax tree.

The HasStructure property on SyntaxTrivia indicates whether structured trivia is present, and the GetStructure method returns the root node for the miniature syntax tree:

var tree = CSharpSyntaxTree.ParseText (@"#define FOO");

// In LINQPad:
tree.DumpSyntaxTree();  // LINQPad displays structured trivia in Visualizer

SyntaxNode root = tree.GetRoot();

var trivia = root.DescendantTrivia().First();
Console.WriteLine (trivia.HasStructure);           // True
Console.WriteLine (trivia.GetStructure().Kind());  // DefineDirectiveTrivia

In the case of preprocessor directives, you can navigate directly to the structured trivia by calling GetFirstDirective on a SyntaxNode. There’s also a Contains​Directives property to indicate whether preprocessor trivia is present:

var tree = CSharpSyntaxTree.ParseText (@"#define FOO");
SyntaxNode root = tree.GetRoot();

Console.WriteLine (root.ContainsDirectives);      // True

// directive is the root node of the structured trivia:
var directive = root.GetFirstDirective();
Console.WriteLine (directive.Kind());             // DefineDirectiveTrivia
Console.WriteLine (directive.ToString());         // #define FOO

// If there were more directives, we could get to them as follows:
Console.WriteLine (directive.GetNextDirective());    // (null)

After we have a trivia node, we can cast it to a specific type and query its properties, just as we would with any other node:

var hashDefine = (DefineDirectiveTriviaSyntax) root.GetFirstDirective();
Console.WriteLine (hashDefine.Name.Text);     // FOO
Note

All nodes, tokens, and trivia have the IsPartOfStructured​Trivia property to indicate whether the object in question is part of a structured trivia tree (i.e., descends from a trivia object).

Transforming a Syntax Tree

You can “modify” nodes, tokens, and trivia via a set of methods with the following prefixes (most of which are extension methods):

Add*
Insert*
Remove*
Replace*
With*
Without*

Because syntax trees are immutable, all of these methods return a new object with the desired modifications, leaving the original untouched.

Handling changes to the source code

If you’re writing a C# editor, for instance, you’ll need to update a syntax tree based on changes to the source code. The SyntaxTree class has a WithChangedText method that does exactly this: it partially reparses the source code based on modifications that you describe with a SourceText instance (in Microsoft.CodeAnalysis​.Text).

To create a SourceText, use its static From method, giving it the complete source code. You then can use this to create a syntax tree:

SourceText sourceText = SourceText.From ("class Program {}");
var tree = CSharpSyntaxTree.ParseText (sourceText);

Alternatively, you can obtain the SourceText for an existing tree by calling GetText.

You now can “update” sourceText by calling Replace or WithChanges. For example, we could replace the first five characters (class) with struct, as follows:

var newSource = sourceText.Replace (0, 5, "struct");

Finally, we can call WithChangedText on the tree to update it:

var newTree = tree.WithChangedText (newSource);
Console.WriteLine (newTree.ToString());         // struct Program {}

Creating new nodes, tokens, and trivia with SyntaxFactory

The static methods on SyntaxFactory programmatically create nodes, tokens, and trivia, which you can use to “transform” existing syntax trees or to create new trees from scratch.

The most difficult part of doing this is establishing exactly what kind of nodes and tokens to create. The solution is to first parse a sample of the code you want, examining the result in a syntax visualizer. For instance, suppose that we want to create a syntax node for the following:

using System.Text;

We can visualize the syntax tree for this in LINQPad, as follows:

CSharpSyntaxTree.ParseText ("using System.Text;").DumpSyntaxTree();

(We can parse using System.Text; without error because it’s valid as a complete program, albeit a functionally empty one. For most other code snippets, you’ll need to wrap the snippet in a method and/or type definition so that it will parse.)

The result has the following structure, of which we are interested in the second node—UsingDirective and its descendants:

Kind                               Token Text
=================================  ==========
CompilationUnit (node)
  UsingDirective (node)
    UsingKeyword (token)           using
      WhitespaceTrivia (trailing)
    QualifiedName (node)
      IdentifierName (node)
        IdentifierToken (token)    System
      DotToken (token)             .
      IdentifierName (node)
        IdentifierToken (token)    Text
    SemiColonToken (token)         ;
  EndOfFileToken (token)

Starting from the inside, we have two IdentifierName nodes, whose parent is a QualifiedName. We can create that as follows:

QualifiedNameSyntax qualifiedName = SyntaxFactory.QualifiedName (
  SyntaxFactory.IdentifierName ("System"),
  SyntaxFactory.IdentifierName ("Text"));

We used the overload of QualifiedName that accepts two identifiers. This overload inserts the dot token for us automatically.

We now need to wrap this in a UsingDirective:

UsingDirectiveSyntax usingDirective =
  SyntaxFactory.UsingDirective (qualifiedName);

Because we didn’t specify tokens for the using keyword or the trailing semicolon, tokens for each were automatically created and added. However, the automatically created tokens don’t include whitespace. This wouldn’t prevent compilation, but converting the tree to a string would result in syntactically incorrect code:

Console.WriteLine (usingDirective.ToFullString());  // usingSystem.Text;

We can fix this by calling NormalizeWhitespace on the node (or one of its ancestors); doing so automatically adds whitespace trivia (for both syntactic correctness and readability). Or for more control, we could add whitespace explicitly:

usingDirective = usingDirective.WithUsingKeyword (
  usingDirective.UsingKeyword.WithTrailingTrivia (
    SyntaxFactory.Whitespace (" ")));

Console.WriteLine (usingDirective.ToFullString());  // using System.Text;

For brevity, we “harvested” the node’s existing UsingKeyword to which we added trailing trivia. We could have created an equivalent token with more effort by calling SyntaxFactory.Token(SyntaxKind.UsingKeyword).

The final step is to add our UsingDirective node to an existing or new syntax tree (or more precisely, the root node of a tree). To do the former, we cast the existing tree’s root to a CompilationUnitSyntax and call the AddUsings method. We then can create a new tree from the transformed compilation unit:

var existingTree = CSharpSyntaxTree.ParseText ("class Program {}");
var existingUnit = (CompilationUnitSyntax) existingTree.GetRoot();

var unitWithUsing = existingUnit.AddUsings (usingDirective);

var treeWithUsing = CSharpSyntaxTree.Create (
  unitWithUsing.NormalizeWhitespace());
Note

Remember that all parts of a syntax tree are immutable. Calling AddUsings returns a new node, leaving the original untouched. Ignoring the return value is an easy mistake to make!

We called NormalizeWhitespace on our compilation unit so that calling ToString on the tree will yield syntactically correct and readable code. Alternatively, we could have added explicit newline trivia to usingDirective, as follows:

.WithTrailingTrivia (SyntaxFactory.EndOfLine("

"))

Creating a compilation unit and syntax tree from scratch is a similar process. The easiest approach is to start with an empty compilation unit and call AddUsings on the unit as we did before:

var unit = SyntaxFactory.CompilationUnit().AddUsings (usingDirective);

We can add type definitions to our compilation unit by creating them in a similar fashion, and then calling AddMembers:

// Create a simple empty class definition:
unit = unit.AddMembers (SyntaxFactory.ClassDeclaration ("Program"));

The final step is to create the tree:

var tree = CSharpSyntaxTree.Create (unit.NormalizeWhitespace());
Console.WriteLine (tree.ToString());

// Output:
using System.Text;

class Program
{
}

CSharpSyntaxRewriter

For more complex syntax tree transformations, you can subclass CSharpSyntax​Rewriter.

CSharpSyntaxRewriter is similar to the CSharpSyntaxWalker class that we looked at previously (see “CSharpSyntaxWalker”) except that each Visit* method accepts and returns a syntax node. By returning something other than was passed in, you can “rewrite” the syntax tree.

For instance, the following rewriter changes method declaration names to uppercase:

class MyRewriter : CSharpSyntaxRewriter
{
  public override SyntaxNode VisitMethodDeclaration
    (MethodDeclarationSyntax node)
  {
    // "Replace" the method's identifier with an uppercase version:
    return node.WithIdentifier (
      SyntaxFactory.Identifier (
        node.Identifier.LeadingTrivia,            // Preserve old trivia
        node.Identifier.Text.ToUpperInvariant(),
        node.Identifier.TrailingTrivia));         // Preserve old trivia
  }
}

Here’s how to use it:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() { Test(); }
  static void Test() {         }
}");

var rewriter = new MyRewriter();
var newRoot = rewriter.Visit (tree.GetRoot());
Console.WriteLine (newRoot.ToFullString());

// Output:
class Program
{
  static void MAIN() { Test(); }
  static void TEST() {         }
}

Notice that our call to Test() in the main method did not get renamed, because we visited just member declarations and ignored invocations. To reliably rename invocations, however, we must be able to determine whether calls to Main() or Test() refer to the Program type, and not some other type. To do this, a syntax tree is not enough on its own; we also need a semantic model.

Compilations and Semantic Models

A compilation comprises syntax trees, references, and compilation options. It serves two purposes:

  • Allows compilation to a library or executable (the emit phase).

  • Exposes a semantic model that provides symbol information (obtained from binding).

The semantic model is essential in implementing features such as symbol renaming, or offering code completion listings in an editor.

Creating a Compilation

Whether you’re interested in querying the semantic model or performing a full compilation, the first step is to create a CSharpCompilation, passing in the (simple) name of the assembly that you want to create:

var compilation = CSharpCompilation.Create ("test");

An assembly’s simple name is important even if you don’t plan to emit an assembly, because it forms part of the identity of the types inside the compilation.

By default, it assumes that you want to create a library. You can specify a different kind of output (windows executable, console executable, etc.) as follows:

compilation = compilation.WithOptions (
  new CSharpCompilationOptions (OutputKind.ConsoleApplication));

The CSharpCompilationOptions class has more than a dozen optional constructor parameters for options that you can pass to the compiler. For example, to enable compiler optimizations, you would do this:

compilation = compilation.WithOptions (
  new CSharpCompilationOptions (OutputKind.ConsoleApplication,
    optimizationLevel:OptimizationLevel.Release));

Next, let’s add syntax trees. Each syntax tree corresponds to a “file” to be included in the compilation:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() => System.Console.WriteLine (""Hello"");
}");

compilation = compilation.AddSyntaxTrees (tree);

Finally, we need to reference the .NET Core assemblies. Because it’s difficult to know exactly what combination of assemblies are required, it’s easiest to reference them all. The following code returns all the .NET Core assemblies (plus any that the calling application references):

string trustedAssemblies = (string)AppContext.GetData
                           ("TRUSTED_PLATFORM_ASSEMBLIES");
string[] trustedAssemblyPaths = trustedAssemblies.Split(Path.PathSeparator);
Note

Note that this returns runtime assemblies, which are specific to the current platform and .NET Core version. If you’re planning to use Roslyn to compile libraries that will work correctly across different platforms and .NET Core versions, you should use reference assemblies instead. The reference assemblies are available in the NuGet package Microsoft.NETCore.app.ref (for .NET Core), Microsoft.AspNetCore.App.ref (for ASP.NET Core), and Microsoft.WindowsDesktop.app.ref (for Windows Forms/WPF).

We then can add the references to the compilation, as follows:

var references = trustedAssemblyPaths.Select
  (path => MetadataReference.CreateFromFile (path));

compilation = compilation.AddReferences (references);

The call to MetadataReference.CreateFromFile reads the content of an assembly into memory, but not using ordinary reflection. Instead, it uses a high-performance assembly reader (System.Reflection.Metadata), which avoids creating an Assembly object. (Creating an Assembly object would be slow and result in the assembly file being locked until the process exited.)

Note

The PortableExecutableReference that you get back from MetadataReference.CreateFromFile can end up with a significant memory footprint, so be careful about holding on to references that you don’t need. Also, if you find yourself repeatedly creating references to the same assembly, a cache is worth considering (one that holds weak references is ideal).

You can do everything in a single step by calling the overload of CSharp​Compilation.Create that takes syntax trees, references, and options. Or you can do it fluently in a single expression, too:

var compilation = CSharpCompilation.Create ("...")
  .WithOptions (...)
  .AddSyntaxTrees (...)
  .AddReferences (...);

Diagnostics

A compilation can generate errors and warnings even if the syntax trees are error free. Examples include forgetting to import a namespace, a typo when referring to a type or member name, and type parameter inference failing. You can get the errors and warnings by calling GetDiagnostics on the compilation object. Any syntax errors will be included, too.

Emitting an Assembly

Creating an output assembly is simply a matter of calling Emit:

EmitResult result = compilation.Emit (@"c:	emp	est.dll");
Console.WriteLine (result.Success);

If result.Success is false, EmitResult also has a Diagnostics property to indicate the errors that occurred during emission (this also includes diagnostics from the previous stages). If Emit fails due to a file I/O error, it will throw an exception rather than generate error codes.

With .NET Core, you must specify a .dll extension even for Console or Windows applications. To run the application, you then call dotnet.exe with the path to your .dll.

The Emit method also lets you specify a .pdb file path (for debug information), and an XML documentation file path.

Querying the Semantic Model

Calling GetSemanticModel on a compilation returns the semantic model for a syntax tree:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main() => System.Console.WriteLine (123);
}");

var references = ((string)AppContext.GetData("TRUSTED_PLATFORM_ASSEMBLIES"))
  .Split (Path.PathSeparator)
  .Select (path => MetadataReference.CreateFromFile (path));

var compilation = CSharpCompilation.Create ("test")
  .AddReferences (references)
  .AddSyntaxTrees (tree);

SemanticModel model = compilation.GetSemanticModel (tree);

(The reason for needing to specify a tree is that a compilation can contain multiple trees.)

You might expect a semantic model to be similar to a syntax tree, but with more properties and methods and a more detailed structure. This is not the case and there is no overarching DOM associated with the semantic model. Instead, you’re given a set of methods to call to obtain semantic information about a particular position or node in the syntax tree.

This means that you can’t “explore” a semantic model like you would a syntax tree, and using it is rather like playing “20 Questions”: the challenge is figuring out the right questions to ask. There are nearly 50 methods and extension methods; in this section, we’ll cover some of the most commonly used methods, in particular, those that demonstrate the principles of using the semantic model.

Following on from our previous example, we could ask for symbol information on the WriteLine identifier, as follows:

var writeLineNode = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "WriteLine").Parent;

SymbolInfo symbolInfo = model.GetSymbolInfo (writeLineNode);
Console.WriteLine (symbolInfo.Symbol);   // System.Console.WriteLine(int)

SymbolInfo is a wrapper for symbols, whose nuances we discuss shortly. We begin first with symbols.

Symbols

In the syntax tree, names such as System, Console, and WriteLine are parsed as identifiers (IdentifierNameSyntax node). Identifiers have little meaning, and the syntactic parser does no work on “understanding” them other than to distinguish them from contextual keywords.

The semantic model is able to transform identifiers into symbols, which have type information (the output of the binding phase).

All symbols implement the ISymbol interface, although there are more specific interfaces for each kind of symbol. In our example, System, Console, and WriteLine map to symbols of the following types:

System      INamespaceSymbol
Console     INamedTypeSymbol
WriteLine   IMethodSymbol

Some symbol types, such as IMethodSymbol, have a conceptual analog in the System.Reflection namespace (MethodInfo, in this case), whereas some other symbol types, such as INamespaceSymbol, do not. This is because the Roslyn type system exists for the benefit of the compiler, whereas the Reflection type system exists for the benefit of the CLR (after the source code has melted away).

Nonetheless, working with ISymbol types is similar in many ways to using the Reflection API we described in Chapter 19. Let’s extend our previous example:

ISymbol symbol = model.GetSymbolInfo (writeLineNode).Symbol;

Console.WriteLine (symbol.Name);                   // WriteLine
Console.WriteLine (symbol.Kind);                   // Method
Console.WriteLine (symbol.IsStatic);               // True
Console.WriteLine (symbol.ContainingType.Name);    // Console

var method = (IMethodSymbol) symbol;
Console.WriteLine (method.ReturnType.ToString());  // void

The output of the last line illustrates a subtle difference with Reflection. Notice that void is in lowercase, which is C# nomenclature (Reflection is language-agnostic). Similarly, calling ToString() on the INamedTypeSymbol for System.Int32 returns int. Here’s something else you can’t do with Reflection:

Console.WriteLine (symbol.Language);                // C#
Note

With the syntax trees API, the classes for syntax nodes differ for C# and Visual Basic (although they share an abstract SyntaxNode base type). This makes sense because the languages have a different lexical structure. In contrast, ISymbol and its derived interfaces are shared between C# and Visual Basic. However, their internal concrete implementations are specific to each language, and the output from their methods and properties reflects language-specific differences.

We can also ask the symbol where it came from:

var location = symbol.Locations.First();
Console.WriteLine (location.Kind);                     // MetadataFile

If the symbol was defined in our own source code (i.e., a syntax tree), the SourceTree property will return that tree, and SourceSpan will return its location in the tree:

Console.WriteLine (location.SourceTree == null);    // True
Console.WriteLine (location.SourceSpan);            // [0..0)

A partial type can have multiple definitions, in which case it will have multiple Locations.

The following query returns all the overloads of WriteLine:

symbol.ContainingType.GetMembers ("WriteLine").OfType<IMethodSymbol>()

You can also call ToDisplayParts on a symbol. This returns a collection of parts that make up the full name; in our case System.Console.WriteLine(int) comprises four symbols interspersed with punctuation.

SymbolInfo

If you’re writing code completion for an editor, you’ll need to obtain symbols for code that’s incomplete or incorrect. For instance, consider the following incomplete code:

System.Console.Writeline(

Because the WriteLine method is overloaded, it’s impossible to match to a single ISymbol. Instead, we want to present options to the user. To deal with this, the semantic model’s GetSymbolInfo method returns an ISymbolInfo struct, which has the following properties:

ISymbol Symbol
ImmutableArray<ISymbol> CandidateSymbols
CandidateReason CandidateReason

If there’s an error or ambiguity, the Symbol property returns null, and Candidate​Symbols returns a collection comprising the best matches. The CandidateReason property returns an enum telling you what went wrong.

Note

To obtain error and warning information for a section of code, you can also call GetDiagnostics on a semantic model, specifying a TextSpan. Calling GetDiagnostics with no argument is equivalent to calling the same method on the CSharp​Compilation object.

Symbol accessibility

ISymbol has a DeclaredAccessibility property that indicates whether the symbol is public, protected, internal, and so on. However, this isn’t sufficient to determine whether a given symbol is accessible at a particular position in your source code. Local variables, for instance, have a lexically limited scope, and a protected class member is accessible from source code positions within its type or a derived type. To help with this, SemanticModel has an IsAccessible method:

bool canAccess = model.IsAccessible (42, someSymbol);

This returns true if someSymbol can be accessed at offset 42 in the source code.

Declared symbols

If you call GetSymbolInfo on a type or member declaration, you’ll get no symbols back. For instance, suppose that we want the symbol for our Main method:

var mainMethod = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "Main").Parent;

SymbolInfo symbolInfo = model.GetSymbolInfo (mainMethod);
Console.WriteLine (symbolInfo.Symbol == null);              // True
Console.WriteLine (symbolInfo.CandidateSymbols.Length);     // 0
Note

This applies not just to type/member declarations, but any node where you’re introducing a new symbol rather than consuming an existing symbol.

To obtain the symbol, we must instead call GetDeclaredSymbol:

ISymbol symbol = model.GetDeclaredSymbol (mainMethod);

Unlike GetSymbolInfo, GetDeclaredSymbol either succeeds or it doesn’t. (If it fails, it will be because it can’t find a valid declaration node.)

To give another example, suppose that our Main method is as follows:

static void Main()
{
  int xyz = 123;
}

We can determine the type of xyz as follows:

SyntaxNode variableDecl = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "xyz").Parent;

var local = (ILocalSymbol) model.GetDeclaredSymbol (variableDecl);
Console.WriteLine (local.Type.ToString());             // int
Console.WriteLine (local.Type.BaseType.ToString());    // System.ValueType

TypeInfo

Sometimes, you need type information about an expression or literal for which there’s no explicit symbol. Consider the following:

var now = System.DateTime.Now;
System.Console.WriteLine (now - now);

To determine the type of now - now, we call GetTypeInfo on the semantic model:

SyntaxNode binaryExpr = tree.GetRoot().DescendantTokens().Single (
  t => t.Text == "-").Parent;

TypeInfo typeInfo = model.GetTypeInfo (binaryExpr);

TypeInfo has two properties, Type and ConvertedType. The latter indicates the type after any implicit conversions:

Console.WriteLine (typeInfo.Type);             // System.TimeSpan
Console.WriteLine (typeInfo.ConvertedType);    // object

Because Console.WriteLine is overloaded to accept an object but not a TimeSpan, an implicit conversion to object took place, which manifested in typeInfo​.ConvertedType.

Looking up symbols

A powerful feature of the semantic model is the ability to ask for all symbols in scope at a particular point in the source code. The result is the basis for IntelliSense listings, when the user requests a list of available symbols.

To obtain the listing, simply call LookupSymbols, with the desired source code offset. Here’s a complete example:

var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static void Main()
  {
    int x = 123, y = 234;

  }
}");

var references = ((string)AppContext.GetData ("TRUSTED_PLATFORM_ASSEMBLIES"))
  .Split (Path.PathSeparator)
  .Select (path => MetadataReference.CreateFromFile (path));

var compilation = CSharpCompilation.Create ("test")
  .AddReferences (references)
  .AddSyntaxTrees (tree);

SemanticModel model = compilation.GetSemanticModel (tree);

// Look for available symbols at start of 6th line:
int index = tree.GetText().Lines[5].Start;

foreach (ISymbol symbol in model.LookupSymbols (index))
  Console.WriteLine (symbol.ToString());

Here’s the result:

y
x
Program.Main()
object.ToString()
object.Equals(object)
object.Equals(object, object)
object.ReferenceEquals(object, object)
object.GetHashCode()
object.GetType()
object.~Object()
object.MemberwiseClone()
Program
Microsoft
System
Windows

(If we imported the System namespace, we’d see hundreds more symbols, for types in that namespace.)

Example: Renaming a Symbol

To illustrate the features we’ve covered, let’s write a method to rename a symbol, which is robust to the most common use cases; in particular:

  • The symbol can be a type, member, local variable, range, or loop variable.

  • You can specify the symbol from either its use or declaration.

  • With a class or struct, it will rename the static and instance constructors.

  • In the case of a class, it will rename the finalizer (destructor).

For brevity, we omit some checks, such as ensuring that the new name is not already in use, and that the symbol isn’t an edge-case for which the rename will fail. Our method will consider just a single syntax tree, and so will have the following signature:

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)

One obvious way to implement this is to subclass CSharpSyntaxRewriter. However, a more elegant and flexible approach is to have RenameSymbol call a lower-level method that returns the text spans to be renamed:

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)

This allows an editor to call GetRenameSpans directly and apply just the changes (within an Undo transaction), avoiding the loss of editor state that might otherwise result in replacing the entire text.

This makes RenameSymbol a relatively simple wrapper around GetRenameSpans. We can use SourceText’s WithChanges method to apply a sequence of text changes:

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)
{
  IEnumerable<TextSpan> renameSpans = GetRenameSpans (model, token);

  SourceText newSourceText = model.SyntaxTree.GetText().WithChanges (
    renameSpans.Select (span => new TextChange (span, newName))
               .OrderBy (tc => tc));

  return model.SyntaxTree.WithChangedText (newSourceText);
}

WithChanges throws an exception unless the changes are in order; this is why we called OrderBy on the latter.

Now we must write GetRenameSpans. The first step is to find the symbol corresponding to the token that we want to rename. The token can be part of either a declaration or usage, so we first call GetSymbolInfo, and if the result is null, we call GetDeclaredSymbol:

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)
{
  var node = token.Parent;

  ISymbol symbol = model.GetSymbolInfo (node).Symbol
                ?? model.GetDeclaredSymbol (node);

  if (symbol == null) return null;   // No symbol to rename.

Next, we need to find the symbol definitions. We can get this from the symbol’s Locations property. (Our consideration of multiple locations makes us robust to the scenario of partial classes and methods, although for the former to be useful, we would need to expand the example to work with multiple syntax trees):

  var definitions =
    from location in symbol.Locations
    where location.SourceTree == node.SyntaxTree
    select location.SourceSpan;

Now we need to find usages of the symbol. For this, we begin by looking for descendant tokens whose names match the symbol’s name because this is a fast way to weed out most tokens. Then, we can call GetSymbolInfo on the token’s parent node and see whether it matches the symbol we want to rename:

  var usages =
    from t in model.SyntaxTree.GetRoot().DescendantTokens()
    where t.Text == symbol.Name
    let s = model.GetSymbolInfo (t.Parent).Symbol
    where s == symbol
    select t.Span;
Note

Binding-related operations such as asking for symbol information have a tendency to be slower than operations that consider just text or syntax trees. This is because the process of binding can require searching for types in assemblies, applying type inference rules, and checking for extensions methods.

If the symbol is something other than a named type (local variable, range variable, etc.), our job is done and we can return the definitions plus usages:

  if (symbol.Kind != SymbolKind.NamedType)
    return definitions.Concat (usages);

If the symbol is a named type, we need to rename its constructors and destructor, if present. To do so, we enumerate the descendant nodes, looking for type declarations whose names match the one we want to rename. Then, we get its declared symbol, and if it matches the one we’re renaming, we locate its constructor and destructor methods, returning the spans of their identifiers if present:

  var structors =
    from type in model.SyntaxTree.GetRoot().DescendantNodes()
                                           .OfType<TypeDeclarationSyntax>()
    where type.Identifier.Text == symbol.Name
    let declaredSymbol = model.GetDeclaredSymbol (type)
    where declaredSymbol == symbol
    from method in type.Members
    let constructor = method as ConstructorDeclarationSyntax
    let destructor = method as DestructorDeclarationSyntax
    where constructor != null || destructor != null
    let identifier = constructor?.Identifier ?? destructor.Identifier
    select identifier.Span;

  return definitions.Concat (usages).Concat (structors);
}

Here’s the complete listing, along with an example of how to use it:

void Demo()
{
  var tree = CSharpSyntaxTree.ParseText (@"class Program
{
  static Program() {}
  public Program() {}

  static void Main()
  {
    Program p = new Program();
    p.Foo();
  }

  void Foo() => Bar();
  void Bar() => Foo();
}
");

  var references = ((string)AppContext.GetData 
                    ("TRUSTED_PLATFORM_ASSEMBLIES"))
    .Split (Path.PathSeparator)
    .Select (path => MetadataReference.CreateFromFile (path));

  var compilation = CSharpCompilation.Create ("test")
    .AddReferences (references)
    .AddSyntaxTrees (tree);

  var model = compilation.GetSemanticModel (tree);

  var tokens = tree.GetRoot().DescendantTokens();

  // Rename the Program class to Program2:
  SyntaxToken program = tokens.First (t => t.Text == "Program");
  Console.WriteLine (RenameSymbol (model, program, "Program2").ToString());

  // Rename the Foo method to Foo2:
  SyntaxToken foo = tokens.Last (t => t.Text == "Foo");
  Console.WriteLine (RenameSymbol (model, foo, "Foo2").ToString());

  // Rename the p local variable to p2:
  SyntaxToken p = tokens.Last (t => t.Text == "p");
  Console.WriteLine (RenameSymbol (model, p, "p2").ToString());
}

public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token,
                                string newName)
{
  IEnumerable<TextSpan> renameSpans =
    GetRenameSpans (model, token).OrderBy (s => s);

  SourceText newSourceText = model.SyntaxTree.GetText().WithChanges (
    renameSpans.Select (s => new TextChange (s, newName)));

  return model.SyntaxTree.WithChangedText (newSourceText);
}

public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model,
                                             SyntaxToken token)
{
  var node = token.Parent;

  ISymbol symbol =
    model.GetSymbolInfo (node).Symbol ??
    model.GetDeclaredSymbol (node);

  if (symbol == null) return null;   // No symbol to rename.

  var definitions =
    from location in symbol.Locations
    where location.SourceTree == node.SyntaxTree
    select location.SourceSpan;

  var usages =
    from t in model.SyntaxTree.GetRoot().DescendantTokens ()
    where t.Text == symbol.Name
    let s = model.GetSymbolInfo (t.Parent).Symbol
    where s == symbol
    select t.Span;

  if (symbol.Kind != SymbolKind.NamedType)
    return definitions.Concat (usages);

  var structors =
    from type in model.SyntaxTree.GetRoot().DescendantNodes()
                                           .OfType<TypeDeclarationSyntax>()
    where type.Identifier.Text == symbol.Name
    let declaredSymbol = model.GetDeclaredSymbol (type)
    where declaredSymbol == symbol
    from method in type.Members
    let constructor = method as ConstructorDeclarationSyntax
    let destructor = method as DestructorDeclarationSyntax
    where constructor != null || destructor != null
    let identifier = constructor?.Identifier ?? destructor.Identifier
    select identifier.Span;

  return definitions.Concat (usages).Concat (structors);
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset