Chapter 3. C# 3: LINQ and everything that comes with it

This chapter covers

  • Implementing trivial properties simply
  • Initializing objects and collections more concisely
  • Creating anonymous types for local data
  • Using lambda expressions to build delegates and expression trees
  • Expressing complex queries simply with query expressions

The new features of C# 2 were mostly independent of each other. Nullable value types depended on generics, but they were still separate features that didn’t build toward a common goal.

C# 3 was different. It consisted of many new features, each of which was useful in its own right, but almost all of which built toward the larger goal of LINQ. This chapter shows each feature individually and then demonstrates how they fit together. The first feature we’ll look at is the only one that has no direct relationship with LINQ.

3.1. Automatically implemented properties

Prior to C# 3, every property had to be implemented manually with bodies for the get and/or set accessors. The compiler was happy to provide an implementation for field-like events but not properties. That meant there were a lot of properties like this:

private string name;
public string Name
{
    get { return name; }
    set { name = value; }
}

Formatting would vary by code style, but whether the property was one long line, 11 short ones, or five lines in between (as in the preceding example), it was always just noise. It was a very long-winded way of expressing the intention to have a field and expose its value to callers via a property.

C# 3 made this much simpler by using automatically implemented properties (often referred to as automatic properties or even autoprops). These are properties with no accessor bodies; the compiler provides the implementation. The whole of the preceding code can be replaced with a single line:

public string Name { get; set; }

Note that there’s no field declaration in the source code now. There’s still a field, but it’s created for you automatically by the compiler and given a name that can’t be referred to anywhere in the C# code.

In C# 3, you can’t declare read-only automatically implemented properties, and you can’t provide an initial value at the point of declaration. Both of those features were introduced (finally!) in C# 6 and are described in section 8.2. Before C# 6, it was a reasonably common practice to fake read-only properties by giving them a private set accessor like this:

public string Name { get; private set; }

The introduction of automatically implemented properties in C# 3 had a huge effect in reducing boilerplate code. They’re useful only when the property simply fetches and sets the field value, but that accounts for a large proportion of properties in my experience.

As I mentioned, automatically implemented properties don’t directly contribute to LINQ. Let’s move on to the first feature that does: implicit typing for arrays and local variables.

3.2. Implicit typing

In order to be as clear as possible about the features introduced in C# 3, I need to define a few terms first.

3.2.1. Typing terminology

Many terms are used to describe the way programming languages interact with their type system. Some people use the terms weakly typed and strongly typed, but I try to avoid those because they’re not clearly defined and mean different things to different developers. Two other aspects have more consensus: static/dynamic typing and explicit/implicit typing. Let’s look at each of those in turn.

Static and dynamic typing

Languages that are statically typed are typically compiled languages; the compiler is able to determine the type of each expression and check that it’s used correctly. For example, if you make a method call on an object, the compiler can use the type information to check that there’s a suitable method to call based on the type of the expression the method is called on, the name of the method, and the number and types of the arguments. Determining the meaning of something like a method call or field access is called binding. Languages that are dynamically typed leave all or most of the binding to execution time.

Note

As you’ll see in various places, some expressions in C# don’t have a type when considered in source code, such as the null literal. But the compiler always works out a type based on the context in which the expression is used, at which point that type can be used for checking how the expression is used.

Aside from the dynamic binding introduced in C# 4 (and described in chapter 4), C# is a statically typed language. Even though the choice of which implementation of a virtual method should be executed depends on the execution-time type of the object it’s called on, the binding process of determining the method signature all happens at compile time.

Explicit and implicit typing

In a language that’s explicitly typed, the source code specifies all the types involved. This could be for local variables, fields, method parameters, or method return types, for example. A language that’s implicitly typed allows the developer to omit the types from the source code so some other mechanism (whether it’s a compiler or something at execution time) can infer which type is meant based on other context.

C# is mostly explicitly typed. Even before C# 3, there was some implicit typing, such as type inference for generic type arguments as you saw in section 2.1.4. Arguably, the presence of implicit conversions (such as int to long) make the language less explicitly typed, too.

With those different aspects of typing separated, you can look at the C# 3 features around implicit typing. We’ll start with implicitly typed local variables.

3.2.2. Implicitly typed local variables (var)

Implicitly typed local variables are variables declared with the contextual keyword var instead of the name of a type, such as the following:

var language = "C#";

The result of declaring a local variable with var instead of with the name of a type is still a local variable with a known type; the only difference is that the type is inferred by the compiler from the compile-time type of the value assigned to it. The preceding code will generate the exact same result as this:

string language = "C#";
Tip

When C# 3 first came out, a lot of developers avoided var because they thought it would remove a lot of compile-time checks or lead to execution-time performance problems. It doesn’t do that at all; it only infers the type of the local variable. After the declaration, the variable acts exactly as if it had been declared with an explicit type name.

The way the type is inferred leads to two important rules for implicitly typed local variables:

  • The variable must be initialized at the point of declaration.
  • The expression used to initialize the variable must have a type.

Here’s some invalid code to demonstrate these rules:

var x;           1
x = 10;          1

var y = null;    2

  • 1 No initial value provided
  • 2 Initial value has no type.

It would’ve been possible to avoid these rules in some cases by analyzing all the assignments performed to the variable and inferring the type from those. Some languages do that, but the C# language designers preferred to keep the rules as simple as possible.

Another restriction is that var can be used for only local variables. Many times I’ve longed for implicitly typed fields, but they’re still not available (as of C# 7.3, anyway).

In the preceding example, there was little benefit, if any, in using var. The explicit declaration is feasible and just as readable. There are generally three reasons for using var:

  • When the type of the variable can’t be named because it’s anonymous. You’ll look at anonymous types in section 3.4. This is the LINQ-related part of the feature.
  • When the type of the variable has a long name and can easily be inferred by a human reader based on the expression used to initialize it.
  • When the precise type of the variable isn’t particularly important, and the expression used to initialize it gives enough information to anyone reading the code.

I’ll save examples of the first bullet point for section 3.4, but it’s easy to show the second. Suppose you want to create a dictionary that maps a name to a list of decimal values. You can do that with an explicitly typed variable:

Dictionary<string, List<decimal>> mapping =
    new Dictionary<string, List<decimal>>();

That’s really ugly. I had to wrap it on two lines just to make it fit on the page, and there’s a lot of duplication. That duplication can be entirely avoided by using var:

var mapping = new Dictionary<string, List<decimal>>();

This expresses the same amount of information in less text, so there’s less to distract you from other code. Of course, this works only when you want the type of the variable to be exactly the type of the initialization expression. If you wanted the type of the mapping variable to be IDictionary<string, List<decimal>>—the interface instead of the class—then var wouldn’t help. But for local variables, that sort of separation between interface and implementation is usually less important.

When I wrote the first edition of C# in Depth, I was wary of implicitly typed local variables. I rarely used them outside LINQ, apart from when I was calling a constructor directly, as in the preceding example. I was worried that I wouldn’t be able to easily work out the type of the variable when just reading the code.

Ten years later, that caution has mostly gone. I use var for almost all my local variables in test code and extensively in production code, too. My fears weren’t realized; in almost every case, I’m easily able to infer what the type should be just by inspection. Where that isn’t the case, I’ll happily use an explicit declaration instead.

I don’t claim to be entirely consistent about this, and I’m certainly not dogmatic. Because explicitly typed variables generate the exact same code as implicitly typed variables, it’s fine to change your mind later in either direction. I suggest you discuss this with the other people who’ll work with your code the most (whether those are colleagues or open source collaborators), get a sense of everyone’s comfort level, and try to abide by that. The other aspect of implicit typing in C# 3 is somewhat different. It’s not directly related to var, but it has the same aspect of removing a type name to let the compiler infer it.

3.2.3. Implicitly typed arrays

Sometimes you need to create an array without populating it and keep all the elements with their default values. The syntax for that hasn’t changed since C# 1; it’s always something like this:

int[] array = new int[10];

But you often want to create an array with specific initial content. Before C# 3, there were two ways of doing this:

int[] array1 = { 1, 2, 3, 4, 5};
int[] array2 = new int[] { 1, 2, 3, 4, 5};

The first form of this is valid only when it’s part of a variable declaration that specifies the array type. This is invalid, for example:

int[] array;
array = { 1, 2, 3, 4, 5 };      1

  • 1 Invalid

The second form is always valid, so the second line in the preceding example could’ve been as follows:

array = new int[] { 1, 2, 3, 4, 5 };

C# 3 introduced a third form in which the type of the array is implicit based on the content:

array = new[] { 1, 2, 3, 4, 5 };

This can be used anywhere, so long as the compiler is able to infer the array element type from the array elements specified. It also works with multidimensional arrays, as in the following example:

var array = new[,] { { 1, 2, 3 }, { 4, 5, 6 } };

The next obvious question is how the compiler infers that type. As is so often the case, the precise details are complex in order to handle all kinds of corner cases, but the simplified sequence of steps is as follows:

  1. Find a set of candidate types by considering the type of each array element that has a type.
  2. For each candidate type, check whether every array element has an implicit conversion to that type. Remove any candidate type that doesn’t meet this condition.
  3. If there’s exactly one type left, that’s the inferred element type, and the compiler creates an appropriate array. Otherwise (if there are no types or more than one type left), a compile-time error occurs.

The array element type must be the type of one of the expressions in the array initializer. There’s no attempt to find a common base class or a commonly implemented interface. Table 3.1 gives some examples that illustrate the rules.

Table 3.1. Examples of type inference for implicitly typed arrays

Expression

Result

Notes

new[] { 10, 20 } int[] All elements are of type int.
new[] { null, null } Error No elements have types.
new[] { "xyz", null } string[] Only candidate type is string, and the null literal can be converted to string.
new[] { "abc", new object() } object[] Candidate types of string and object; implicit conversion from string to object but not vice versa.
new[] { 10, new DateTime() } Error Candidate types of int and DateTime but no conversion from either to the other.
new[] { 10, null } Error Only candidate type is int, but there’s no conversion from null to int.

Implicitly typed arrays are mostly a convenience to reduce the source code required except for anonymous types, where the array type can’t be stated explicitly even if you want to. Even so, they’re a convenience I’d definitely miss now if I had to work without them.

The next feature continues the theme of making it simpler to create and initialize objects, but in a different way.

3.3. Object and collection initializers

Object initializers and collection initializers make it easy to create new objects or collections with initial values, just as you can create and populate an array in a single expression. This functionality is important for LINQ because of the way queries are translated, but it turns out to be extremely useful elsewhere, too. It does require types to be mutable, which can be annoying if you’re trying to write code in a functional style, but where you can apply it, it’s great. Let’s look at a simple example before diving into the details.

3.3.1. Introduction to object and collection initializers

As a massively oversimplified example, let’s consider what an order in an e-commerce system might look like. The following listing shows three classes to model an order, a customer, and a single item within an order.

Listing 3.1. Modeling an order in an e-commerce system
public class Order
{
    private readonly List<OrderItem> items = new List<OrderItem>();

    public string OrderId { get; set; }
    public Customer Customer { get; set; }
    public List<OrderItem> Items { get { return items; } }
}

public class Customer
{
    public string Name { get; set; }
    public string Address { get; set; }
}

public class OrderItem
{
    public string ItemId { get; set; }
    public int Quantity { get; set; }
}

How do you create an order? Well, you need to create an instance of Order and assign to its OrderId and Customer properties. You can’t assign to the Items property, because it’s read-only. Instead, you can add items to the list it returns. The following listing shows how you might do this if you didn’t have object and collection initializers and couldn’t change the classes to make things simpler.

Listing 3.2. Creating and populating an order without object and collection initializers
var customer = new Customer();  1
customer.Name = "Jon";          1
customer.Address = "UK";        1

var item1 = new OrderItem();    2
item1.ItemId = "abcd123";       2
item1.Quantity = 1;             2

var item2 = new OrderItem();    3
item2.ItemId = "fghi456";       3
item2.Quantity = 2;             3

var order = new Order();        4
order.OrderId = "xyz";          4
order.Customer = customer;      4
order.Items.Add(item1);         4
order.Items.Add(item2);         4

  • 1 Creates the Customer
  • 2 Creates the first OrderItem
  • 3 Creates the second OrderItem
  • 4 Creates the order

This code could be simplified by adding constructors to the various classes to initialize properties based on the parameters. Even with object and collection initializers available, that’s what I’d do. But for the sake of brevity, I’m going to ask you to trust me that it’s not always feasible, for all kinds of reasons. Aside from anything else, you don’t always control the code for the classes you’re using. Object and collection initializers make it much simpler to create and populate our order, as shown in the following listing.

Listing 3.3. Creating and populating an order with object and collection initializers
var order = new Order
{
    OrderId = "xyz",
    Customer = new Customer { Name = "Jon", Address = "UK" },
    Items =
    {
        new OrderItem { ItemId = "abcd123", Quantity = 1 },
        new OrderItem { ItemId = "fghi456", Quantity = 2 }
    }
};

I can’t speak for everyone, but I find listing 3.3 much more readable than listing 3.2. The structure of the object becomes apparent in the indentation, and less repetition occurs. Let’s look more closely at each part of the code.

3.3.2. Object initializers

Syntactically, an object initializer is a sequence of member initializers within braces. Each member initializer is of the form property = initializer-value, where property is the name of the field or property being initialized and initializer-value is an expression, a collection initializer, or another object initializer.

Note

Object initializers are most commonly used with properties, and that’s how I’ve described them in this chapter. Fields don’t have accessors, but the obvious equivalents apply: reading the field instead of calling a get accessor and writing the field instead of calling a set accessor.

Object initializers can be used only as part of a constructor call or another object initializer. The constructor call can specify arguments as usual, but if you don’t want to specify any arguments, you don’t need an argument list at all, so you can omit the (). A constructor call without an argument list is equivalent to supplying an empty argument list. For example, these two lines are equivalent:

Order order = new Order() { OrderId = "xyz" };
Order order = new Order { OrderId = "xyz" };

You can omit the constructor argument list only if you provide an object or collection initializer. This is invalid:

Order order = new Order;       1

  • 1 Invalid

An object initializer simply says how to initialize each of the properties it mentions in its member initializers. If the initializer-value part (the part to the right of the = sign) is a normal expression, that expression is evaluated, and the value is passed to the property set accessor. That’s how most of the object initializers in listing 3.3 work. The Items property uses a collection initializer, which you’ll see shortly.

If initializer-value is another object initializer, the set accessor is never called. Instead, the get accessor is called, and then the nested object initializer is applied to the value returned by the property. As an example, listing 3.4 creates an HttpClient and modifies the set of default headers that are sent with each request. The code sets the From and Date headers, which I chose only because they’re the simplest ones to set.

Listing 3.4. Modifying default headers on a new HttpClient with a nested object initializer
HttpClient client = new HttpClient
{
    DefaultRequestHeaders =              1
    {
        From = "[email protected]",       2
        Date = DateTimeOffset.UtcNow     3
    }
};

  • 1 Property get accessor called for DefaultRequestHeaders
  • 2 Property set accessor called for From
  • 3 Property set accessor called for Date

The code in listing 3.4 is equivalent to the following code:

HttpClient client = new HttpClient();
var headers = client.DefaultRequestHeaders;
headers.From = "[email protected]";
headers.Date = DateTimeOffset.UtcNow;

A single object initializer can include a mixture of nested object initializers, collection initializers, and normal expressions in the sequence of member initializers. Speaking of collection initializers, let’s look at those now.

3.3.3. Collection initializers

Syntactically, a collection initializer is a comma-separated list of element initializers in curly braces. Each element initializer is either a single expression or a comma-separated list of expressions also in curly braces. Collection initializers can be used only as part of a constructor call or part of an object initializer. Further restrictions exist on the types they can be used with, which we’ll come to shortly. In listing 3.3, you saw a collection initializer being used as part of an object initializer. Here’s the listing again with the collection initializer highlighted in bold:

var order = new Order
{
    OrderId = "xyz",
    Customer = new Customer { Name = "Jon", Address = "UK" },
    Items =
    {
        new OrderItem { ItemId = "abcd123", Quantity = 1 },
        new OrderItem { ItemId = "fghi456", Quantity = 2 }
    }
};

Collection initializers might be more commonly used when creating new collections, though. For example, this line declares a new variable for a list of strings and populates the list:

var beatles = new List<string> { "John", "Paul", "Ringo", "George" };

The compiler compiles that into a constructor call followed by a sequence of calls to an Add method:

var beatles = new List<string>();
beatles.Add("John");
beatles.Add("Paul");
beatles.Add("Ringo");
beatles.Add("George");

But what if the collection type you’re using doesn’t have an Add method with a single parameter? That’s where element initializers with braces come in. After List<T>, the second most common generic collection is probably Dictionary<TKey, TValue> with an Add(key, value) method. A dictionary can be populated with a collection initializer like this:

var releaseYears = new Dictionary<string, int>
{
    { "Please please me", 1963 },
    { "Revolver", 1966 },
    { "Sgt. Pepper's Lonely Hearts Club Band", 1967 },
    { "Abbey Road", 1970 }
};

The compiler treats each element initializer as a separate Add call. If the element initializer is a simple one without braces, the value is passed as a single argument to Add. That’s what happened for the elements in our List<string> collection initializer.

If the element initializer uses braces, it’s still treated as a single call to Add, but with one argument for each expression within the braces. The preceding dictionary example is effectively equivalent to this:

var releaseYears = new Dictionary<string, int>();
releaseYears.Add("Please please me", 1963);
releaseYears.Add("Revolver", 1966);
releaseYears.Add("Sgt. Pepper's Lonely Hearts Club Band", 1967);
releaseYears.Add("Abbey Road", 1970);

Overload resolution then proceeds as normal to find the most appropriate Add method, including performing type inference if there are any generic Add methods.

Collection initializers are valid only for types that implement IEnumerable, although they don’t have to implement IEnumerable<T>. The language designers looked at the types in the framework that had Add methods and determined that the best way of separating them into collections and noncollections was to look at whether they implemented IEnumerable. As an example of why that’s important, consider the DateTime.Add(TimeSpan) method. The DateTime type clearly isn’t a collection, so it’d be odd to be able to write this:

DateTime invalid = new DateTime(2020, 1, 1) { TimeSpan.FromDays(10) };   1

  • 1 Invalid

The compiler never uses the implementation of IEnumerable when compiling a collection initializer. I’ve sometimes found it convenient to create types in test projects with Add methods and an implementation of IEnumerable that just throws a NotImplementedException. This can be useful for constructing test data, but I don’t advise doing it in production code. I’d appreciate an attribute that let me express the idea that this type should be usable for collection initializers without implementing IEnumerable, but I doubt that’ll ever happen.

3.3.4. The benefits of single expressions for initialization

You may be wondering what all of this has to do with LINQ. I said that almost all the features in C# 3 built up to LINQ, so how do object and collection initializers fit into the picture? The answer is that other LINQ features require code to be expressible as a single expression. (For example, in a query expression, you can’t write a select clause that requires multiple statements to produce the output for a given input.)

The ability to initialize new objects in a single expression isn’t useful only for LINQ, however. It can also be important to simplify field initializers, method arguments, or even the operands in a conditional ?: operator. I find it particularly useful for static field initializers to build up useful lookup tables, for example. Of course, the larger the initialization expression becomes, the more you may want to consider separating it out.

It’s even recursively important to the feature itself. For example, if we couldn’t use an object initializer to create our OrderItem objects, the collection initializer wouldn’t be nearly as convenient to populate the Order.Items property.

In the rest of this book, whenever I refer to a new or improved feature as having a special case for a single expression (such as lambda expressions in section 3.5 or expression-bodied members in section 8.3), it’s worth remembering that object and collection initializers immediately make that feature more useful than it’d be otherwise.

Object and collection initializers allow for more concise code to create an instance of a type and populate it, but they do require that you already have an appropriate type to construct. Our next feature, anonymous types, allows you to create objects without even declaring the type of the object beforehand. It’s not quite as strange as it sounds.

3.4. Anonymous types

Anonymous types allow you to build objects that you can refer to in a statically typed way without having to declare a type beforehand. This sounds like types might be created dynamically at execution time, but the reality is a little more subtle than that. We’ll look at what anonymous types look like in source code, how the compiler handles them, and a few of their limitations.

3.4.1. Syntax and basic behavior

The simplest way to explain anonymous types is to start with an example. The following listing shows a simple piece of code to create an object with Name and Score properties.

Listing 3.5. Anonymous type with Name and Score properties
var player = new                                       1
{                                                      1
    Name = "Rajesh",                                   1
    Score = 3500                                       1
};                                                     1

Console.WriteLine("Player name: {0}", player.Name);    2
Console.WriteLine("Player score: {0}", player.Score);  2

  • 1 Creates an object of an anonymous type with Name and Score properties
  • 2 Displays the property values

This brief example demonstrates important points about anonymous types:

  • The syntax is a little like object initializers but without specifying a type name; it’s just new, open brace, properties, close brace. This is called an anonymous object creation expression. The property values can be nested anonymous object creation expressions.
  • You’re using var for the declaration of the player variable, because the type has no name for you to use instead of var. (The declaration would work if you used object instead, but it wouldn’t be nearly as useful.)
  • This code is still statically typed. Visual Studio can autocomplete the Name and Score properties of the player variable. If you ignore that and try to access a property that doesn’t exist (if you try to use player.Points, for example), the compiler will raise an error. The property types are inferred from the values assigned to them; player.Name is a string property, and player.Score is an int property.

That’s what anonymous types look like, but what are they used for? This is where LINQ comes in. When performing a query, whether that’s using an SQL database as the underlying data store or using a collection of objects, it’s common to want a specific shape of data that isn’t the original type and may not have much meaning outside the query.

For example, suppose you’re building a query using a set of people, each of which has expressed a favorite color. You might want the result to be a histogram: each entry in the resulting collection is the color and the number of people who chose that as their favorite. That type representing a favorite color and type isn’t likely to be useful anywhere else, but it is useful in this specific context. Anonymous types allow us to express those one-off cases concisely without losing the benefits of static typing.

Comparison with Java anonymous classes

If you’re familiar with Java, you may be wondering about the relationship between C#’s anonymous types and Java’s anonymous classes. They sound like they’d be similar, but they differ greatly both in syntax and purpose.

Historically, the principal use for anonymous classes in Java was to implement interfaces or extend abstract classes to override just one or two methods. C#’s anonymous types don’t allow you to implement an interface or derive from any class other than System.Object; their purpose is much more about data than executable code.

C# provides one extra piece of shorthand in anonymous object creation expressions where you’re effectively copying a property or field from somewhere else and you’re happy to use the same name. This syntax is called a projection initializer. To give an example, let’s go back to our simplified e-commerce data model. You have three classes:

  • OrderOrderId, Customer, Items
  • CustomerName, Address
  • OrderItemItemId, Quantity

At some point in your code, you may want an object with all this information for a specific order item. If you have variables of the relevant types called order, customer, and item, you can easily use an anonymous type to represent the flattened information:

var flattenedItem = new
{
    order.OrderId,
    CustomerName = customer.Name,
    customer.Address,
    item.ItemId,
    item.Quantity
};

In this example, every property except CustomerName uses a projection initializer. The result is identical to this code, which specifies the property names in the anonymous type explicitly:

var flattenedItem = new
{
    OrderId = order.OrderId,
    CustomerName = customer.Name,
    Address = customer.Address,
    ItemId = item.ItemId,
    Quantity = item.Quantity
};

Projection initializers are most useful when you’re either performing a query and want to select only a subset of properties or to combine properties from multiple objects into one. If the name you want to give the property in the anonymous type is the same as the name of the field or property you’re copying from, the compiler can infer that name for you. So instead of writing this

SomeProperty = variable.SomeProperty

you can just write this:

variable.SomeProperty

Projection initializers can significantly reduce the amount of duplication in your source code if you’re copying multiple properties. It can easily make the difference between an expression being short enough to keep on one line or long enough to merit a separate line per property.

Refactoring and projection initializers

Although it’s accurate to say that the results of the two preceding listings are the same, that doesn’t mean they behave identically in other ways. Consider a rename of the Address property to CustomerAddress.

In the version with projection initializers, the property name in the anonymous type would change too. In the version with the explicit property name, it wouldn’t. That’s rarely an issue in my experience, but it’s worth being aware of as a difference.

I’ve described the syntax of anonymous types, and you know the resulting objects have properties you can use as if they were normal types. But what’s going on behind the scenes?

3.4.2. The compiler-generated type

Although the type never appears in source code, the compiler does generate a type. There’s no magic for the runtime to contend with; it just sees a type that happens to have a name that would be invalid in C#. That type has a few interesting aspects to it. Some are guaranteed by the specification; others aren’t. When using the Microsoft C# compiler, an anonymous type has the following characteristics:

  • It’s a class (guaranteed).
  • Its base class is object (guaranteed).
  • It’s sealed (not guaranteed, although it would be hard to see how it would be useful to make it unsealed).
  • The properties are all read-only (guaranteed).
  • The constructor parameters have the same names as the properties (not guaranteed; can be useful for reflection occasionally).
  • It’s internal to the assembly (not guaranteed; can be irritating when working with dynamic typing).
  • It overrides GetHashCode() and Equals() so that two instances are equal only if all their properties are equal. (It handles properties being null.) The fact that these methods are overridden is guaranteed, but the precise way of computing the hash code isn’t.
  • It overrides ToString() in a helpful way and lists the property names and their values. This isn’t guaranteed, but it is super helpful when diagnosing issues.
  • The type is generic with one type parameter for each property. Multiple anonymous types with the same property names but different property types will use different type arguments for the same generic type. This isn’t guaranteed and could easily vary by compiler.
  • If two anonymous object creation expressions use the same property names in the same order with the same property types in the same assembly, the result is guaranteed to be two objects of the same type.

The last point is important for variable reassignment and for implicitly typed arrays using anonymous types. In my experience, it’s relatively rare that you want to reassign a variable initialized with an anonymous type, but it’s nice that it’s feasible. For example, this is entirely valid:

var player = new { Name = "Pam", Score = 4000 };
player = new { Name = "James", Score = 5000 };

Likewise, it’s fine to create an array by using anonymous types using the implicitly typed array syntax described in section 3.2.3:

var players = new[]
{
    new { Name = "Priti", Score = 6000 },
    new { Name = "Chris", Score = 7000 },
    new { Name = "Amanda", Score = 8000 },
};

Note that the properties must have the same names and types and be in the same order for two anonymous object creation expressions to use the same type. For example, this would be invalid because the order of properties in the second array element is different from the others:

var players = new[]
{
    new { Name = "Priti", Score = 6000 },
    new { Score = 7000, Name = "Chris" },
    new { Name = "Amanda", Score = 8000 },
};

Although each array element is valid individually, the type of the second element stops the compiler from inferring the array type. The same would be true if you added an extra property or changed the type of one of the properties.

Although anonymous types are useful within LINQ, that doesn’t make this feature the right tool for every problem. Let’s look briefly at places you may not want to use them.

3.4.3. Limitations

Anonymous types are great when you want a localized representation of just data. By localized, I mean that the data shape you’re interested in is relevant only within that specific method. As soon as you want to represent the same shape in multiple places, you need to look for a different solution. Although it’s possible to return instances of anonymous types from methods or accept them as parameters, you can do so only by using either generics or the object type. The fact that the types are anonymous prevents you from expressing them in method signatures.

Until C# 7, if you wanted to use a common data structure in more than one method, you’d normally declare your own class or struct for it. C# 7 has introduced tuples, as you’ll see in chapter 11, which can work as an alternative solution, depending on how much encapsulation you desire.

Speaking of encapsulation, anonymous types basically don’t provide any. You can’t place any validation in the type or add extra behavior to it. If you find yourself wanting to do so, that’s a good indication that you should probably be creating your own type instead.

Finally, I mentioned earlier that using anonymous types across assemblies via C# 4’s dynamic typing is made more difficult because the types are internal. I’ve usually seen this attempted in MVC web applications where the model for a page may be built using anonymous types and then accessed in the view using the dynamic type (which you’ll look at in chapter 4). This works if either the two pieces of code are in the same assembly or the assembly containing the model code has made its internal members visible to the assembly containing the view code using [InternalsVisibleTo]. Depending on the framework you’re using, it may be awkward to arrange for either of these to be true. Given the benefits of static typing anyway, I generally recommend declaring the model as a regular type instead. It’s more up-front work than using an anonymous type but is likely to save you time in the long term.

Note

Visual Basic has anonymous types too, but they don’t behave in quite the same way. In C#, all properties are used in determining equality and hash codes, and they’re all read-only. In VB, only properties declared with the Key modifier behave like that. Nonkey properties are read/write and don’t affect equality or hash codes.

We’re about halfway through the C# 3 features, and so far they’ve all had to do with data. The next features focus more on executable code, first with lambda expressions and then extension methods.

3.5. Lambda expressions

In chapter 2, you saw how anonymous methods made it much easier to create delegate instances by including their code inline like this:

Action<string> action = delegate(string message)      1
{                                                     1
    Console.WriteLine("In delegate: {0}", message);   1
};                                                    1
action("Message");                                    2

  • 1 Creates delegate using an anonymous method
  • 2 Invokes the delegate

Lambda expressions were introduced in C# 3 to make this even more concise. The term anonymous function is used to refer to both anonymous methods and lambda expressions. I’ll use it at various points in the rest of this book, and it’s widely used in the C# specification.

Note

The name lambda expressions comes from lambda calculus, a field of mathematics and computer science started by Alonzo Church in the 1930s. Church used the Greek lambda character (λ) in his notation for functions, and the name stuck.

There are various reasons that it was useful for the language designers to put so much effort into streamlining delegate instance creation, but LINQ is the most important one. When you look at query expressions in section 3.7, you’ll see that they’re effectively translated into code that uses lambda expressions. You can use LINQ without using query expressions, though, and that almost always involves using lambda expressions directly in your source code.

First, we’ll look at the syntax for lambda expressions and then some of the details of how they behave. Finally, we’ll talk about expression trees that represent code as data.

3.5.1. Lambda expression syntax

The basic syntax for lambda expressions is always of this form:

parameter-list => body

Both the parameter list and the body, however, have multiple representations. In its most explicit form, the parameter list for a lambda expression looks like a normal method or anonymous method parameter list. Likewise, the body of a lambda expression can be a block: a sequence of statements all within a pair of curly braces. In this form, the lambda expression looks similar to an anonymous method:

Action<string> action = (string message) =>
{
    Console.WriteLine("In delegate: {0}", message);
};
action("Message");

So far, this doesn’t look much better; you’ve traded the delegate keyword for =>, but that’s all. But special cases allow the lambda expression to become shorter.

Let’s start by making the body more concise. A body that consists of just a return statement or a single expression can be reduced to that single expression. The return keyword is removed if there was one. In the preceding example, the body of our lambda expression was just a method invocation, so you can simplify it:

Action<string> action =
    (string message) => Console.WriteLine("In delegate: {0}", message);

You’ll look at an example returning a value shortly. Lambda expressions shortened like this are said to have expression bodies, whereas lambda expressions using braces are said to have statement bodies.

Next, you can make the parameter list shorter if the compiler can infer the parameter types based on the type you’re attempting to convert the lambda expression to. Lambda expressions don’t have a type but are convertible to compatible delegate types, and the compiler can often infer the parameter type as part of that conversion.

For example, in the preceding code, the compiler knows that an Action<string> has a single parameter of type string, so it’s capable of inferring that parameter type. When the compiler can infer the parameter type, you can omit it. Therefore, our example can be shortened:

Action<string> action =
    (message) => Console.WriteLine("In delegate: {0}", message);

Finally, if the lambda expression has exactly one parameter, and that parameter’s type is inferred, the parentheses can be dropped from the parameter list:

Action<string> action =
    message => Console.WriteLine("In delegate: {0}", message);

Now let’s look at a couple of examples that return values. In each case, you’ll apply every step you can to make it shorter. First, you’ll construct a delegate to multiply two integers together and return the result:

Func<int, int, int> multiply =                            1
    (int x, int y) => { return x * y; };                  1

Func<int, int, int> multiply = (int x, int y) => x * y;   2

Func<int, int, int> multiply = (x, y) => x * y;           3
(Two parameters, so you can't remove parentheses)

  • 1 Longest form
  • 2 Uses an expression body
  • 3 Infers parameter types

Next, you’ll use a delegate to take the length of a string, multiply that length by itself, and return the result:

Func<string, int> squareLength = (string text) =>  1
{                                                  
    int length = text.Length;                      
    return length * length;                        
};                                                 

Func<string, int> squareLength = (text) =>         2
{
    int length = text.Length;
    return length * length;
};

Func<string, int> squareLength = text =>           3
{
    int length = text.Length;
    return length * length;
};
(Can't do anything else immediately; body has two statements)

  • 1 Longest form
  • 2 Infers parameter type
  • 3 Removes parentheses for single parameter

If you were happy to evaluate the Length property twice, you could reduce this second example:

Func<string, int> squareLength = text => text.Length * text.Length;

That’s not the same kind of change as the others, though; that’s changing the behavior (however slightly) rather than just the syntax. It may seem odd to have all of these special cases, but in practice all of them apply in a large number of cases, particularly within LINQ. Now that you understand the syntax, you can start looking at the behavior of the delegate instance, particularly in terms of any variables it has captured.

3.5.2. Capturing variables

In section 2.3.2, when I described captured variables in anonymous methods, I promised that we’d return to the topic in the context of lambda expressions. This is probably the most confusing part of lambda expressions. It’s certainly been the cause of lots of Stack Overflow questions.

To create a delegate instance from a lambda expression, the compiler converts the code in the lambda expression to a method somewhere. The delegate can then be created at execution time exactly as if you had a method group. This section shows the kind of transformation the compiler performs. I’ve written this as if the compiler translates the source code into more source code that doesn’t contain lambda expressions, but of course the compiler never needs that translated source code. It can just emit the appropriate IL.

Let’s start with a recap of what counts as a captured variable. Within a lambda expression, you can use any variable that you’d be able to use in regular code at that point. That could be a static field, an instance field (if you’re writing the lambda expression within an instance method[1]), the this variable, method parameters, or local variables. All of these are captured variables, because they’re variables declared outside the immediate context of the lambda expression. Compare that with parameters to the lambda expression or local variables declared within the lambda expression; those aren’t captured variables. The following listing shows a lambda expression that captures various variables. You’ll then look at how the compiler handles that code.

1

You can write lambda expressions in constructors, property accessors, and so on as well, but for the sake of simplicity, I’ll assume you’re writing them in methods.

Listing 3.6. Capturing variables in a lambda expression
class CapturedVariablesDemo
{
    private string instanceField = "instance field";

    public Action<string> CreateAction(string methodParameter)
    {
        string methodLocal = "method local";
        string uncaptured = "uncaptured local";

        Action<string> action = lambdaParameter =>
        {
            string lambdaLocal = "lambda local";
            Console.WriteLine("Instance field: {0}", instanceField);
            Console.WriteLine("Method parameter: {0}", methodParameter);
            Console.WriteLine("Method local: {0}", methodLocal);
            Console.WriteLine("Lambda parameter: {0}", lambdaParameter);
            Console.WriteLine("Lambda local: {0}", lambdaLocal);
        };
        methodLocal = "modified method local";
        return action;
    }
}

In other code
var demo = new CapturedVariablesDemo();
Action<string> action = demo.CreateAction("method argument");
action("lambda argument");

Lots of variables are involved here:

  • instanceField is an instance field in the CapturedVariablesDemo class and is captured by the lambda expression.
  • methodParameter is a parameter in the CreateAction method and is captured by the lambda expression.
  • methodLocal is a local variable in the CreateAction method and is captured by the lambda expression.
  • uncaptured is a local variable in the CreateAction method, but it’s never used by the lambda expression, so it’s not captured by it.
  • lambdaParameter is a parameter in the lambda expression itself, so it isn’t a captured variable.
  • lambdaLocal is a local variable in the lambda expression, so it isn’t a captured variable.

It’s important to understand that the lambda expression captures the variables themselves, not the values of the variables at the point when the delegate is created.[2] If you modified any of the captured variables between the time at which the delegate is created and when it’s invoked, the output would reflect those changes. Likewise, the lambda expression can change the value of the captured variables. How does the compiler make all of that work? How does it make sure all those variables are still available to the delegate when it’s invoked?

2

I will repeat this multiple times, for which I make no apology. If you’re new to captured variables, this can take a while to get used to.

Implementing captured variables with a generated class

There are three broad cases to consider:

  • If no variables are captured at all, the compiler can create a static method. No extra context is required.
  • If the only variables captured are instance fields, the compiler can create an instance method. Capturing one instance field is equivalent to capturing 100 of them, because you need access only to this.
  • If local variables or parameters are captured, the compiler creates a private nested class to contain that context and then an instance method in that class containing the lambda expression code. The method containing the lambda expression is changed to use that nested class for every access to the captured variables.
Implementation details may vary

You may see some variation in what I’ve described. For example, with a lambda expression with no captured variables, the compiler may create a nested class with a single instance instead of a static method. There can be subtle differences in the efficiency of executing delegates based on exactly how they’re created. In this section, I’ve described the minimum work that the compiler must do in order to make captured variables available. It can introduce more complexity if it wants to.

The last case is obviously the most complex one, so we’ll focus on that. Let’s start with listing 3.6. As a reminder, here’s the method that creates the lambda expression; I’ve omitted the class declaration for brevity:

public Action<string> CreateAction(string methodParameter)
{
    string methodLocal = "method local";
    string uncaptured = "uncaptured local";

    Action<string> action = lambdaParameter =>
    {
        string lambdaLocal = "lambda local";
        Console.WriteLine("Instance field: {0}", instanceField);
        Console.WriteLine("Method parameter: {0}", methodParameter);
        Console.WriteLine("Method local: {0}", methodLocal);
        Console.WriteLine("Lambda parameter: {0}", lambdaParameter);
        Console.WriteLine("Lambda local: {0}", lambdaLocal);
    };
    methodLocal = "modified method local";
    return action;
}

As I described before, the compiler creates a private nested class for the extra context it’ll need and then an instance method in that class for the code in the lambda expression. The context is stored in instance variables of the nested class. In our case, that means the following:

  • A reference to the original instance of CapturedVariablesDemo so that you can access instanceField later
  • A string variable for the captured method parameter
  • A string variable for the captured local variable

The following listing shows the nested class and how it’s used by the CreateAction method.

Listing 3.7. Translation of a lambda expression with captured variables
private class LambdaContext                           1
{
    public CapturedVariablesDemoImpl originalThis;    2
    public string methodParameter;                    2
    public string methodLocal;                        2

    public void Method(string lambdaParameter)        3
    {
        string lambdaLocal = "lambda local";
        Console.WriteLine("Instance field: {0}",
            originalThis.instanceField);
        Console.WriteLine("Method parameter: {0}", methodParameter);
        Console.WriteLine("Method local: {0}", methodLocal);
        Console.WriteLine("Lambda parameter: {0}", lambdaParameter);
        Console.WriteLine("Lambda local: {0}", lambdaLocal);
    }
}

public Action<string> CreateAction(string methodParameter)
{
    LambdaContext context = new LambdaContext();      4
    context.originalThis = this;                      4
    context.methodParameter = methodParameter;        4
    context.methodLocal = "method local";             4
    string uncaptured = "uncaptured local";           4

                                                      4
    Action<string> action = context.Method;           4
    context.methodLocal = "modified method local";    4
    return action;
}

  • 1 Generated class to hold the captured variables
  • 2 Captured variables
  • 3 Body of lambda expression becomes an instance method.
  • 4 Generated class is used for all captured variables.

Note how the context.methodLocal is modified near the end of the CreateAction method. When the delegate is finally invoked, it’ll “see” that modification. Likewise, if the delegate modified any of the captured variables, each invocation would see the results of the previous invocations. This is just reinforcing that the compiler ensures that the variable is captured rather than a snapshot of its value.

In listings 3.6 and 3.7, you had to create only a single context for the captured variables. In the terminology of the specification, each of the local variables was instantiated only once. Let’s make things a little more complicated.

Multiple instantiations of local variables

To make things a little simpler, you’ll capture one local variable this time and no parameters or instance fields. The following listing shows a method to create a list of actions and then execute them one at a time. Each action captures a text variable.

Listing 3.8. Instantiating a local variable multiple times
static List<Action> CreateActions()
{
    List<Action> actions = new List<Action>();
    for (int i = 0; i < 5; i++)
    {
        string text = string.Format("message {0}", i);   1
        actions.Add(() => Console.WriteLine(text));      2
    }
    return actions;
}

In other code
List<Action> actions = CreateActions();
foreach (Action action in actions)
{
    action();
}

  • 1 Declares a local variable within the loop
  • 2 Captures the variable in a lambda expression

The fact that text is declared inside the loop is very important indeed. Each time you reach that declaration, the variable is instantiated. Each lambda expression captures a different instantiation of the variable. There are effectively five different text variables, each of which has been captured separately. They’re completely independent variables. Although this code happens not to modify them after the initial assignment, it certainly could do so either inside the lambda expression or elsewhere within the loop. Modifying one variable would have no effect on the others.

The compiler models this behavior by creating a different instance of the generated type for each instantiation. Therefore, the CreateAction method of listing 3.8 could be translated into the following listing.

Listing 3.9. Creating multiple context instances, one for each instantiation
private class LambdaContext
{
    public string text;

    public void Method()
    {
        Console.WriteLine(text);
    }
}

static List<Action> CreateActions()
{
    List<Action> actions = new List<Action>();
    for (int i = 0; i < 5; i++)
    {
        LambdaContext context = new LambdaContext();      1
        context.text = string.Format("message {0}", i);
        actions.Add(context.Method);                      2
    }
    return actions;
}

  • 1 Creates a new context for each loop iteration
  • 2 Uses the context to create an action

Hopefully, that still makes sense. You’ve gone from having a single context for the lambda expression to one for each iteration of the loop. I’m going to finish this discussion of captured variables with an even more complicated example, which is a mixture of the two.

Capturing variables from multiple scopes

It was the scope of the text variable that meant it was instantiated once for each iteration of the loop. But multiple scopes can exist within a single method, and each scope can contain local variable declarations, and a single lambda expression can capture variables from multiple scopes. Listing 3.10 gives an example. You create two delegate instances, each of which captures two variables. They both capture the same outerCounter variable, but each captures a separate innerCounter variable. The delegates simply print out the current values of the counters and increment them. You execute each delegate twice, which makes the difference between the captured variables clear.

Listing 3.10. Capturing variables from multiple scopes
static List<Action> CreateCountingActions()
{
    List<Action> actions = new List<Action>();
    int outerCounter = 0;                        1
    for (int i = 0; i < 2; i++)
    {
        int innerCounter = 0;                    2
        Action action = () =>
        {
            Console.WriteLine(                   3
                "Outer: {0}; Inner: {1}",        3
                outerCounter, innerCounter);     3
            outerCounter++;                      3
            innerCounter++;                      3
        };
        actions.Add(action);
    }
    return actions;
}

In other code
List<Action> actions = CreateCountingActions();  
actions[0]();                                    4
actions[0]();                                    4
actions[1]();                                    4
actions[1]();                                    4

  • 1 One variable captured by both delegates
  • 2 New variable for each loop iteration
  • 3 Displays and increments counters
  • 4 Calls each delegate twice

The output of listing 3.10 is as follows:

Outer: 0; Inner: 0
Outer: 1; Inner: 1
Outer: 2; Inner: 0
Outer: 3; Inner: 1

The first two lines are printed by the first delegate. The last two lines are printed by the second delegate. As I described before the listing, the same outer counter is used by both delegates, but they have independent inner counters.

What does the compiler do with this? Each delegate needs its own context, but that context needs to also refer to a shared context. The compiler creates two private nested classes instead of one. The following listing shows an example of how the compiler could treat listing 3.10.

Listing 3.11. Capturing variables from multiple scopes leads to multiple classes
private class OuterContext                               1
{                                                        1
    public int outerCounter;                             1
}                                                        1

private class InnerContext                               2
{                                                        2
    public OuterContext outerContext;                    2
    public int innerCounter;                             2

    public void Method()                                 3
    {
        Console.WriteLine(
            "Outer: {0}; Inner: {1}",
            outerContext.outerCounter, innerCounter);
        outerContext.outerCounter++;
        innerCounter++;
    }
}

static List<Action> CreateCountingActions()
{
    List<Action> actions = new List<Action>();
    OuterContext outerContext = new OuterContext();      4
    outerContext.outerCounter = 0;
    for (int i = 0; i < 2; i++)
    {
        InnerContext innerContext = new InnerContext();  5
        innerContext.outerContext = outerContext;        5
        innerContext.innerCounter = 0;                   5
        Action action = innerContext.Method;
        actions.Add(action);
    }
    return actions;
}

  • 1 Context for the outer scope
  • 2 Context for the inner scope with reference to outer context
  • 3 Method used to create delegate
  • 4 Creates a single outer context
  • 5 Creates an inner context per loop iteration

You’ll rarely need to look at the generated code like this, but it can make a difference in terms of performance. If you use a lambda expression in a performance-critical piece of code, you should be aware of how many objects will be created to support the variables it captures.

I could give even more examples with multiple lambda expressions in the same scope capturing different sets of variables or lambda expressions in methods of value types. I find it fascinating to explore compiler-generated code, but you probably wouldn’t want a whole book of it. If you ever find yourself wondering how the compiler treats a particular lambda expression, it’s easy enough to run a decompiler or ildasm over the result.

So far, you’ve looked only at converting lambda expressions to delegates, which you could already do with anonymous methods. Lambda expressions have another superpower, however: they can be converted to expression trees.

3.5.3. Expression trees

Expression trees are representations of code as data. This is the heart of how LINQ is able to work efficiently with data providers such as SQL databases. The code you write in C# can be analyzed at execution time and converted into SQL.

Whereas delegates provide code you can run, expression trees provide code you can inspect, a little like reflection. Although you can build up expression trees directly in code, it’s more common to ask the compiler to do this for you by converting a lambda expression into an expression tree. The following listing gives a trivial example of this by creating an expression tree just to add two numbers together.

Listing 3.12. A simple expression tree to add two integers
Expression<Func<int, int, int>> adder = (x, y) => x + y;
Console.WriteLine(adder);

Considering it’s only two lines of code, there is a lot going on. Let’s start with the output. If you try to print out a regular delegate, the result will be just the type with no indication of the behavior. The output of listing 3.12 shows exactly what the expression tree does, though:

(x, y) => x + y

The compiler isn’t cheating by hardcoding a string somewhere. That string representation is constructed from the expression tree. This demonstrates that the code is available for examination at execution time, which is the whole point of expression trees.

Let’s look at the type of adder: Expression<Func<int, int, int>>. It’s simplest to split it into two parts: Expression<TDelegate> and Func<int, int, int>. The second part is used as a type argument to the first. The second part is a delegate type with two integer parameters and an integer return type. (The return type is expressed by the last type parameter, so a Func<string, double, int> would accept a string and a double as inputs and return an int.)

Expression<TDelegate> is the expression tree type associated with TDelegate, which must be a delegate type. (That’s not expressed as a type constraint, but it’s enforced at execution time.) This is only one of the many types involved in expression trees. They’re all in the System.Linq.Expressions namespace. The nongeneric Expression class is the abstract base class for all the other expression types, and it’s also used as a convenient container for factory methods to create instances of the concrete subclasses.

Our adder variable type is an expression tree representation of a function accepting two integers and returning an integer. You then use a lambda expression to assign a value to that variable. The compiler generates code to build the appropriate expression tree at execution time. In this case, it’s reasonably simple. You can write the same code yourself, as shown in the following listing.

Listing 3.13. Handwritten code to create an expression tree to add two integers
ParameterExpression xParameter = Expression.Parameter(typeof(int), "x");
ParameterExpression yParameter = Expression.Parameter(typeof(int), "y");
Expression body = Expression.Add(xParameter, yParameter);
ParameterExpression[] parameters = new[] { xParameter, yParameter };

Expression<Func<int, int, int>> adder =
    Expression.Lambda<Func<int, int, int>>(body, parameters);
Console.WriteLine(adder);

This is a small example, and it’s still significantly more long-winded than the lambda expression. By the time you add method calls, property accesses, object initializers, and so on, it gets complex and error prone. That’s why it’s so important that the compiler can do the work for you by converting lambda expressions into expression trees. There are a few rules around this, though.

Limitations of conversions to expression trees

The most important restriction is that only expression-bodied lambda expressions can be converted to expression trees. Although our earlier lambda expression of (x, y) => x + y was fine, the following code would cause a compilation error:

Expression<Func<int, int, int>> adder = (x, y) => { return x + y; };

The expression tree API has expanded since .NET 3.5 to include blocks and other constructs, but the C# compiler still has this restriction, and it’s consistent with the use of expression trees for LINQ. This is one reason that object and collection initializers are so important: they allow initialization to be captured in a single expression, which means it can be used in an expression tree.

Additionally, the lambda expression can’t use the assignment operator, or use C# 4’s dynamic typing, or use C# 5’s asynchrony. (Although object and collection initializers do use the = symbol, that’s not the assignment operator in that context.)

Compiling expression trees to delegates

The ability to execute queries against remote data sources, as I referred to earlier, isn’t the only use for expression trees. They can be a powerful way of constructing efficient delegates dynamically at execution time, although this is typically an area where at least part of the expression tree is built with handwritten code rather than converted from a lambda expression.

Expression<TDelegate> has a Compile() method that returns the delegate type. You can then handle this delegate as you do any other. As a trivial example, the following listing takes our earlier adder expression tree, compiles that to a delegate, and then invokes it, producing an output of 5.

Listing 3.14. Compiling an expression tree to a delegate and invoking the result
Expression<Func<int, int, int>> adder = (x, y) => x + y;
Func<int, int, int> executableAdder = adder.Compile();     1
Console.WriteLine(executableAdder(2, 3));                  2

  • 1 Compiles the expression tree to a delegate
  • 2 Invokes the delegate as normal

This approach can be used in conjunction with reflection for property access and method invocation to produce delegates and then cache them. The result is as efficient as if you’d written the equivalent code by hand. For a single method call or property access, there are already methods to create delegates directly, but sometimes you need additional conversion or manipulation steps, which are easily represented in expression trees.

We’ll come back to why expression trees are so important in LINQ when we tie everything together. You have only two more language features to look at. Extension methods come next.

3.6. Extension methods

Extension methods sound pointless when they’re first described. They’re static methods that can be called as if they’re instance methods, based on their first parameter. Suppose you have a static method call like this:

ExampleClass.Method(x, y);

If you turn ExampleClass.Method into an extension method, you can call it like this instead:

x.Method(y);

That’s all extension methods do. It’s one of the simplest transformations the C# compiler does. It makes all the difference in terms of code readability when it comes to chaining method calls together, however. You’ll look at that later, finally using real examples from LINQ, but first let’s look at the syntax.

3.6.1. Declaring an extension method

Extension methods are declared by adding the keyword this before the first parameter. The method must be declared in a non-nested, nongeneric static class, and until C# 7.2, the first parameter can’t be a ref parameter. (You’ll see more about that in section 13.5.) Although the class containing the method can’t be generic, the extension method itself can be.

The type of the first parameter is sometimes called the target of the extension method and sometimes called the extended type. (The specification doesn’t give this concept a name, unfortunately.)

As an example from Noda Time, we have an extension method to convert from DateTimeOffset to Instant. There’s already a static method within the Instant struct to do this, but it’s useful to have as an extension method, too. Listing 3.15 shows the code for the method. For once, I’ve included the namespace declaration, as that’s going to be important when you see how the C# compiler finds extension methods.

Listing 3.15. ToInstant extension method targeting DateTimeOffset from Noda Time
using System;

namespace NodaTime.Extensions
{
    public static class DateTimeOffsetExtensions
    {
        public static Instant ToInstant(this DateTimeOffset dateTimeOffset)
        {
            return Instant.FromDateTimeOffset(dateTimeOffset);
        }
    }
}

The compiler adds the [Extension] attribute to both the method and the class declaring it, and that’s all. This attribute is in the System.Runtime.CompilerServices namespace. It’s a marker indicating the intent that a developer should be able to call ToInstant() as if it were declared as an instance method in DateTimeOffset.

3.6.2. Invoking an extension method

You’ve already seen the syntax to invoke an extension method: you call it as if it were an instance method on the type of the first parameter. But you need to make sure that the compiler can find the method as well.

First, there’s a matter of priority: if there’s a regular instance method that’s valid for the method invocation, the compiler will always prefer that over an extension method. It doesn’t matter whether the extension method has “better” parameters; if the compiler can use an instance method, it won’t even look for extension methods.

After it has exhausted its search for instance methods, the compiler will look for extension methods based on the namespace the calling code is in and any using directives present. Suppose you’re making a call from the ExtensionMethodInvocation class in the CSharpInDepth.Chapter03 namespace.[3] The following listing shows how to do that, giving the compiler all the information it needs to find the extension method.

3

If you’re following along with the downloaded code, you may have noticed that the samples are in namespaces of Chapter01, Chapter02, and so on, for simplicity. I’ve made an exception here for the sake of showing the hierarchical nature of the namespace checks.

Listing 3.16. Invoking the ToInstant() extension method outside Noda Time
using NodaTime.Extensions;                          1
using System;

namespace CSharpInDepth.Chapter03
{
    class ExtensionMethodInvocation
    {
        static void Main()
        {
            var currentInstant =
                DateTimeOffset.UtcNow.ToInstant();  2
            Console.WriteLine(currentInstant);
        }
    }
}

  • 1 Imports the NodaTime.Extensions namespace
  • 2 Calls the extension method

The compiler will check for extension methods in the following:

  • Static classes in the CSharpInDepth.Chapter03 namespace.
  • Static classes in the CSharpInDepth namespace.
  • Static classes in the global namespace.
  • Static classes in namespaces specified with using namespace directives. (Those are the using directives that just specify a namespace, like using System.)
  • In C# 6 only, static classes specified with using static directives. We’ll come back to that in section 10.1.

The compiler effectively works its way outward from the deepest namespace out toward the global namespace and looks at each step for static classes either in that namespace or provided by classes made available by using directives in the namespace declaration. The details of the ordering are almost never important. If you find yourself in a situation where moving a using directive changes which extension method is used, it’s probably best to rename one of them. But it’s important to understand that within each step, multiple extension methods can be found that would be valid for the call. In that situation, the compiler performs normal overload resolution between all the extension methods it found in that step. After the compiler has located the right method to invoke, the IL it generates for the call is exactly the same as if you’d written a regular static method call instead of using its capabilities as an extension method.

Extension methods can be called on null values

Extension methods differ from instance methods in terms of their null handling. Let’s look back at our initial example:

x.Method(y);

If Method were an instance method and x were a null reference, that would throw a NullReferenceException. Instead, if Method is an extension method, it’ll be called with x as the first argument even if x is null. Sometimes the method will specify that the first argument must not be null, in which case it should validate it and throw an ArgumentNullException. In other cases, the extension method may have been explicitly designed to handle a null first argument gracefully.

Let’s get back to why extension methods are important to LINQ. It’s time for our first query.

3.6.3. Chaining method calls

Listing 3.17 shows a simple query. It takes a sequence of words, filters them by length, orders them in the natural way, and then converts them to uppercase. It uses lambda expressions and extension methods but no other C# 3 features. We’ll put everything else together at the end of the chapter. For the moment, I want to focus on the readability of this simple code.

Listing 3.17. A simple query on strings
string[] words = { "keys", "coat", "laptop", "bottle" };  1
IEnumerable<string> query = words
    .Where(word => word.Length > 4)                       2
    .OrderBy(word => word)                                2
    .Select(word => word.ToUpper());                      2

foreach (string word in query)                            3
{                                                         3
    Console.WriteLine(word);                              3
}                                                         3

  • 1 A simple data source
  • 2 Filters, orders, transforms
  • 3 Displays the results

Notice the ordering of the Where, OrderBy, and Select calls in our code. That’s the order in which the operations happen. The lazy and streaming-where-possible nature of LINQ makes it complicated to talk about exactly what happens when, but the query reads in the same order as it executes. The following listing is the same query but without taking advantage of the fact that these methods are extension methods.

Listing 3.18. A simple query without using extension methods
string[] words = { "keys", "coat", "laptop", "bottle" };
IEnumerable<string> query =
    Enumerable.Select(
        Enumerable.OrderBy(
            Enumerable.Where(words, word => word.Length > 4),
            word => word),
        word => word.ToUpper());

I’ve formatted listing 3.18 as readably as I can, but it’s still awful. The calls are laid out in the opposite order in the source code to how they’ll execute: Where is the first thing to execute but the last method call in the listing. Next, it’s not obvious which lambda expression goes with which call: word => word.ToUpper() is part of the Select call, but a huge amount of code is between those two pieces of text.

You can tackle this in another way by assigning the result of each method call to a local variable and then making the method call via that. Listing 3.19 shows one option for doing this. (In this case, you could’ve just declared the query to start with and reassigned it on each line, but that wouldn’t always be the case.) This time, I’ve also used var, just for brevity.

Listing 3.19. A simple query in multiple statements
string[] words = { "keys", "coat", "laptop", "bottle" };
var tmp1 = Enumerable.Where(words, word => word.Length > 4);
var tmp2 = Enumerable.OrderBy(tmp1, word => word);
var query = Enumerable.Select(tmp2, word => word.ToUpper());

This is better than listing 3.18; the operations are back in the right order, and it’s obvious which lambda expression is used for which operation. But the extra local variable declarations are a distraction, and it’s easy to end up using the wrong one.

The benefits of method chaining aren’t limited to LINQ, of course. Using the result of one call as the starting point of another call is common. But extension methods allow you to do this in a readable way for any type, rather than the type itself declaring the methods that support chaining. IEnumerable<T> doesn’t know anything about LINQ; its sole responsibility is to represent a general sequence. It’s the System.Linq.Enumerable class that adds all the operations for filtering, grouping, joining, and so on.

C# 3 could’ve stopped here. The features described so far would already have added a lot of power to the language and enabled many LINQ queries to be written in a perfectly readable form. But when queries get more complex, particularly when they include joins and groupings, using the extension methods directly can get complicated. Enter query expressions.

3.7. Query expressions

Although almost all features in C# 3 contribute to LINQ, only query expressions are specific to LINQ. Query expressions allow you to write concise code by using query-specific clauses (select, where, let, group by, and so on). The query is then translated into a nonquery form by the compiler and compiled as normal.[4] Let’s start with a brief example to make this clearer. As a reminder, in listing 3.17 you had this query:

4

This sounds like macros in C, but it’s a little more involved than that. C# still doesn’t have macros.

IEnumerable<string> query = words
    .Where(word => word.Length > 4)
    .OrderBy(word => word)
    .Select(word => word.ToUpper());

The following listing shows the same query written as a query expression.

Listing 3.20. Introductory query expression with filtering, ordering, and projection
IEnumerable<string> query = from word in words
                            where word.Length > 4
                            orderby word
                            select word.ToUpper();

The section of listing 3.20 in bold is the query expression, and it’s very concise indeed. The repetitive use of word as a parameter to lambda expressions has been replaced by specifying the name of a range variable once in the from clause, and then using it in each of the other clauses. What happens to the query expression in listing 3.20?

3.7.1. Query expressions translate from C# to C#

In this book, I’ve expressed many language features in terms of more C# source code. For example, when looking at captured variables in section 3.5.2, I showed C# code that you could’ve written to achieve the same result as using a lambda expression. That’s just for the purpose of explaining the code generated by the compiler. I wouldn’t expect the compiler to generate any C#. The specification describes the effects of capturing variables rather than a source code translation.

Query expressions work differently. The specification describes them as a syntactic translation that occurs before any overload resolution or binding. The code in listing 3.20 doesn’t just have the same eventual effect as listing 3.17; it’s really translated into the code in listing 3.17 before further processing. The language has no specific expectation about what the result of that further processing will be. In many cases, the result of the translation will be calls to extension methods, but that’s not required by the language specification. They could be instance method calls or invocations of delegates returned by properties named Select, Where, and so on.

The specification of query expressions puts in place an expectation of certain methods being available, but there’s no specific requirement for them all to be present. For example, if you write an API with suitable Select, OrderBy, and Where methods, you could use the kind of query shown in listing 3.20 even though you couldn’t use a query expression that includes a join clause.

Although we’re not going to look at every clause available in query expressions in detail, I need to draw your attention to two related concepts. In part, these provide greater justification for the language designers introducing query expressions into the language.

3.7.2. Range variables and transparent identifiers

Query expressions introduce range variables, which aren’t like any other regular variables. They act as the per item input within each clause of the query. You’ve already seen how the from clause at the start of a query expression introduces a range variable. Here’s the query expression from listing 3.20 again with the range variable highlighted:

from word in words         1
where word.Length > 4    2
orderby word             2
select word.ToUpper()    2

  • 1 Introduces range variable in a from clause
  • 2 Uses the range variable in the following clauses

That’s simple to understand when there’s only one range variable, but that initial from clause isn’t the only way a range variable can be introduced. The simplest example of a clause that introduces a new range variable is probably let. Suppose you want to refer to the length of the word multiple times in your query without having to call the Length property every time. For example, you could orderby it and include it in the output. The let clause allows you to write the query as shown in the following listing.

Listing 3.21. A let clause introducing a new range variable
from word in words
let length = word.Length
where length > 4
orderby length
select string.Format("{0}: {1}", length, word.ToUpper());

You now have two range variables in scope at the same time, as you can see from the use of both length and word in the select clause. That raises the question of how this can be represented in the query translation. You need a way of taking our original sequence of words and creating a sequence of word/length pairs, effectively. Then within the clauses that can use those range variables, you need to access the relevant item within the pair. The following listing shows how listing 3.21 is translated by the compiler using an anonymous type to represent the pair of values.

Listing 3.22. Query translation using a transparent identifier
words.Select(word => new { word, length = word.Length })
     .Where(tmp => tmp.length > 4)
     .OrderBy(tmp => tmp.length)
     .Select(tmp =>
         string.Format("{0}: {1}", tmp.length, tmp.word.ToUpper()));

The name tmp here isn’t part of the query translation. The specification uses * instead, and there’s no indication of what name should be given to the parameter when building an expression tree representation of the query. The name doesn’t matter because you don’t see it when you write the query. This is called a transparent identifier.

I’m not going into all the details of query translation. That could be a whole chapter on its own. But I wanted to bring up transparent identifiers for two reasons. First, if you’re aware of how extra range variables are introduced, you won’t be surprised when you see them if you ever decompile a query expression. Second, they provide the biggest motivation for using query expressions, in my experience.

3.7.3. Deciding when to use which syntax for LINQ

Query expressions can be appealing, but they’re not always the simplest way of representing a query. They always require a from clause to start with and either a select or group by clause to end with. That sounds reasonable, but it means that if you want a query that performs a single filtering operation, for example, you end up with quite a lot of baggage. For example, if you take just the filtering part of our word-based query, you’d have the following query expression:

from word in words
where word.Length > 4
select word

Compare that with the method syntax version of the query:

words.Where(word => word.Length > 4)

They both compile to the same code,[5] but I’d use the second syntax for such a simple query.

5

The compiler has special handling for select clauses that select just the current query item.

Note

There’s no single ubiquitous term for not using query expression syntax. I’ve seen it called method syntax, dot syntax, fluent syntax, and lambda syntax, to name just four. I’ll call it method syntax consistently, but if you hear other terms for it, don’t try to look for a subtle difference in meaning.

Even when the query gets a little more complicated, method syntax can be more flexible. Many methods are available within LINQ that have no corresponding query expression syntax, including overloads of Select and Where that present the index of the item within the sequence as well as the item itself. Additionally, if you want a method call at the end of the query (for example, ToList() to materialize the result as a List<T>), you have to put the whole query expression in parentheses, whereas with method syntax you add the call on the end.

I’m not as down on query expressions as that may sound. In many cases, there’s no clear winner between the two syntax options, and I’d probably include our earlier filter, order, project example in that set. Query expressions really shine when the compiler is doing more work for you by handling all those transparent identifiers. You can do it all by hand, of course, but I’ve found that building up anonymous types as results and deconstructing them in each subsequent step gets annoying quickly. Query expressions make all of that much easier.

The upshot of all of this is that I strongly recommend that you become comfortable in both styles of query. If you tie yourself to always using query expressions or never using query expressions, you’ll be missing out on opportunities to make your code more readable. We’ve covered all the features in C# 3, but I’m going to take a moment to step back and show how they fit together to form LINQ.

3.8. The end result: LINQ

I’m not going to attempt to cover the various LINQ providers available these days. The LINQ technology I use most (by far) is LINQ to Objects, using the Enumerable static class and delegates. But in order to show how all the pieces come into play, let’s imagine that you have a query from something like Entity Framework. This isn’t real code that you can test, but it would be fine if you had a suitable database structure:

var products = from product in dbContext.Products
               where product.StockCount > 0
               orderby product.Price descending
               select new { product.Name, product.Price };

In this single example of a mere four lines, all of these features are used:

  • Anonymous types, including projection initializers (to select just the name and price of the product)
  • Implicit typing using var, because otherwise you couldn’t declare the type of the products variable in a useful way
  • Query expressions, which you could do without in this case, but which make life a lot simpler for more-complicated queries
  • Lambda expressions, which are the result of the query expression translation
  • Extension methods, which allow the translated query to be expressed via the Queryable class because of dbContext.Products implementing IQueryable<Product>
  • Expression trees, which allow the logic in the query to be passed to the LINQ provider as data, so it can be converted into SQL and executed efficiently at the database

Take away any one of these features, and LINQ would be significantly less useful. Sure, you could have in-memory collection processing without expression trees. You could write readable simple queries without query expressions. You could have dedicated classes with all the relevant methods without using extension methods. But it all fits together beautifully.

Summary

  • All the features in C# 3 are related to working with data in some form or other, and most are critical parts of LINQ.
  • Automatically implemented properties provide a concise way of exposing state that doesn’t need any extra behavior.
  • Implicit typing with the var keyword (and for arrays) is necessary for working with anonymous types but also convenient to avoid long-winded repetition.
  • Object and collection initializers make initialization simpler and more readable. They also allow initialization to occur as a single expression, which is crucial for working with other aspects of LINQ.
  • Anonymous types allow you to effectively create a type just for a single local purpose in a lightweight way.
  • Lambda expressions provide an even simpler way of constructing delegates than anonymous methods. They also allow code to be expressed as data via expression trees, which can be used by LINQ providers to convert C# queries into other forms such as SQL.
  • Extension methods are static methods that can be called as if they were instance methods elsewhere. This allows for fluent interfaces to be written even for types that weren’t originally designed that way.
  • Query expressions are translated into more C# that uses lambda expressions to express the query. Although these are great for complex queries, simpler ones are often easier to write using method syntax.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset