The new features of C# 2 were mostly independent of each other. Nullable value types depended on generics, but they were still separate features that didn’t build toward a common goal.
C# 3 was different. It consisted of many new features, each of which was useful in its own right, but almost all of which built toward the larger goal of LINQ. This chapter shows each feature individually and then demonstrates how they fit together. The first feature we’ll look at is the only one that has no direct relationship with LINQ.
Prior to C# 3, every property had to be implemented manually with bodies for the get and/or set accessors. The compiler was happy to provide an implementation for field-like events but not properties. That meant there were a lot of properties like this:
private string name; public string Name { get { return name; } set { name = value; } }
Formatting would vary by code style, but whether the property was one long line, 11 short ones, or five lines in between (as in the preceding example), it was always just noise. It was a very long-winded way of expressing the intention to have a field and expose its value to callers via a property.
C# 3 made this much simpler by using automatically implemented properties (often referred to as automatic properties or even autoprops). These are properties with no accessor bodies; the compiler provides the implementation. The whole of the preceding code can be replaced with a single line:
public string Name { get; set; }
Note that there’s no field declaration in the source code now. There’s still a field, but it’s created for you automatically by the compiler and given a name that can’t be referred to anywhere in the C# code.
In C# 3, you can’t declare read-only automatically implemented properties, and you can’t provide an initial value at the point of declaration. Both of those features were introduced (finally!) in C# 6 and are described in section 8.2. Before C# 6, it was a reasonably common practice to fake read-only properties by giving them a private set accessor like this:
public string Name { get; private set; }
The introduction of automatically implemented properties in C# 3 had a huge effect in reducing boilerplate code. They’re useful only when the property simply fetches and sets the field value, but that accounts for a large proportion of properties in my experience.
As I mentioned, automatically implemented properties don’t directly contribute to LINQ. Let’s move on to the first feature that does: implicit typing for arrays and local variables.
In order to be as clear as possible about the features introduced in C# 3, I need to define a few terms first.
Many terms are used to describe the way programming languages interact with their type system. Some people use the terms weakly typed and strongly typed, but I try to avoid those because they’re not clearly defined and mean different things to different developers. Two other aspects have more consensus: static/dynamic typing and explicit/implicit typing. Let’s look at each of those in turn.
Languages that are statically typed are typically compiled languages; the compiler is able to determine the type of each expression and check that it’s used correctly. For example, if you make a method call on an object, the compiler can use the type information to check that there’s a suitable method to call based on the type of the expression the method is called on, the name of the method, and the number and types of the arguments. Determining the meaning of something like a method call or field access is called binding. Languages that are dynamically typed leave all or most of the binding to execution time.
As you’ll see in various places, some expressions in C# don’t have a type when considered in source code, such as the null literal. But the compiler always works out a type based on the context in which the expression is used, at which point that type can be used for checking how the expression is used.
Aside from the dynamic binding introduced in C# 4 (and described in chapter 4), C# is a statically typed language. Even though the choice of which implementation of a virtual method should be executed depends on the execution-time type of the object it’s called on, the binding process of determining the method signature all happens at compile time.
In a language that’s explicitly typed, the source code specifies all the types involved. This could be for local variables, fields, method parameters, or method return types, for example. A language that’s implicitly typed allows the developer to omit the types from the source code so some other mechanism (whether it’s a compiler or something at execution time) can infer which type is meant based on other context.
C# is mostly explicitly typed. Even before C# 3, there was some implicit typing, such as type inference for generic type arguments as you saw in section 2.1.4. Arguably, the presence of implicit conversions (such as int to long) make the language less explicitly typed, too.
With those different aspects of typing separated, you can look at the C# 3 features around implicit typing. We’ll start with implicitly typed local variables.
Implicitly typed local variables are variables declared with the contextual keyword var instead of the name of a type, such as the following:
var language = "C#";
The result of declaring a local variable with var instead of with the name of a type is still a local variable with a known type; the only difference is that the type is inferred by the compiler from the compile-time type of the value assigned to it. The preceding code will generate the exact same result as this:
string language = "C#";
When C# 3 first came out, a lot of developers avoided var because they thought it would remove a lot of compile-time checks or lead to execution-time performance problems. It doesn’t do that at all; it only infers the type of the local variable. After the declaration, the variable acts exactly as if it had been declared with an explicit type name.
The way the type is inferred leads to two important rules for implicitly typed local variables:
Here’s some invalid code to demonstrate these rules:
var x; 1 x = 10; 1 var y = null; 2
It would’ve been possible to avoid these rules in some cases by analyzing all the assignments performed to the variable and inferring the type from those. Some languages do that, but the C# language designers preferred to keep the rules as simple as possible.
Another restriction is that var can be used for only local variables. Many times I’ve longed for implicitly typed fields, but they’re still not available (as of C# 7.3, anyway).
In the preceding example, there was little benefit, if any, in using var. The explicit declaration is feasible and just as readable. There are generally three reasons for using var:
I’ll save examples of the first bullet point for section 3.4, but it’s easy to show the second. Suppose you want to create a dictionary that maps a name to a list of decimal values. You can do that with an explicitly typed variable:
Dictionary<string, List<decimal>> mapping = new Dictionary<string, List<decimal>>();
That’s really ugly. I had to wrap it on two lines just to make it fit on the page, and there’s a lot of duplication. That duplication can be entirely avoided by using var:
var mapping = new Dictionary<string, List<decimal>>();
This expresses the same amount of information in less text, so there’s less to distract you from other code. Of course, this works only when you want the type of the variable to be exactly the type of the initialization expression. If you wanted the type of the mapping variable to be IDictionary<string, List<decimal>>—the interface instead of the class—then var wouldn’t help. But for local variables, that sort of separation between interface and implementation is usually less important.
When I wrote the first edition of C# in Depth, I was wary of implicitly typed local variables. I rarely used them outside LINQ, apart from when I was calling a constructor directly, as in the preceding example. I was worried that I wouldn’t be able to easily work out the type of the variable when just reading the code.
Ten years later, that caution has mostly gone. I use var for almost all my local variables in test code and extensively in production code, too. My fears weren’t realized; in almost every case, I’m easily able to infer what the type should be just by inspection. Where that isn’t the case, I’ll happily use an explicit declaration instead.
I don’t claim to be entirely consistent about this, and I’m certainly not dogmatic. Because explicitly typed variables generate the exact same code as implicitly typed variables, it’s fine to change your mind later in either direction. I suggest you discuss this with the other people who’ll work with your code the most (whether those are colleagues or open source collaborators), get a sense of everyone’s comfort level, and try to abide by that. The other aspect of implicit typing in C# 3 is somewhat different. It’s not directly related to var, but it has the same aspect of removing a type name to let the compiler infer it.
Sometimes you need to create an array without populating it and keep all the elements with their default values. The syntax for that hasn’t changed since C# 1; it’s always something like this:
int[] array = new int[10];
But you often want to create an array with specific initial content. Before C# 3, there were two ways of doing this:
int[] array1 = { 1, 2, 3, 4, 5}; int[] array2 = new int[] { 1, 2, 3, 4, 5};
The first form of this is valid only when it’s part of a variable declaration that specifies the array type. This is invalid, for example:
int[] array; array = { 1, 2, 3, 4, 5 }; 1
The second form is always valid, so the second line in the preceding example could’ve been as follows:
array = new int[] { 1, 2, 3, 4, 5 };
C# 3 introduced a third form in which the type of the array is implicit based on the content:
array = new[] { 1, 2, 3, 4, 5 };
This can be used anywhere, so long as the compiler is able to infer the array element type from the array elements specified. It also works with multidimensional arrays, as in the following example:
var array = new[,] { { 1, 2, 3 }, { 4, 5, 6 } };
The next obvious question is how the compiler infers that type. As is so often the case, the precise details are complex in order to handle all kinds of corner cases, but the simplified sequence of steps is as follows:
The array element type must be the type of one of the expressions in the array initializer. There’s no attempt to find a common base class or a commonly implemented interface. Table 3.1 gives some examples that illustrate the rules.
Implicitly typed arrays are mostly a convenience to reduce the source code required except for anonymous types, where the array type can’t be stated explicitly even if you want to. Even so, they’re a convenience I’d definitely miss now if I had to work without them.
The next feature continues the theme of making it simpler to create and initialize objects, but in a different way.
Object initializers and collection initializers make it easy to create new objects or collections with initial values, just as you can create and populate an array in a single expression. This functionality is important for LINQ because of the way queries are translated, but it turns out to be extremely useful elsewhere, too. It does require types to be mutable, which can be annoying if you’re trying to write code in a functional style, but where you can apply it, it’s great. Let’s look at a simple example before diving into the details.
As a massively oversimplified example, let’s consider what an order in an e-commerce system might look like. The following listing shows three classes to model an order, a customer, and a single item within an order.
public class Order { private readonly List<OrderItem> items = new List<OrderItem>(); public string OrderId { get; set; } public Customer Customer { get; set; } public List<OrderItem> Items { get { return items; } } } public class Customer { public string Name { get; set; } public string Address { get; set; } } public class OrderItem { public string ItemId { get; set; } public int Quantity { get; set; } }
How do you create an order? Well, you need to create an instance of Order and assign to its OrderId and Customer properties. You can’t assign to the Items property, because it’s read-only. Instead, you can add items to the list it returns. The following listing shows how you might do this if you didn’t have object and collection initializers and couldn’t change the classes to make things simpler.
var customer = new Customer(); 1 customer.Name = "Jon"; 1 customer.Address = "UK"; 1 var item1 = new OrderItem(); 2 item1.ItemId = "abcd123"; 2 item1.Quantity = 1; 2 var item2 = new OrderItem(); 3 item2.ItemId = "fghi456"; 3 item2.Quantity = 2; 3 var order = new Order(); 4 order.OrderId = "xyz"; 4 order.Customer = customer; 4 order.Items.Add(item1); 4 order.Items.Add(item2); 4
This code could be simplified by adding constructors to the various classes to initialize properties based on the parameters. Even with object and collection initializers available, that’s what I’d do. But for the sake of brevity, I’m going to ask you to trust me that it’s not always feasible, for all kinds of reasons. Aside from anything else, you don’t always control the code for the classes you’re using. Object and collection initializers make it much simpler to create and populate our order, as shown in the following listing.
var order = new Order { OrderId = "xyz", Customer = new Customer { Name = "Jon", Address = "UK" }, Items = { new OrderItem { ItemId = "abcd123", Quantity = 1 }, new OrderItem { ItemId = "fghi456", Quantity = 2 } } };
I can’t speak for everyone, but I find listing 3.3 much more readable than listing 3.2. The structure of the object becomes apparent in the indentation, and less repetition occurs. Let’s look more closely at each part of the code.
Syntactically, an object initializer is a sequence of member initializers within braces. Each member initializer is of the form property = initializer-value, where property is the name of the field or property being initialized and initializer-value is an expression, a collection initializer, or another object initializer.
Object initializers are most commonly used with properties, and that’s how I’ve described them in this chapter. Fields don’t have accessors, but the obvious equivalents apply: reading the field instead of calling a get accessor and writing the field instead of calling a set accessor.
Object initializers can be used only as part of a constructor call or another object initializer. The constructor call can specify arguments as usual, but if you don’t want to specify any arguments, you don’t need an argument list at all, so you can omit the (). A constructor call without an argument list is equivalent to supplying an empty argument list. For example, these two lines are equivalent:
Order order = new Order() { OrderId = "xyz" }; Order order = new Order { OrderId = "xyz" };
You can omit the constructor argument list only if you provide an object or collection initializer. This is invalid:
Order order = new Order; 1
An object initializer simply says how to initialize each of the properties it mentions in its member initializers. If the initializer-value part (the part to the right of the = sign) is a normal expression, that expression is evaluated, and the value is passed to the property set accessor. That’s how most of the object initializers in listing 3.3 work. The Items property uses a collection initializer, which you’ll see shortly.
If initializer-value is another object initializer, the set accessor is never called. Instead, the get accessor is called, and then the nested object initializer is applied to the value returned by the property. As an example, listing 3.4 creates an HttpClient and modifies the set of default headers that are sent with each request. The code sets the From and Date headers, which I chose only because they’re the simplest ones to set.
HttpClient client = new HttpClient { DefaultRequestHeaders = 1 { From = "[email protected]", 2 Date = DateTimeOffset.UtcNow 3 } };
The code in listing 3.4 is equivalent to the following code:
HttpClient client = new HttpClient(); var headers = client.DefaultRequestHeaders; headers.From = "[email protected]"; headers.Date = DateTimeOffset.UtcNow;
A single object initializer can include a mixture of nested object initializers, collection initializers, and normal expressions in the sequence of member initializers. Speaking of collection initializers, let’s look at those now.
Syntactically, a collection initializer is a comma-separated list of element initializers in curly braces. Each element initializer is either a single expression or a comma-separated list of expressions also in curly braces. Collection initializers can be used only as part of a constructor call or part of an object initializer. Further restrictions exist on the types they can be used with, which we’ll come to shortly. In listing 3.3, you saw a collection initializer being used as part of an object initializer. Here’s the listing again with the collection initializer highlighted in bold:
var order = new Order { OrderId = "xyz", Customer = new Customer { Name = "Jon", Address = "UK" }, Items = { new OrderItem { ItemId = "abcd123", Quantity = 1 }, new OrderItem { ItemId = "fghi456", Quantity = 2 } } };
Collection initializers might be more commonly used when creating new collections, though. For example, this line declares a new variable for a list of strings and populates the list:
var beatles = new List<string> { "John", "Paul", "Ringo", "George" };
The compiler compiles that into a constructor call followed by a sequence of calls to an Add method:
var beatles = new List<string>(); beatles.Add("John"); beatles.Add("Paul"); beatles.Add("Ringo"); beatles.Add("George");
But what if the collection type you’re using doesn’t have an Add method with a single parameter? That’s where element initializers with braces come in. After List<T>, the second most common generic collection is probably Dictionary<TKey, TValue> with an Add(key, value) method. A dictionary can be populated with a collection initializer like this:
var releaseYears = new Dictionary<string, int> { { "Please please me", 1963 }, { "Revolver", 1966 }, { "Sgt. Pepper's Lonely Hearts Club Band", 1967 }, { "Abbey Road", 1970 } };
The compiler treats each element initializer as a separate Add call. If the element initializer is a simple one without braces, the value is passed as a single argument to Add. That’s what happened for the elements in our List<string> collection initializer.
If the element initializer uses braces, it’s still treated as a single call to Add, but with one argument for each expression within the braces. The preceding dictionary example is effectively equivalent to this:
var releaseYears = new Dictionary<string, int>(); releaseYears.Add("Please please me", 1963); releaseYears.Add("Revolver", 1966); releaseYears.Add("Sgt. Pepper's Lonely Hearts Club Band", 1967); releaseYears.Add("Abbey Road", 1970);
Overload resolution then proceeds as normal to find the most appropriate Add method, including performing type inference if there are any generic Add methods.
Collection initializers are valid only for types that implement IEnumerable, although they don’t have to implement IEnumerable<T>. The language designers looked at the types in the framework that had Add methods and determined that the best way of separating them into collections and noncollections was to look at whether they implemented IEnumerable. As an example of why that’s important, consider the DateTime.Add(TimeSpan) method. The DateTime type clearly isn’t a collection, so it’d be odd to be able to write this:
DateTime invalid = new DateTime(2020, 1, 1) { TimeSpan.FromDays(10) }; 1
The compiler never uses the implementation of IEnumerable when compiling a collection initializer. I’ve sometimes found it convenient to create types in test projects with Add methods and an implementation of IEnumerable that just throws a NotImplementedException. This can be useful for constructing test data, but I don’t advise doing it in production code. I’d appreciate an attribute that let me express the idea that this type should be usable for collection initializers without implementing IEnumerable, but I doubt that’ll ever happen.
You may be wondering what all of this has to do with LINQ. I said that almost all the features in C# 3 built up to LINQ, so how do object and collection initializers fit into the picture? The answer is that other LINQ features require code to be expressible as a single expression. (For example, in a query expression, you can’t write a select clause that requires multiple statements to produce the output for a given input.)
The ability to initialize new objects in a single expression isn’t useful only for LINQ, however. It can also be important to simplify field initializers, method arguments, or even the operands in a conditional ?: operator. I find it particularly useful for static field initializers to build up useful lookup tables, for example. Of course, the larger the initialization expression becomes, the more you may want to consider separating it out.
It’s even recursively important to the feature itself. For example, if we couldn’t use an object initializer to create our OrderItem objects, the collection initializer wouldn’t be nearly as convenient to populate the Order.Items property.
In the rest of this book, whenever I refer to a new or improved feature as having a special case for a single expression (such as lambda expressions in section 3.5 or expression-bodied members in section 8.3), it’s worth remembering that object and collection initializers immediately make that feature more useful than it’d be otherwise.
Object and collection initializers allow for more concise code to create an instance of a type and populate it, but they do require that you already have an appropriate type to construct. Our next feature, anonymous types, allows you to create objects without even declaring the type of the object beforehand. It’s not quite as strange as it sounds.
Anonymous types allow you to build objects that you can refer to in a statically typed way without having to declare a type beforehand. This sounds like types might be created dynamically at execution time, but the reality is a little more subtle than that. We’ll look at what anonymous types look like in source code, how the compiler handles them, and a few of their limitations.
The simplest way to explain anonymous types is to start with an example. The following listing shows a simple piece of code to create an object with Name and Score properties.
var player = new 1 { 1 Name = "Rajesh", 1 Score = 3500 1 }; 1 Console.WriteLine("Player name: {0}", player.Name); 2 Console.WriteLine("Player score: {0}", player.Score); 2
This brief example demonstrates important points about anonymous types:
That’s what anonymous types look like, but what are they used for? This is where LINQ comes in. When performing a query, whether that’s using an SQL database as the underlying data store or using a collection of objects, it’s common to want a specific shape of data that isn’t the original type and may not have much meaning outside the query.
For example, suppose you’re building a query using a set of people, each of which has expressed a favorite color. You might want the result to be a histogram: each entry in the resulting collection is the color and the number of people who chose that as their favorite. That type representing a favorite color and type isn’t likely to be useful anywhere else, but it is useful in this specific context. Anonymous types allow us to express those one-off cases concisely without losing the benefits of static typing.
If you’re familiar with Java, you may be wondering about the relationship between C#’s anonymous types and Java’s anonymous classes. They sound like they’d be similar, but they differ greatly both in syntax and purpose.
Historically, the principal use for anonymous classes in Java was to implement interfaces or extend abstract classes to override just one or two methods. C#’s anonymous types don’t allow you to implement an interface or derive from any class other than System.Object; their purpose is much more about data than executable code.
C# provides one extra piece of shorthand in anonymous object creation expressions where you’re effectively copying a property or field from somewhere else and you’re happy to use the same name. This syntax is called a projection initializer. To give an example, let’s go back to our simplified e-commerce data model. You have three classes:
At some point in your code, you may want an object with all this information for a specific order item. If you have variables of the relevant types called order, customer, and item, you can easily use an anonymous type to represent the flattened information:
var flattenedItem = new { order.OrderId, CustomerName = customer.Name, customer.Address, item.ItemId, item.Quantity };
In this example, every property except CustomerName uses a projection initializer. The result is identical to this code, which specifies the property names in the anonymous type explicitly:
var flattenedItem = new { OrderId = order.OrderId, CustomerName = customer.Name, Address = customer.Address, ItemId = item.ItemId, Quantity = item.Quantity };
Projection initializers are most useful when you’re either performing a query and want to select only a subset of properties or to combine properties from multiple objects into one. If the name you want to give the property in the anonymous type is the same as the name of the field or property you’re copying from, the compiler can infer that name for you. So instead of writing this
SomeProperty = variable.SomeProperty
you can just write this:
variable.SomeProperty
Projection initializers can significantly reduce the amount of duplication in your source code if you’re copying multiple properties. It can easily make the difference between an expression being short enough to keep on one line or long enough to merit a separate line per property.
Although it’s accurate to say that the results of the two preceding listings are the same, that doesn’t mean they behave identically in other ways. Consider a rename of the Address property to CustomerAddress.
In the version with projection initializers, the property name in the anonymous type would change too. In the version with the explicit property name, it wouldn’t. That’s rarely an issue in my experience, but it’s worth being aware of as a difference.
I’ve described the syntax of anonymous types, and you know the resulting objects have properties you can use as if they were normal types. But what’s going on behind the scenes?
Although the type never appears in source code, the compiler does generate a type. There’s no magic for the runtime to contend with; it just sees a type that happens to have a name that would be invalid in C#. That type has a few interesting aspects to it. Some are guaranteed by the specification; others aren’t. When using the Microsoft C# compiler, an anonymous type has the following characteristics:
The last point is important for variable reassignment and for implicitly typed arrays using anonymous types. In my experience, it’s relatively rare that you want to reassign a variable initialized with an anonymous type, but it’s nice that it’s feasible. For example, this is entirely valid:
var player = new { Name = "Pam", Score = 4000 }; player = new { Name = "James", Score = 5000 };
Likewise, it’s fine to create an array by using anonymous types using the implicitly typed array syntax described in section 3.2.3:
var players = new[] { new { Name = "Priti", Score = 6000 }, new { Name = "Chris", Score = 7000 }, new { Name = "Amanda", Score = 8000 }, };
Note that the properties must have the same names and types and be in the same order for two anonymous object creation expressions to use the same type. For example, this would be invalid because the order of properties in the second array element is different from the others:
var players = new[] { new { Name = "Priti", Score = 6000 }, new { Score = 7000, Name = "Chris" }, new { Name = "Amanda", Score = 8000 }, };
Although each array element is valid individually, the type of the second element stops the compiler from inferring the array type. The same would be true if you added an extra property or changed the type of one of the properties.
Although anonymous types are useful within LINQ, that doesn’t make this feature the right tool for every problem. Let’s look briefly at places you may not want to use them.
Anonymous types are great when you want a localized representation of just data. By localized, I mean that the data shape you’re interested in is relevant only within that specific method. As soon as you want to represent the same shape in multiple places, you need to look for a different solution. Although it’s possible to return instances of anonymous types from methods or accept them as parameters, you can do so only by using either generics or the object type. The fact that the types are anonymous prevents you from expressing them in method signatures.
Until C# 7, if you wanted to use a common data structure in more than one method, you’d normally declare your own class or struct for it. C# 7 has introduced tuples, as you’ll see in chapter 11, which can work as an alternative solution, depending on how much encapsulation you desire.
Speaking of encapsulation, anonymous types basically don’t provide any. You can’t place any validation in the type or add extra behavior to it. If you find yourself wanting to do so, that’s a good indication that you should probably be creating your own type instead.
Finally, I mentioned earlier that using anonymous types across assemblies via C# 4’s dynamic typing is made more difficult because the types are internal. I’ve usually seen this attempted in MVC web applications where the model for a page may be built using anonymous types and then accessed in the view using the dynamic type (which you’ll look at in chapter 4). This works if either the two pieces of code are in the same assembly or the assembly containing the model code has made its internal members visible to the assembly containing the view code using [InternalsVisibleTo]. Depending on the framework you’re using, it may be awkward to arrange for either of these to be true. Given the benefits of static typing anyway, I generally recommend declaring the model as a regular type instead. It’s more up-front work than using an anonymous type but is likely to save you time in the long term.
Visual Basic has anonymous types too, but they don’t behave in quite the same way. In C#, all properties are used in determining equality and hash codes, and they’re all read-only. In VB, only properties declared with the Key modifier behave like that. Nonkey properties are read/write and don’t affect equality or hash codes.
We’re about halfway through the C# 3 features, and so far they’ve all had to do with data. The next features focus more on executable code, first with lambda expressions and then extension methods.
In chapter 2, you saw how anonymous methods made it much easier to create delegate instances by including their code inline like this:
Action<string> action = delegate(string message) 1 { 1 Console.WriteLine("In delegate: {0}", message); 1 }; 1 action("Message"); 2
Lambda expressions were introduced in C# 3 to make this even more concise. The term anonymous function is used to refer to both anonymous methods and lambda expressions. I’ll use it at various points in the rest of this book, and it’s widely used in the C# specification.
The name lambda expressions comes from lambda calculus, a field of mathematics and computer science started by Alonzo Church in the 1930s. Church used the Greek lambda character (λ) in his notation for functions, and the name stuck.
There are various reasons that it was useful for the language designers to put so much effort into streamlining delegate instance creation, but LINQ is the most important one. When you look at query expressions in section 3.7, you’ll see that they’re effectively translated into code that uses lambda expressions. You can use LINQ without using query expressions, though, and that almost always involves using lambda expressions directly in your source code.
First, we’ll look at the syntax for lambda expressions and then some of the details of how they behave. Finally, we’ll talk about expression trees that represent code as data.
The basic syntax for lambda expressions is always of this form:
parameter-list => body
Both the parameter list and the body, however, have multiple representations. In its most explicit form, the parameter list for a lambda expression looks like a normal method or anonymous method parameter list. Likewise, the body of a lambda expression can be a block: a sequence of statements all within a pair of curly braces. In this form, the lambda expression looks similar to an anonymous method:
Action<string> action = (string message) => { Console.WriteLine("In delegate: {0}", message); }; action("Message");
So far, this doesn’t look much better; you’ve traded the delegate keyword for =>, but that’s all. But special cases allow the lambda expression to become shorter.
Let’s start by making the body more concise. A body that consists of just a return statement or a single expression can be reduced to that single expression. The return keyword is removed if there was one. In the preceding example, the body of our lambda expression was just a method invocation, so you can simplify it:
Action<string> action = (string message) => Console.WriteLine("In delegate: {0}", message);
You’ll look at an example returning a value shortly. Lambda expressions shortened like this are said to have expression bodies, whereas lambda expressions using braces are said to have statement bodies.
Next, you can make the parameter list shorter if the compiler can infer the parameter types based on the type you’re attempting to convert the lambda expression to. Lambda expressions don’t have a type but are convertible to compatible delegate types, and the compiler can often infer the parameter type as part of that conversion.
For example, in the preceding code, the compiler knows that an Action<string> has a single parameter of type string, so it’s capable of inferring that parameter type. When the compiler can infer the parameter type, you can omit it. Therefore, our example can be shortened:
Action<string> action = (message) => Console.WriteLine("In delegate: {0}", message);
Finally, if the lambda expression has exactly one parameter, and that parameter’s type is inferred, the parentheses can be dropped from the parameter list:
Action<string> action = message => Console.WriteLine("In delegate: {0}", message);
Now let’s look at a couple of examples that return values. In each case, you’ll apply every step you can to make it shorter. First, you’ll construct a delegate to multiply two integers together and return the result:
Func<int, int, int> multiply = 1
(int x, int y) => { return x * y; }; 1
Func<int, int, int> multiply = (int x, int y) => x * y; 2
Func<int, int, int> multiply = (x, y) => x * y; 3
(Two parameters, so you can't remove parentheses)
Next, you’ll use a delegate to take the length of a string, multiply that length by itself, and return the result:
Func<string, int> squareLength = (string text) => 1
{
int length = text.Length;
return length * length;
};
Func<string, int> squareLength = (text) => 2
{
int length = text.Length;
return length * length;
};
Func<string, int> squareLength = text => 3
{
int length = text.Length;
return length * length;
};
(Can't do anything else immediately; body has two statements)
If you were happy to evaluate the Length property twice, you could reduce this second example:
Func<string, int> squareLength = text => text.Length * text.Length;
That’s not the same kind of change as the others, though; that’s changing the behavior (however slightly) rather than just the syntax. It may seem odd to have all of these special cases, but in practice all of them apply in a large number of cases, particularly within LINQ. Now that you understand the syntax, you can start looking at the behavior of the delegate instance, particularly in terms of any variables it has captured.
In section 2.3.2, when I described captured variables in anonymous methods, I promised that we’d return to the topic in the context of lambda expressions. This is probably the most confusing part of lambda expressions. It’s certainly been the cause of lots of Stack Overflow questions.
To create a delegate instance from a lambda expression, the compiler converts the code in the lambda expression to a method somewhere. The delegate can then be created at execution time exactly as if you had a method group. This section shows the kind of transformation the compiler performs. I’ve written this as if the compiler translates the source code into more source code that doesn’t contain lambda expressions, but of course the compiler never needs that translated source code. It can just emit the appropriate IL.
Let’s start with a recap of what counts as a captured variable. Within a lambda expression, you can use any variable that you’d be able to use in regular code at that point. That could be a static field, an instance field (if you’re writing the lambda expression within an instance method[1]), the this variable, method parameters, or local variables. All of these are captured variables, because they’re variables declared outside the immediate context of the lambda expression. Compare that with parameters to the lambda expression or local variables declared within the lambda expression; those aren’t captured variables. The following listing shows a lambda expression that captures various variables. You’ll then look at how the compiler handles that code.
You can write lambda expressions in constructors, property accessors, and so on as well, but for the sake of simplicity, I’ll assume you’re writing them in methods.
class CapturedVariablesDemo
{
private string instanceField = "instance field";
public Action<string> CreateAction(string methodParameter)
{
string methodLocal = "method local";
string uncaptured = "uncaptured local";
Action<string> action = lambdaParameter =>
{
string lambdaLocal = "lambda local";
Console.WriteLine("Instance field: {0}", instanceField);
Console.WriteLine("Method parameter: {0}", methodParameter);
Console.WriteLine("Method local: {0}", methodLocal);
Console.WriteLine("Lambda parameter: {0}", lambdaParameter);
Console.WriteLine("Lambda local: {0}", lambdaLocal);
};
methodLocal = "modified method local";
return action;
}
}
In other code
var demo = new CapturedVariablesDemo();
Action<string> action = demo.CreateAction("method argument");
action("lambda argument");
Lots of variables are involved here:
It’s important to understand that the lambda expression captures the variables themselves, not the values of the variables at the point when the delegate is created.[2] If you modified any of the captured variables between the time at which the delegate is created and when it’s invoked, the output would reflect those changes. Likewise, the lambda expression can change the value of the captured variables. How does the compiler make all of that work? How does it make sure all those variables are still available to the delegate when it’s invoked?
I will repeat this multiple times, for which I make no apology. If you’re new to captured variables, this can take a while to get used to.
There are three broad cases to consider:
You may see some variation in what I’ve described. For example, with a lambda expression with no captured variables, the compiler may create a nested class with a single instance instead of a static method. There can be subtle differences in the efficiency of executing delegates based on exactly how they’re created. In this section, I’ve described the minimum work that the compiler must do in order to make captured variables available. It can introduce more complexity if it wants to.
The last case is obviously the most complex one, so we’ll focus on that. Let’s start with listing 3.6. As a reminder, here’s the method that creates the lambda expression; I’ve omitted the class declaration for brevity:
public Action<string> CreateAction(string methodParameter) { string methodLocal = "method local"; string uncaptured = "uncaptured local"; Action<string> action = lambdaParameter => { string lambdaLocal = "lambda local"; Console.WriteLine("Instance field: {0}", instanceField); Console.WriteLine("Method parameter: {0}", methodParameter); Console.WriteLine("Method local: {0}", methodLocal); Console.WriteLine("Lambda parameter: {0}", lambdaParameter); Console.WriteLine("Lambda local: {0}", lambdaLocal); }; methodLocal = "modified method local"; return action; }
As I described before, the compiler creates a private nested class for the extra context it’ll need and then an instance method in that class for the code in the lambda expression. The context is stored in instance variables of the nested class. In our case, that means the following:
The following listing shows the nested class and how it’s used by the CreateAction method.
private class LambdaContext 1 { public CapturedVariablesDemoImpl originalThis; 2 public string methodParameter; 2 public string methodLocal; 2 public void Method(string lambdaParameter) 3 { string lambdaLocal = "lambda local"; Console.WriteLine("Instance field: {0}", originalThis.instanceField); Console.WriteLine("Method parameter: {0}", methodParameter); Console.WriteLine("Method local: {0}", methodLocal); Console.WriteLine("Lambda parameter: {0}", lambdaParameter); Console.WriteLine("Lambda local: {0}", lambdaLocal); } } public Action<string> CreateAction(string methodParameter) { LambdaContext context = new LambdaContext(); 4 context.originalThis = this; 4 context.methodParameter = methodParameter; 4 context.methodLocal = "method local"; 4 string uncaptured = "uncaptured local"; 4 4 Action<string> action = context.Method; 4 context.methodLocal = "modified method local"; 4 return action; }
Note how the context.methodLocal is modified near the end of the CreateAction method. When the delegate is finally invoked, it’ll “see” that modification. Likewise, if the delegate modified any of the captured variables, each invocation would see the results of the previous invocations. This is just reinforcing that the compiler ensures that the variable is captured rather than a snapshot of its value.
In listings 3.6 and 3.7, you had to create only a single context for the captured variables. In the terminology of the specification, each of the local variables was instantiated only once. Let’s make things a little more complicated.
To make things a little simpler, you’ll capture one local variable this time and no parameters or instance fields. The following listing shows a method to create a list of actions and then execute them one at a time. Each action captures a text variable.
static List<Action> CreateActions()
{
List<Action> actions = new List<Action>();
for (int i = 0; i < 5; i++)
{
string text = string.Format("message {0}", i); 1
actions.Add(() => Console.WriteLine(text)); 2
}
return actions;
}
In other code
List<Action> actions = CreateActions();
foreach (Action action in actions)
{
action();
}
The fact that text is declared inside the loop is very important indeed. Each time you reach that declaration, the variable is instantiated. Each lambda expression captures a different instantiation of the variable. There are effectively five different text variables, each of which has been captured separately. They’re completely independent variables. Although this code happens not to modify them after the initial assignment, it certainly could do so either inside the lambda expression or elsewhere within the loop. Modifying one variable would have no effect on the others.
The compiler models this behavior by creating a different instance of the generated type for each instantiation. Therefore, the CreateAction method of listing 3.8 could be translated into the following listing.
private class LambdaContext { public string text; public void Method() { Console.WriteLine(text); } } static List<Action> CreateActions() { List<Action> actions = new List<Action>(); for (int i = 0; i < 5; i++) { LambdaContext context = new LambdaContext(); 1 context.text = string.Format("message {0}", i); actions.Add(context.Method); 2 } return actions; }
Hopefully, that still makes sense. You’ve gone from having a single context for the lambda expression to one for each iteration of the loop. I’m going to finish this discussion of captured variables with an even more complicated example, which is a mixture of the two.
It was the scope of the text variable that meant it was instantiated once for each iteration of the loop. But multiple scopes can exist within a single method, and each scope can contain local variable declarations, and a single lambda expression can capture variables from multiple scopes. Listing 3.10 gives an example. You create two delegate instances, each of which captures two variables. They both capture the same outerCounter variable, but each captures a separate innerCounter variable. The delegates simply print out the current values of the counters and increment them. You execute each delegate twice, which makes the difference between the captured variables clear.
static List<Action> CreateCountingActions()
{
List<Action> actions = new List<Action>();
int outerCounter = 0; 1
for (int i = 0; i < 2; i++)
{
int innerCounter = 0; 2
Action action = () =>
{
Console.WriteLine( 3
"Outer: {0}; Inner: {1}", 3
outerCounter, innerCounter); 3
outerCounter++; 3
innerCounter++; 3
};
actions.Add(action);
}
return actions;
}
In other code
List<Action> actions = CreateCountingActions();
actions[0](); 4
actions[0](); 4
actions[1](); 4
actions[1](); 4
The output of listing 3.10 is as follows:
Outer: 0; Inner: 0 Outer: 1; Inner: 1 Outer: 2; Inner: 0 Outer: 3; Inner: 1
The first two lines are printed by the first delegate. The last two lines are printed by the second delegate. As I described before the listing, the same outer counter is used by both delegates, but they have independent inner counters.
What does the compiler do with this? Each delegate needs its own context, but that context needs to also refer to a shared context. The compiler creates two private nested classes instead of one. The following listing shows an example of how the compiler could treat listing 3.10.
private class OuterContext 1 { 1 public int outerCounter; 1 } 1 private class InnerContext 2 { 2 public OuterContext outerContext; 2 public int innerCounter; 2 public void Method() 3 { Console.WriteLine( "Outer: {0}; Inner: {1}", outerContext.outerCounter, innerCounter); outerContext.outerCounter++; innerCounter++; } } static List<Action> CreateCountingActions() { List<Action> actions = new List<Action>(); OuterContext outerContext = new OuterContext(); 4 outerContext.outerCounter = 0; for (int i = 0; i < 2; i++) { InnerContext innerContext = new InnerContext(); 5 innerContext.outerContext = outerContext; 5 innerContext.innerCounter = 0; 5 Action action = innerContext.Method; actions.Add(action); } return actions; }
You’ll rarely need to look at the generated code like this, but it can make a difference in terms of performance. If you use a lambda expression in a performance-critical piece of code, you should be aware of how many objects will be created to support the variables it captures.
I could give even more examples with multiple lambda expressions in the same scope capturing different sets of variables or lambda expressions in methods of value types. I find it fascinating to explore compiler-generated code, but you probably wouldn’t want a whole book of it. If you ever find yourself wondering how the compiler treats a particular lambda expression, it’s easy enough to run a decompiler or ildasm over the result.
So far, you’ve looked only at converting lambda expressions to delegates, which you could already do with anonymous methods. Lambda expressions have another superpower, however: they can be converted to expression trees.
Expression trees are representations of code as data. This is the heart of how LINQ is able to work efficiently with data providers such as SQL databases. The code you write in C# can be analyzed at execution time and converted into SQL.
Whereas delegates provide code you can run, expression trees provide code you can inspect, a little like reflection. Although you can build up expression trees directly in code, it’s more common to ask the compiler to do this for you by converting a lambda expression into an expression tree. The following listing gives a trivial example of this by creating an expression tree just to add two numbers together.
Expression<Func<int, int, int>> adder = (x, y) => x + y; Console.WriteLine(adder);
Considering it’s only two lines of code, there is a lot going on. Let’s start with the output. If you try to print out a regular delegate, the result will be just the type with no indication of the behavior. The output of listing 3.12 shows exactly what the expression tree does, though:
(x, y) => x + y
The compiler isn’t cheating by hardcoding a string somewhere. That string representation is constructed from the expression tree. This demonstrates that the code is available for examination at execution time, which is the whole point of expression trees.
Let’s look at the type of adder: Expression<Func<int, int, int>>. It’s simplest to split it into two parts: Expression<TDelegate> and Func<int, int, int>. The second part is used as a type argument to the first. The second part is a delegate type with two integer parameters and an integer return type. (The return type is expressed by the last type parameter, so a Func<string, double, int> would accept a string and a double as inputs and return an int.)
Expression<TDelegate> is the expression tree type associated with TDelegate, which must be a delegate type. (That’s not expressed as a type constraint, but it’s enforced at execution time.) This is only one of the many types involved in expression trees. They’re all in the System.Linq.Expressions namespace. The nongeneric Expression class is the abstract base class for all the other expression types, and it’s also used as a convenient container for factory methods to create instances of the concrete subclasses.
Our adder variable type is an expression tree representation of a function accepting two integers and returning an integer. You then use a lambda expression to assign a value to that variable. The compiler generates code to build the appropriate expression tree at execution time. In this case, it’s reasonably simple. You can write the same code yourself, as shown in the following listing.
ParameterExpression xParameter = Expression.Parameter(typeof(int), "x"); ParameterExpression yParameter = Expression.Parameter(typeof(int), "y"); Expression body = Expression.Add(xParameter, yParameter); ParameterExpression[] parameters = new[] { xParameter, yParameter }; Expression<Func<int, int, int>> adder = Expression.Lambda<Func<int, int, int>>(body, parameters); Console.WriteLine(adder);
This is a small example, and it’s still significantly more long-winded than the lambda expression. By the time you add method calls, property accesses, object initializers, and so on, it gets complex and error prone. That’s why it’s so important that the compiler can do the work for you by converting lambda expressions into expression trees. There are a few rules around this, though.
The most important restriction is that only expression-bodied lambda expressions can be converted to expression trees. Although our earlier lambda expression of (x, y) => x + y was fine, the following code would cause a compilation error:
Expression<Func<int, int, int>> adder = (x, y) => { return x + y; };
The expression tree API has expanded since .NET 3.5 to include blocks and other constructs, but the C# compiler still has this restriction, and it’s consistent with the use of expression trees for LINQ. This is one reason that object and collection initializers are so important: they allow initialization to be captured in a single expression, which means it can be used in an expression tree.
Additionally, the lambda expression can’t use the assignment operator, or use C# 4’s dynamic typing, or use C# 5’s asynchrony. (Although object and collection initializers do use the = symbol, that’s not the assignment operator in that context.)
The ability to execute queries against remote data sources, as I referred to earlier, isn’t the only use for expression trees. They can be a powerful way of constructing efficient delegates dynamically at execution time, although this is typically an area where at least part of the expression tree is built with handwritten code rather than converted from a lambda expression.
Expression<TDelegate> has a Compile() method that returns the delegate type. You can then handle this delegate as you do any other. As a trivial example, the following listing takes our earlier adder expression tree, compiles that to a delegate, and then invokes it, producing an output of 5.
Expression<Func<int, int, int>> adder = (x, y) => x + y; Func<int, int, int> executableAdder = adder.Compile(); 1 Console.WriteLine(executableAdder(2, 3)); 2
This approach can be used in conjunction with reflection for property access and method invocation to produce delegates and then cache them. The result is as efficient as if you’d written the equivalent code by hand. For a single method call or property access, there are already methods to create delegates directly, but sometimes you need additional conversion or manipulation steps, which are easily represented in expression trees.
We’ll come back to why expression trees are so important in LINQ when we tie everything together. You have only two more language features to look at. Extension methods come next.
Extension methods sound pointless when they’re first described. They’re static methods that can be called as if they’re instance methods, based on their first parameter. Suppose you have a static method call like this:
ExampleClass.Method(x, y);
If you turn ExampleClass.Method into an extension method, you can call it like this instead:
x.Method(y);
That’s all extension methods do. It’s one of the simplest transformations the C# compiler does. It makes all the difference in terms of code readability when it comes to chaining method calls together, however. You’ll look at that later, finally using real examples from LINQ, but first let’s look at the syntax.
Extension methods are declared by adding the keyword this before the first parameter. The method must be declared in a non-nested, nongeneric static class, and until C# 7.2, the first parameter can’t be a ref parameter. (You’ll see more about that in section 13.5.) Although the class containing the method can’t be generic, the extension method itself can be.
The type of the first parameter is sometimes called the target of the extension method and sometimes called the extended type. (The specification doesn’t give this concept a name, unfortunately.)
As an example from Noda Time, we have an extension method to convert from DateTimeOffset to Instant. There’s already a static method within the Instant struct to do this, but it’s useful to have as an extension method, too. Listing 3.15 shows the code for the method. For once, I’ve included the namespace declaration, as that’s going to be important when you see how the C# compiler finds extension methods.
using System; namespace NodaTime.Extensions { public static class DateTimeOffsetExtensions { public static Instant ToInstant(this DateTimeOffset dateTimeOffset) { return Instant.FromDateTimeOffset(dateTimeOffset); } } }
The compiler adds the [Extension] attribute to both the method and the class declaring it, and that’s all. This attribute is in the System.Runtime.CompilerServices namespace. It’s a marker indicating the intent that a developer should be able to call ToInstant() as if it were declared as an instance method in DateTimeOffset.
You’ve already seen the syntax to invoke an extension method: you call it as if it were an instance method on the type of the first parameter. But you need to make sure that the compiler can find the method as well.
First, there’s a matter of priority: if there’s a regular instance method that’s valid for the method invocation, the compiler will always prefer that over an extension method. It doesn’t matter whether the extension method has “better” parameters; if the compiler can use an instance method, it won’t even look for extension methods.
After it has exhausted its search for instance methods, the compiler will look for extension methods based on the namespace the calling code is in and any using directives present. Suppose you’re making a call from the ExtensionMethodInvocation class in the CSharpInDepth.Chapter03 namespace.[3] The following listing shows how to do that, giving the compiler all the information it needs to find the extension method.
If you’re following along with the downloaded code, you may have noticed that the samples are in namespaces of Chapter01, Chapter02, and so on, for simplicity. I’ve made an exception here for the sake of showing the hierarchical nature of the namespace checks.
using NodaTime.Extensions; 1 using System; namespace CSharpInDepth.Chapter03 { class ExtensionMethodInvocation { static void Main() { var currentInstant = DateTimeOffset.UtcNow.ToInstant(); 2 Console.WriteLine(currentInstant); } } }
The compiler will check for extension methods in the following:
The compiler effectively works its way outward from the deepest namespace out toward the global namespace and looks at each step for static classes either in that namespace or provided by classes made available by using directives in the namespace declaration. The details of the ordering are almost never important. If you find yourself in a situation where moving a using directive changes which extension method is used, it’s probably best to rename one of them. But it’s important to understand that within each step, multiple extension methods can be found that would be valid for the call. In that situation, the compiler performs normal overload resolution between all the extension methods it found in that step. After the compiler has located the right method to invoke, the IL it generates for the call is exactly the same as if you’d written a regular static method call instead of using its capabilities as an extension method.
Extension methods differ from instance methods in terms of their null handling. Let’s look back at our initial example:
x.Method(y);
If Method were an instance method and x were a null reference, that would throw a NullReferenceException. Instead, if Method is an extension method, it’ll be called with x as the first argument even if x is null. Sometimes the method will specify that the first argument must not be null, in which case it should validate it and throw an ArgumentNullException. In other cases, the extension method may have been explicitly designed to handle a null first argument gracefully.
Let’s get back to why extension methods are important to LINQ. It’s time for our first query.
Listing 3.17 shows a simple query. It takes a sequence of words, filters them by length, orders them in the natural way, and then converts them to uppercase. It uses lambda expressions and extension methods but no other C# 3 features. We’ll put everything else together at the end of the chapter. For the moment, I want to focus on the readability of this simple code.
string[] words = { "keys", "coat", "laptop", "bottle" }; 1 IEnumerable<string> query = words .Where(word => word.Length > 4) 2 .OrderBy(word => word) 2 .Select(word => word.ToUpper()); 2 foreach (string word in query) 3 { 3 Console.WriteLine(word); 3 } 3
Notice the ordering of the Where, OrderBy, and Select calls in our code. That’s the order in which the operations happen. The lazy and streaming-where-possible nature of LINQ makes it complicated to talk about exactly what happens when, but the query reads in the same order as it executes. The following listing is the same query but without taking advantage of the fact that these methods are extension methods.
string[] words = { "keys", "coat", "laptop", "bottle" }; IEnumerable<string> query = Enumerable.Select( Enumerable.OrderBy( Enumerable.Where(words, word => word.Length > 4), word => word), word => word.ToUpper());
I’ve formatted listing 3.18 as readably as I can, but it’s still awful. The calls are laid out in the opposite order in the source code to how they’ll execute: Where is the first thing to execute but the last method call in the listing. Next, it’s not obvious which lambda expression goes with which call: word => word.ToUpper() is part of the Select call, but a huge amount of code is between those two pieces of text.
You can tackle this in another way by assigning the result of each method call to a local variable and then making the method call via that. Listing 3.19 shows one option for doing this. (In this case, you could’ve just declared the query to start with and reassigned it on each line, but that wouldn’t always be the case.) This time, I’ve also used var, just for brevity.
string[] words = { "keys", "coat", "laptop", "bottle" }; var tmp1 = Enumerable.Where(words, word => word.Length > 4); var tmp2 = Enumerable.OrderBy(tmp1, word => word); var query = Enumerable.Select(tmp2, word => word.ToUpper());
This is better than listing 3.18; the operations are back in the right order, and it’s obvious which lambda expression is used for which operation. But the extra local variable declarations are a distraction, and it’s easy to end up using the wrong one.
The benefits of method chaining aren’t limited to LINQ, of course. Using the result of one call as the starting point of another call is common. But extension methods allow you to do this in a readable way for any type, rather than the type itself declaring the methods that support chaining. IEnumerable<T> doesn’t know anything about LINQ; its sole responsibility is to represent a general sequence. It’s the System.Linq.Enumerable class that adds all the operations for filtering, grouping, joining, and so on.
C# 3 could’ve stopped here. The features described so far would already have added a lot of power to the language and enabled many LINQ queries to be written in a perfectly readable form. But when queries get more complex, particularly when they include joins and groupings, using the extension methods directly can get complicated. Enter query expressions.
Although almost all features in C# 3 contribute to LINQ, only query expressions are specific to LINQ. Query expressions allow you to write concise code by using query-specific clauses (select, where, let, group by, and so on). The query is then translated into a nonquery form by the compiler and compiled as normal.[4] Let’s start with a brief example to make this clearer. As a reminder, in listing 3.17 you had this query:
This sounds like macros in C, but it’s a little more involved than that. C# still doesn’t have macros.
IEnumerable<string> query = words .Where(word => word.Length > 4) .OrderBy(word => word) .Select(word => word.ToUpper());
The following listing shows the same query written as a query expression.
IEnumerable<string> query = from word in words where word.Length > 4 orderby word select word.ToUpper();
The section of listing 3.20 in bold is the query expression, and it’s very concise indeed. The repetitive use of word as a parameter to lambda expressions has been replaced by specifying the name of a range variable once in the from clause, and then using it in each of the other clauses. What happens to the query expression in listing 3.20?
In this book, I’ve expressed many language features in terms of more C# source code. For example, when looking at captured variables in section 3.5.2, I showed C# code that you could’ve written to achieve the same result as using a lambda expression. That’s just for the purpose of explaining the code generated by the compiler. I wouldn’t expect the compiler to generate any C#. The specification describes the effects of capturing variables rather than a source code translation.
Query expressions work differently. The specification describes them as a syntactic translation that occurs before any overload resolution or binding. The code in listing 3.20 doesn’t just have the same eventual effect as listing 3.17; it’s really translated into the code in listing 3.17 before further processing. The language has no specific expectation about what the result of that further processing will be. In many cases, the result of the translation will be calls to extension methods, but that’s not required by the language specification. They could be instance method calls or invocations of delegates returned by properties named Select, Where, and so on.
The specification of query expressions puts in place an expectation of certain methods being available, but there’s no specific requirement for them all to be present. For example, if you write an API with suitable Select, OrderBy, and Where methods, you could use the kind of query shown in listing 3.20 even though you couldn’t use a query expression that includes a join clause.
Although we’re not going to look at every clause available in query expressions in detail, I need to draw your attention to two related concepts. In part, these provide greater justification for the language designers introducing query expressions into the language.
Query expressions introduce range variables, which aren’t like any other regular variables. They act as the per item input within each clause of the query. You’ve already seen how the from clause at the start of a query expression introduces a range variable. Here’s the query expression from listing 3.20 again with the range variable highlighted:
from word in words 1 where word.Length > 4 2 orderby word 2 select word.ToUpper() 2
That’s simple to understand when there’s only one range variable, but that initial from clause isn’t the only way a range variable can be introduced. The simplest example of a clause that introduces a new range variable is probably let. Suppose you want to refer to the length of the word multiple times in your query without having to call the Length property every time. For example, you could orderby it and include it in the output. The let clause allows you to write the query as shown in the following listing.
from word in words let length = word.Length where length > 4 orderby length select string.Format("{0}: {1}", length, word.ToUpper());
You now have two range variables in scope at the same time, as you can see from the use of both length and word in the select clause. That raises the question of how this can be represented in the query translation. You need a way of taking our original sequence of words and creating a sequence of word/length pairs, effectively. Then within the clauses that can use those range variables, you need to access the relevant item within the pair. The following listing shows how listing 3.21 is translated by the compiler using an anonymous type to represent the pair of values.
words.Select(word => new { word, length = word.Length }) .Where(tmp => tmp.length > 4) .OrderBy(tmp => tmp.length) .Select(tmp => string.Format("{0}: {1}", tmp.length, tmp.word.ToUpper()));
The name tmp here isn’t part of the query translation. The specification uses * instead, and there’s no indication of what name should be given to the parameter when building an expression tree representation of the query. The name doesn’t matter because you don’t see it when you write the query. This is called a transparent identifier.
I’m not going into all the details of query translation. That could be a whole chapter on its own. But I wanted to bring up transparent identifiers for two reasons. First, if you’re aware of how extra range variables are introduced, you won’t be surprised when you see them if you ever decompile a query expression. Second, they provide the biggest motivation for using query expressions, in my experience.
Query expressions can be appealing, but they’re not always the simplest way of representing a query. They always require a from clause to start with and either a select or group by clause to end with. That sounds reasonable, but it means that if you want a query that performs a single filtering operation, for example, you end up with quite a lot of baggage. For example, if you take just the filtering part of our word-based query, you’d have the following query expression:
from word in words where word.Length > 4 select word
Compare that with the method syntax version of the query:
words.Where(word => word.Length > 4)
They both compile to the same code,[5] but I’d use the second syntax for such a simple query.
The compiler has special handling for select clauses that select just the current query item.
There’s no single ubiquitous term for not using query expression syntax. I’ve seen it called method syntax, dot syntax, fluent syntax, and lambda syntax, to name just four. I’ll call it method syntax consistently, but if you hear other terms for it, don’t try to look for a subtle difference in meaning.
Even when the query gets a little more complicated, method syntax can be more flexible. Many methods are available within LINQ that have no corresponding query expression syntax, including overloads of Select and Where that present the index of the item within the sequence as well as the item itself. Additionally, if you want a method call at the end of the query (for example, ToList() to materialize the result as a List<T>), you have to put the whole query expression in parentheses, whereas with method syntax you add the call on the end.
I’m not as down on query expressions as that may sound. In many cases, there’s no clear winner between the two syntax options, and I’d probably include our earlier filter, order, project example in that set. Query expressions really shine when the compiler is doing more work for you by handling all those transparent identifiers. You can do it all by hand, of course, but I’ve found that building up anonymous types as results and deconstructing them in each subsequent step gets annoying quickly. Query expressions make all of that much easier.
The upshot of all of this is that I strongly recommend that you become comfortable in both styles of query. If you tie yourself to always using query expressions or never using query expressions, you’ll be missing out on opportunities to make your code more readable. We’ve covered all the features in C# 3, but I’m going to take a moment to step back and show how they fit together to form LINQ.
I’m not going to attempt to cover the various LINQ providers available these days. The LINQ technology I use most (by far) is LINQ to Objects, using the Enumerable static class and delegates. But in order to show how all the pieces come into play, let’s imagine that you have a query from something like Entity Framework. This isn’t real code that you can test, but it would be fine if you had a suitable database structure:
var products = from product in dbContext.Products where product.StockCount > 0 orderby product.Price descending select new { product.Name, product.Price };
In this single example of a mere four lines, all of these features are used:
Take away any one of these features, and LINQ would be significantly less useful. Sure, you could have in-memory collection processing without expression trees. You could write readable simple queries without query expressions. You could have dedicated classes with all the relevant methods without using extension methods. But it all fits together beautifully.