Chapter 2. C# 2

This chapter covers

  • Using generic types and methods for flexible, safe code
  • Expressing the absence of information with nullable value types
  • Constructing delegates relatively easily
  • Implementing iterators without writing boilerplate code

If your experience with C# goes far enough back, this chapter will be a reminder of just how far we’ve come and a prompt to be grateful for a dedicated and smart language design team. If you’ve never programmed C# without generics, you may end up wondering how C# ever took off without these features.[1] Either way, you may still find features you weren’t aware of or details you’ve never considered listed here.

1

For me, the answer to this one is simple: C# 1 was a more productive language for many developers than Java was at the time.

It’s been more than 10 years since C# 2 was released (with Visual Studio 2005), so it can be hard to get excited about features in the rearview mirror. You shouldn’t underestimate how important its release was at the time. It was also painful: the upgrade from C# 1 and .NET 1.x to C# 2 and .NET 2.0 took a long time to roll through the industry. Subsequent evolutions have been much quicker. The first feature from C# 2 is the one almost all developers consider to be the most important: generics.

2.1. Generics

Generics allow you to write general-purpose code that’s type safe at compile time using the same type in multiple places without knowing what that type is beforehand. When generics were first introduced, their primary use was for collections, but in modern C# code, they crop up everywhere. They’re probably most heavily used for the following:

  • Collections (they’re just as useful in collections as they ever were)
  • Delegates, particularly in LINQ
  • Asynchronous code, where a Task<T> is a promise of a future value of type T
  • Nullable value types, which I’ll talk about more in section 2.2

This isn’t the limit of their usefulness by any means, but even those four bullets mean that C# programmers use generics on a daily basis. Collections provide the simplest way of explaining the benefits of generics, because you can look at collections in .NET 1 and compare them with the generic collections in .NET 2.

2.1.1. Introduction by example: Collections before generics

.NET 1 had three broad kinds of collections:

  • Arrays—These have direct language and runtime support. The size is fixed at initialization.
  • Object-based collections—Values (and keys where relevant) are described in the API by using System.Object. These have no collection-specific language or runtime support, although language features such as indexers and foreach statements can be used with them. ArrayList and Hashtable are the most commonly used examples.
  • Specialized collections—Values are described in the API with a specific type, and the collection can be used for only that type. StringCollection is a collection of strings, for example; its API looks like ArrayList but using String instead of Object for anything referring to a value.

Arrays and specialized collections are statically typed, by which I mean that the API prevents you from putting the wrong kind of value in a collection, and when you fetch a value from the collection, you don’t need to cast the result back to the type you expect it to be.

Note

Reference type arrays are only mostly safe when storing values because of array covariance. I view array covariance as an early design mistake that’s beyond the scope of this book. Eric Lippert wrote about this at http://mng.bz/gYPv as part of his series of blog posts on covariance and contravariance.

Let’s make this concrete: suppose you want to create a collection of strings in one method (GenerateNames) and print those strings out in another method (PrintNames). You’ll look at three options to keep the collection of names—arrays, ArrayList, and StringCollection—and weigh the pros and cons of each. The code looks similar in each case (particularly for PrintNames), but bear with me. We’ll start with arrays.

Listing 2.1. Generating and printing names by using arrays
static string[] GenerateNames()
{
    string[] names = new string[4];      1
    names[0] = "Gamma";
    names[1] = "Vlissides";
    names[2] = "Johnson";
    names[3] = "Helm";
    return names;
}

static void PrintNames(string[] names)
{
    foreach (string name in names)
    {
        Console.WriteLine(name);
    }
}

  • 1 Size of array needs to be known at creation time

I haven’t used an array initializer here, because I want to mimic the situation where the names are discovered only one at a time, such as when reading them from a file. Notice that you need to allocate the array to be the right size to start with, though. If you really were reading from a file, you’d either need to find out how many names there were before you started, or you’d need to write more-complicated code. For example, you could allocate one array to start with, copy the contents to a larger array if the first one filled up, and so on. You’d then need to consider creating a final array of just the right size if you ended up with an array larger than the exact number of names.

The code used to keep track of the size of our collection so far, reallocate an array, and so on is repetitive and can be encapsulated in a type. As it happens, that’s just what ArrayList does.

Listing 2.2. Generating and printing names by using ArrayList
static ArrayList GenerateNames()
{
    ArrayList names = new ArrayList();
    names.Add("Gamma");
    names.Add("Vlissides");
    names.Add("Johnson");
    names.Add("Helm");
    return names;
}
static void PrintNames(ArrayList names)
{
    foreach (string name in names)       1
    {
        Console.WriteLine(name);
    }
}

  • 1 What happens if the ArrayList contains a nonstring?

That’s cleaner in terms of our GenerateNames method: you don’t need to know how many names you have before you start adding to the collection. But equally, there’s nothing to stop you from adding a nonstring to the collection; the type of the ArrayList.Add parameter is just Object.

Furthermore, although the PrintNames method looks safe in terms of types, it’s not. The collection can contain any kind of object reference. What would you expect to happen if you added a completely different type (a WebRequest, as an odd example) to the collection, and then tried to print it? The foreach loop hides an implicit cast, from object to string, because of the type of the name variable. That cast can fail in the normal way with an InvalidCastException. Therefore, you’ve fixed one problem but caused another. Is there anything that solves both of these?

Listing 2.3. Generating and printing names by using StringCollection
static StringCollection GenerateNames()
{
    StringCollection names = new StringCollection();
    names.Add("Gamma");
    names.Add("Vlissides");
    names.Add("Johnson");
    names.Add("Helm");
    return names;
}

static void PrintNames(StringCollection names)
{
    foreach (string name in names)
    {
        Console.WriteLine(name);
    }
}

Listing 2.3 is identical to listing 2.2 except for replacing ArrayList with StringCollection everywhere. That’s the whole point of StringCollection: it should feel like a pleasant general-purpose collection but specialized to only handle strings. The parameter type of StringCollection.Add is String, so you can’t add a WebRequest to it through some odd bug in our code. The resulting effect is that when you print the names, you can be confident that the foreach loop won’t encounter any nonstring references. (You could still see a null reference, admittedly.)

That’s great if you always need only strings. But if you need a collection of some other type, you have to either hope that there’s already a suitable collection type in the framework or write one yourself. This was such a common task that there’s a System.Collections.CollectionBase abstract class to make the work somewhat less repetitive. There are also code generators to avoid having to write it all by hand.

That solves both problems from the previous solution, but the cost of having all these extra types around is way too high. There’s a maintenance cost in keeping them up-to-date as the code generator changes. There are efficiency costs in terms of compilation time, assembly size, JITting time, and keeping the code in memory. Most important, there’s a human cost in keeping track of all the collection classes available.

Even if those costs weren’t too high, you’d be missing the ability to write a method that can work on any collection type in a statically typed way, potentially using the collection’s element type in another parameter or in the return type. For example, say you want to write a method to create a copy of the first N elements of a collection into a new one, which was then returned. You could write a method that returns an ArrayList, but that loses the goodness of static typing. If you pass in a StringCollection, you’d want a StringCollection back. The string aspect is part of the input to the method, which then needs to be propagated to the output as well. You had no way of expressing that in the language when using C# 1. Enter generics.

2.1.2. Generics save the day

Let’s get straight to the solution for our GenerateNames/PrintNames code and use the List<T> generic type. List<T> is a collection in which T is the element type of the collection—string, in our case. You can replace StringCollection with List<string> everywhere.[2]

2

I’m deliberately not going into the possibility of using interfaces for return types and parameters. That’s an interesting topic, but I don’t want to distract you from generics.

Listing 2.4. Generating and printing names with List<T>
static List<string> GenerateNames()
{
    List<string> names = new List<string>();
    names.Add("Gamma");
    names.Add("Vlissides");
    names.Add("Johnson");
    names.Add("Helm");
    return names;
}

static void PrintNames(List<string> names)
{
    foreach (string name in names)
    {
        Console.WriteLine(name);
    }
}

List<T> solves all the problems we talked about before:

  • You don’t need to know the size of the collection beforehand, unlike with arrays.
  • The exposed API uses T everywhere it needs to refer to the element type, so you know that a List<string> will contain only string references. You’ll get a compile-time error if you try to add anything else, unlike with ArrayList.
  • You can use it with any element type without worrying about generating code and managing the result, unlike with StringCollection and similar types.

Generics also solve the problem of expressing an element type as an input to a method. To delve into that aspect more deeply, you’ll need more terminology.

Type parameters and type arguments

The terms parameter and argument predate generics in C# and have been used in other languages for decades. A method declares its inputs as parameters, and they’re provided by calling code in the form of arguments. Figure 2.1 shows how the two relate to each other.

Figure 2.1. Relationship between method parameters and arguments

The values of the arguments are used as the initial values for the parameters within the method. In generics, you have type parameters and type arguments, which are the same idea but applied to types. The declaration of a generic type or method includes type parameters in angle brackets after the name. Within the body of the declaration, the code can use the type parameter as a normal type (just one it doesn’t know much about).

The code using the generic type or method then specifies the type arguments in angle brackets after the name as well. Figure 2.2 shows this relationship in the context of List<T>.

Figure 2.2. Relationship between type parameters and type arguments

Now imagine the complete API of List<T>: all the method signatures, properties, and so on. If you’re using the list variable shown in the figure, any T that appears in the API becomes string. For example, the Add method in List<T> has the following signature:

public void Add(T item)

But if you type list.Add( into Visual Studio, IntelliSense will prompt you as if the item parameter had been declared with a type of string. If you try to pass in an argument of another type, it will result in a compile-time error.

Although figure 2.2 refers to a generic class, methods can be generic as well. The method declares type parameters, and those type parameters can be used within other parts of the method signature. Method type parameters are often used as type arguments to other types within the signature. The following listing shows a solution to the method you couldn’t implement earlier: something to create a new collection containing the first N elements of an existing one but in a statically typed way.

Listing 2.5. Copying elements from one collection to another
public static List<T> CopyAtMost<T>(                         1
    List<T> input, int maxElements)                          1
{
    int actualCount = Math.Min(input.Count, maxElements);
    List<T> ret = new List<T>(actualCount);                  2
    for (int i = 0; i < actualCount; i++)
    {
        ret.Add(input[i]);
    }
    return ret;
}

static void Main()
{
    List<int> numbers = new List<int>();
    numbers.Add(5);
    numbers.Add(10);
    numbers.Add(20);

    List<int> firstTwo = CopyAtMost<int>(numbers, 2);        3
    Console.WriteLine(firstTwo.Count);
}

  • 1 Method declares a type parameter T and uses it in parameters and return type.
  • 2 Type parameter used in method body
  • 3 Call to method using int as the type parameter

Plenty of generic methods use the type parameter only once in the signature[3] and without it being a type argument to any generic types. But the ability to use a type parameter to express a relationship between the types of regular parameters and the return type is a huge part of the power of generics.

3

Although it’s valid to write a generic method that doesn’t use the type parameter anywhere else in the signature, that’s rarely useful.

Likewise, generic types can use their type parameters as type arguments when declaring a base class or an implemented interface. For example, the List<T> type implements the IEnumerable<T> interface, so the class declaration could be written like this:

public class List<T> : IEnumerable<T>
Note

In reality, List<T> implements multiple interfaces; this is a simplified form.

Arity of generic types and methods

Generic types or methods can declare multiple type parameters by separating them with commas within the angle brackets. For example, the generic equivalent of the .NET 1 Hashtable class is declared like this:

public class Dictionary<TKey, TValue>

The generic arity of a declaration is the number of type parameters it has. To be honest, this is a term that’s more useful to authors than in everyday usage when writing code, but I’d argue it’s still worth knowing. You can think of a nongeneric declaration as one with generic arity 0.

The generic arity of a declaration is effectively part of what makes it unique. As an example, I’ve already referred to the IEnumerable<T> interface introduced in .NET 2.0, but that’s a distinct type from the nongeneric IEnumerable interface that was already part of .NET 1.0. Likewise, you can write methods with the same name but a different generic arity, even if their signatures are otherwise the same:

public void Method() {}            1
public void Method<T>() {}         2
public void Method<T1, T2>() {}    3

  • 1 Nongeneric method (generic arity 0)
  • 2 Method with generic arity 1
  • 3 Method with generic arity 2

When declaring types with different generic arity, the types don’t have to be of the same kind, although they usually are. As an extreme example, consider these type declarations that can all coexist in one highly confusing assembly:

public enum IAmConfusing {}
public class IAmConfusing<T> {}
public struct IAmConfusing<T1, T2> {}
public delegate void IAmConfusing<T1, T2, T3> {}
public interface IAmConfusing<T1, T2, T3, T4> {}

Although I’d strongly discourage code like the above, one reasonably common pattern is to have a nongeneric static class providing helper methods that refer to other generic types with the same name (see section 2.5.2 for more about static classes). For example, you’ll see the Tuple class in section 2.1.4, which is used to create instances of the various generic Tuple classes.

Just as multiple types can have the same name but different generic arity, so can generic methods. It’s like creating overloads based on the parameters, except this is overloading based on the number of type parameters. Note that although the generic arity keeps declarations separate, type parameter names don’t. For example, you can’t declare two methods like this:

public void Method<TFirst>()  {}
public void Method<TSecond>() {}      1

  • 1 Compile-time error; can’t overload solely by type parameter name

These are deemed to have equivalent signatures, so they aren’t permitted under the normal rules of method overloading. You can write method overloads that use different type parameter names so long as the methods differ in other ways (such as the number of regular parameters), although I can’t remember ever wanting to do so.

While we’re on the subject of multiple type parameters, you can’t give two type parameters in the same declaration the same name just like you can’t declare two regular parameters the same name. For example, you can’t declare a method like this:

public void Method<T, T>() {}         1

  • 1 Compile-time error; duplicate type parameter T

It’s fine for two type arguments to be the same, though, and that’s often what you want. For example, to create a string-to-string mapping, you might use a Dictionary<string, string>.

The earlier example of IAmConfusing used an enum as the nongeneric type. That was no coincidence, because I wanted to use it to demonstrate my next point.

2.1.3. What can be generic?

Not all types or type members can be generic. For types, it’s reasonably simple, partly because relatively few kinds of types can be declared. Enums can’t be generic, but classes, structs, interfaces, and delegates all can be.

For type members, it’s slightly more confusing; some members may look like they’re generic because they use other generic types. Remember that a declaration is generic only if it introduces new type parameters.

Methods and nested types can be generic, but all of the following have to be nongeneric:

  • Fields
  • Properties
  • Indexers
  • Constructors
  • Events
  • Finalizers

As an example of how you might be tempted to think of a field as being generic even though it’s not, consider this generic class:

public class ValidatingList<TItem>
{
    private readonly List<TItem> items = new List<TItem>();     1
}

  • 1 Lots of other members

I’ve named the type parameter TItem simply to differentiate it from the T type parameter of List<T>. Here, the items field is of type List<TItem>. It uses the type parameter TItem as a type argument for List<T>, but that’s a type parameter introduced by the class declaration, not by the field declaration.

For most of these, it’s hard to conceive how the member could be generic. Occasionally, I’ve wanted to write a generic constructor or indexer, though, and the answer is almost always to write a generic method instead.

Speaking of generic methods, I gave only a simplified description of type arguments earlier when I was describing the way generic methods are called. In some cases, the compiler can determine the type arguments for a call without you having to provide them in the source code.

2.1.4. Type inference for type arguments to methods

Let’s look back at the crucial parts of listing 2.5. You have a generic method declared like this:

public static List<T> CopyAtMost<T>(List<T> input, int maxElements)

Then, in the Main method, you declare a variable of type List<int> and later use that as an argument to the method:

List<int> numbers = new List<int>();
...
List<int> firstTwo = CopyAtMost<int>(numbers, 2);

I’ve highlighted the method call here. You need a type argument to the CopyAtMost call, because it has a type parameter. But you don’t have to specify that type argument in the source code. You can rewrite that code as follows:

List<int> numbers = new List<int>();
...
List<int> firstTwo = CopyAtMost(numbers, 2);

This is exactly the same method call in terms of the IL the compiler will generate. But you haven’t had to specify the type argument of int; the compiler inferred that for you. It did that based on your argument for the first parameter in the method. You’re using an argument of type List<int> as the value for a parameter of type List<T>, so T has to be int.

Type inference can use only the arguments you pass to a method, not what you do with the result. It also has to be complete; you either explicitly specify all the type arguments or none of them.

Although type inference applies only to methods, it can be used to more easily construct instances of generic types. For example, consider the Tuple family of types introduced in .NET 4.0. This consists of a nongeneric static Tuple class and multiple generic classes: Tuple<T1>, Tuple<T1, T2>, Tuple<T1, T2, T3>, and so forth. The static class has a set of overloaded Create factory methods like this:

public static Tuple<T1> Create<T1>(T1 item1)
{
    return new Tuple<T1>(item1);
}

public static Tuple<T1, T2> Create<T1, T2>(T1 item1, T2 item2)
{
    return new Tuple<T1, T2>(item1, item2);
}

These look pointlessly trivial, but they allow type inference to be used where otherwise the type arguments would have to be explicitly specified when creating tuples. Instead of this

new Tuple<int, string, int>(10, "x", 20)

you can write this:

Tuple.Create(10, "x", 20)

This is a powerful technique to be aware of; it’s generally simple to implement and can make working with generic code a lot more pleasant.

I’m not going to go into the details of how generic type inference works. It’s changed a lot over time as the language designers figure out ways of making it work in more cases. Overload resolution and type inference are closely tied together, and they intersect with all kinds of other features (such as inheritance, conversions, and optional parameters in C# 4). This is the area of the specification I find the most complex,[4] and I couldn’t do it justice here.

4

I’m not alone in this. At the time of this writing, the spec for overload resolution is broken. Efforts to fix it for the C# 5 ECMA standard failed; we’re going to try again for the next edition.

Fortunately, this is one area where understanding the details wouldn’t help very much in day-to-day coding. In any particular situation, three possibilities exist:

  • Type inference succeeds and gives you the result you want. Hooray.
  • Type inference succeeds but gives you a result you didn’t want. Just explicitly specify type arguments or cast some of the arguments. For example, if you wanted a Tuple<int, object, int> from the preceding Tuple.Create call, you could specify the type arguments to Tuple.Create explicitly or just call new Tuple<int, object, int>(...) or call Tuple.Create(10, (object) "x", 20).
  • Type inference fails at compile time. Sometimes this can be fixed by casting some of your arguments. For example, the null literal doesn’t have a type, so type inference will fail for Tuple.Create(null, 50) but succeed for Tuple.Create((string) null, 50). Other times you just need to explicitly specify the type arguments.

For the last two cases, the option you pick rarely makes much difference to readability in my experience. Understanding the details of type inference can make it easier to predict what will work and what won’t, but it’s unlikely to repay the time invested in studying the specification. If you’re curious, I’d never actively discourage anyone from reading the specification. Just don’t be surprised when you find it alternates between feeling like a maze of twisty little passages, all alike, and a maze of twisty little passages, all different.

This alarmist talk of complicated language details shouldn’t detract from the convenience of type inference, though. C# is considerably easier to use because of its presence.

So far, all the type parameters we’ve talked about have been unconstrained. They could stand in for any type. That’s not always what you want, though; sometimes, you want only certain types to be used as type arguments for a particular type parameter. That’s where type constraints come in.

2.1.5. Type constraints

When a type parameter is declared by a generic type or method, it can also specify type constraints that restrict which types can be provided as type arguments. Suppose you want to write a method that formats a list of items and ensures that you format them in a particular culture instead of the default culture of the thread. The IFormattable interface provides a suitable ToString(string, IFormatProvider) method, but how can you make sure you have an appropriate list? You might expect a signature like this:

static void PrintItems(List<IFormattable> items)

But that would hardly ever be useful. You couldn’t pass a List<decimal> to it, for example, even though decimal implements IFormattable; a List<decimal> isn’t convertible to List<IFormattable>.

Note

We’ll go into the reasons for this more deeply in chapter 4, when we consider generic variance. For the moment, just treat this as a simple example for constraints.

What you need to express is that the parameter is a list of some element type, where the element type implements the IFormattable interface. The “some element type” part suggests that you might want to make the method generic, and “where the element type implements the IFormattable interface” is precisely the ability that type constraints give us. You add a where clause at the end of the method declaration, like this:

static void PrintItems<T>(List<T> items) where T : IFormattable

The way you’ve constrained T here doesn’t just change which values can be passed to the method; it also changes what you can do with a value of type T within the method. The compiler knows that T implements IFormattable, so it allows the IFormattable.ToString(string, IFormatProvider) method to be called on any T value.

Listing 2.6. Printing items in the invariant culture by using type constraints
static void PrintItems<T>(List<T> items) where T : IFormattable
{
    CultureInfo culture = CultureInfo.InvariantCulture;
    foreach (T item in items)
    {
        Console.WriteLine(item.ToString(null, culture));
    }
}

Without the type constraints, that ToString call wouldn’t compile; the only ToString method the compiler would know about for T is the one declared in System.Object.

Type constraints aren’t limited to interfaces. The following type constraints are available:

  • Reference type constraintwhere T : class. The type argument must be a reference type. (Don’t be fooled by the use of the class keyword; it can be any reference type, including interfaces and delegates.)
  • Value type constraintwhere T : struct. The type argument must be a non-nullable value type (either a struct or an enum). Nullable value types (described in section 2.2) don’t meet this constraint.
  • Constructor constraintwhere T : new(). The type argument must have a public parameterless constructor. This enables the use of new T() within the body of the code to construct a new instance of T.
  • Conversion constraintwhere T : SomeType. Here, SomeType can be a class, an interface, or another type parameter as shown here:

    • where T : Control
    • where T : IFormattable
    • where T1 : T2

Moderately complex rules indicate how constraints can be combined. In general, the compiler error message makes it obvious what’s wrong when you break these rules.

One interesting and reasonably common form of constraint uses the type parameter in the constraint itself:

public void Sort(List<T> items) where T : IComparable<T>

The constraint uses T as the type argument to the generic IComparable<T> interface. This allows our sorting method to compare elements from the items parameter pairwise using the CompareTo method from IComparable<T>:

T first = ...;
T second = ...;
int comparison = first.CompareTo(second);

I’ve used interface-based type constraints more than any other kind, although I suspect what you use depends greatly on the kind of code you’re writing.

When multiple type parameters exist in a generic declaration, each type parameter can have an entirely different set of constraints as in the following example:

TResult Method<TArg, TResult>(TArg input)    1
    where TArg : IComparable<TArg>           2
    where TResult : class, new()             3

  • 1 Generic method with two type parameters, TArg and TResult
  • 2 TArg must implement IComparable<TArg>.
  • 3 TResult must be a reference type with a parameterless constructor.

We’ve nearly finished our whirlwind tour of generics, but I have a couple of topics left to describe. I’ll start with the two type-related operators available in C# 2.

2.1.6. The default and typeof operators

C# 1 already had the typeof() operator accepting a type name as its only operand. C# 2 added the default() operator and expanded the use of typeof slightly.

The default operator is easily described. The operand is the name of a type or type parameter, and the result is the default value for that type—the same value you’d get if you declared a field and didn’t immediately assign a value to it. For reference types, that’s a null reference; for non-nullable value types, it’s the “all zeroes” value (0, 0.0, 0.0m, false, the UTF-16 code unit with a numerical value of 0, and so on); and for nullable value types, it’s the null value for the type.

The default operator can be used with type parameters and with generic types with appropriate type arguments supplied (where those arguments can be type parameters, too). For example, in a generic method declaring a type parameter T, all of these are valid:

  • default(T)
  • default(int)
  • default(string)
  • default(List<T>)
  • default(List<List<string>>)

The type of the default operator is the type that’s named inside it. It’s most frequently used with generic type parameters, because otherwise you can usually specify the default value in a different way. For example, you might want to use the default value as the initial value for a local variable that may or may not be assigned a different value later. To make this concrete, here’s a simplistic implementation of a method that may be familiar to you:

public T LastOrDefault(IEnumerable<T> source)
{
    T ret = default(T);             1
    foreach (T item in source)
    {
        ret = item;                 2
    }
    return ret;                     3
}

  • 1 Declare a local variable and assign the default value of T to it.
  • 2 Replace the local variable value with the current one in the sequence.
  • 3 Return the last-assigned value.

The typeof operator is slightly more complex. There are four broad cases to consider:

  • No generics involved at all; for example, typeof(string)
  • Generics involved but no type parameters; for example, typeof(List<int>)
  • Just a type parameter; for example, typeof(T)
  • Generics involved using a type parameter in the operand; for example, typeof(List<TItem>) within a generic method declaring a type parameter called TItem
  • Generics involved but no type arguments specified in the operand; for example, typeof(List<>)

The first of these is simple and hasn’t changed at all. All the others need a little more care, and the last introduces a new kind of syntax. The typeof operator is still defined to return a Type value, so what should it return in each of these cases? The Type class was augmented to know about generics. There are multiple situations to be considered; the following are a few examples:

  • If you list the types within the assembly containing List<T>, for example, you’d expect to get List<T> without any specific type argument for T. It’s a generic type definition.
  • If you call GetType() on a List<int> object, you’d want to get a type that has the information about the type argument.
  • If you ask for the base type of the generic type definition of a class declared as
    class StringDictionary<T> : Dictionary<string, T>
    you’d end up with a type with one “concrete” type argument (string, for the TKey type parameter of Dictionary<TKey, TValue>) and one type argument that’s still a type parameter (T, for the TValue type parameter).

Frankly, it’s all very confusing, but that’s inherent in the problem domain. Lots of methods and properties in Type let you go from a generic type definition to a type with all the type arguments provided, or vice versa, for example.

Let’s come back to the typeof operator. The simplest example to understand is typeof(List<int>). That returns the Type representing List<T> with a type argument of int just as if you’d called new List<int>().GetType().

The next case, typeof(T), returns whatever the type argument for T is at that point in the code. This will always be a closed, constructed type, which is the specification’s way of saying it’s a real type with no type parameters involved anywhere. Although in most places I try to explain terminology thoroughly, the terminology around generics (open, closed, constructed, bound, unbound) is confusing and almost never useful in real life. We’ll need to talk about closed, constructed types later, but I won’t touch on the rest.

It’s easiest to demonstrate what I mean about typeof(T), and you can look at typeof(List<T>) in the same example. The following listing declares a generic method that prints the result of both typeof(T) and typeof(List<T>) to the console and then calls that method with two different type arguments.

Listing 2.7. Printing the result of the typeof operator
static void PrintType<T>()
{
    Console.WriteLine("typeof(T) = {0}", typeof(T));               1
    Console.WriteLine("typeof(List<T>) = {0}", typeof(List<T>));
}

static void Main()
{
    PrintType<string>();                                           2
    PrintType<int>();                                              3
}

  • 1 Prints both typeof(T) and typeof(List<T>)
  • 2 Calls the method with a type argument of string
  • 3 Calls the method with a type argument of int

The result of listing 2.7 is shown here:

typeof(T) = System.String
typeof(List<T>) = System.Collections.Generic.List`1[System.String]
typeof(T) = System.Int32
typeof(List<T>) = System.Collections.Generic.List`1[System.Int32]

The important point is that when you’re running in a context where the type argument for T is string (during the first call), the result of typeof(T) is the same as typeof(string). Likewise, the result of typeof(List<T>) is the same as the result of typeof(List<string>). When you call the method again with int as the type argument, you get the same results as for typeof(int) and typeof(List<int>). Whenever code is executing within a generic type or method, the type parameter always refers to a closed, constructed type.

Another takeaway from this output is the format of the name of a generic type when you’re using reflection. The List`1 indicates that this is a generic type called List with generic arity 1 (one type parameter), and the type arguments are shown in square brackets afterward.

The final bullet in our earlier list was typeof(List<>). That appears to be missing a type argument altogether. This syntax is valid only in the typeof operator and refers to the generic type definition. The syntax for types with generic arity 1 is just TypeName<>; for each additional type parameter, you add a comma within the angle brackets. To get the generic type definition for Dictionary<TKey, TValue>, you’d use typeof(Dictionary<,>). To get the definition for Tuple<T1, T2, T3>, you’d use typeof(Tuple<,,>).

Understanding the difference between a generic type definition and a closed, constructed type is crucial for our final topic: how types are initialized and how type-wide (static) state is handled.

2.1.7. Generic type initialization and state

As you saw when using the typeof operator, List<int> and List<string> are effectively different types that are constructed from the same generic type definition. That’s not only true for how you use the types but also true for how types are initialized and how static fields are handled. Each closed, constructed type is initialized separately and has its own independent set of static fields. The following listing demonstrates this with a simple (and not thread-safe) generic counter.

Listing 2.8. Exploring static fields in generic types
class GenericCounter<T>
{
    private static int value;                    1

    static GenericCounter()
    {
        Console.WriteLine("Initializing counter for {0}", typeof(T));
    }

    public static void Increment()
    {
        value++;
    }

    public static void Display()
    {
        Console.WriteLine("Counter for {0}: {1}", typeof(T), value);
    }
}

class GenericCounterDemo
{
    static void Main()
    {
        GenericCounter<string>.Increment();      2
        GenericCounter<string>.Increment();
        GenericCounter<string>.Display();
        GenericCounter<int>.Display();           3
        GenericCounter<int>.Increment();
        GenericCounter<int>.Display();
    }
}

  • 1 One field per closed, constructed type
  • 2 Triggers initialization for GenericCounter<string>
  • 3 Triggers initialization for GenericCounter<int>

The output of listing 2.8 is as follows:

Initializing counter for System.String
Counter for System.String: 2
Initializing counter for System.Int32
Counter for System.Int32: 0
Counter for System.Int32: 1

There are two results to focus on in that output. First, the GenericCounter<string> value is independent of GenericCounter<int>. Second, the static constructor is run twice: once for each closed, constructed type. If you didn’t have a static constructor, there would be fewer timing guarantees for exactly when each type would be initialized, but essentially you can regard GenericCounter<string> and GenericCounter<int> as independent types.

To complicate things further, generic types can be nested within other generic types. When that occurs, there’s a separate type for each combination of type arguments. For example, consider classes like this:

class Outer<TOuter>
{
    class Inner<TInner>
    {
        static int value;
    }    
}

Using int and string as type arguments, the following types are independent and each has its own value field:

  • Outer<string>.Inner<string>
  • Outer<string>.Inner<int>
  • Outer<int>.Inner<string>
  • Outer<int>.Inner<int>

In most code this occurs relatively rarely, and it’s simple enough to handle when you’re aware that what’s important is the fully specified type, including any type arguments for both the leaf type and any enclosing types.

That’s it for generics, which is by far the biggest single feature in C# 2 and a huge improvement over C# 1. Our next topic is nullable value types, which are firmly based on generics.

2.2. Nullable value types

Tony Hoare introduced null references into Algol in 1965 and has subsequently called it his “billion-dollar mistake.” Countless developers have become frustrated when their code throws NullReferenceException (.NET), NullPointerException (Java), or other equivalents. There are canonical Stack Overflow questions with hundreds of other questions pointing at them because it’s such a common problem. If nullity is so bad, why was more of it introduced in C# 2 and .NET 2.0 in the form of nullable value types? Before we look at the implementation of the feature, let’s consider the problem it’s trying to solve and the previous workarounds.

2.2.1. Aim: Expressing an absence of information

Sometimes it’s useful to have a variable to represent some information, but that information won’t be present in every situation. Here are a few simple examples:

  • You’re modeling a customer order, including the company’s details, but the customer may not be ordering on behalf of a company.
  • You’re modeling a person, including their date of birth and date of death, but the person may still be alive.
  • You’re modeling a filter for products, including a price range, but the customer may not have specified a maximum price.

These are all one specific form of wanting to represent the absence of a value; you can have complete information but still need to model the absence. In other situations, you may have incomplete information. In the second example, you may not know the person’s date of birth not because they weren’t born, but because your system doesn’t have that information. Sometimes you need to represent the difference between “known to be absent” and “unknown” within your data, but often just the absence of information is enough.

For reference types, you already have a way of representing an absence of information: a null reference. If you have a Company class and your Order class has a reference to the company associated with the order, you can set it to null if the customer doesn’t specify a company.

For value types in C# 1, there was no equivalent. There were two common ways of representing this:

  • Use a reserved value to represent missing data. For example, you might use decimal.MaxValue in a price filter to represent “no maximum price specified.”
  • Keep a separate Boolean flag to indicate whether another field has a real value or the value should be ignored. So long as you check the flag before using the other field, its value is irrelevant in the absent case.

Neither of these is ideal. The first approach reduces the set of valid values (not so bad for decimal but more of a problem for byte, where it’s more likely that you need the full range). The second approach leads to a lot of tedious and repetitive logic.

More important, both are error prone. Both require you to perform a check before using the value that might or might not be valid. If you don’t perform that check, your code will proceed using inappropriate data. It’ll silently do the wrong thing and quite possibly propagate the mistake to other parts of the system. Silent failure is the worst kind, because it can be hard to track down and hard to undo. I prefer nice loud exceptions that stop the broken code in its tracks.

Nullable value types encapsulate the second approach shown previously: they keep an extra flag along with the value to say whether it should be used. The encapsulation is key here; the simplest way of using the value is also a safe one because it throws an exception if you try to use it inappropriately. The consistent use of a single type to represent a possibly missing value enables the language to make our lives easier, and library authors have an idiomatic way of representing it in their API surface, too.

With that conceptual introduction out of the way, let’s look at what the framework and the CLR provide in terms of nullable value types. After you’ve built that foundation, I’ll show you the extra features C# has adopted to make it easy to work with them.

2.2.2. CLR and framework support: The Nullable<T> struct

The core of nullable value type support is the Nullable<T> struct. A primitive version of Nullable<T> would look like this:

public struct Nullable<T> where T : struct               1
{
    private readonly T value;
    private readonly bool hasValue;

    public Nullable(T value)                             2
    {
        this.value = value;
        this.hasValue = true;
    }

    public bool HasValue { get { return hasValue; } }    3
    
    public T Value                                       4
    {                                                    4
        get                                              4
        {                                                4
            if (!hasValue)                               4
            {                                            4
                throw new InvalidOperationException();   4
            }                                            4
            return value;                                4
        }                                                4
    }                                                    4
}

  • 1 Generic struct with T constrained to be a non-nullable value type
  • 2 Constructor to provide a value
  • 3 Property to check whether there’s a real value
  • 4 Access to the value, throwing an exception if it’s missing

As you can see, the only declared constructor sets hasValue to true, but like all structs, there’s an implicit parameterless constructor that will leave hasValue as false and value as the default value of T:

Nullable<int> nullable = new Nullable<int>();
Console.WriteLine(nullable.HasValue);           1

  • 1 Prints False

The where T : struct constraint on Nullable<T> allows T to be any value type except another Nullable<T>. It works with primitive types, enums, system-provided structs, and user-defined structs. All of the following are valid:

  • Nullable<int>
  • Nullable<FileMode>
  • Nullable<Guid>
  • Nullable<LocalDate> (from Noda Time)

But the following are invalid:

  • Nullable<string> (string is a reference type)
  • Nullable<int[]> (arrays are reference types, even if the element type is a value type)
  • Nullable<ValueType> (ValueType itself isn’t a value type)
  • Nullable<Enum> (Enum itself isn’t a value type)
  • Nullable<Nullable<int>> (Nullable<int> is nullable)
  • Nullable<Nullable<Nullable<int>>> (trying to nest the nullabilty further doesn’t help)

The type T is also known as the underlying type of Nullable<T>. For example, the underlying type of Nullable<int> is int.

With just this part in place and no extra CLR, framework, or language support, you can safely use type to display the maximum price filter:

public void DisplayMaxPrice(Nullable<decimal> maxPriceFilter)
{
    if (maxPriceFilter.HasValue)
    {
        Console.WriteLine("Maximum price: {0}", maxPriceFilter.Value);
    }
    else
    {
        Console.WriteLine("No maximum price set.");
    }
}

That’s well-behaved code that checks before using the value, but what about poorly written code that forgets to check first or checks the wrong thing? You can’t accidentally use an inappropriate value; if you try to access maxPriceFilter.Value when its HasValue property is false, an exception will be thrown.

Note

I know I made this point earlier, but I think it’s important enough to restate: progress doesn’t come just from making it easier to write correct code; it also comes from making it harder to write broken code or making the consequences less severe.

The Nullable<T> struct has methods and operators available, too:

  • The parameterless GetValueOrDefault() method will return the value in the struct or the default value for the type if HasValue is false.
  • The parameterized GetValueOrDefault(T defaultValue) method will return the value in the struct or the specified default value if HasValue is false.
  • The Equals(object) and GetHashCode() methods declared in object are overridden in a reasonably obvious way, first comparing the HasValue properties and then comparing the Value properties for equality if HasValue is true for both values.
  • There’s an implicit conversion from T to Nullable<T>, which always succeeds and returns a value where HasValue is true. This is equivalent to calling the parameterized constructor.
  • There’s an explicit conversion from Nullable<T> to T, which either returns the encapsulated value (if HasValue is true) or throws an InvalidOperationException (if HasValue is false). This is equivalent to using the Value property.

I’ll return to the topic of conversions when I talk about language support. So far, the only place you’ve seen where the CLR needs to understand Nullable<T> is to enforce the struct type constraint. Another aspect of CLR behavior is nullable-specific, though: boxing.

Boxing behavior

Nullable value types behave differently than non-nullable value types when it comes to boxing. When a value of a non-nullable value type is boxed, the result is a reference to an object of a type that’s the boxed form of the original type. Say, for example, you write this:

int x = 5;
object o = x;

The value of o is a reference to an object of type “boxed int.” The difference between boxed int and int isn’t normally visible via C#. If you call o.GetType(), the Type returned will be equal to typeof(int), for example. Some other languages (such as C++/CLI) allow developers to differentiate between the original value type and its boxed equivalent.

Nullable value types have no boxed equivalent, however. The result of boxing a value of type Nullable<T> depends on the HasValue property:

  • If HasValue is false, the result is a null reference.
  • If HasValue is true, the result is a reference to an object of type “boxed T.”

The following listing demonstrates both of these points.

Listing 2.9. The effects of boxing nullable value type values
Nullable<int> noValue = new Nullable<int>();
object noValueBoxed = noValue;                    1
Console.WriteLine(noValueBoxed == null);          2

Nullable<int> someValue = new Nullable<int>(5);
object someValueBoxed = someValue;                3
Console.WriteLine(someValueBoxed.GetType());      4

  • 1 Boxes a value where HasValue is false
  • 2 Prints True: the result of boxing is a null reference.
  • 3 Boxes a value where HasValue is true
  • 4 Prints System.Int32: the result is a boxed int.

When you’re aware of this behavior, it’s almost always what you want. This has one bizarre side effect, however. The GetType() method declared on System.Object is nonvirtual, and the somewhat complex rules around when boxing occurs mean that if you call GetType() on a value type value, it always needs to be boxed first. Normally, that’s a little inefficient but doesn’t cause any confusion. With nullable value types, it’ll either cause a NullReferenceException or return the underlying non-nullable value type. The following listing shows examples of these.

Listing 2.10. Calling GetType on nullable values leads to surprising results
Nullable<int> noValue = new Nullable<int>();
// Console.WriteLine(noValue.GetType());           1

Nullable<int> someValue = new Nullable<int>(5);
Console.WriteLine(someValue.GetType());            2

  • 1 Would throw NullReferenceException
  • 2 Prints System.Int32, the same as if you’d used typeof(int)

You’ve seen framework support and CLR support, but the C# language goes even further to make nullable value types easier to work with.

2.2.3. Language support

It would’ve been possible for C# 2 to have shipped with the compiler knowing only about nullable value types when enforcing the struct type constraint. It would’ve been awful, but it’s useful to consider the absolute minimum support required in order to appreciate all the features that have been added to make nullable value types fit into the language more idiomatically. Let’s start with the simplest part: simplifying nullable value type names.

The ? type suffix

If you add a ? to the end of the name of a non-nullable value type, that’s precisely equivalent to using Nullable<T> for the same type. It works for the keyword shortcuts for the simple types (int, double, and so forth) as well as full type names. For example, these four declarations are precisely equivalent:

  • Nullable<int> x;
  • Nullable<Int32> x;
  • int? x;
  • Int32? x;

You can mix and match them however you like. The generated IL won’t change at all. In practice, I end up using the ? suffix everywhere, but other teams may have different conventions. For clarity, I’ve used Nullable<T> within the remainder of the text here, because the ? can become confusing when used in prose, but in code that’s rarely an issue.

That’s the simplest language enhancement, but the theme of allowing you to write concise code continues through the rest of this section. The ? suffix is about expressing a type easily; the next feature focuses on expressing a value easily.

The null literal

In C# 1, the expression null always referred to a null reference. In C# 2, that meaning is expanded to a null value: either a null reference or a value of a nullable value type where HasValue is false. This can be used for assignments, method arguments, comparisons—any manner of places. It’s important to understand that when it’s used for a nullable value type, it really does represent the value of that type where HasValue is false rather than being a null reference; if you try to work null references into your mental model of nullable value types, it’ll get confusing quickly. The following two lines are equivalent:

int? x = new int?();

int? x = null;

I typically prefer to use the null literal over explicitly calling the parameterless constructor (I’d write the second of the preceding lines rather than the first), but when it comes to comparisons, I’m ambivalent about the two options. For example, these two lines are equivalent:

if (x != null)

if (x.HasValue)

I suspect I’m not even consistent about which I use. I’m not advocating for inconsistency, but this is an area where it doesn’t hurt very much. You can always change your mind later with no compatibility concerns.

Conversions

You’ve already seen that Nullable<T> provides an implicit conversion from T to Nullable<T> and an explicit conversion from Nullable<T> to T. The language takes that set of conversions further by allowing certain conversions to chain together. Where there are two non-nullable value types S and T and there’s a conversion from S to T (for example, the conversion from int to decimal), the following conversions are also available:

  • Nullable<S> to Nullable<T> (implicit or explicit, depending on the original conversion)
  • S to Nullable<T> (implicit or explicit, depending on the original conversion)
  • Nullable<S> to T (always explicit)

These work in a reasonably obvious way by propagating null values and using the S to T conversion as required. This process of extending an operation to propagate nulls appropriately is called lifting.

One point to note: it’s possible to explicitly provide conversions to both nullable and non-nullable types. LINQ to XML uses this to great effect. For example, there are explicit conversions from XElement to both int and Nullable<int>. Many operations in LINQ to XML will return a null reference if you ask them to find an element that doesn’t exist, and the conversion to Nullable<int> converts a null reference to a null value and propagates the nullity without throwing an exception. If you try to convert a null XElement reference to the non-nullable int type, however, an exception will be thrown. The existence of both conversions makes it easy to handle optional and required elements safely.

Conversions are one form of operator that can be built into C# or user-defined. Other operators defined on non-nullable types receive a similar sort of treatment in their nullable counterparts.

Lifted operators

C# allows the following operators to be overloaded:

  • Unary: + ++ - -- ! ~ true false
  • Binary:[5] + - * / % & | ^ << >>

    5

    The equality and relational operators are also binary operators, but they behave slightly differently from the others, hence their separation in this list.

  • Equality: == !=
  • Relational: < > <= >=

When these operators are overloaded for a non-nullable value type T, the Nullable<T> type has the same operators with slightly different operand and result types. These are called lifted operators whether they’re predefined operators, such as addition on numeric types, or user-defined operators, such as adding a TimeSpan to a DateTime. A few restrictions apply:

  • The true and false operators are never lifted. They’re incredibly rare in the first place, though, so this is no great loss.
  • Only operators with non-nullable value types for the operands are lifted.
  • For the unary and binary operators (other than equality and relational operators), the return type of the original operator has to be a non-nullable value type.
  • For the equality and relational operators, the return type of the original operator has to be bool.
  • The & and | operators on Nullable<bool> have separately defined behaviors, which we’ll consider presently.

For all the operators, the operand types become their nullable equivalents. For the unary and binary operators, the return type also becomes nullable, and a null value is returned if any of the operands is a null value. The equality and relational operators keep their non-nullable Boolean return types. For equality, two null values are considered equal, and a null value and any non-null value are considered different. The relational operators always return false if either operand is a null value. When neither of the operands is a null value, the operator of the non-nullable type is invoked in the obvious way.

All these rules sound more complicated than they are; for the most part, everything works as you probably expect it to. It’s easiest to see what happens with a few examples, and because int has so many predefined operators (and integers can be so easily expressed), it’s the natural demonstration type. Table 2.1 shows a number of expressions, the lifted operator signature, and the result. It’s assumed that there are variables four, five, and nullInt, each with type Nullable<int> and with the obvious values.

Table 2.1. Examples of lifted operators applied to nullable integers

Expression

Lifted operator

Result

-nullInt int? -(int? x) null
-five int? -(int? x) -5
five + nullInt int? +(int? x, int? y) null
five + five int? +(int? x, int? y) 10
four & nullInt int? &(int? x, int? y) null
four & five int? &(int? x, int? y) 4
nullInt == nullInt bool ==(int? x, int? y) true
five == five bool ==(int? x, int? y) true
five == nullInt bool ==(int? x, int? y) false
five == four bool ==(int? x, int? y) false
four < five bool <(int? x, int? y) true
nullInt < five bool <(int? x, int? y) false
five < nullInt bool <(int? x, int? y) false
nullInt < nullInt bool <(int? x, int? y) false
nullInt <= nullInt bool <=(int? x, int? y) false

Possibly the most surprising line of the table is the last one: that a null value isn’t deemed less than or equal to another null value even though they are deemed to be equal to each other (as per the seventh row)! This is very odd, but it’s unlikely to cause problems in real life, in my experience. In the list of restrictions regarding operator lifting, I mentioned that Nullable<bool> works slightly differently from the other types.

Nullable logic

Truth tables are often used to demonstrate Boolean logic with all possible input combinations and the result. Although the same approach can be used for Nullable<Boolean> logic, we have three values to consider (true, false, and null) for each input instead of just true and false. There are no conditional logical operators (the short-circuiting && and || operators) defined for Nullable<bool>, which makes life simpler.

Only the logical AND and inclusive OR operators (& and |, respectively) have special behavior. The other operators—unary logical negation (!) and exclusive OR (^)—follow the same rules as other lifted operators. For the sake of completeness, table 2.2 gives the truth table for all four valid Nullable<bool> logical operators. I’ve highlighted the results that would be different if the extra rules didn’t exist for Nullable<bool>.

Table 2.2. Truth table for Nullable<bool> operators

x

y

x & y

x | y

x ^ y

!x

true true true false false false null null null true false null true false null true false null true false null false false false null false null true true true true false null true null null false true null true false null null null null false false false true true true null null null

If you find reasoning about rules easier to understand than looking up values in tables, the idea is that a null bool? value is in some senses a maybe. If you imagine that each null entry in the input side of the table is a variable instead, you’ll always get a null value on the output side of the table if the result depends on the value of that variable. For instance, looking at the third line of the table, the expression true & y will be true only if y is true, but the expression true | y will always be true whatever the value of y is, so the nullable results are null and true, respectively.

When considering the lifted operators and particularly how nullable logic works, the language designers had two slightly contradictory sets of existing behavior: C# 1 null references and SQL NULL values. In many cases, these don’t conflict at all; C# 1 had no concept of applying logical operators to null references, so there was no problem in using the SQL-like results given earlier. The definitions you’ve seen may surprise some SQL developers, though, when it comes to comparisons. In standard SQL, the result of comparing two values (in terms of equality or greater than/less than) is always unknown if either value is NULL. The result in C# 2 is never null, and two null values are considered to be equal to each other.

Results of lifted operators are specific to C#

The lifted operators and conversions, along with the Nullable<bool> logic described in this section, are all provided by the C# compiler and not by the CLR or the framework itself. If you use ildasm on code that evaluates any of these nullable operators, you’ll find that the compiler has created all the appropriate IL to test for null values and dealt with them accordingly.

Different languages can behave differently on these matters, and this is definitely something to look out for if you need to port code between different .NET-based languages. For example, VB treats lifted operators far more like SQL, so the result of x < y is Nothing if x or y is Nothing.

Another familiar operator is now available with nullable value types, and it behaves as you’d probably expect it to if you consider your existing knowledge of null references and just tweak it to be in terms of null values.

The as operator and nullable value types

Prior to C# 2, the as operator was available only for reference types. As of C# 2, it can now be applied to nullable value types as well. The result is a value of that nullable type: the null value if the original reference was the wrong type or null or a meaningful value otherwise. Here’s a short example:

static void PrintValueAsInt32(object o)
{
    int? nullable = o as int?;
    Console.WriteLine(nullable.HasValue ?
                      nullable.Value.ToString() : "null");
}
...
PrintValueAsInt32(5);                   1
PrintValueAsInt32("some string");       2

  • 1 Prints 5
  • 2 Prints null

This allows you to safely convert from an arbitrary reference to a value in a single step, although you’d normally check whether the result is null afterward. In C# 1, you’d have had to use the is operator followed by a cast, which is inelegant; it’s essentially asking the CLR to perform the same type check twice.

Note

Using the as operator with nullable types is surprisingly slow. In most code, this is unlikely to matter (it’s not going to be slow compared with any I/O, for example), but it’s slower than is and then a cast in all the framework and compiler combinations I’ve tried.

C# 7 has an even better solution for most cases where I’ve used the as operator with nullable value types using pattern matching (described in chapter 12). If your intended result type really is a Nullable<T>, though, the as operator is handy. Finally, C# 2 introduced an entirely new operator specifically for handling null values elegantly.

The null-coalescing ?? operator

It’s reasonably common to want to use nullable value types—or indeed, reference types—and provide a sort of default value if a particular expression evaluates to null. C# 2 introduced the ?? operator, also known as the null-coalescing operator, for precisely this purpose.

?? is a binary operator that evaluates an expression of first ?? second by going through the following steps (roughly speaking):

  1. Evaluate first.
  2. If the result is non-null, that’s the result of the whole expression.
  3. Otherwise, evaluate second, and use that as the result of the whole expression.

I say roughly speaking because the formal rules in the specification have to deal with situations involving conversions between the types of first and second. These aren’t important in most uses of the operator, and I don’t intend to go through them. They’re easy to find in the specification if you need them.

One aspect of those rules is worth highlighting. If the type of the first operand is a nullable value type and the type of the second operand is the underlying type of the first operand, the type of the whole expression is that (non-nullable) underlying type. For example, this code is perfectly valid:

int? a = 5;
int b = 10;
int c = a ?? b;

Note that you’re assigning directly to c even though its type is the non-nullable int type. You can do this only because b is non-nullable, so you know that the overall result can’t be null. The ?? operator composes well with itself; an expression such as x ?? y ?? z will evaluate y only if x evaluates to null and will evaluate z only if both x and y evaluate to null.

Null values become even easier to work with—and more likely as expression results—in C# 6 with the ?. null conditional operator, as you’ll see in section 10.3. Combining ?. and ?? can be a powerful way of handling possible nulls at various points of execution. Like all techniques, this is best used in moderation. If you find your code’s readability going downhill, you might want to consider using multiple statements to avoid trying to do too much in one go.

That’s it for nullable value types in C# 2. We’ve now covered the two most important features of C# 2, but we have a couple of fairly large features still to talk about, along with a raft of smaller ones. Next up is delegates.

2.3. Simplified delegate creation

The basic purpose of delegates hasn’t changed since they were first introduced: to encapsulate a piece of code so that it can be passed around and executed as necessary in a type-safe fashion in terms of the return type and parameters. Back in the days of C# 1, that was almost always used for event handling or starting threads. This was mostly still the case when C# 2 was introduced in 2005. It was only in 2008 that LINQ helped C# developers feel comfortable with the idea of passing a function around for all kinds of reasons.

C# 2 brought three new ways of creating delegate instances as well as the ability to declare generic delegates, such as EventHandler<TEventArgs> and Action<T>. We’ll start with method group conversions.

2.3.1. Method group conversions

A method group refers to one or more methods with the same name. Every C# developer has been using them forever without necessarily thinking about it, because every method invocation uses one. For example, consider this trivial code:

Console.WriteLine("hello");

The expression Console.WriteLine is a method group. The compiler then looks at the arguments to work out which of the overloads within that method group should be invoked. Other than method invocations, C# 1 used method groups in delegate creation expressions as the only way the language provided to create a delegate instance. For example, say you have a method like this:

private void HandleButtonClick(object sender, EventArgs e)

Then you could create an EventHandler[6] instance like this:

6

For reference, EventHandler has a signature of public delegate void EventHandler(object sender, EventArgs e).

EventHandler handler = new EventHandler(HandleButtonClick);

C# 2 introduced method group conversions as a sort of shorthand: a method group is implicitly convertible to any delegate type with a signature that’s compatible with one of the overloads. You’ll explore the notion of compatibility further in section 2.3.3, but for the moment you’ll look at methods that exactly match the signature of the delegate you’re trying to convert to.

In the case of our preceding EventHandler code, C# 2 allows you to simplify the creation of the delegate to this:

EventHandler handler = HandleButtonClick;

This works for event subscription and removal, too:

button.Click += HandleButtonClick;

The same code is generated as for the delegate creation expression, but it’s much more concise. These days, I rarely see delegate creation expressions in idiomatic code. Method group conversions save a few characters when creating a delegate instance, but anonymous methods achieve a lot more.

2.3.2. Anonymous methods

You might reasonably expect a lot of detail on anonymous methods here. I’m going to save most of that information for the successor of anonymous methods: lambda expressions. They were introduced in C# 3, and I expect that if they’d existed before anonymous methods, the latter would never have been introduced at all.

Even so, their introduction in C# 2 made me think about delegates in a whole different way. Anonymous methods allow you to create a delegate instance without having a real method to refer to[7] just by writing some code inline wherever you want to create the instance. You just use the delegate keyword, optionally include some parameters, and then write some code in braces. For example, if you wanted an event handler that just logged to the console when it was fired, you could do that very simply:

7

In your source code, anyway. The method still exists in the IL.

EventHandler handler = delegate
{
    Console.WriteLine("Event raised");
};

That doesn’t call Console.WriteLine immediately; instead it creates a delegate that’ll call Console.WriteLine when it’s invoked. To see the type of the sender and event arguments, you need appropriate parameters:

EventHandler handler = delegate(object sender, EventArgs args)
{
    Console.WriteLine("Event raised. sender={0}; args={1}",
        sender.GetType(), args.GetType());
};

The real power comes when you use an anonymous method as a closure. A closure is able to access all the variables that are in scope at the point of its declaration, even if those variables normally wouldn’t be available anymore when the delegate is executed. You’ll look at closures in a lot more detail (including how the compiler treats them) when you look at lambda expressions. For now, here’s a single brief example; it’s an AddClickLogger method that adds a Click handler to any control with a custom message that’s passed into AddClickLogger:

void AddClickLogger(Control control, string message)
{
    control.Click += delegate
    {
        Console.WriteLine("Control clicked: {0}", message);
    }
}

Here the message variable is a parameter to the method, but it’s captured by the anonymous method. The AddClickLogger method doesn’t execute the event handler itself; it just adds it as a handler for the Click event. By the time the code in the anonymous method executes, AddClickLogger will have returned. How does the parameter still exist? In short, the compiler handles it all for you to avoid you having to write boring code. Section 3.5.2 provides more details when you look at capturing variables in lambda expressions. There’s nothing special about EventHandler here; it’s just a well-known delegate type that’s been part of the framework forever. For the final part of our whirlwind tour of C# 2 delegate improvements, let’s come back to the idea of compatibility, which I mentioned when talking about method group conversions.

2.3.3. Delegate compatibility

In C# 1, you needed a method with a signature with exactly the same return type and parameter types (and ref/out modifiers) to create a delegate instance. For example, suppose you had this delegate declaration and method:

public delegate void Printer(string message);

public void PrintAnything(object obj)
{
    Console.WriteLine(obj);
}

Now imagine you wanted to create an instance of Printer to effectively wrap the PrintAnything method. It feels like it should be okay; a Printer will always be given a string reference, and that’s convertible to an object reference via an identity conversion. C# 1 wouldn’t allow that, though, because the parameter types don’t match. C# 2 allows this for delegate creation expressions and for method group conversions:

Printer p1 = new Printer(PrintAnything);
Printer p2 = PrintAnything;

Additionally, you can create one delegate to wrap another one with a compatible signature. Suppose you had a second delegate type that coincidentally did match the PrintAnything method:

public delegate void GeneralPrinter(object obj);

If you already have a GeneralPrinter, you can create a Printer from it:

GeneralPrinter generalPrinter = ...;               1
Printer printer = new Printer(generalPrinter);     2

  • 1 Any way you might create a GeneralPrinter delegate
  • 2 Constructs a Printer to wrap the GeneralPrinter

The compiler lets you do that because it’s safe; any argument that can be passed to a Printer can safely be passed to a GeneralPrinter. The compiler is happy to do the same in the other direction for return types, as shown in the following example:

public delegate object ObjectProvider();    1
public delegate string StringProvider();    1

StringProvider stringProvider = ...;        2
ObjectProvider objectProvider =             3
    new ObjectProvider(stringProvider);     3

  • 1 Parameterless delegates returning values
  • 2 Any way you might create a StringProvider
  • 3 Creates an ObjectProvider to wrap the StringProvider

Again, this is safe because any value that StringProvider can return would definitely be fine to return from an ObjectProvider.

It doesn’t always work the way you might want it to, though. The compatibility between different parameter or return types has to be in terms of an identity conversion that doesn’t change the representation of the value at execution time. For example, this code doesn’t compile:

public delegate void Int32Printer(int x);  1
public delegate void Int64Printer(long x); 1

Int64Printer int64Printer = ...;           2
Int32Printer int32Printer =                3
    new Int32Printer(int64Printer);        3

  • 1 Delegates accepting 32- and 64-bit integers
  • 2 Any way you might create an Int64Printer
  • 3 Error! Can’t wrap the Int64Printer in an Int32Printer

The two delegate signatures here aren’t compatible; although there’s an implicit conversion from int to long, it’s not an identity conversion. You might argue that the compiler could’ve silently created a method that performed the conversion for you, but it doesn’t do so. In a way, that’s helpful, because this behavior fits in with the generic variance feature you’ll see in chapter 4.

It’s important to understand that although this feature looks a bit like generic variance, they are different features. Aside from anything else, this wrapping really does create a new instance of the delegate instead of just treating the existing delegate as an instance of a different type. I’ll go into more detail when you look at the feature fully, but I wanted to highlight as early as possible that they’re not the same.

That’s it for delegates in C# 2. Method group conversions are still widely used, and often the compatibility aspect will be used without anyone even thinking about it. Anonymous methods aren’t seen much these days, because lambda expressions can do almost anything anonymous methods can, but I still look on them fondly as my first taste of the power of closures. Speaking of one feature that led to another, let’s look at the forerunner of C# 5’s asynchrony: iterator blocks.

2.4. Iterators

Relatively few interfaces have specific language support in C# 2. IDisposable has support via the using statement, and the language makes guarantees about the interfaces that arrays implement, but apart from that, only the enumerable interfaces have direct support. IEnumerable has always had support for consumption in the form of the foreach statement, and C# 2 extended that to its new-to-.NET-2 generic counterpart IEnumerable<T> in a reasonably obvious way.

The enumerable interfaces represent sequences of items, and although consuming them is extremely common, it’s also entirely reasonable to want to produce a sequence. Implementing either the generic or nongeneric interfaces manually can be tedious and error prone, so C# 2 introduced a new feature called iterators to make it simpler.

2.4.1. Introduction to iterators

An iterator is a method or property implemented with an iterator block, which is in turn just a block of code using the yield return or yield break statements. Iterator blocks can be used only to implement methods or properties with one of the following return types:

  • IEnumerable
  • IEnumerable<T> (where T can be a type parameter or a regular type)
  • IEnumerator
  • IEnumerator<T> (where T can be a type parameter or a regular type)

Each iterator has a yield type based on its return type. If the return type is one of the nongeneric interfaces, the yield type is object. Otherwise, it’s the type argument provided to the interface. For example, the yield type of a method returning IEnumerator<string> is string.

The yield return statements provide values for the returned sequence, and a yield break statement will terminate a sequence. Similar constructs, sometimes called generators, exist in some other languages, such as Python.

The following listing shows a simple iterator method that you can analyze further. I’ve highlighted the yield return statements in the method.

Listing 2.11. A simple iterator yielding integers
static IEnumerable<int> CreateSimpleIterator()
{
    yield return 10;
    for (int i = 0; i < 3; i++)
    {
        yield return i;
    }
    yield return 20;
}

With that method in place, you can call the method and iterate over the results with a regular foreach loop:

foreach (int value in CreateSimpleIterator())
{
    Console.WriteLine(value);
}

That loop will print the following output:

10
0
1
2
20

So far, this isn’t terribly exciting. You could change the method to create a List<int>, replace each yield return statement with a call to Add(), and then return the list at the end of the method. The loop output would be exactly the same, but it wouldn’t execute in the same way at all. The huge difference is that iterators are executed lazily.

2.4.2. Lazy execution

Lazy execution, or lazy evaluation, was invented as part of lambda calculus in the 1930s. The basic idea of it is simple: execute code only when you need the value that it’ll compute. There are uses of it well beyond iterators, but they’re all we need it for right now.

To explain how the code executes, the following listing expands the foreach loop into mostly equivalent code that uses a while loop instead. I’ve still used the syntactic sugar of a using statement that will call Dispose automatically, just for simplicity.

Listing 2.12. The expansion of a foreach loop
IEnumerable<int> enumerable = CreateSimpleIterator();   1
using (IEnumerator<int> enumerator =                    2
    enumerable.GetEnumerator())                         2
{
    while (enumerator.MoveNext())                       3
    {
        int value = enumerator.Current;                 4
        Console.WriteLine(value);
    }
}

  • 1 Calls the iterator method
  • 2 Gets an IEnumerator<T> from an IEnumerable<T>
  • 3 Moves to the next value, if there is one
  • 4 Fetches the current value

If you’ve never looked at the IEnumerable/IEnumerator pair of interfaces (and their generic equivalents) before, now is a good time to make sure you understand the difference between them. An IEnumerable is a sequence that can be iterated over, whereas an IEnumerator is like a cursor within a sequence. Multiple IEnumerator instances can probably iterate over the same IEnumerable without changing its state at all. Compare that with an IEnumerator, which naturally does have mutable state: each time you call MoveNext(), you’re asking it to move the cursor to the next element of the sequence it’s iterating over.

If that didn’t make much sense, you might want to think about an IEnumerable as a book and an IEnumerator as a bookmark. There can be multiple bookmarks within a book at any one time. Moving a bookmark to the next page doesn’t change the book or any of the other bookmarks, but it does change that bookmark’s state: its position within the book. The IEnumerable.GetEnumerator() method is a sort of bootstrapping: it asks the sequence to create an IEnumerator that’s set up to iterate over that sequence, just like putting a new bookmark at the start of a book.

After you have an IEnumerator, you repeatedly call MoveNext(); if it returns true, that means you’ve moved to another value that you can access with the Current property. If MoveNext() returns false, you’ve reached the end of the sequence.

What does this have to do with lazy evaluation? Well, now that you know exactly what the code using the iterator will call, you can look at when the method body starts executing. Just as a reminder, here’s the method from listing 2.11:

static IEnumerable<int> CreateSimpleIterator()
{
    yield return 10;
    for (int i = 0; i < 3; i++)
    {
        yield return i;
    }
    yield return 20;
}

When CreateSimpleIterator() is called, none of the method body is executed.

If you put a breakpoint on the first line (yield return 10) and step through the code, you won’t hit the breakpoint when you call the method. You won’t hit the breakpoint when you call GetEnumerator(), either. The method body starts executing only when MoveNext() is called. But what happens then?

2.4.3. Evaluation of yield statements

Even when the method starts executing, it goes only as far as it needs to. It stops executing when any of the following occurs:

  • An exception is thrown.
  • It reaches the end of the method.
  • It reaches a yield break statement.
  • It has evaluated the operand to a yield return statement, so it is ready to yield the value.

If an exception is thrown, that exception is propagated as normal. If the end of the method is reached or it hits a yield break statement, the MoveNext() method returns false to indicate that you’ve reached the end of the sequence. If you reach a yield return statement, the Current property is set to the value you’re yielding, and MoveNext() returns true.

Note

To clarify the preceding paragraph, the exception is propagated as normal, assuming you’re already executing the iterator code. Don’t forget that until the calling code iterates over the returned sequence, you won’t start executing the iterator code. It’s the MoveNext() call that will throw the exception, not the initial call to the iterator method.

In our simple example, as soon as MoveNext() starts iterating, it reaches the yield return 10; statement, sets Current to 10, and then returns true.

That all sounds simple for the first call to MoveNext(), but what about subsequent ones? You can’t start again from scratch; otherwise, the sequence would be 10 repeated an infinite number of times. Instead, when MoveNext() returns, it’s as if the method is paused. The generated code keeps track of the point you’ve reached in the method along with any other state, such as the local variable i in your loop. When MoveNext() is called again, execution picks up from the point you’ve reached and keeps going. That’s what makes it lazy, and that’s the part that’s difficult to get right when you’re writing the code yourself.

2.4.4. The importance of being lazy

To give you an idea of why this is important, let’s write some code to print out the Fibonacci sequence until you hit the first value over 1,000. The following listing shows a Fibonacci() method that returns an infinite sequence and then a method that iterates over that sequence until it hits a limit.

Listing 2.13. Iterating over the Fibonacci sequence
static IEnumerable<int> Fibonacci()
{
    int current = 0;
    int next = 1;
    while (true)                         1
    {
        yield return current;            2
        int oldCurrent = current;
        current = next;
        next = next + oldCurrent;
    }
}

static void Main()
{
    foreach (var value in Fibonacci())   3
    {
        Console.WriteLine(value);        4
        if (value > 1000)                5
        {
            break;
        }
    }
}

  • 1 Infinite loop? Only if you keep asking for more
  • 2 Yields the current Fibonacci value
  • 3 Calls the method to obtain the sequence
  • 4 Prints the current value
  • 5 Break condition

How would you do something like this without iterators? You could change the method to create a List<int> and populate it until you hit the limit. But that list could be big if the limit is large, and why should the method that knows the details of the Fibonacci sequence also know how you want to stop? Suppose you sometimes want to stop based on how long you’ve been printing out values, sometimes based on how many values you’ve printed, and sometimes based on the current value. You don’t want to implement the method three times.

You could avoid creating the list by printing the value in the loop, but that makes your Fibonacci() method even more tightly coupled to the one thing you happen to want to do with the values right now. What if you wanted to add the values together instead of printing them? Would you write a second method? It’s all a ghastly violation of the separation of concerns.

The iterator solution is exactly what you want: a representation of an infinite sequence, and that’s all. The calling code can iterate over it as far as it wants[8] and use the values however it wants.

8

At least until it overflows the range of int. At that point, it might throw an exception or underflow to a large negative number depending on whether the code is in a checked context.

Implementing the Fibonacci sequence manually wouldn’t be terribly hard. There’s little state to maintain between calls, and the flow control is simple. (The fact that there’s only one yield return statement helps there.) But as soon as the code gets more complicated, you don’t want to be writing this code yourself. The compiler not only generates code that keeps track of where the code has reached, but it’s also smart about how to handle finally blocks, which aren’t quite as simple as you might think.

2.4.5. Evaluation of finally blocks

It may seem odd that I’d focus on finally blocks out of all the syntax that C# has for managing execution flow, but the way that they’re handled in iterators is both interesting and important for the usefulness of the feature. In reality, it’s far more likely that you’ll use using statements than the raw finally blocks, but you can view using statements as effectively built with finally blocks, so the same behavior holds.

To demonstrate how the execution flow works, the following listing shows a trivial iterator block that yields two items within a try block and writes its progress to the console. You’ll then use the method in a couple of ways.

Listing 2.14. An iterator that logs its progress
static IEnumerable<string> Iterator()
{
    try
    {
        Console.WriteLine("Before first yield");
        yield return "first";
        Console.WriteLine("Between yields");
        yield return "second";
        Console.WriteLine("After second yield");
    }
    finally
    {
        Console.WriteLine("In finally block");
    }
}

Before you run it, think about what you’d expect this to print if you just iterate over the sequence returned by the method. In particular, would you expect to see In finally block in the console when first is returned? There are two ways of thinking about it:

  • If you consider execution to be paused by the yield return statement, then logically it’s still inside the try block, and there’s no need to execute the finally block.
  • If you think about the code having to actually return to the MoveNext() caller when it hits the yield return statement, then it feels like you’re exiting the try block and should execute the finally block as normal.

Without wanting to spoil the surprise, the pause model wins. It’s much more useful and avoids other aspects that seem counterintuitive. It would be odd to execute each statement in a try block just once but execute its finally block three times, for example—once for each time you yield a value and then when you execute the rest of the method.

Let’s prove that it works that way. The following listing calls the method and iterates over the values in the sequence and prints them as it goes.

Listing 2.15. A simple foreach loop to iterate and log
static void Main()
{
    foreach (string value in Iterator())
    {
        Console.WriteLine("Received value: {0}", value);
    }
}

The output of listing 2.15 shows that the finally block is executed only once at the end:

Before first yield
Received value: first
Between yields
Received value: second
After second yield
In finally block

This also proves that lazy evaluation is working: the output from the Main() method is interleaved with the output from the Iterator() method, because the iterator is repeatedly paused and resumed.

So far, so simple, but that relied on you iterating through the whole of the sequence. What if you want to stop halfway through? If the code that’s fetching items from an iterator calls MoveNext() only once (if it needs only the first value from the sequence, for example), does that leave the iterator paused in the try block forever without ever executing the finally block?

The answer is yes and no. If you write all the calls to the IEnumerator<T> manually and call MoveNext() just once, the finally block will indeed never get executed. But if you write a foreach loop and happen to exit it without looping over the whole sequence, the finally block will get executed. The following listing demonstrates that by breaking out of the loop as soon as it sees a non-null value (which it will do immediately, of course). It’s the same as listing 2.15 but with the addition of the part in bold.

Listing 2.16. Breaking out of a foreach loop by using an iterator
static void Main()
{
    foreach (string value in Iterator())
    {
        Console.WriteLine("Received value: {0}", value);
        if (value != null)
        {
            break;
        }
    }
}

The output of listing 2.16 is as follows:

Before first yield
Received value: first
In finally block

The last line is the important one: you’re still executing the finally block. That happens automatically when you exit the foreach loop, because that has a hidden using statement. Listing 2.17 shows what listing 2.16 would look like if you couldn’t use a foreach loop and had to write the equivalent code by hand. If this looks familiar, it’s because you did the same thing in listing 2.12, but this time you’re paying more attention to the using statement.

Listing 2.17. Expansion of listing 2.16 to not use a foreach loop
static void Main()
{
    IEnumerable<string> enumerable = Iterator();
    using (IEnumerator<string> enumerator = enumerable.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            string value = enumerator.Current;
            Console.WriteLine("Received value: {0}", value);
            if (value != null)
            {
                break;
            }
        }
    }
}

The important part is the using statement. That makes sure that however you leave it, you’ll call Dispose on the IEnumerator<string>. If the iterator method is “paused” within the try block at that point, the Dispose method ends up executing the finally block. Isn’t it clever?

2.4.6. The importance of finally handling

This may sound like a minor detail, but it makes a huge difference in how applicable iterators are. It means they can be used for methods that acquire resources that need disposing of, such as file handles. It also means that they can be used to chain to other iterators with the same requirement. You’ll see in chapter 3 that LINQ to Objects uses sequences a lot, and reliable disposal is crucial to being able to work with files and other resources.

All of this requires the caller to dispose of the iterator

If you don’t call Dispose on an iterator (and you haven’t iterated to the end of the sequence), you can leak resources or at least delay cleanup. This should be avoided.

The nongeneric IEnumerator interface doesn’t extend IDisposable, but the foreach loop checks whether the runtime implementation also implements IDisposable, and calls Dispose if necessary. The generic IEnumerator<T> interface does extend IDisposable, making things simpler.

If you’re iterating by calling MoveNext() manually (which can definitely have its place), you should do the same thing. If you’re iterating over a generic IEnumerable<T>, you can just use a using statement as I have in my expanded foreach loop listings. If you’re in the unfortunate position of iterating over a nongeneric sequence, you should perform the same interface check that the compiler does in foreach.

As an example of how useful it can be to acquire resources in iterator blocks, consider the following listing of a method that returns a sequence of lines read from a file.

Listing 2.18. Reading lines from a file
static IEnumerable<string> ReadLines(string path)
{
    using (TextReader reader = File.OpenText(path))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

A method like this was introduced in .NET 4.0 (File.ReadLines), but the framework method doesn’t work well if you call the method once but iterate over the result multiple times; it opens the file only once. The method in listing 2.18 opens the file each time you iterate, making it simpler to reason about. This has the downside, however, of delaying any exception due to the file not existing or not being readable. Tricky trade-offs always exist in API design.

The point of showing you this method is to demonstrate how important it is that iterator disposal is handled properly. If a foreach loop that threw an exception or returned early resulted in a dangling open file handle, the method would be close to useless. Before we leave iterators, let’s peek behind the curtain briefly and see how they’re implemented.

2.4.7. Implementation sketch

I always find it useful to see roughly what the compiler does with code, particularly for complicated situations such as iterators, async/await, and anonymous functions. This section provides only a taste; an article at http://csharpindepth.com provides far more detail. Please be aware that the exact details are implementation specific; you may find different compilers take slightly different approaches. I’d expect most to have the same basic strategy, though.

The first thing to understand is that even though you’ve written a method,[9] the compiler generates a whole new type for you to implement the relevant interfaces. Your method body is moved into a MoveNext() method in this generated type and adjusted for the execution semantics of iterators. To demonstrate the generated code, we’ll look at the code that the compiler generates for the following listing.

9

You can use iterators to write property accessors as well, but I’ll just talk about iterator methods for the rest of this section, just to be concise. The implementation is the same for property accessors.

Listing 2.19. Sample iterator method to decompile
public static IEnumerable<int> GenerateIntegers(int count)
{
    try
    {
        for (int i = 0; i < count; i++)
        {
            Console.WriteLine("Yielding {0}", i);
            yield return i;
            int doubled = i * 2;
            Console.WriteLine("Yielding {0}", doubled);
            yield return doubled;
        }
    }
    finally
    {
        Console.WriteLine("In finally block");
    }
}

Listing 2.19 shows a relatively simple method in its original form, but I’ve deliberately included five aspects that may not seem obvious:

  • A parameter
  • A local variable that needs to be preserved across yield return statements
  • A local variable that doesn’t need to be preserved across yield return statements
  • Two yield return statements
  • A finally block

The method iterates over its loop count times and yields two integers on each iteration: the iteration number and double the same value. For example, if you pass in 5, it will yield 0, 0, 1, 2, 2, 4, 3, 6, 4, 8.

The downloadable source code contains a full, manually tweaked, decompiled form of the generated code. It’s pretty long, so I haven’t included it in its entirety here. Instead, I want to give you a flavor of what’s generated. The following listing shows most of the infrastructure but none of the implementation details. I’ll explain that, and then you’ll look at the MoveNext() method, which does most of the real work.

Listing 2.20. Infrastructure of the generated code for an iterator
public static IEnumerable<int> GenerateIntegers(             1
    int count)                                               1
{
    GeneratedClass ret = new GeneratedClass(-2);
    ret.count = count;
    return ret;
}

private class GeneratedClass                                 2
    : IEnumerable<int>, IEnumerator<int>                     2
{
    public int count;                                        3
    private int state;                                       3
    private int current;                                     3
    private int initialThreadId;                             3
    private int i;                                           3

    public GeneratedClass(int state)                         4
    {
        this.state = state;
        initialThreadId = Environment.CurrentManagedThreadId;
    }

    public bool MoveNext() { ... }                           5

    public IEnumerator<int> GetEnumerator() { ... }          6

    public void Reset()
    {
        throw new NotSupportedException();                   7
    }
    public void Dispose() { ... }                            8

    public int Current { get { return current; } }           9

    private void Finally1() { ... }                          10

    IEnumerator Enumerable().GetEnumerator()                 11
    {                                                        11
        return GetEnumerator();                              11
    }                                                        11
                                                             10
    object IEnumerator.Current { get { return current; } }   11
}

  • 1 Stub method with the original declared signature
  • 2 Generated class to represent the state machine
  • 3 All the fields in the state machine with varying purposes
  • 4 Constructor called by both the stub method and GetEnumerator
  • 5 Main body of state machine code
  • 6 Creates a new state machine if necessary
  • 7 Generated iterators never support Reset
  • 8 Executes any finally blocks, if required
  • 9 Current property to return last-yielded value
  • 10 Body of a finally block for use in MoveNext and Dispose
  • 11 Explicit implementation of nongeneric interface members

Yes, that’s the simplified version. The important point to understand is that the compiler generates a state machine for you, as a private nested class. A lot of the names generated by the compiler aren’t valid C# identifiers, but I’ve provided valid ones for simplicity. The compiler still emits a method with the signature declared in the original source code, and that’s what any callers will use. All that does is create an instance of the state machine, copy any parameter to it, and return the state machine to the caller. None of the original source code is called, which corresponds to the lazy behavior you’ve already seen.

The state machine contains everything it needs to implement the iterator:

  • An indicator of where you are within the method. This is similar to an instruction counter in a CPU but simpler because you need to distinguish between only a few states
  • A copy of all the parameters, so you can obtain their values when you need them
  • Local variables within the method
  • The last-yielded value, so the caller can obtain it with the Current property

You’d expect the caller to perform the following sequence of operations:

  1. Call GetEnumerator() to obtain an IEnumerator<int>.
  2. Repeatedly call MoveNext() and then Current on the IEnumerator<int>, until MoveNext() returns false.
  3. Call Dispose for any cleanup that’s required, whether an exception was thrown or not.

In almost all cases, the state machine is used only once and only on the same thread it was created on. The compiler generates code to optimize for this case; the GetEnumerator() method checks for it and returns this if the state machine is still in its original state and is on the same thread. That’s why the state machine implements both IEnumerable<int> and IEnumerator<int>, which would be unusual to see in normal code.[10] If GetEnumerator() is called from a different thread or multiple times, those calls create a new instance of the state machine with the initial parameter values copied in.

10

If the original method returns only IEnumerator<T>, the state machine implements only that.

The MoveNext() method is the complicated bit. The first time it’s called, it just needs to start executing the code written in the method as normal; but on subsequent calls, however, it needs to effectively jump to the right point in the method. The local variables need to be preserved between calls as well, so they’re stored in fields in the state machine.

In an optimized build, some local variables don’t have to be copied into fields. The point of using a field is so you can keep track of the value you set in one MoveNext() call when you come back in the next MoveNext() call. If you look at the doubled local variable from listing 2.19, it’s never used like that:

for (int i = 0; i < count; i++)
{
    Console.WriteLine("Yielding {0}", i);
    yield return i;
    int doubled = i * 2;
    Console.WriteLine("Yielding {0}", doubled);
    yield return doubled;
}

All you do is initialize the variable, print it out, and then yield it. When you return to the method, that value is irrelevant so the compiler can optimize it into a real local variable in a release build. In a debug build, it may still be present to improve the debugging experience. Notice that if you swapped the last two bold lines in the preceding code—yielded the value and then printed it—the optimization wouldn’t be possible.

What does a MoveNext() method look like? It’s difficult to give real code without getting stuck in too much detail, so the following listing gives a sketch of the structure.

Listing 2.21. Simplified MoveNext() method
public bool MoveNext()
{
    try
    {
        switch (state)
        {
                          1
        }
                          2
    }
    fault                 3
    {
        Dispose();        4
    }
}

  • 1 Jump table to get to the right part of the rest of the method
  • 2 Method code returning at each yield return
  • 3 Fault block executed only on exceptions
  • 4 Clean up on exceptions

The state machine contains a variable (in our case, called state) that remembers where it reached. The precise values used depend on the implementation, but in the version of Roslyn I happened to use, the states were effectively as follows:

  • –3MoveNext()currently executing
  • –2GetEnumerator() not yet called
  • –1—Completed (whether successfully or not)
  • 0GetEnumerator() called but MoveNext() not yet called (start of method)
  • 1—At the first yield return statement
  • 2—At the second yield return statement

When MoveNext() is called, it uses this state to jump to the right place in the method to either start executing for the first time or resume from the previous yield return statement. Notice that there aren’t any states for positions in the code such as “just assigned a value to the doubled variable,” because you never need to resume from there; you need to resume only from where you previously paused.

The fault block near the end of listing 2.21 is an IL construct with no direct equivalent in C#. It’s like a finally block that executes when an exception is thrown but without catching the exception. This is used to perform any cleanup operations required; in our case, that’s the finally block. The code in that finally block is moved into a separate method that’s called from Dispose() (if an exception has been thrown) and MoveNext() (if you reach it without an exception). The Dispose() method checks the state to see what cleanup is required. That becomes more complicated the more finally blocks there are.

Looking at the implementation isn’t enlightening in terms of teaching you more C# coding techniques, but it’s great for building an appreciation of how much the compiler is capable of doing on your behalf. The same ideas come into play again in C# 5 with async/await, where instead of pausing until the MoveNext() is called again, asynchronous methods effectively pause until an asynchronous operation has completed.

We’ve now covered the biggest features of C# 2, but several smaller features were introduced at the same time. These features are reasonably simple to describe, which is why I’ve lumped them all together here. They’re not otherwise related, but sometimes that’s just the way language design happens.

2.5. Minor features

Some of the features described in this section are rarely used in my experience, but others are common in any modern C# codebase. The time it takes to describe a feature doesn’t always correlate with how useful it is. In this section, you’ll look at the following:

  • Partial types that allow code for a single type to be split across multiple source files
  • Static classes for utility types
  • Separate accessibility (public, private, and so on) for get and set accessors in properties
  • Improvements to namespace aliases to make it easier to work with code that uses the same names in multiple namespaces or assemblies
  • Pragma directives that allow additional compiler-specific features such as temporarily disabling warnings
  • Fixed-size buffers for inline data in unsafe code
  • The [InternalsVisibleTo] attribute, which makes testing simpler

Each feature is independent of the others, and the order in which I’ve described them is unimportant. If you know just enough about one of these sections to know it’s irrelevant to you, you can safely skip it without that becoming a problem later.

2.5.1. Partial types

Partial types allow a single class, struct, or interface to be declared in multiple parts and usually across multiple source files. This is typically used with code generators. Multiple code generators can contribute different parts to the same type, and these can be further augmented by manually written code. The various parts are combined by the compiler and act as if they were all declared together.

Partial types are declared by adding the partial modifier to the type declaration. This must be present in every part. The following listing shows an example with two parts and demonstrates how a method declared in one part can be used in a different part.

Listing 2.22. A simple partial class
partial class PartialDemo
{
    public static void MethodInPart1()
    {
        MethodInPart2();                      1
    }
}

partial class PartialDemo
{
    private static void MethodInPart2()       2
    {
        Console.WriteLine("In MethodInPart2");
    }
}

  • 1 Uses method declaredin second part
  • 2 Method used by first part

If the type is generic, every part has to declare the same set of type parameters with the same names, although if multiple declarations constrain the same type parameter, those constraints must be the same. Different parts can contribute different interfaces that a type implements, and the implementation doesn’t need to be in the part that specifies the interface.

Partial methods (C# 3)

C# 3 introduced an extra feature to partial types called partial methods. These are methods declared without a body in one part and then optionally implemented in another part. Partial methods are implicitly private and must be void with no out parameters. (It’s fine to use ref parameters.) At compile time, only partial methods that have implementations are retained; if a partial method hasn’t been implemented, all calls to it are removed. This sounds odd, but it allows generated code to provide optional hooks for manually written code to add extra behavior. It turns out to be useful indeed. The following listing provides an example with two partial methods, one of which is implemented and one of which isn’t.

Listing 2.23. Two partial methods—one implemented, one not
partial class PartialMethodsDemo
{
    public PartialMethodsDemo()
    {
        OnConstruction();                            1
    }

    public override string ToString()
    {
        string ret = "Original return value";
        CustomizeToString(ref ret);                  2
        return ret;
    }

    partial void OnConstruction();                   3
    partial void CustomizeToString(ref string text); 3
}

partial class PartialMethodsDemo
{
    partial void CustomizeToString(ref string text)  4
    {
        text += " - customized!";
    }
}

  • 1 Call to unimplemented partial method
  • 2 Call to implemented partial method
  • 3 Partial method declarations
  • 4 Partial method implementation

In listing 2.23, the first part would most likely be generated code, thereby allowing for additional behavior on construction and when obtaining a string representation of the object. The second part corresponds to manually written code that doesn’t need to customize construction but does want to change the string representation returned by ToString(). Even though the CustomizeToString method can’t return a value directly, it can effectively pass information back to its caller with a ref parameter.

Because OnConstruction is never implemented, it’s completely removed by the compiler. If a partial method with parameters is called, the arguments are never even evaluated when there’s no implementation.

If you ever find yourself writing a code generator, I strongly encourage you to make it generate partial classes. You may also find it useful to create partial classes in purely handwritten code; I’ve used this to split tests for large classes into multiple source files for easy organization, for example.

2.5.2. Static classes

Static classes are classes declared with the static modifier. If you’ve ever found yourself writing utility classes composed entirely of static methods, those are prime candidates to be static classes. Static classes can’t declare instance methods, properties, events, or constructors, but they can contain regular nested types.

Although it’s perfectly valid to declare a regular class with only static members, adding the static modifier signals your intent in terms of how you expect the class to be used. The compiler knows that static classes can never be instantiated, so it prevents them from being used as either variable types or type arguments. The following listing gives a brief example of what’s allowed and what’s not.

Listing 2.24. Demonstration of static classes
static class StaticClassDemo
{
    public static void StaticMethod() { }   1

    public void InstanceMethod() { }        2

    public class RegularNestedClass         3
    {
        public void InstanceMethod() { }    4
    }
}
...
StaticClassDemo.StaticMethod();             5

StaticClassDemo localVariable = null;       6
List<StaticClassDemo> list =                7
    new List<StaticClassDemo>();            7

  • 1 Fine: static classes can declare static methods.
  • 2 Invalid: static classes can’t declare instance methods.
  • 3 Fine: static classes can declare regular nested types.
  • 4 Fine: a regular type nested in a static class can declare an instance method.
  • 5 Fine: calling a static method from a static class
  • 6 Invalid: can’t declare a variable of a static class
  • 7 Invalid: can’t use a static class as a type argument

Static classes have additional special behavior in that extension methods (introduced in C# 3) can be declared only in non-nested, nongeneric, static classes.

2.5.3. Separate getter/setter access for properties

It’s hard to believe, but in C# 1, a property had only a single access modifier that was used for both the getter and the setter, assuming both were present. C# 2 introduced the ability to make one accessor more private than the other by adding a modifier to that more-private accessor. This is almost always used to make the setter more private than the getter, and by far the most common combination is to have a public getter and a private setter, like this:

private string text;

public string Text
{
    get { return text; }
    private set { text = value; }
}

In this example, any code that has access to the property setter could just set the field value directly, but in more complex situations, you may want to add validation or change notification. Using a property allows behavior like this to be encapsulated nicely. Although this could be put in a method instead, using a property feels more idiomatic in C#.

2.5.4. Namespace aliases

Namespaces are used to allow multiple types with the same name to be declared but in different namespaces. This avoids long and convoluted type names just for the sake of uniqueness. C# 1 already supported namespaces and even namespace aliases so you could make it clear which type you meant if you had a single piece of code that needed to use types with the same name from different namespaces. The following listing shows how one method can refer to the Button classes from both Windows Forms and ASP.NET Web Forms.

Listing 2.25. Namespace aliases in C# 1
using System;
using WinForms = System.Windows.Forms;              1
using WebForms = System.Web.UI.WebControls;         1

class Test
{
    static void Main()
    {
        Console.WriteLine(typeof(WinForms.Button)); 2
        Console.WriteLine(typeof(WebForms.Button)); 2
    }
}

  • 1 Introduces namespace aliases
  • 2 Uses the aliases to qualify a name

C# 2 extends the support for namespace aliases in three important ways.

Namespace alias qualifier syntax

The WinForms.Button syntax in listing 2.25 works fine so long as there isn’t a type called WinForms as well. At that point, the compiler would treat WinForms.Button as an attempt to use a member called Button within the type WinForms instead of using the namespace alias. C# 2 solves this by introducing a new piece of syntax called a namespace alias qualifier, which is just a pair of colons. This is used only for namespace aliases, thereby removing any ambiguity. Using namespace alias qualifiers, the Main method in listing 2.25 would become the following:

static void Main()
{
    Console.WriteLine(typeof(WinForms::Button));
    Console.WriteLine(typeof(WebForms::Button));
}

Resolving ambiguity is useful for more than just helping the compiler. More important, it helps anyone reading your code understand that the identifier before the :: is expected to be a namespace alias, not a type name. I suggest using :: anywhere you use a namespace alias.

The global namespace alias

Although it’s unusual to declare types in the global namespace in production code, it can happen. Prior to C# 2, there was no way of fully qualifying a reference to a type in the namespace. C# 2 introduces global as a namespace alias that always refers to the global namespace. In addition to referring to types in the global namespace, the global namespace alias can be used as a sort of “root” for fully qualified names, and this is how I’ve used it most often.

As an example, recently I was dealing with some code with a lot of methods using DateTime parameters. When another type called DateTime was introduced into the same namespace, that caused problems for these method declarations. Although I could’ve introduced a namespace alias for the System namespace, it was simpler to replace each method parameter type with global::System.DateTime. I find that namespace aliases in general, and particularly the global namespace alias, are especially useful when writing code generators or working with generated code where collisions are more likely to occur.

Extern aliases

So far I’ve been talking about naming collisions between multiple types with the same name but in different namespaces. What about a more worrying collision: two types with the same name in the same namespace but provided by different assemblies?

This is definitely a corner case, but it can come up, and C# 2 introduced extern aliases to handle it. Extern aliases are declared in source code without any specified association, like this:

extern alias FirstAlias;
extern alias SecondAlias;

In the same source code, you can then use the alias in using directives or writing fully qualified type names. For example, if you were using Json.NET but had an additional assembly that declared Newtonsoft.Json.Linq.JObject, you could write code like this:

extern alias JsonNet;
extern alias JsonNetAlternative;

using JsonNet::Newtonsoft.Json.Linq;
using AltJObject = JsonNetAlternative::Newtonsoft.Json.Linq.JObject;
...
JObject obj = new JObject();           1
AltJObject alt = new AltJObject();     2

  • 1 Uses the regular Json.NET JObject type
  • 2 Uses the JObject type in the alternative assembly

That leaves one problem: associating each extern alias with an assembly. The mechanism for doing this is implementation specific. For example, it could be specified in project options or on the compiler command line.

I can’t remember ever having to use extern aliases myself, and I’d normally expect them to be used as a stopgap solution while alternative approaches were being found to avoid the naming collision to start with. But I’m glad they exist to allow those temporary solutions.

2.5.5. Pragma directives

Pragma directives are implementation-specific directives that give extra information to the compiler. A pragma directive can’t change the behavior of the program to contravene anything within the C# language specification, but it can do anything outside the scope of the specification. If the compiler doesn’t understand a particular pragma directive, it can issue a warning but not an error. The syntax for pragma directives is simple: it’s just #pragma as the first nonwhitespace part of a line followed by the text of the pragma directive.

The Microsoft C# compiler supports pragma directives for warnings and checksums. I’ve always seen checksum pragmas only in generated code, but warning pragmas are useful for disabling and reenabling specific warnings. For example, to disable warning CS0219 (“variable is assigned but its value is never used”) for a specific piece of code, you might write this:

#pragma warning disable CS0219
int variable = CallSomeMethod();
#pragma warning restore CS0219

Until C# 6, warnings could be specified only using numbers. Roslyn makes the compiler pipeline more extensible, thereby allowing other packages to contribute warnings as part of the build. To accommodate this, the language was changed to allow a prefix (for example, CS for the C# compiler) to be specified as part of the warning identifier as well. I recommend always including the prefix (CS0219 rather than just 0219 in the preceding example) for clarity.

If you omit a specific warning identifier, all warnings will be disabled or restored. I’ve never used this facility, and I recommend against it in general. Usually, you want to fix warnings instead of disabling them, and disabling them on a blanket basis hides information about problems that might be lurking in your code.

2.5.6. Fixed-size buffers

Fixed-size buffers are another feature I’ve never used in production code. That doesn’t mean you won’t find them useful, particularly if you use interop with native code a lot.

Fixed-size buffers can be used only in unsafe code and only within structs. They effectively allocate a chunk of memory inline within the struct using the fixed modifier. The following listing shows a trivial example of a struct that represents 16 bytes of arbitrary data and two 32-bit integers to represent the major and minor versions of that data.

Listing 2.26. Using fixed-size buffers for a versioned chunk of binary data
unsafe struct VersionedData
{
    public int Major;
    public int Minor;
    public fixed byte Data[16];
}

unsafe static void Main()
{
    VersionedData versioned = new VersionedData();
    versioned.Major = 2;
    versioned.Minor = 1;
    versioned.Data[10] = 20;
}

I’d expect the size of a value of this struct type to be 24 bytes or possibly 32 bytes if the runtime aligned the fields to 8-byte boundaries. The important point is that all of the data is directly within the value; there’s no reference to a separate byte array. This struct could be used for interoperability with native code or just used within regular managed code.

Warning

Although I provide a general warning about using sample code in this book, I feel compelled to give a more specific one for this example. To keep the code short, I haven’t attempted to provide any encapsulation in this struct. It should be used only to get an impression of the syntax for fixed-size buffers.

Improved access to fixed-sized buffers in fields in C# 7.3

Listing 2.26 demonstrated accessing a fixed-sized buffer via a local variable. If the versioned variable had been a field instead, accessing elements of versioned.Data would’ve required a fixed statement to create a pointer prior to C# 7.3. As of C# 7.3, you can access fixed-sized buffers in fields directly, although the code still needs to be in an unsafe context.

2.5.7. InternalsVisibleTo

The final feature for C# 2 is as much a framework and runtime feature as anything else. It isn’t even mentioned in the language specification, although I’d expect any modern C# compiler to be aware of it. The framework exposes an attribute called [InternalsVisibleToAttribute], which is an assembly-level attribute with a single parameter specifying another assembly. This allows internal members of the assembly containing the attribute to be used by the assembly specified in the attribute, as shown in the following example:

[assembly:InternalsVisibleTo("MyProduct.Test")]

When the assembly is signed, you need to include the public key in the assembly name. For example, in Noda Time I have this:

[assembly: InternalsVisibleTo("NodaTime.Test,PublicKey=0024...4669"]

The real public key is much longer than that, of course. Using this attribute with signed assemblies is never pretty, but you don’t need to look at the code often. I’ve used the attribute in three kinds of situations, one of which I later regretted:

  • Allowing a test assembly access to internal members to make testing easier
  • Allowing tools (which are never published) access to internal members to avoid code duplication
  • Allowing one library access to internal members in another closely related library

The last of these was a mistake. We’re used to expecting that we can change internal code without worrying about versioning, but when internal code is exposed to another library that’s versioned independently, it takes on the same versioning characteristics as public code. I don’t intend to do that again.

For testing and tools, however, I’m a big fan of making the internals visible. I know there’s testing dogma around testing only the public API surface, but often if you’re trying to keep the public surface small, allowing your tests access to the internal code allows you to write much simpler tests, which means you’re likely to write more of them.

Summary

  • The changes in C# 2 made an enormous difference to the look and feel of idiomatic C#. Working without generics or nullable types is frankly horrible.
  • Generics allow both types and methods to say more about the types in their API signatures. This promotes compile-time type safety without a lot of code duplication.
  • Reference types have always had the ability to use a null value to express an absence of information. Nullable value types apply that idea to value types with support in the language, runtime, and framework to make them easy to work with.
  • Delegates became easier to work with in C# 2, and method group conversions for regular methods and anonymous methods provide even more power and brevity.
  • Iterators allow code to produce sequences that are lazily evaluated, which effectively pauses a method until the next value is requested.
  • Not all features are huge. Small features such as partial types and static classes can still have a significant impact. Some of these won’t affect every developer but will be vital for niche use cases.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset