Chapter 13. Improving efficiency with more pass by reference

This chapter covers

  • Aliasing variables with the ref keyword
  • Returning variables by reference with ref returns
  • Efficient argument passing with in parameters
  • Preventing data changes with read-only ref returns, read-only ref locals, and read-only struct declarations
  • Extension methods with in or ref targets
  • Ref-like structs and Span<T>

When C# 7.0 came out, it had a couple of features that struck me as slightly odd: ref local variables and ref returns. I was slightly skeptical about how many developers would need them, as they seemed to be targeted situations involving large value types, which are rare. My expectation was that only near-real-time services and games would find these useful.

C# 7.2 brought another raft of ref-related features: in parameters, read-only ref locals and returns, read-only structs, and ref-like structs. These were complementary to the 7.0 features but still appeared to be making the language more complicated for the benefit of a small set of users.

I’m now convinced that although many developers may not directly see more ref-based code in their projects, they’ll reap the benefits of the features existing because more-efficient facilities are being made available in the framework. At the time of writing, it’s too early to say for sure how revolutionary this will prove, but I think it’s likely to be significant.

Often performance comes at the expense of readability. I still believe that’s the case with many of the features described in this chapter; I’m expecting them to be used sparingly in cases where performance is known to be important enough to justify the cost. The framework changes enabled by all of this are a different matter, though. They should make it reasonably easy to reduce object allocations and save both memory and garbage collector work without making your code harder to read.

I bring all of this up because you may have similar reactions. While reading this chapter, it’s entirely reasonable to decide that you’ll try to avoid most of the language features here. I urge you to plow on to the end, though, to see the framework-related benefits. The final section, on ref-like structs, introduces Span<T>. Far more can be said about spans than I have room to write in this book, but I expect spans and related types to be important parts of the developer toolbox in the future.

Throughout this chapter, I’ll mention when a feature is available only in a point release of C# 7. As with other point release features, that means if you’re using a C# 7 compiler, you’ll be able to take advantage of those features only with appropriate project settings to specify the language version. I suggest you take an all-or-nothing approach to ref-related features: either use them all, with appropriate settings to allow this, or use none of them. Using only the features in C# 7.0 is likely to be less satisfying. With all of that said, let’s start by revisiting the use of the ref keyword in earlier versions of C#.

13.1. Recap: What do you know about ref?

You need a firm grasp of how ref parameters work in C# 6 and earlier in order to understand the ref-related features in C# 7. This, in turn, requires a firm grasp of the difference between a variable and its value.

Different developers have different ways of thinking about variables, but my mental model is always that of a piece of paper, as shown in figure 13.1. The piece of paper has three items of information:

Figure 13.1. Representing a variable as a piece of paper

  • The name of the variable
  • The compile-time type
  • The current value

Assigning a new value to the variable is just a matter of erasing the current value and writing a new one instead. When the type of the variable is a reference type, the value written on the piece of paper is never an object; it’s always an object reference. An object reference is just a way of navigating to an object in the same way that a street address is a way of navigating to a building. Two pieces of paper with the same address written on them refer to the same building, just as two variables with the same reference value refer to the same object.

Tip

The ref keyword and object references are different concepts. Similarities certainly exist, but you need to distinguish between them. Passing an object reference by value isn’t the same thing as passing a variable by reference, for example. In this section, I’ve emphasized the difference by using object reference instead of just reference.

Importantly, when an assignment copies one variable’s value into another variable, it really is just the value that’s copied; the two pieces of paper remain independent, and a later change to either variable doesn’t change the other. Figure 13.2 illustrates this concept.

Figure 13.2. Assignment copying a value into a new variable

This sort of value copying is exactly what happens with a value parameter when you call a method; the value of the method argument is copied onto a fresh piece of paper—the parameter—as shown in figure 13.3. The argument doesn’t have to be a variable; it can be any expression of an appropriate type.

Figure 13.3. Calling a method with value parameters: the parameters are new variables that start with the values of the arguments.

A ref parameter behaves differently, as shown in figure 13.4. Instead of acting as a new piece of paper, a reference parameter requires the caller to provide an existing piece of paper, not just an initial value. You can think of it as a piece of paper with two names written on it: the one the calling code uses to identify it and the parameter name.

Figure 13.4. A ref parameter uses the same piece of paper rather than creating a new one with a copy of the value.

If the method modifies the value of the ref parameter, thereby changing what’s written on the paper, then when the method returns, that change is visible to the caller because it’s on the original piece of paper.

Note

There are different ways of thinking about ref parameters and variables. You may read other authors who treat ref parameters as entirely separate variables that just have an automatic layer of indirection so that any access to the ref parameter follows the indirection first. That’s closer to what the IL represents, but I find it less helpful.

There’s no requirement that each ref parameter uses a different piece of paper. The following listing provides a somewhat extreme example, but it’s good for checking your understanding before moving on to ref locals.

Listing 13.1. Using the same variable for multiple ref parameters
static void Main()
{
    int x = 5;
    IncrementAndDouble(ref x, ref x);
    Console.WriteLine(x);
}

static void IncrementAndDouble(ref int p1, ref int p2)
{
    p1++;
    p2 *= 2;
}

The output here is 12: x, p1, p2 all represent the same piece of paper. It starts with a value of 5; p1++ increments it to 6, and p2 *= 2 doubles it to 12. Figure 13.5 shows a graphical representation of the variables involved.

Figure 13.5. Two ref parameters referring to the same piece of paper

A common way of talking about this is aliasing: in the preceding example, the variables x, p1, and p2 are all aliases for the same storage location. They’re different ways of getting to the same piece of memory.

Apologies if this seems long-winded and old hat. You’re now ready to move on to the genuinely new features of C# 7. With the mental model of variables as pieces of paper, understanding the new features will be much easier.

13.2. Ref locals and ref returns

Many of the ref-related C# 7 features are interconnected, which makes it harder to understand the benefits when you see them one at a time. While I’m describing the features, the examples will be even more contrived than normal, as they try to demonstrate just a single point at a time. The first two features you’ll look at are the ones introduced in C# 7.0, although even they were enhanced in C# 7.2. First up, ref locals.

13.2.1. Ref locals

Let’s continue our earlier analogy: ref parameters allow a piece of paper to be shared between variables in two methods. The same piece of paper used by the caller is the one that the method uses for the parameter. Ref locals take that idea one step further by allowing you declare a new local variable that shares the same piece of paper as an existing variable.

The following listing shows a trivial example of this, incrementing twice via different variables and then showing the result. Note that you have to use the ref keyword in both the declaration and in the initializer.

Listing 13.2. Incrementing twice via two variables
int x = 10;
ref int y = ref x;
x++;
y++;
Console.WriteLine(x);

This prints 12, just as if you’d incremented x twice.

Any expression of the appropriate type that’s classified as a variable can be used to initialize a ref local, including array elements. If you have an array of large mutable value types, this can avoid unnecessary copy operations in order to make multiple changes. The following listing creates an array of tuples and then modifies both items within each array element without copying.

Listing 13.3. Modifying array elements using ref local
var array = new (int x, int y)[10];

for (int i = 0; i < array.Length; i++)         1
{                                              1
    array[i] = (i, i);                         1
}                                              1

for (int i = 0; i < array.Length; i++)         2
{                                              2
    ref var element = ref array[i];            2
    element.x++;                               2
    element.y *= 2;                            2
}                                              2

  • 1 Initializes the array with (0, 0), (1, 1), and so on
  • 2 For each element of the array, increments x and doubles y

Before ref locals, there would’ve been two alternatives to modify the array. You could use either multiple array access expressions such as the following:

for (int i = 0; i < array.Length; i++)
{
    array[i].x++;
    array[i].y *= 2;
}

Or you could copy the whole tuple out of the array, modify it, and then copy it back:

for (int i = 0; i < array.Length; i++)
{
    var tuple = array[i];
    tuple.x++;
    tuple.y *= 2;
    array[i] = tuple;
}

Neither of these is particularly appealing. The ref local approach expresses our aim of working with an array element as a normal variable for the body of the loop.

Ref locals can also be used with fields. The behavior for a static field is predictable, but the behavior for instance fields may surprise you. Consider the following listing, which creates a ref local to alias a field in one instance via a variable (obj) and then changes the value of obj to refer to a different instance.

Listing 13.4. Aliasing the field of a specific object by using ref local
class RefLocalField
{
    private int value;

    static void Main()
    {
        var obj = new RefLocalField();         1
        ref int tmp = ref obj.value;           2
        tmp = 10;                              3
        Console.WriteLine(obj.value);          4

        obj = new RefLocalField();             5
        Console.WriteLine(tmp);                6
        Console.WriteLine(obj.value);          7
    }
}

  • 1 Creates an instance of RefLocalField
  • 2 Declares a ref local variable referring to the field of the first instance
  • 3 Assigns a new value to ref local
  • 4 Demonstrates that this has modified the field
  • 5 Reassigns the obj variable to refer to a second instance of RefLocalField
  • 6 Demonstrates that tmp still uses the field in the first instance
  • 7 Demonstrates that the value of the field in the second instance really is 0

The output is shown here:

10
10
0

The possibly surprising line is the middle one. It demonstrates that using tmp isn’t the same as using obj.value each time. Instead, tmp acts as an alias for the field expressed as obj.value at the point of initialization. Figure 13.6 shows a snapshot of the variables and objects involved at the end of the Main method.

Figure 13.6. At the end of listing 13.4, the tmp variable refers to a field in the first instance created, whereas the value of obj refers to a different instance.

As a corollary of this, the tmp variable will prevent the first instance from being garbage collected until after the last use of tmp in the method. Similarly, using a ref local for an array element stops the array containing that element from being garbage collected.

Note

A ref variable that refers to a field within an object or an array element makes life harder for the garbage collector. It has to work out which object the variable is part of and keep that object alive. Regular object references are simpler because they directly identify the object involved. Each ref variable that refers to a field in an object introduces an interior pointer into a data structure maintained by the garbage collector. It’d be expensive to have a lot of these present concurrently, but ref variables can occur only on the stack, which makes it less likely that there’ll be enough to cause performance issues.

Ref locals do have a few restrictions around their use. Most of them are obvious and won’t get in your way, but it’s worth knowing them just so you don’t experiment to try to work around them.

Initialization: Once, only once, and at declaration (before C# 7.3)

Ref locals always have to be initialized at the point of declaration. For example, the following code is invalid:

int x = 10;
ref int invalid;
invalid = ref int x;

Likewise, there’s no way to change a ref local to alias a different variable. (In our model terms, you can’t rub the name off and then write it on a different piece of paper.) Of course, the same variable can effectively be declared several times; for example, in listing 13.3, you declared the element variable in a loop:

for (int i = 0; i < array.Length; i++)
{
    ref var element = ref array[i];
    ...
}

On each iteration of the loop, element will alias a different array element. But that’s okay, because it’s effectively a new variable on each iteration.

The variable used to initialize the ref local has to be definitely assigned, too. You might expect the variables to share definite assignment status, but rather than making the definite assignment rules even more complicated, the language designers ensured that ref locals are always definitely assigned. Here’s an example:

int x;
ref int y = ref x;      1
x = 10;
Console.WriteLine(y);

  • 1 Invalid, as x isn’t definitely assigned

This code doesn’t try to read from any variable until everything is definitely assigned, but it’s still invalid.

C# 7.3 lifts the restriction on reassignment, but ref locals still have to be initialized at the point of declaration using a definitely assigned variable. For example:

int x = 10;
int y = 20;
ref int r = ref x;
r++;
r = ref y;                             1
r++;
Console.WriteLine($"x={x}; y={y}");    2

  • 1 Valid only in C# 7.3
  • 2 Prints x = 11, y = 21

I urge a degree of caution around using this feature anyway. If you need the same ref variable to refer to different variables over the course of a method, I suggest at least trying to refactor the method to be simpler.

No ref fields, or local variables that would live beyond the method call

Although a ref local can be initialized using a field, you can’t declare a field using ref. This is one aspect of protecting against having a ref variable that acts like an alias for another variable with a shorter lifetime. It’d be problematic if you could create an object with a field that aliased a local variable in a method; what would happen to that field after the method had returned?

The same concern around lifetimes extends to local variables in three cases:

  • Iterator blocks can’t contain ref locals.
  • Async methods can’t contain ref locals.
  • Ref locals can’t be captured by anonymous methods or local methods. (Local methods are described in chapter 14.)

These are all cases where local variables can live beyond the original method call. At times, the compiler could potentially prove that it wouldn’t cause a problem, but the language rules have been chosen for simplicity. (One simple example of this is a local method that’s only called by the containing method rather than being used in a method group conversion.)

No references to read-only variables

Any ref local variable introduced in C# 7.0 is writable; you can write a new value on the piece of paper. That causes a problem if you try to initialize the ref local by using a piece of paper that isn’t writable. Consider this attempt to violate the readonly modifier:

class MixedVariables
{
    private int writableField;
    private readonly int readonlyField;

    public void TryIncrementBoth()
    {
        ref int x = ref writableField;     1
        ref int y = ref readonlyField;     2

        x++;                               3
        y++;                               3
    }
}

  • 1 Aliases a writable field
  • 2 Attempts to alias a readonly field
  • 3 Increments both variables

If this were valid, all the reasoning we’ve built up over the years about read-only fields would be lost. Fortunately, that isn’t the case; the compiler prevents the assignment to y just as it would prevent any direct modification of readonlyField. But this code would be valid in the constructor for the MixedVariables class, because in that situation you’d be able to write directly to readonlyField as well. In short, you can initialize a ref local only in a way that aliases a variable you’d be able to write to in other situations. This matches the behavior from C# 1.0 onward for using fields as arguments for ref parameters.

This restriction can be frustrating if you want to take advantage of the sharing aspect of ref locals without the writable aspect. In C# 7.0, that’s a problem; but you’ll see in section 13.2.4 that C# 7.2 offers a solution.

Types: Only identity conversions are permitted

The type of the ref local either has to be the same as the type of the variable it’s being initialized with or there has to be an identity conversion between the two types. Any other conversion—even reference conversions that are allowed in many other scenarios—aren’t enough. The following listing shows an example of a ref local declaration using a tuple-based identity conversion that you learned about in chapter 11.

Note

See section 11.3.3 for a reminder on identity conversions.

Listing 13.5. Identity conversion in ref local declaration
(int x, int y) tuple1 = (10, 20);
ref (int a, int b) tuple2 = ref tuple1;
tuple2.a = 30;
Console.WriteLine(tuple1.x);

This prints 30, as tuple1 and tuple2 share the same storage location; tuple1.x and tuple2.a are equivalent to each other, as are tuple1.y and tuple2.b.

In this section, you’ve looked at initializing ref locals from local variables, fields, and array elements. A new kind of expression is categorized as a variable in C# 7: the variable returned by a ref return method.

13.2.2. Ref returns

In some ways, it should be easy to understand ref returns. Using our previous model, it’s the idea that a method can return a piece of paper instead of a value. You need to add the ref keyword to the return type and to any return statement. The calling code will often declare a ref local to receive the return value, too. This means you have to sprinkle the ref keyword pretty liberally in your code to make it very clear what you’re trying to do. The following listing shows about the simplest possible use of ref return; the RefReturn method returns whatever variable was passed into it.

Listing 13.6. Simplest possible ref return demonstration
static void Main()
{
    int x = 10;
    ref int y = ref RefReturn(ref x);
    y++;
    Console.WriteLine(x);
}

static ref int RefReturn(ref int p)
{
    return ref p;
}

This prints 11, because x and y are on the same piece of paper, just as if you’d written

ref int y = ref x;

The method is essentially an identity function just to show the syntax. It could’ve been written as an expression-bodied method, but I wanted to make the return part clear.

So far, so simple, but a lot of details get in the way, mostly because the compiler makes sure that any piece of paper that’s returned is still going to exist when the method has finished returning. It can’t be a piece of paper that was created in the method.

To put this in implementation terms, a method can’t return a storage location that it’s just created on the stack, because when the stack is popped, the storage location won’t be valid anymore. When describing how the C# language works, Eric Lippert is fond of saying that the stack is an implementation detail (see http://mng.bz/oVvZ). In this case, it’s an implementation detail that leaks into the language. The restrictions are for the same reasons that ref fields are prohibited, so if you feel you understand one of these, you can apply the same logic to the other.

I won’t go into an exhaustive list of every kind of variable that can and can’t be returned using ref return, but here are the most common examples:

Valid

  • ref or out parameters
  • Fields of reference types
  • Fields of structs where the struct variable is a ref or out parameter
  • Array elements
Invalid

  • Local variables declared in the method (including value parameters)
  • Fields of struct variables declared in the method

In addition to these restrictions on what can and can’t be returned, ref return is entirely invalid in async methods and iterator blocks. Similar to pointer types, you can’t use the ref modifier in a type argument, although it can appear in interface and delegate declarations. For example, this is entirely valid

delegate ref int RefFuncInt32();

But you couldn’t get the same result by trying to refer to Func<ref int>.

Ref return doesn’t have to be used with a ref local. If you want to perform a single operation on the result, you can do that directly. The following listing shows this using the same code as listing 13.6 but without the ref local.

Listing 13.7. Incrementing the result of a ref return directly
static void Main()
{
    int x = 10;
    RefReturn(ref x)++;          1
    Console.WriteLine(x);
}

static ref int RefReturn(ref int p)
{
    return ref p;
}

  • 1 Increments the returned variable directly

Again, this is equivalent to incrementing x, so the output is 11. In addition to modifying the resulting variable, you can use it as a ref argument to another method. To make our already purely demonstrative example even sillier, you could call RefReturn with the result of itself (twice):

RefReturn(ref RefReturn(ref RefReturn(ref x)))++;

Ref returns are valid for indexers as well as methods. This is most commonly useful to return an array element by reference, as shown in the following listing.

Listing 13.8. A ref return indexer exposing array elements
class ArrayHolder
{
    private readonly int[] array = new int[10];
    public ref int this[int index] => ref array[index];    1
}

static void Main()
{
    ArrayHolder holder = new ArrayHolder();
    ref int x = ref holder[0];                             2
    ref int y = ref holder[0];                             2

    x = 20;                                                3
    Console.WriteLine(y);                                  4
}

  • 1 Indexer returns an array element by reference
  • 2 Declares two ref locals referring to the same array element
  • 3 Changes the array element value via x
  • 4 Observes the change via y

You’ve now covered all the new features in C# 7.0, but later point releases expanded the set of ref-related features. The first feature was one I was quite frustrated by when writing my initial draft of this chapter: lack of conditional ?: operator support.

13.2.3. The conditional ?: operator and ref values (C# 7.2)

The conditional ?: operator has been present since C# 1.0 and is familiar from other languages:

condition ? expression1 : expression2

It evaluates its first operand (the condition), and then evaluates either the second or third operand to provide the overall result. It feels natural to want to achieve the same thing with ref values, picking one or another variable based on a condition.

With C# 7.0, this wasn’t feasible, but it is in C# 7.2. A conditional operator can use ref values for the second and third operands, at which point the result of the conditional operator is also a variable that can be used with the ref modifier. As an example, the following listing shows a method that counts the even and odd values in a sequence, returning the result as a tuple.

Listing 13.9. Counting even and odd elements in a sequence
static (int even, int odd) CountEvenAndOdd(IEnumerable<int> values)
{
    var result = (even: 0, odd: 0);
    foreach (var value in values)
    {
        ref int counter = ref (value & 1) == 0 ?    1
            ref result.even : ref result.odd;       1
        counter++;                                  2
    }
    return result;
}

  • 1 Picks the appropriate variable to increment
  • 2 Increments it

The use of a tuple here is somewhat coincidental, although it serves to demonstrate how useful it is for tuples to be mutable. This addition makes the language feel much more consistent. The result of the conditional operator can be used as an argument to a ref parameter, assigned to a ref local, or used in a ref return. It all just drops out nicely. The next C# 7.2 feature addresses an issue you looked at in section 13.2.1 when discussing the restrictions on ref locals: how do you take a reference to a read-only variable?

13.2.4. Ref readonly (C# 7.2)

So far, all the variables you’ve been aliasing have been writable. In C# 7.0, that’s all that’s available. But that’s insufficient in two parallel scenarios:

  • You may want to alias a read-only field for the sake of efficiency to avoid copying.
  • You may want to allow only read-only access via the ref variable.

The introduction of ref readonly in C# 7.2 addresses both scenarios. Both ref locals and ref returns can be declared with the readonly modifier, and the result is read-only, just like a read-only field. You can’t assign a new value to the variable, and if it’s a struct type, you can’t modify any fields or call property setters.

Tip

Given that one of the reasons for using ref readonly is to avoid copying, you could be surprised to hear that sometimes it has the opposite effect. You’ll look at this in detail in section 13.4. Don’t start using ref readonly in your production code without reading that section!

The two places you can put the modifier work together: if you call a method or indexer with a ref readonly return and want to store the result in a local variable, it has to be a ref readonly local variable, too. The following listing shows how the read-only aspects chain together.

Listing 13.10. ref readonly return and local
static readonly int field = DateTime.UtcNow.Second;      1

static ref readonly int GetFieldAlias() => ref field;    2

static void Main()
{
    ref readonly int local = ref GetFieldAlias();        3
    Console.WriteLine(local);
}

  • 1 Initializes a read-only field with an arbitrary value
  • 2 Returns a read-only alias to the field
  • 3 Initializes a read-only ref local using the method

This works with indexers, too, and it allows immutable collections to expose their data directly without any copying but without any risk of the memory being mutated. Note that you can return a ref readonly without the underlying variable being read-only, which provides a read-only view over an array, much like ReadOnlyCollection does for arbitrary collections but with copy-free read access. The following listing shows a simple implementation of this idea.

Listing 13.11. A read-only view over an array with copy-free reads
class ReadOnlyArrayView<T>
{
    private readonly T[] values;

    public ReadOnlyArrayView(T[] values) =>     1
        this.values = values;                   1

    public ref readonly T this[int index] =>    2
        ref values[index];                      2
}
...
static void Main()
{
    var array = new int[] { 10, 20, 30 };
    var view = new ReadOnlyArrayView<int>(array);

    ref readonly int element = ref view[0];
    Console.WriteLine(element);                  3
    array[0] = 100;                              3
    Console.WriteLine(element);                  3
}

  • 1 Copies the array reference without cloning contents
  • 2 Returns a read-only alias to the array element
  • 3 Modification to the array is visible via the local.

This example isn’t compelling in terms of efficiency gains because int is already a small type, but in scenarios using larger structs to avoid excessive heap allocation and garbage collection, the benefits can be significant.

Implementation details

In IL, a ref readonly method is implemented as a regular ref-returning method (the return type is a by-ref type) but with [InAttribute] from the System.Runtime.InteropServices namespace applied to it. This attribute is, in turn, specified with the modreq modifier in IL: if a compiler isn’t aware of InAttribute, it should reject any call to the method. This is a safety mechanism to prevent misuse of the method’s return value. Imagine a C# 7.0 compiler (one that’s aware of ref returns but not ref readonly returns) trying to call a ref readonly returning method from another assembly. It could allow the caller to store the result in a writable ref local and then modify it, thereby violating the intention of the ref readonly return.

You can’t declare ref readonly returning methods unless InAttribute is available to the compiler. That’s rarely an issue, because it’s been in the desktop framework since .NET 1.1 and in .NET Standard 1.1. If you absolutely have to, you can declare your own attribute in the right namespace, and the compiler will use that.

The readonly modifier can be applied to local variables and return types as you’ve seen, but what about parameters? If you have a ref readonly local and want to pass it into a method without just copying the value, what are your options? You might expect the answer to be the readonly modifier again, just applied to parameters, but reality is slightly different, as you’ll see in the next section.

13.3. in parameters (C# 7.2)

C# 7.2 adds in as a new modifier for parameters in the same style as ref or out but with a different intention. When a parameter has the in modifier, the intention is that the method won’t change the parameter value, so a variable can be passed by reference to avoid copying. Within the method, an in parameter acts like a ref readonly local variable. It’s still an alias for a storage location passed by the caller, so it’s important that the method doesn’t modify the value; the caller would see that change, which goes against the point of it being an in parameter.

There’s a big difference between an in parameter and a ref or out parameter: the caller doesn’t have to specify the in modifier for the argument. If the in modifier is missing, the compiler will pass the argument by reference if the argument is a variable but take a copy of the value as a hidden local variable and pass that by reference if necessary. If the caller specifies the in modifier explicitly, the call is valid only if the argument can be passed by reference directly. The following listing shows all the possibilities.

Listing 13.12. Valid and invalid possibilities for passing arguments for in parameters
static void PrintDateTime(in DateTime value)    1
{
    string text = value.ToString(
        "yyyy-MM-dd'T'HH:mm:ss",
        CultureInfo.InvariantCulture);
    Console.WriteLine(text);
}

static void Main()
{
    DateTime start = DateTime.UtcNow;
    PrintDateTime(start);                       2
    PrintDateTime(in start);                    3
    PrintDateTime(start.AddMinutes(1));         4
    PrintDateTime(in start.AddMinutes(1));      5
}

  • 1 Declares method with in parameter
  • 2 Variable is passed by reference implicitly.
  • 3 Variable is passed by reference explicitly (due to in modifier).
  • 4 Result is copied to hidden local variable, which is passed by reference.
  • 5 Compile-time error: argument can’t be passed by reference.

In the generated IL, the parameter is equivalent to a ref parameter decorated with [IsReadOnlyAttribute] from the System.Runtime.CompilerServices name-space. This attribute was introduced much more recently than InAttribute; it’s in .NET 4.7.1, but it’s not even in .NET Standard 2.0. It’d be annoying to have to either add a dependency or declare the attribute yourself, so the compiler generates the attribute in your assembly automatically if it’s not otherwise available.

The attribute doesn’t have the modreq modifier in IL; any C# compiler that doesn’t understand IsReadOnlyAttribute will treat it as a regular ref parameter. (The CLR doesn’t need to know about the attribute either.) Any callers recompiled with a later version of a compiler will suddenly fail to compile, because they’ll now require the in modifier instead of the ref modifier. That leads us to a bigger topic of backward compatibility.

13.3.1. Compatibility considerations

The way that the in modifier is optional at the call site leads to an interesting backward-compatibility situation. Changing a method parameter from being a value parameter (the default, with no modifiers) to an in parameter is always source compatible (you should always be able to recompile without changing calling code) but is never binary compatible (any existing compiled assemblies calling the method will fail at execution time). Exactly what that means will depend on your situation. Suppose you want to change a method parameter to be an in parameter for an assembly that has already been released:

  • If your method is accessible to callers outside your control (if you’re publishing a library to NuGet, for example), this is a breaking change and should be treated like any other breaking change.
  • If your code is accessible only to callers that will definitely be recompiled when they use the new version of your assembly (even if you can’t change that calling code), then this won’t break those callers.
  • If your method is only internal to your assembly,[1] you don’t need to worry about binary compatibility because all the callers will be recompiled anyway.

    1

    If your assembly uses InternalsVisibleTo, the situation is more nuanced; that level of detail is beyond the scope of this book.

Another slightly less likely scenario exists: if you have a method with a ref parameter purely for the sake of avoiding copying (you never modify the parameter in the method), changing that to an in parameter is always binary compatible, but never source compatible. That’s the exact opposite of changing a value parameter to an in parameter.

All of this assumes that the act of using an in parameter doesn’t break the semantics of the method itself. That’s not always a valid assumption; let’s see why.

13.3.2. The surprising mutability of in parameters: External changes

So far, it sounds like if you don’t modify a parameter within a method, it’s safe to make it an in parameter. That’s not the case, and it’s a dangerous expectation. The compiler stops the method from modifying the parameter, but it can’t do anything about other code modifying it. You need to remember that an in parameter is an alias for a storage location that other code may be able to modify. Let’s look at a simple example first, which may seem utterly obvious.

Listing 13.13. in parameter and value parameter differences in the face of side effects
static void InParameter(in int p, Action action)
{
    Console.WriteLine("Start of InParameter method");
    Console.WriteLine($"p = {p}");
    action();
    Console.WriteLine($"p = {p}");
}

static void ValueParameter(int p, Action action)
{
    Console.WriteLine("Start of ValueParameter method");
    Console.WriteLine($"p = {p}");
    action();
    Console.WriteLine($"p = {p}");
}

static void Main()
{
    int x = 10;
    InParameter(x, () => x++);
    ValueParameter(x, () => x++);
}

The first two methods are identical except for the log message displayed and the nature of the parameter. In the Main method, you call the two methods in the same way, passing in a local variable with an initial value of 10 as the argument and an action that increments the variable. The output shows the difference in semantics:

Start of InParameter method
p = 10
p = 11
Start of ValueParameter method
p = 10
p = 10

As you can see, the InParameter method is able to observe the change caused by calling action(); the ValueParameter method isn’t. This isn’t surprising; in parameters are intended to share a storage location, whereas value parameters are intended to take a copy.

The problem is that although it’s obvious in this particular case because there’s so little code, in other examples it might not be. For example, the in parameter could happen to be an alias for a field in the same class. In that case, any modifications to the field, either directly in the method or by other code that the method calls, will be visible via the parameter. That isn’t obvious either in the calling code or the method itself. It gets even harder to predict what will happen when multiple threads are involved.

I’m deliberately being somewhat alarmist here, but I think this is a real problem. We’re used to highlighting the possibility of this sort of behavior[2] with ref parameters by specifying the modifier on the parameter and the argument. Additionally, the ref modifier feels like it’s implicitly concerned with how changes in a parameter are visible, whereas the in modifier is about not changing the parameter. In section 13.3.4, I’ll give more guidance on using in parameters, but for the moment you should just be aware of the potential risk of the parameter changing its value unexpectedly.

2

I like to think of it as being similar to the quantum entanglement phenomenon known as “spooky action at a distance.”

13.3.3. Overloading with in parameters

One aspect I haven’t touched on yet is method overloading: what happens if you want two methods with the same name and the same parameter type, but in one case the parameter is an in parameter and in the second method it’s not?

Remember that as far as the CLR is concerned, this is just another ref parameter. You can’t overload the method by just changing between ref, out, and in modifiers; they all look the same to the CLR. But you can overload an in parameter with a regular value parameter:

void Method(int x) { ... }
void Method(in int x) { ... }

New tiebreaker rules in overload resolution make the method with the value parameter better with respect to an argument that doesn’t have an in modifier:

int x = 5;
Method(5);         1
Method(x);         2
Method(in x);      3

  • 1 Call to first method
  • 2 Call to first method
  • 3 Call to second method because of in modifier

These rules allow you to add overloads for existing method names without too many compatibility concerns if the existing methods have value parameters and the new methods have in parameters.

13.3.4. Guidance for in parameters

Full disclosure: I haven’t used in parameters in real code yet. The guidance here is speculative.

The first thing to note is that in parameters are intended to improve performance. As a general principle, I wouldn’t start making any changes to your code to improve performance before you’ve measured performance in a meaningful and repeatable way and set goals for it. If you’re not careful, you can complicate your code in the name of optimization, only to find out that even if you massively improved the performance of one or two methods, those methods weren’t on a critical path for the application anyway. The exact goals you have will depend on the kind of code you’re writing (games, web applications, libraries, IoT applications, or something else), but careful measurement is important. For microbenchmarks, I recommend the BenchmarkDotNet project.

The benefit of in parameters lies in reducing the amount of data that needs to be copied. If you’re using only reference types or small structs, no improvement may occur at all; logically, the storage location still needs to be passed to the method, even if the value at that storage location isn’t being copied. I won’t make too many claims here because of the black box of JIT compilation and optimization. Reasoning about performance without testing it is a bad idea: enough complex factors are involved to turn that reasoning into an educated guess at best. I’d expect the benefits of in parameters to increase as the size of the structs involved increases, however.

My main concern about in parameters is that they can make reasoning about your code much harder. You can read the value of the same parameter twice and get different results, despite your method not changing anything, as you saw in section 13.3.2. That makes it harder to write correct code and easy to write code that appears to be correct but isn’t.

There’s a way to avoid this while still getting many of the benefits of in parameters, though: by carefully reducing or removing the possibilities of them changing. If you have a public API that’s implemented via a deep stack of private method calls, you can use a value parameter for that public API and then use in parameters in the private methods. The following listing provides an example, although it’s not doing any meaningful computations.

Listing 13.14. Using in parameters safely
public static double PublicMethod(                         1
    LargeStruct first,                                     1
    LargeStruct second)                                    1
{
    double firstResult = PrivateMethod(in first);
    double secondResult = PrivateMethod(in second);
    return firstResult + secondResult;
}

private static double PrivateMethod(                       2
    in LargeStruct input)                                 2
{
    double scale = GetScale(in input);
    return (input.X + input.Y + input.Z) * scale;
}

private static double GetScale(in LargeStruct input) =>    3
    input.Weight * input.Score;

  • 1 Public method using value parameters
  • 2 Private method using an in parameter
  • 3 Another method with an in parameter

With this approach, you can guard against unexpected change; because all the methods are private, you can inspect all the callers to make sure they won’t be passing in values that could change while your method is executing. A single copy of each struct will be made when PublicMethod is called, but those copies are then aliased for use in the private methods, isolating your code from any changes the caller may be making in other threads or as side effects of the other methods. In some cases, you may want the parameter to be changeable, but in a way that you carefully document and control.

Applying the same logic to internal calls is also reasonable but requires more discipline because there’s more code that can call the method. As a matter of personal preference, I’ve explicitly used the in modifier at the call site as well as in the parameter declaration to make it obvious what’s going on when reading the code.

I’ve summed all of this up in a short list of recommendations:

  • Use in parameters only when there’s a measurable and significant performance benefit. This is most likely to be true when large structs are involved.
  • Avoid using in parameters in public APIs unless your method can function correctly even if the parameter values change arbitrarily during the method.
  • Consider using a public method as a barrier against change and then using in parameters within the private implementation to avoid copying.
  • Consider explicitly using the in modifier when calling a method that takes an in parameter unless you’re deliberately using the compiler’s ability to pass a hidden local variable by reference.

Many of these guidelines could be easily checked by a Roslyn analyzer. Although I don’t know of such an analyzer at the time of this writing, I wouldn’t be surprised to see a NuGet package become available.

Note

If you detect an implicit challenge here, you’re right. If you let me know about an analyzer like this, I’ll add a note on the website.

All of this depends on the amount of copying genuinely being reduced, and that’s not as straightforward as it sounds. I alluded to this earlier, but now it’s time to look much more closely at when the compiler implicitly copies structs and how you can avoid that.

13.4. Declaring structs as readonly (C# 7.2)

The point of in parameters is to improve performance by reducing copying for structs. That sounds great, but an obscure aspect of C# gets in our way unless we’re careful. We’ll look at the problem first and then at how C# 7.2 solves it.

13.4.1. Background: Implicit copying with read-only variables

C# has been implicitly copying structs for a long time. It’s all documented in the specification, but I wasn’t aware of it until I spotted a mysterious performance boost in Noda Time when I’d accidentally forgotten to make a field read-only.

Let’s take a look at a simple example. You’re going to declare a YearMonthDay struct with three read-only properties: Year, Month, and Day. You’re not using the built-in DateTime type for reasons that will become clear later. The following listing shows the code for YearMonthDay; it’s really simple. (There’s no validation; it’s purely for demonstration in this section.)

Listing 13.15. A trivial year/month/day struct
public struct YearMonthDay
{
    public int Year { get; }
    public int Month { get; }
    public int Day { get; }

    public YearMonthDay(int year, int month, int day) =>
        (Year, Month, Day) = (year, month, day);
}

Now let’s create a class with two YearMonthDay fields: one read-only and one read-write. You’ll then access the Year property in both fields.

Listing 13.16. Accessing properties via a read-only or read-write field
class ImplicitFieldCopy
{
    private readonly YearMonthDay readOnlyField =
        new YearMonthDay(2018, 3, 1);
    private YearMonthDay readWriteField =
        new YearMonthDay(2018, 3, 1);

    public void CheckYear()
    {
        int readOnlyFieldYear = readOnlyField.Year;
        int readWriteFieldYear = readWriteField.Year;
    }
}

The IL generated for the two property accesses is different in a subtle but important way. Here’s the IL for the read-only field; I’ve removed the namespaces from the IL for simplicity:

ldfld valuetype YearMonthDay ImplicitFieldCopy::readOnlyField
stloc.0
ldloca.s V_0
call instance int32 YearMonthDay::get_Year()

It loads the value of the field, thereby copying it to the stack. Only then can it call the get_Year() member, which is the getter for the Year property. Compare that with the code using the read-write field:

ldflda valuetype YearMonthDay ImplicitFieldCopy::readWriteField
call instance int32 YearMonthDay::get_Year()

This uses the ldflda instruction to load the address of the field onto the stack rather than ldfld, which loads the value of the field. This is only IL, which isn’t what your computer executes directly. It’s entirely possible that in some cases the JIT compiler is able to optimize this away, but in Noda Time I found that making fields read-write (with an attribute purely to explain why they weren’t read-only) made a significant difference in performance.

The reason the compiler takes this copy is to avoid a read-only field being mutated by the code within the property (or method, if you’re calling one). The intention of a read-only field is that nothing can change its value. It’d be odd if readOnlyField.SomeMethod() was able to modify the field. C# is designed to expect that any property setters will mutate the data, so they’re prohibited entirely for read-only fields. But even a property getter could try to mutate the value. Taking a copy is a safety measure, effectively.

This affects only value types

Just as a reminder, it’s fine to have a read-only field that’s a reference type and for methods to mutate the data in the objects they refer to. For example, you could have a read-only StringBuilder field, and you’d still be able to append to that StringBuilder. The value of the field is only the reference, and that’s what can’t change.

In this section, we’re focusing on the field type being a value type like decimal or DateTime. It doesn’t matter whether the type that contains the field is a class or a struct.

Until C# 7.2, only fields could be read-only. Now we have ref readonly local variables and in parameters to worry about. Let’s write a method that prints out the year, month, and day from a value parameter:

private void PrintYearMonthDay(YearMonthDay input) =>
    Console.WriteLine($"{input.Year} {input.Month} {input.Day}");

The IL for this uses the address of the value that’s already on the stack. Each property access looks as simple as this:

ldarga.s input
call instance int32 Chapter13.YearMonthDay::get_Year()

This doesn’t create any additional copies. The assumption is that if the property mutates the value, it’s okay for your input variable to be changed; it’s just a read-write variable, after all. But if you decide to change input to an in parameter like this, things change:

private void PrintYearMonthDay(in YearMonthDay input) =>
    Console.WriteLine($"{input.Year} {input.Month} {input.Day}");

Now in the IL for the method, each property access has code like this:

ldarg.1
ldobj Chapter13.YearMonthDay
stloc.0
ldloca.s V_0
call instance int32 YearMonthDay::get_Year()

The ldobj instruction copies the value from the address (the parameter) onto the stack. You were trying to avoid one copy being made by the caller, but in doing so you’ve introduced three copies within the method. You’d see the exact same behavior with readonly ref local variables, too. That’s not good! As you’ve probably guessed, C# 7.2 has a solution to this: read-only structs to the rescue!

13.4.2. The readonly modifier for structs

To recap, the reason the C# compiler needs to make copies for read-only value type variables is to avoid code within those types changing the value of the variable. What if the struct could promise that it didn’t do that? After all, most structs are designed to be immutable. In C# 7.2, you can apply the readonly modifier to a struct declaration to do exactly that.

Let’s modify our year/month/day struct to be read-only. It’s already obeying the semantics within the implementation, so you just need to add the readonly modifier:

public readonly struct YearMonthDay
{
    public int Year { get; }
    public int Month { get; }
    public int Day { get; }

    public YearMonthDay(int year, int month, int day) =>
        (Year, Month, Day) = (year, month, day);
}

After that simple change to the declaration, and without any changes to the code using the struct, the IL generated for PrintYearMonthDay(in YearMonthDay input) becomes more efficient. Each property access now looks like this:

ldarg.1
call instance int32 YearMonthDay::get_Year()

Finally, you’ve managed to avoid copying the whole struct even once.

If you look in the downloadable source code that accompanies the book, you’ll see this in a separate struct declaration: ReadOnlyYearMonthDay. That was necessary so I could have samples with before and after, but in your own code you can just make an existing struct read-only without breaking source or binary compatibility. Going in the opposite direction is an insidious breaking change, however; if you decide to remove the modifier and modify an existing member to mutate the state of the value, code that was previously compiled expecting the struct to be read-only could end up mutating read-only variables in an alarming way.

You can apply the modifier only if your struct is genuinely read-only and therefore meets the following conditions:

  • Every instance field and automatically implemented instance property must be read-only. Static fields and properties can still be read-write.
  • You can assign to this only within constructors. In specification terms, this is treated as an out parameter in constructors, a ref parameter in members of regular structs, and an in parameter in members of read-only structs.

Assuming you already intended your structs to be read-only, adding the readonly modifier allows the compiler to help you by checking that you aren’t violating that. I’d expect most user-defined structs to work right away. Unfortunately, there’s a slight wrinkle when it comes to Noda Time, which may affect you, too.

13.4.3. XML serialization is implicitly read-write

Currently, most of the structs in Noda Time implement IXmlSerializable. Unfortunately, XML serialization is defined in a way that’s actively hostile to writing read-only structs. My implementation in Noda Time typically looks like this:

void IXmlSerializable.ReadXml(XmlReader reader)
{
    var pattern = /* some suitable text parsing pattern for the type */;
    var text = /* extract text from the XmlReader */;
    this = pattern.Parse(text).Value;
}

Can you see the problem? It assigns to this in the last line. That prevents me from declaring these structs with the readonly modifier, which saddens me. I have three options at the moment:

  • Leave the structs as they are, which means in parameters and ref readonly locals are inefficient.
  • Remove XML serialization from the next major version of Noda Time.
  • Use unsafe code in ReadXml to violate the readonly modifier. The System .Runtime.CompilerServices.Unsafe package makes this simpler.

None of these options is pleasant, and there’s no twist as I reveal a cunning way of satisfying all the concerns. At the moment, I believe that structs implementing IXmlSerializable can’t be genuinely read-only. No doubt there are other interfaces that are implicitly mutable in the same way that you might want to implement in a struct, but I suspect that IXmlSerializable will be the most common one.

The good news is that most readers probably aren’t facing this issue. Where you can make your user-defined structs read-only, I encourage you to do so. Just bear in mind that it’s a one-way change for public code; you can safely remove the modifier later only if you’re in the privileged position of being able to recompile all the code that uses the struct. Our next feature is effectively tidying up consistency: providing the same functionality to extension methods that’s already present in struct instance methods.

13.5. Extension methods with ref or in parameters (C# 7.2)

Prior to C# 7.2, the first parameter in any extension method had to be a value parameter. This restriction is partially lifted in C# 7.2 to embrace the new ref-like semantics more thoroughly.

13.5.1. Using ref/in parameters in extension methods to avoid copying

Suppose you have a large struct that you’d like to avoid copying around and a method that computes a result based on the values of properties in that struct—the magnitude of a 3D vector, for example. If the struct provides the method (or property) itself, you’re fine, particularly if the struct is declared with the readonly modifier. You can avoid copying with no problems. But maybe you’re doing something more complex that the authors of the struct hadn’t considered. The samples in this section use a trivial read-only Vector3D struct introduced in the following listing. The struct just exposes X, Y, and Z properties.

Listing 13.17. A trivial Vector3D struct
public readonly struct Vector3D
{
    public double X { get; }
    public double Y { get; }
    public double Z { get; }

    public Vector3D(double x, double y, double z)
    {
        X = x;
        Y = y;
        Z = z;
    }
}

If you write your own method accepting the struct with an in parameter, you’re fine. You can avoid copying, but it may be slightly awkward to call. For example, you might end up having to write something like this:

double magnitude = VectorUtilities.Magnitude(vector);

That would be ugly. You have extension methods, but a regular extension method like this would copy the vector on each call:

public static double Magnitude(this Vector3D vector)

It’s unpleasant to have to choose between performance and readability. C# 7.2 comes to the rescue in a reasonably predictable way: you can write extension methods with a ref or in modifier on the first parameter. The modifier can appear before or after the this modifier. If you’re only computing a value, you should use an in parameter, but you can also use ref if you want to be able to modify the value in the original storage location without having to create a new value and copy it in. The following listing provides two sample extension methods on a Vector3D.

Listing 13.18. Extension methods using ref and in
public static double Magnitude(this in Vector3D vec) =>
    Math.Sqrt(vec.X * vec.X + vec.Y * vec.Y + vec.Z * vec.Z);

public static void OffsetBy(this ref Vector3D orig, in Vector3D off) =>
    orig = new Vector3D(orig.X + off.X, orig.Y + off.Y, orig.Z + off.Z);

The parameter names are abbreviated more than I’m normally comfortable with to avoid long-winded formatting in the book. Note that the second parameter in the OffsetBy method is an in parameter; you’re trying to avoid copying to as great an extent as you can.

It’s simple to use the extension methods. The only possibly surprising aspect is that unlike regular ref parameters, there’s no sign of the ref modifier when calling ref extension methods. The following listing uses both of the extension methods I’ve shown to create two vectors, offset the first vector by the second vector, and then display the resulting vector and its magnitude.

Listing 13.19. Calling ref and in extension methods
var vector = new Vector3D(1.5, 2.0, 3.0);
var offset = new Vector3D(5.0, 2.5, -1.0);

vector.OffsetBy(offset);

Console.WriteLine($"({vector.X}, {vector.Y}, {vector.Z})");
Console.WriteLine(vector.Magnitude());

The output is as follows:

(6.5, 4.5, 2)
8.15475321515004

This shows that the call to OffsetBy modified the vector variable as you intended it to.

Note

The OffsetBy method makes our immutable Vector3D struct feel somewhat mutable. This feature is still in its early days, but I suspect I’ll feel much more comfortable writing extension methods with initial in parameters than with ref parameters.

An extension method with an initial in parameter can be called on a read-write variable (as you’ve seen by calling vector.Magnitude()), but an extension method with an initial ref parameter can’t be called on a read-only variable. For example, if you create a read-only alias for vector, you can’t call OffsetBy:

ref readonly var alias = ref vector;
alias.OffsetBy(offset);                   1

  • 1 Error: trying to use a read-only variable as ref

Unlike regular extension methods, restrictions exist about the extended type (the type of the first parameter) for initial ref and in parameters.

13.5.2. Restrictions on ref and in extension methods

Normal extension methods can be declared to extend any type. They can use either a regular type or a type parameter with or without constraints:

static void Method(this string target)
static void Method(this IDisposable target)
static void Method<T>(this T target)
static void Method<T>(this T target) where T : IComparable<T>
static void Method<T>(this T target) where T : struct

In contrast, ref and in extension methods always have to extend value types. In the case of in extension methods, that value type can’t be a type parameter either. These are valid:

static void Method(this ref int target)
static void Method<T>(this ref T target) where T : struct
static void Method<T>(this ref T target) where T : struct, IComparable<T>
static void Method<T>(this ref int target, T other)
static void Method(this in int target)
static void Method(this in Guid target)
static void Method<T>(this in Guid target, T other)

But these are invalid:

static void Method(this ref string target)        1
static void Method<T>(this ref T target)          2
    where T : IComparable<T>                      2
static void Method<T>(this in string target)      3
static void Method<T>(this in T target)           4
    where T : struct                              4

  • 1 Reference type target for ref parameter
  • 2 Type parameter target for ref parameter without struct constraint
  • 3 Reference type target for in parameter
  • 4 Type parameter target for in parameter

Note the difference between in and ref, where a ref parameter can be a type parameter so long as it has the struct constraint. An in extension method can still be generic (as per the final valid example), but the extended type can’t be a type parameter. At the moment, there’s no constraint that can require that T is a readonly struct, which would be required for a generic in parameter to be useful. That may change in future versions of C#.

You may wonder why the extended type is constrained to be a value type at all. There are two primary reasons for this:

  • The feature is designed to avoid expensive copying of value types, so there’s no benefit for reference types.
  • If a ref parameter could be a reference type, it could be set to a null reference within the method. That would disrupt an assumption C# developers and tooling can always make at the moment: that calling x.Method() (where x is a variable of some reference type) can never make x null.

I don’t expect to use ref and in extension methods very much, but they do provide a pleasant consistency to the language.

The features in the remainder of the chapter are somewhat different from the ones you’ve examined so far. Just to recap, so far you’ve looked at these:

  • Ref locals
  • Ref returns
  • Read-only versions of ref locals and ref returns
  • in parameters: read-only versions of ref parameters
  • Read-only structs, which allow in parameters and read-only ref locals and returns to avoid copying
  • Extension methods targeting ref or in parameters

If you started with ref parameters and wondered how to extend the concept further, you might have come up with something similar to this list. We’re now going to move on to ref-like structs, which are related to all of these but also feel like a whole new kind of type.

13.6. Ref-like structs (C# 7.2)

C# 7.2 introduces the notion of a ref-like struct: one that’s intended to exist only on the stack. Just as with custom task types, it’s likely that you’ll never need to declare your own ref-like struct, but I expect C# code written against up-to-date frameworks in the next few years to use the ones built into the framework quite a lot.

First, you’ll look at the basic rules for ref-like structs and then see how they’re used and the framework support for them. I should note that these are a simplified form of the rules; consult the language specification for the gory details. I suspect that relatively few developers will need to know exactly how the compiler enforces the stack safety of ref-like structs, but it’s important to understand the principle of what it’s trying to achieve:

A ref-like struct value must stay on the stack, always.

Let’s start by creating a ref-like struct. The declaration is the same as a normal struct declaration with the addition of the ref modifier:

public ref struct RefLikeStruct
{
          1
}

  • 1 Struct members as normal

13.6.1. Rules for ref-like structs

Rather than say what you can do with it, here are some of the things you can’t do with RefLikeStruct and a brief explanation:

  • You can’t include a RefLikeStruct as a field of any type that isn’t also a ref-like struct. Even a regular struct can easily end up on the heap either via boxing or by being a field in a class. Even within another ref-like struct, you can use RefLikeStruct only as the type of an instance field—never a static field.
  • You can’t box a RefLikeStruct. Boxing is precisely designed to create an object on the heap, which is exactly what you don’t want.
  • You can’t use RefLikeStruct as a type argument (either explicitly or by type inference) for any generic method or type, including as a type argument for a generic ref-like struct type. Generic code can use generic type arguments in all kinds of ways that put values on the heap, such as creating a List<T>.
  • You can’t use RefLikeStruct[] or any similar array type as the operand for the typeof operator.
  • Local variables of type RefLikeStruct can’t be used anywhere the compiler might need to capture them on the heap in a special generated type. That includes the following:

    • Async methods, although this could potentially be relaxed so a variable could be declared and used between await expressions, so long as it was never used across an await expression (with a declaration before the await and a usage after it). Parameters for async methods can’t be ref-like struct types.
    • Iterator blocks, which already appear to have the “only using RefLikeStruct between two yield expressions is okay” rules. Parameters for iterator blocks can’t be ref-like struct types.
    • Any local variable captured by a local method, LINQ query expression, anonymous method, or lambda expression.

Additionally, complicated rules[3] indicate how ref local variables of ref-like types can be used. I suggest trusting the compiler here; if your code fails to compile because of ref-like structs, you’re likely trying to make something available at a point where it will no longer be alive on the stack. With this set of rules keeping values on the stack, you can finally look at using the poster child for ref-like structs: Span<T>.

3

Translation: I’m finding them hard to understand. I understand the general purpose, but the complexity required to prevent bad things from happening is beyond my current level of interest in going over the rules line by line.

13.6.2. Span<T> and stackalloc

There are several ways of accessing chunks of memory in .NET. Arrays are the most common, but ArraySegment<T> and pointers are also used. One large downside of using arrays directly is that the array effectively owns all its memory; an array is never just part of a larger piece of memory. That doesn’t sound too bad until you think of how many method signatures you’ve seen like this:

int ReadData(byte[] buffer, int offset, int length)

This “buffer, offset, length” set of parameters occurs all over the place in .NET, and it’s effectively a code smell suggesting that we haven’t had the right abstraction in place. Span<T> and the related types aim to fix this.

Note

Some uses of Span<T> will work just by adding a reference to the System.Memory NuGet package. Others require framework support. The code presented in this section has been built against .NET Core 2.1. Some listings will build against earlier versions of the framework as well.

Span<T> is a ref-like struct that provides read/write, indexed access to a section of memory just like an array but without any concept of owning that memory. A span is always created from something else (maybe a pointer, maybe an array, even data created directly on the stack). When you use a Span<T>, you don’t need to care where the memory has been allocated. Spans can be sliced: you can create one span as a subsection of another without copying any data. In new versions of the framework, the JIT compiler will be aware of Span<T> and handle it in a heavily optimized manner.

The ref-like nature of Span<T> sounds irrelevant, but it has two significant benefits:

  • It allows a span to refer to memory with a tightly controlled lifecycle, as the span can’t escape from the stack. The code that allocates the memory can pass a span to other code and then free the memory afterward with confidence that there won’t be any spans left to refer to that now-deallocated memory.
  • It allows custom one-time initialization of data in a span without any copying and without the risk of code being able to change the data afterward.

Let’s demonstrate both of these points in a simple way by writing a method to generate a random string. Although Guid.NewGuid often can be used for this purpose, sometimes you may want a more customized approach using a different set of characters and length. The following listing shows the traditional code you might have used in the past.

Listing 13.20. Generating a random string by using a char[]
static string Generate(string alphabet, Random random, int length)
{
    char[] chars = new char[length];
    for (int i = 0; i < length; i++)
    {
        chars[i] = alphabet[random.Next(alphabet.Length)];
    }
    return new string(chars);
}

Here’s an example of calling the method to generate a string of 10 lowercase letters:

string alphabet = "abcdefghijklmnopqrstuvwxyz";
Random random = new Random();
Console.WriteLine(Generate(alphabet, random, 10));

Listing 13.20 performs two heap allocations: one for the char array and one for the string. The data needs to be copied from one place to the other when constructing the string. You can improve this slightly if you know you’ll always be generating reasonably small strings, and if you’re in a position to use unsafe code. In that situation, you can use stackalloc, as shown in the following listing.

Listing 13.21. Generating a random string by using stackalloc and a pointer
unsafe static string Generate(string alphabet, Random random, int length)
{
    char* chars = stackalloc char[length];
    for (int i = 0; i < length; i++)
    {
        chars[i] = alphabet[random.Next(alphabet.Length)];
    }
    return new string(chars);
}

This performs only one heap allocation: the string. The temporary buffer is stack allocated, but you need to use the unsafe modifier because you’re using a pointer. Unsafe code takes me out of my comfort zone; although I’m reasonably confident that this code is okay, I wouldn’t want to do anything much more complicated with pointers. There’s still the copy from the stack allocated buffer to the string, too.

The good news is that Span<T> also supports stackalloc without any need for the unsafe modifier, as shown in the following listing. You don’t need the unsafe modifier because you’re relying on the rules for ref-like structs to keep everything safe.

Listing 13.22. Generating a random string by using stackalloc and a Span<char>
static string Generate(string alphabet, Random random, int length)
{
    Span<char> chars = stackalloc char[length];
    for (int i = 0; i < length; i++)
    {
        chars[i] = alphabet[random.Next(alphabet.Length)];
    }
    return new string(chars);
}

That makes me more confident, but it’s no more efficient; you’re still copying data in a way that feels redundant. You can do better. All you need is this factory method in System.String:

public static string Create<TState>(
    int length, TState state, SpanAction<char, TState> action)

That uses SpanAction<T, TArg>, which is a new delegate with this signature:

delegate void SpanAction<T, in TArg>(Span<T> span, TArg arg);

These two signatures may look a little odd to start with, so let’s unpack what the implementation of Create does. It takes the following steps:

  1. Allocates a string with the requested length
  2. Creates a span that refers to the memory inside the string
  3. Calls the action delegate, passing in whatever state the method was given and the span
  4. Returns the string

The first thing to note is that our delegate is able to write to the content of a string. That sounds like it defies everything you know about the immutability of strings, but the Create method is in control here. Yes, you can write whatever you like to the string, just as you can create a new string with whatever content you want. But by the time the string is returned, the content is effectively baked into the string. You can’t try to cheat by holding onto the Span<char> that’s passed to the delegate, because the compiler makes sure it doesn’t escape the stack.

That still leaves the odd part about the state. Why do you need to pass in state that’s then passed back to our delegate? It’s easiest to show you an example; the following listing uses the Create method to implement our random string generator.

Listing 13.23. Generating a random string with string.Create
static string Generate(string alphabet, Random random, int length) =>
    string.Create(length, (alphabet, random), (span, state) =>
    {
        var alphabet2 = state.alphabet;
        var random2 = state.random;
        for (int i = 0; i < span.Length; i++)
 
        {
            span[i] = alphabet2[random2.Next(alphabet2.Length)];
        }
    });

At first, it looks like a lot of pointless repetition occurs. The second argument to string.Create is (alphabet, random), which puts the alphabet and random parameters into a tuple to act as the state. You then unpack these values from the tuple again in the lambda expression:

var alphabet2 = state.alphabet;
var random2 = state.random;

Why not just capture the parameters in the lambda expression? Using alphabet and random within the lambda expression would compile and behave correctly, so why bother using the extra state parameter?

Remember the point of using spans: you’re trying to reduce heap allocations as well as copying. When a lambda expression captures a parameter or local variable, it has to create an instance of a generated class so that the delegate has access to those variables. The lambda expression in listing 13.23 doesn’t need to capture anything, so the compiler can generate a static method and cache a single delegate instance to use every time Generate is called. All the state is passed via the parameters to string.Create, and because C# 7 tuples are value types, there’s no allocation for that state.

At this point, your simple string generation method is as good as it’s going to get: it requires a single heap allocation and no extra data copying. Your code just writes straight into the string data.

This is just one example of the kind of thing that Span<T> makes possible. Related types exist; ReadOnlySpan<T>, Memory<T>, and ReadOnlyMemory<T> are the most important ones. A full deep-dive into them is beyond the scope of this book.

Importantly, our optimization of the Generate method didn’t need to change its signature at all. It was a pure implementation change isolated from anything else, and that’s what makes me excited. Although passing large structs by reference throughout your codebase would help avoid excessive copying, that’s an invasive optimization. I far prefer optimizations that I can perform in a piecemeal, targeted fashion.

Just as string gains extra methods to make use of spans, so will many other types. We now take it for granted that any I/O-based operation will have an async option available in the framework, and I expect the same to be true for spans over time; wherever they’d be useful, they’ll be available. I expect third-party libraries will offer overloads accepting spans, too.

Stackalloc with initializers (C# 7.3)

While we’re on the subject of stack allocation, C# 7.3 adds one extra twist: initializers. Whereas with previous versions you could use stackalloc only with a size you wanted to allocate, with C# 7.3 you can specify the content of the allocated space as well. This is valid for both pointers and spans:

Span<int> span = stackalloc int[] { 1, 2, 3 };
int* pointer = stackalloc int[] { 4, 5, 6 };

I don’t believe this has any significant efficiency gains over allocating and then manually populating the space, but it’s certainly simpler code to read.

Pattern-based fixed statements (C# 7.3)

As a reminder, the fixed statement is used to obtain a pointer to memory, temporarily preventing the garbage collector from moving that data. Before C# 7.3, this could be used only with arrays, strings, and taking the address of a variable. C# 7.3 allows it to be used with any type that has an accessible method called GetPinnableReference that returns a reference to an unmanaged type. For example, if you have a method returning a ref int, you can use that in a fixed statement like this:

fixed (int* ptr = value)      1
{
                              2
}

  • 1 Calls value.GetPinnableReference
  • 2 Code using the pointer

This isn’t something most developers would normally implement themselves, even within the small proportion of developers who use unsafe code on a regular basis. As you might expect, the types you’re most likely to use this with are Span<T> and ReadOnlySpan<T>, allowing them to interoperate with code that already uses pointers.

13.6.3. IL representation of ref-like structs

Ref-like structs are decorated with an [IsRefLikeAttribute] attribute that is again from the System.Runtime.CompilerServices namespace. If you’re targeting a version of the framework that doesn’t have the attribute available, it’ll be generated in your assembly.

Unlike in parameters, the compiler doesn’t use the modreq modifier to require any tools consuming the type to be aware of it; instead, it also adds an [ObsoleteAttribute] to the type with a fixed message. Any compiler that understands [IsRefLikeAttribute] can ignore the [ObsoleteAttribute] if it has the right text. If the type author wants to make the type obsolete, they just use [ObsoleteAttribute] as normal, and the compiler will treat it as any other obsolete type.

Summary

  • C# 7 adds support for pass-by-reference semantics in many areas of the language.
  • C# 7.0 included only the first few features; use C# 7.3 for the full range.
  • The primary aim of the ref-related features is for performance. If you’re not writing performance-critical code, you may not need to use many of these features.
  • Ref-like structs allow the introduction of new abstractions in the framework, starting with Span<T>. These abstractions aren’t just for high-performance scenarios; they’re likely to affect a large proportion of .NET developers over time.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset