Chapter 2. What Can We Do Already?

Some of the code and concepts discussed in this chapter may seem trivial to some, but bear with me. I don’t want to introduce too much too soon. More experienced developers might like to skip ahead to Chapter 3, in which I talk about the more recent developments in C# for functional programmers, or Chapter 4, where I demonstrate some novel ways to use features you might already be familiar with to achieve some functional features.

This chapter presents the FP features that are possible in just about every C# codebase in use in production today. I’m going to assume at least .NET Framework 3.5, and with some minor alterations, all the code samples provided in this chapter will work in that environment. Even if you work in a more recent version of .NET but are unfamiliar with FP, I still recommend reading this chapter, as it should give you a decent starting point in programming with the functional paradigm.

Those of you familiar with functional code, who just want to see what’s available in the latest versions of .NET, might benefit from skipping ahead to the next chapter.

Getting Started

FP is easy, really it is! Despite what many people think, it’s easier to learn than OOP. FP has fewer new concepts to learn.

If you don’t believe me, try explaining polymorphism to a nontechnical member of your family! Those of us who are comfortable with object orientation have often been doing it so long that we’ve forgotten how hard it may have been to get our heads around it at the beginning.

FP isn’t hard to understand at all, just different. I’ve spoken to plenty of students coming out of higher education who embrace it with enthusiasm. So, if they can manage it…​

The myth does seem to persist, though, that getting into FP requires learning a whole load of stuff first. But what if I told you that if you’ve been doing C# for any length of time, you’ve already most likely been writing functional code for a while? In the next section, I’ll show you what I mean.

Writing Your First Functional Code

Before we start with some functional code, let’s look at a bit of nonfunctional. This is a style you most likely learned somewhere near the beginning of your C# career.

A Nonfunctional Film Query

In this quick, made-up example, we’re getting a list of all films from an imaginary data store and creating a new list, copied from the first, but only those items in the Action genre:1

public IEnumerable<Film> GetFilmsByGenre(string genre)
{
    var allFilms = GetAllFilms();
    var chosenFilms = new List<Film>();

    foreach (var f in allFilms)
    {
        if (f.Genre == genre)
        {
            chosenFilms.Add((f));
        }
    }
    return chosenFilms;
}

var actionFilms = GetFilmsByGenre("Action");

What’s wrong with this code? At the very least, it’s not elegant. We’ve written a lot to do something fairly simple.

We’ve also instantiated a new object that’s going to stay in scope for as long as this function is running. If there’s nothing more to the whole function than this, we don’t have much to worry about. But what if this were just a short excerpt from a long function? In that instance, the allFilms and actionFilms variables would both remain in scope, and thus in memory all that time, even if they aren’t in use.

Copies of all the data may not necessarily be held within the item that’s being replicated, depending on whether it’s a class, a struct, or something else. At the very least, though, a duplicate set of references is being held unnecessarily in memory for as long as both items are in scope. That’s still more memory than we strictly need to hold.

We’re also forcing the order of operations. We’ve specified when to loop, when to add—both where and when each step should be carried out. If any intermediate steps in the data transformations needed to be carried out, we’d be specifying them too and holding them in yet more potentially long-life variables.

We could solve a few problems with a yield return like this:

public IEnumerable<Film> GetFilmsByGenre(string genre)
{
    var allFilms = GetAllFilms();

    foreach (var f in allFilms)
    {
        if (f.Genre == genre)
        {
            yield return f;
        }
    }
}

var actionFilms = GetFilmsByGenre("Action");

This hasn’t done more than shave a few lines off, however.

What if there were a more optimal order of operations than the one we’ve decided on? What if a later bit of code actually meant that we don’t end up returning the contents of actionFilms? We’d have done the work unnecessarily.

This is the eternal problem of procedural code. Everything has to be spelled out. One of the major aims of FP is to move away from all that. Stop being so specific about every little thing. Relax a little and embrace declarative code.

A Functional Film Query

So what would the preceding code sample look like written in a functional style? I hope many of you might already guess at how we could rewrite it:

public IEnumerable<Film> GetFilmsByGenre(
    IEnumerable<Film> source,
    string genre) =>
    source.Where(x => x.Genre == genre);

var allFilms = GetAllFilms();
var actionFilms = GetFilmsByGenre(allFilms, "Action");

You might at this point be saying, “Isn’t that just LINQ?” And yes, yes, it is. I’ll let you in on a little secret—LINQ follows the functional paradigm.

Just quickly, for anyone not yet familiar with the awesomeness of LINQ, it’s a library that’s been part of C# since the early days. LINQ provides a rich set of functions for filtering, altering, and extending collections of data. Functions like Select(), Where(), and All() are from LINQ and are commonly used around the world.

Think back for a moment to the list of features of FP and see how many LINQ implements:

Higher-order functions

The lambda expressions passed to LINQ functions are all functions being passed in as parameter variables.

Immutability

LINQ doesn’t change the source array; it returns a new enumerable based on the old one.

Expressions instead of statements

We’ve eliminated the use of a foreach and an if.

Referential transparency

The preceding lambda expression does conform to referential transparency (i.e., no side effects), though nothing enforces that. We could easily have referenced a string variable outside the lambda. By requiring the source data to be passed in as a parameter, we’re also making it easier to test without requiring the creation and setup of a mock of some kind to represent the data store connection. Everything the function needs is provided by its own parameters.

The iteration could well be done by recursion too, for all I know, but I have no idea what the source code of the Where() function looks like. In the absence of evidence to the contrary, I’m just going to go on believing that it does.

This tiny one-line code sample is a perfect example of the functional approach in many ways. We’re passing around functions to perform operations against a collection of data, creating a new collection based on the old one. What we’ve ended up with by following the functional paradigm is something more concise, easier to read, and therefore far easier to maintain.

Focusing on Results-Oriented Programming

A common feature of functional code is that it focuses much more heavily on the end result rather than on the process of getting there. An entirely procedural approach to building a complex object would be to instantiate it empty at the beginning of the code block, and then fill in each property as we go along:

var sourceData = GetSourceData();
var obj = new ComplexCustomObject();

obj.PropertyA = sourceData.Something + sourceData.SomethingElse;
obj.PropertyB = sourceData.Ping * sourceData.Pong;

if(sourceData.AlternateTuesday)
{
    obj.PropertyC = sourceData.CaptainKirk;
    obj.PropertyD = sourceData.MrSpock;
}
else
{
    obj.PropertyC = sourceData.CaptainPicard;
    obj.PropertyD = sourceData.NumberOne;
}

return obj;

The problem with this approach is that it’s open to abuse. This silly little imaginary code block is short and easy to maintain. What often happens with production code, however, is that the code can end up becoming incredibly long, with multiple data sources that all have to be preprocessed, joined, and reprocessed. We can end up with long blocks of if statements nested in if statements, to the point that the code starts resembling the shape of a family tree.

For each nested if statement, the complexity effectively doubles. This is especially true if multiple return statements are scattered around the codebase. The risk increases of inadvertently ending up with a null or other unexpected value if the increasingly complex codebase isn’t thought through in detail. FP discourages structures like this and isn’t prone to this level of complexity, or to the potential unexpected consequences.

Our preceding code sample defines PropertyC and PropertyD in two places. The code is not too hard to work with here, but I’ve seen examples that define the same property in around half a dozen places across multiple classes and subclasses.2 I don’t know whether you’ve ever had to work with code like this, but it has happened to me an awful lot.

These sorts of large, unwieldy codebases get harder to work with over time. With each addition, the speed at which the developers can do the work goes down, and the business leaders can end up getting frustrated because they don’t understand why their “simple” update is taking so long.

Functional code should ideally be written into small, concise blocks, focusing entirely on the end product. The expressions it prefers are modeled on the individual steps required to solve a mathematical problem, so you really want to write functional code like small formulas, each precisely defining a value and all the variables that make it up. There shouldn’t be any hunting up and down the codebase to work out where a value comes from.

Here’s an example:

function ComplexCustomObject MakeObject(SourceData source) =>
    new ComplexCustomObject
    {
       PropertyA = source.Something + source.SomethingElse,
       PropertyB = source.Ping * source.Pong,
       PropertyC = source.AlternateTuesday
            ? source.CaptainKirk
            : source.CaptainPicard,
       PropertyD = source.AlternateTuesday
            ? source.MrSpock,
            : source.NumberOne
    };

I know we’re repeating the AlternateTuesday flag, but now all the variables that determine a returned property are defined in a single place. This approach makes the code much simpler to work with in the future.

If a property is so complicated that it will need either multiple lines of code or a series of LINQ operations that take up a lot of space, I’d create a break-out function to contain that complex logic. I’d still have my central, result-based return at the heart of it all, though.

Understanding Enumerables

I sometimes think enumerables are one of the most underused and least understood features of C#. An enumerable is the most abstract representation of a collection of data—so abstract that it doesn’t contain any data itself, but just a description held in memory of how to go about getting the data. An enumerable doesn’t even know the number of items available until it iterates through everything—all it knows is where the current item is and how to iterate to the next.

This process is called lazy evaluation, or deferred execution. Being lazy is a good thing in development. Don’t let anyone tell you otherwise.3

In fact, we can even write our own customized behavior for an enumerable. Under the surface is an object called an enumerator. By interacting with that, we can either get the current item or iterate on to the next. We can’t use the enumerable or the enumerator to determine the length of the list, and the iteration works in only a single direction.

Have a look at the following code sample. First, a set of simple logging functions pop a message in a List of strings:

private IList<string> c = new List<string>();


public int DoSomethingOne(int x)
{
    c.Add(DateTime.Now + " - DoSomethingOne (" + x + ")");
    return x;
}

public int DoSomethingTwo(int x)
{
    c.Add(DateTime.Now + " - DoSomethingTwo (" + x + ")");
    return x;
}

public int DoSomethingThree(int x)
{
    c.Add(DateTime.Now + " - DoSomethingThree (" + x + ")");
    return x;
}

Then a bit of code calls each of those DoSomething() functions in turn with different data:

var input = new[]
{
    75,
    22,
    36
};

var output = input.Select(x => DoSomethingOne(x))
    .Select(x => DoSomethingTwo(x))
    .Select(x => DoSomethingThree(x))
    .ToArray();

What do you think the order of operations is? You might think that the runtime would take the original input array, apply DoSomethingOne() to all three elements to create a second array, then again with all three elements into DoSomethingTwo(), and so on.

If we were to examine the content of that List of strings, we’d find something like this:

18/08/1982 11:24:00 - DoSomethingOne(75)
18/08/1982 11:24:01 - DoSomethingTwo(75)
18/08/1982 11:24:02 - DoSomethingThree(75)
18/08/1982 11:24:03 - DoSomethingOne(22)
18/08/1982 11:24:04 - DoSomethingTwo(22)
18/08/1982 11:24:05 - DoSomethingThree(22)
18/08/1982 11:24:06 - DoSomethingOne(36)
18/08/1982 11:24:07 - DoSomethingTwo(36)
18/08/1982 11:24:08 - DoSomethingThree(36)

It’s almost the exact same as we might get if we were running this through a for or foreach loop, but we’ve effectively handed over control of the order of operations to the runtime. We’re not concerned with the nitty-gritty of temporary holding variables, what goes where and when. Instead we’re just describing the operations we want and expecting a single answer back at the end.

The resulting list of strings might not always look exactly like this; it depends on what the code that interacts with the enumerable (via LINQ or a foreach) looks like. But the intent always remains that enumerables actually produce their data only at the precise moment it’s needed. It doesn’t matter where they’re defined; it’s when they’re used that makes a difference.

By using enumerables instead of solid arrays, we’ve managed to implement some of the behaviors we need to write declarative code.

Incredibly, the preceding logfile would still look the same if we were to rewrite the code like this:

var input = new[]
{
    1,
    2,
    3
};

var temp1 = input.Select(x => DoSomethingOne(x));
var temp2 = input.Select(x => DoSomethingTwo(x));
var finalAnswer = input.Select(x => DoSomethingThree(x));

temp1, temp2, and finalAnswer are all enumerables, and none of them will contain any data until iterated.

Here’s an experiment to try. Write some code like this sample. Don’t copy it exactly, maybe something simpler like a series of selects amending an integer value somehow. Put in a break point and move the operation pointer on until the final answer has been passed, then hover over finalAnswer in Visual Studio. You’ll most likely find that it can’t display any data to you, even though the line has been passed. That’s because the enumerable hasn’t performed any of the operations yet.

Things would change if we did something like this:

var input = new[]
{
    1,
    2,
    3
};

var temp1 = input.Select(x => DoSomethingOne(x)).ToArray();
var temp2 = input.Select(x => DoSomethingTwo(x)).ToArray();
var finalAnswer = input.Select(x => DoSomethingThree(x)).ToArray();

Because we’re specifically now calling ToArray() to force an enumeration of each intermediate step, then we really will call DoSomethingOne() for each item in input before moving on to the next stop.

The logfile will look something like this now:

18/08/1982 11:24:00 - DoSomethingOne(75)
18/08/1982 11:24:01 - DoSomethingOne(22)
18/08/1982 11:24:02 - DoSomethingOne(36)
18/08/1982 11:24:03 - DoSomethingTwo(75)
18/08/1982 11:24:04 - DoSomethingTwo(22)
18/08/1982 11:24:05 - DoSomethingTwo(36)
18/08/1982 11:24:06 - DoSomethingThree(75)
18/08/1982 11:24:07 - DoSomethingThree(22)
18/08/1982 11:24:08 - DoSomethingThree(36)

For this reason, I nearly always advocate for waiting as long as possible before using ToArray() or ToList(),4 because this way we can leave the operations unperformed for as long as possible. And potentially, the operations may never be performed if later logic prevents the enumeration from occurring at all.

Exceptions exist for performance or for avoiding multiple iterations. While the enumerable remains un-enumerated, it doesn’t have any data, but the operation itself remains in memory. If you pile too many enumerables on top of one another—especially if you start performing recursive operations—you might find that you fill up far too much memory and performance takes a hit, and possibly even end up with a stack overflow.

Preferring Expressions to Statements

In the rest of this chapter, I’m going to give more examples of how you can use LINQ more effectively to avoid the need for statements like if, where, and for, or to mutate state (i.e., change the value of a variable). In some situations, it won’t be possible to replace these statements with out-of-the-box C#. But that’s what the rest of this book is for.

The Humble Select

If you’ve read this far in the book, you’re most likely aware of Select() statements and how to use them. However, most people I speak to don’t seem to be aware of a few of their features, and they’re all things that can be used to make code a little more functional.

The first feature is something I’ve already shown in the previous section: we can chain them. We can either create a series of Select() function calls—literally one after the other, or in a single code line—or we can store the results of each Select() in a different local variable. Functionally, these two approaches are identical. It doesn’t even matter if we call ToArray() after each one. So long as we don’t modify any resulting arrays or the object contained within them, we’re following the functional paradigm.

The important approach to get away from is the imperative practice of defining a List, looping through the source objects with a foreach, and then adding each new item to the List. This is long-winded, harder to read, and honestly quite tedious. Why do things the hard way? Just use a nice, simple Select() statement.

Iterator value is required

So what if we’re Selecting an enumerable into a new form and need the iterator as part of the transformation? Say we have something like this:

var films = GetAllFilmsForDirector("Jean-Pierre Jeunet")
    .OrderByDescending(x => x.BoxOfficeRevenue);

var i = 1;

Console.WriteLine("The films of visionary French director");
Console.WriteLine("Jean-Pierre Jeunet in descending order");
Console.WriteLine("of financial success are as follows:");

foreach (var f in films)
{
    Console.WriteLine($"{i} - {f.Title}");
    i++;
 }

Console.WriteLine("But his best by far is Amelie");

We can use a feature of Select() statements that surprisingly few people know about: they have an override that allows access to the iterator as part of the Select(). All we have to do is provide a lambda expression with two parameters, the second being an integer that represents the index position of the current item.

This is how our functional version of the code looks:

var films = GetAllFilmsForDirector("Jean-Pierre Jeunet")
    .OrderByDescending(x => x.BoxOfficeRevenue);

Console.WriteLine("The films of visionary French director");
Console.WriteLine("Jean-Pierre Jeunet in descending order");
Console.WriteLine("of financial success are as follows:");

var formattedFilms = films.Select((x, i) => $"{i} - {x.Title}");
Console.WriteLine(string.Join(Environment.NewLine, formattedFilms));

Console.WriteLine("But his best by far is Amelie");

Using these techniques, nearly no circumstance could exist where you need to use foreach loop with a List. Thanks to C#’s support for the functional paradigm, declarative methods are nearly always available to solve problems.

The two techniques for getting the i index position variable are a great example of imperative versus declarative code. The imperative, object-oriented approach has the developer manually creating a variable to hold the value of i, and also explicitly set the place for the variable to be incremented. The declarative code isn’t concerned with where the variable is defined or with how each index value is determined.

Note

Notice that we use string.Join to link the strings together. This is not only another one of those hidden gems of the C# language, but also an example of aggregation—that is, converting a list of things into a single thing. That’s what we’ll walk through in the next few sections.

No starting array

The last trick for getting the value of i for each iteration is great if there’s an array—or collection of another kind—available in the first place. What if there isn’t an array? What if we need to arbitrarily iterate for a set number of times?

For these somewhat rare situations, we need a good, old-fashioned for loop instead of a foreach. How do we create an array from nothing? Our two best friends in this case are two static methods: Enumerable.Range and Enumerable.Repeat.

Range creates an array from a starting integer value and requires us to tell it the number of elements our array should have. It then creates an array of integers based on those specifications. Here’s an example:

var a = Enumerable.Range(8, 5);
var s = string.Join(", ", a);
// s = "8, 9, 10, 11, 12"
// That's 5 elements, each 1 higher than the last,
// starting with 8.

Having whipped up an array, we can then apply LINQ operations to get our final result. Let’s imagine I am preparing a description of the 9 times table for one of my daughters:5

var nineTimesTable = Enumerable.Range(1,10)
    .Select(x => x + " times 9 is " + (x * 9));

var message = string.Join("
", nineTimesTable);

Here’s another example: what if we want to get all the values from a grid, where x and y values are required to get each value. Imagine there’s a grid repository that we can use to get values.

Imagining that the grid is a 5 × 5, this is how we’d get every value:

var coords = Enumerable.Range(1, 5)
    .SelectMany(x => Enumerable.Range(1, 5)
        .Select(y => (X: x, Y: y))
);

var values = coords.Select(x => this.gridRepo.GetVal(x.Item1,x.Item2);

The first line is generating an array of integers with the values [1, 2, 3, 4, 5]. We then use another Select() to convert each of these integers into another array using another call to Enumerable.Range. We now have an array of five elements, each of which is itself an array of five integers. Using a Select() on that nested array, we convert each of those subelements into a tuple that takes one value from the parent array (x) and one from the subarray (y). SelectMany() is used to flatten the whole thing out to a simple list of all the possible coordinates, which would look something like this: (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2)…​and so on.

Values can be obtained by Selecting this array of coordinates into a set of calls to the repository’s GetVal() function, passing in the values of X and Y from the tuple of coordinates created on the previous line.

In another situation, we might need the same starting value in each case, but need to transform it in different ways, depending on the position within the array. This is where Enumerable.Repeat comes in. Enumerable.Repeat creates an array—referenced as an enumerable—of the requested size where every element has the same user-supplied value.

We can’t use Enumerable.Range to count backward. Say we want to do the previous example, but start at (5,5) and move backward to (1,1). Here’s an example of how to do it:

var gridCoords = Enumerable.Repeat(5, 5).Select((x, i) => x - i)
    .SelectMany(x => Enumerable.Repeat(5, 5)
        .Select((y, i) => (x, y - i))
);

var values = coords.Select(x => this.gridRepo.GetVal(x.Item1,x.Item2);

This looks a lot more complicated but isn’t really. What we’ve done is to swap out the Enumerable.Range call for a two-step operation.

First, a call to Enumerable.Repeat is repeating the integer value of 5 five times. This results in an array like this: [5, 5, 5, 5, 5].

We then select using the overloaded version of Select() that includes the value of i, and deduct that i value from the current value in the array. Therefore, in the first iteration, the return value is the current value from the array (5) minus the previous value (0 from the first iteration, in this case); this gives back simply 5. In the next iteration, the value of i is 1, so 5 – 1 means 4 is returned. And so on.

At the end, we get back an array that looks something like this: (5, 5), (5, 4), (5, 3), (5, 2), (5, 1), (4, 5), (4, 4)…​etc.

We can take this further still, but for this chapter we’re sticking to the relatively simple cases, ones that don’t require hacking around with C#. This is all out-of-the-box functionality that anyone can use right away.

Many to One: The Subtle Art of Aggregation

We’ve looked at loops for converting one thing into another, X items in → X new items out. Now I’d like to cover another use case for loops: reducing many items into a single value.

This could be making a total count; calculating averages, means, or other statistical data; or other more complex aggregations. In procedural code, we’d have a loop and a state-tracking value, and inside the loop we’d update the state constantly, based on each item from our array. Here’s a simple example of what I’m talking about:

var total = 0;
foreach(var x in listOfIntegers)
{
    total += x;
}

LINQ has a built-in method for doing this:

var total = listOfIntegers.Sum();

We really shouldn’t ever need to do this sort of operation longhand. Even if we’re creating the sum of a particular property from an array of objects, LINQ still has us covered:

var films = GetAllFilmsForDirector("Alfred Hitchcock");
var totalRevenue = films.Sum(x => x.BoxOfficeRevenue);

Another function for calculating means in the same manner is called Average(). There’s nothing for calculating median, so far as I’m aware.

We could calculate the median with a quick bit of functional-style code, however. It would look like this:

var numbers = new []
{
    83,
    27,
    11,
    98
};

bool IsEvenNumber(int number) => number % 2 == 0;

var sortedList = numbers.OrderBy(x => x).ToArray();
var sortedListCount = sortedList.Count();

var median = IsEvenNumber(sortedList.Count())
    ? sortedList.Skip((sortedListCount/2)-1).Take(2).Average()
    : sortedList.Skip((sortedListCount) / 2).First();

// median = 55.

More complex aggregations are required sometimes. What if we want, for example, a sum of two values from an enumerable of complex objects? Procedural code might look like this:

var films = GetAllFilmsForDirector("Christopher Nolan");

var totalBudget = 0.0M;
var totalRevenue = 0.0M;

foreach (var f in films)
{
    totalBudget += f.Budget;
    totalRevenue += f.BoxOfficeRevenue;
}

We could use two separate Sum() function calls, but then we’d be iterating twice through the enumerable, hardly an efficient way to get our information. Instead, we can use another strangely little-known feature of LINQ: the Aggregate() function. This consists of the following components:

Seed

A starting value for the final value.

Aggregator function

This has two parameters: the current item from the enumerable we’re aggregating down, and the current running total.

The seed doesn’t have to be a primitive type, like an integer; it can just as easily be a complex object. To rewrite the preceding code sample in a functional style, however, we just need a simple tuple:

var films = GetAllFilmsForDirector("Christopher Nolan");

var (totalBudget, totalRevenue) = films.Aggregate(
    (Budget: 0.0M, Revenue: 0.0M),
    (runningTotals, x) => (
            runningTotals.Budget + x.Budget,
            runningTotals.Revenue + x.BoxOfficeRevenue
        )
);

In the right place, Aggregate() is an incredibly powerful feature of C#, and one worth taking the time to explore and understand properly. It’s also an example of another concept important to FP: recursion.

Customized Iteration Behavior

Recursion sits behind a lot of functional methods of iteration. For the benefit of anyone who doesn’t know, recursion is a programming technique that involves a function that calls itself repeatedly until a certain condition is met.

Recursion is a powerful technique but has some limitations to bear in mind in C#. The most important two are as follows:

  • If developed improperly, code using recursion can lead to infinite loops, which will run until the user terminates the application or all available space on the stack is consumed. As Treguard, the legendary dungeon master of the popular British fantasy RPG game show Knightmare would put it: “Oooh, nasty.”6

  • In C#, recursion tends to consume a lot of memory compared to other forms of iteration. There are ways around this, but that’s a topic for another chapter.

I have a lot more to say about recursion, and we’ll get to that shortly, but for the purposes of this chapter, I’ll give the simplest example I can think of.

Let’s say we want to iterate through an enumerable but don’t know for how long. We have a list of delta values for an integer (i.e., the number to add or subtract each time) and want to find the number of steps required to get from the starting value (whatever that might be) to 0.

We could quite easily get the final value with an Aggregate() call, but we don’t want the final value. We’re interested in all the intermediate values and want to stop prematurely at some point mid-iteration. This is a simple arithmetic operation, but if complex objects were involved in a real-world scenario, performance might significantly improve because of the ability to terminate the process early.

In procedural code, we’d probably write something like this:

var deltas = GetDeltas().ToArray();
var startingValue = 10;
var currentValue = startingValue;
var i = -1;

foreach(var d in deltas)
{
    if(currentValue == 0)
    {
        break;
    }
    i++;
    currentValue = startingValue + d;

}

return i;

In this example, we’re returning -1 to say that the starting value is already the one we’re looking for; otherwise, we’re returning the zero-based index of the array that resulted in 0 being reached.

This is how we’d do it recursively:

var deltas = GetDeltas().ToArray();

int GetFirstPositionWithValueZero(int currentValue, int i = -1) =>
    currentValue == 0
        ? i
        : GetFirstPositionWithValueZero(currentValue + deltas[i], i + 1);

return GetFirstPositionWithValueZero(10);

This is functional now, but it’s not really ideal. Nested functions have their place, but I don’t personally find this code as readable as it could be. The code is delightfully recursive, but it could be made clearer.

The other major problem is that this won’t scale up well if the list of deltas is large. I’ll show you what I mean.

Let’s imagine that the deltas have only three values: 2, -12, and 9. We’d expect the answer to come back as 1, because the second position (i.e., index = 1) of the array results in 0 (10 + 2 – 12). We would also expect that the 9 will never be evaluated. That’s the efficiency saving we’re looking for from our code here.

What is actually happening with the recursive code, though?

First, it calls GetFirstPositionWithValueZero() with a current value of 10 (i.e., the starting value), and i is given the default of -1.

The body of the function is a ternary if statement. If 0 has been reached, the function returns i; otherwise, the code calls the function again but with updated values for currentValue and i. This is what’ll happen with the first delta (i.e., i = 0, current​Value = 2), so GetFirstPositionWithValueZero() is called again with currentValue now updated to 12, and i as 0.

The new value is not 0, so the second call to GetFirstPositionWithValueZero() will call itself again, this time with the current value updated with deltas[1] and i incremented to 1. Because deltas[1] is -12, the third call results in a 0, which means that i can simply be returned.

Here’s the problem, though. The third call gets an answer, but the first two calls are still open in memory and stored on the stack. The third call returns 1, which is passed up a level to the second call to GetFirstPositionWithValueZero(), which now also returns 1, and so on—until finally the original first call to GetFirstPositionWith​Va⁠lueZero() returns the 1.

If you want to see that a little graphically, imagine it looking something like this:

GetFirstPositionWithValueZero(10, -1)
    GetFirstPositionWithValueZero(12, 0)
        GetFirstPositionWithValueZero(0, 1)
        return 1;
    return 1;
return 1;

That’s fine with three items in our array, but what if there are hundreds? Recursion, as I’ve said, is a powerful tool, but it comes with a cost in C#. Purer functional languages (including F#) have a feature called tail call optimized recursion that allows the use of recursion without this memory usage problem.

Tail recursion is an important concept, and one I’m going to return to in Chapter 9, so I’m not going to dwell on it in any further detail here. As it stands, out-of-the-box C# doesn’t permit tail recursion, even though it’s available in the .NET common language runtime (CLR). We can try a few tricks to make it available to us, but they’re a little too complex for this chapter, so I’ll talk about them in Chapter 9 instead (be there, or be square). For now, consider recursion as it’s described here, and keep in mind that you might want to be careful about where and when you use it.

Making Your Code Immutable

There’s more to FP in C# than just LINQ. Another important feature I’d like to discuss is immutability (i.e., a variable may not change value once declared). To what extent is immutability possible in C#?

First, C# 8 and upward provide some newer developments with regards to immutability. See Chapter 3 for that. For this chapter, I’m restricting myself to what is true of just about any version of .NET.

To begin, let’s consider this little C# snippet:

public class ClassA
{
    public string PropA { get; set; }
    public int PropB { get; set; }
    public DateTime PropC { get; set; }
    public IEnumerable<double> PropD { get; set; }
    public IList<string> PropE { get; set; }
}

Is this immutable? It very much is not. Any of those properties can be replaced with new values via the setter. The IList also provides a set of functions that allows its underlying array to be added to or removed from.

We could make the setters private, meaning we’d have to instantiate the class via a detailed constructor:

public class ClassA
{
    public string PropA { get; private set; }
    public int PropB { get; private set; }
    public DateTime PropC { get; private set; }
    public IEnumerable<double> PropD { get; private set; }
    public IList<string> PropE { get; private set; }

    public ClassA(
        string propA,
        int propB, DateTime propC,
        IEnumerable<double> propD,
        IList<string> propE)
    {
        this.PropA = propA;
        this.PropB = propB;
        this.PropC = propC;
        this.PropD = propD;
        this.PropE = propE;
    }

}

Is it immutable now? No, honestly it’s not. It’s true that we can’t outright replace any of the properties with new objects outside ClassA, which is great. The properties can be replaced inside the class, but the developer can ensure that no such code is ever added. We should hopefully have some sort of code review system to ensure that as well.

PropA and PropC are fine; strings and DateTime are both immutable in C#. The int value of PropB is fine too; ints don’t have anything we can change except their value. Several problems still remain, however.

PropE is a List, which can still have values added, removed, and replaced, even though we can’t replace the entire object. If we didn’t need to hold a mutable copy of PropE, we could easily replace it with an IEnumerable or IReadOnlyList.

The IEnumerable<double> value of PropD seems fine at first glance, but what if it was passed to the constructor as a List<double>, which is still referenced by that type in the outside world? It would still be possible to alter its contents that way.

There’s also the possibility of introducing something like this:

public class ClassA
{
    public string PropA { get; private set; }
    public int PropB { get; private set; }
    public DateTime PropC { get; private set; }
    public IEnumerable<double> PropD { get; private set; }
    public IList<string> PropE { get; private set; }
    public SubClassB PropF { get; private set; }

    public ClassA(
        string propA,
        int propB, DateTime propC,
        IEnumerable<double> propD,
        IList<string> propE,
        SubClassB propF)
    {
        this.PropA = propA;
        this.PropB = propB;
        this.PropC = propC;
        this.PropD = propD;
        this.PropE = propE;
        this.PropF = propF
    }

}

All properties of PropF are also potentially going to be mutable, unless this same structure with private setters is followed there too.

What about classes from outside our codebase? What about Microsoft classes or those from a third-party NuGet package? There’s no way to enforce immutability.

Unfortunately, C# doesn’t provide any way to enforce universal immutability, not even in its most recent versions. Having a native C# method of ensuring immutability by default would be lovely, but there isn’t one—and isn’t ever likely to be for reasons of backward compatibility. My own solution is that when coding, I simply pretend that immutability exists in the project and never change any object. Nothing in C# provides any level of enforcement whatsoever, so you’d simply have to make a decision for yourself, or within your team, to act as if it does.

Putting It All Together: A Complete Functional Flow

I’ve talked a lot about simple techniques we can use to make our code more functional right away. Now, I’d like to show a complete, if minute, application written to demonstrate an end-to-end functional process.

We’re going to write a simple CSV parser. In this example, we want to read in the complete text of a CSV file containing data about the first few series of Doctor Who.7 We want to read the data, parse it into a plain old C# object (POCO, a class containing only data and no logic), and then aggregate it into a report that counts the number of episodes, and the number of episodes known to be lost for each season.8 I’m simplifying CSV parsing for the purposes of this example; don’t worry about quotes around string fields, commas in field values, or any values requiring additional parsing. Third-party libraries are available for all of that! I’m just proving a point.

This complete process represents a nice, typical functional flow. Take a single item, break it into a list, apply list operations, and then aggregate back down into a single value again.

Table 2-1 shows the structure of our CSV file.

Table 2-1. CSV file structure
IndexField nameDescription

[0]

Season Number

Integer value between 1 and 39. I’m running the risk of dating this book now, but there are 39 seasons to date.

[1]

Story Name

A string field I don’t care about.

[2]

Writer

Ditto.

[3]

Director

Ditto.

[4]

Number of Episodes

Until 1989, all Doctor Who stories were multipart serials comprising 1 to 14 episodes.

[5]

Number of Missing Episodes

The number of episodes of this serial not known to exist. Any nonzero number is too many for me, but such is life.

We want to end up with a report that has just these fields:

  • Season Number

  • Total Episodes

  • Total Missing Episodes

  • Percentage Missing

Let’s get rolling with some code:

var text = File.ReadAllText(filePath);

// Split the string containing the whole contents of the
// file into an array, where each line of the original file
// (i.e., each record) is an array element
var splitLines = text.Split(Environment.NewLine);

// Split each line into an array of fields, splitting the
// source array by the ',' character.  Convert to Array
// for each access.
var splitLinesAndFields = splitLines.Select(x => x.Split(",").ToArray());

// Convert each string array of fields into a data class.
// parse any nonstring fields into the correct type.
// Not strictly necessary, based on the final aggregation
// that follows, but I believe in leaving behind easily
// extendible code
var parsedData = splitLinesAndFields.Select(x => new Story
{
    SeasonNumber = int.Parse(x[0]),
    StoryName = x[1],
    Writer = x[2],
    Director = x[3],
    NumberOfEpisodes = int.Parse(x[4]),
    NumberOfMissingEpisodes = int.Parse(x[5])
});

// group by SeasonNumber, this gives us an array of Story
// objects for each season of the TV series
var groupedBySeason = parsedData.GroupBy(x => SeasonNumber);

// Use a 3-field tuple as the aggregate state:
// S (int) = the season number.  Not required for
//                the aggregation, but we need a way
//                to pin each set of aggregated totals
//                to a season
// NumEps (int) = the total number of episodes in all
//                serials in the season
// NumMisEps (int) = The total number of missing episodes
//                from the season
var aggregatedReportLines = groupedBySeason.Select(x =>
    x.Aggregate((S: x.Key, NumEps: 0, NumMisEps: 0),
        (acc, val) => (acc.S,
            acc.NumEps + val.NumberOfEpisodes,
            acc.NumMisEps + val.NumberOfMissingEpisodes)
    )
);

// convert the tuple-based results set to a proper
// object and add in the calculated field PercentageMissing
// not strictly necessary, but makes for more readable
// and extendible code
var report = aggregatedReportLines.Select(x => new ReportLine
{
    SeasonNumber = x.S,
    NumberOfEpisodes = x.NumEps,
    NumberOfMIssingEpisodes = x.NumMisEps,
    PercentageMissing = (x.NumMisEps/x.NumEps)*100
});

// format the report lines to a list of strings
var reportTextLines = report.Select(x =>
    $"{x.SeasonNumber}	 {x.NumberOfEpisodes}	" +
    $"{x.NumberofMissingEpisodes}	{x.PercentageMissing}");

// join the lines into a large single string with New Line
// characters between each line
var reportBody = string.Join(Environment.NewLine, reportTextLines);
var reportHeader = "Season	No Episodes	No MissingEps	Percentage Missing";

// the final report consists of the header, a new line, then the reportbody
var finalReport = $"{reportHeader}{Environment.NewLine}{reportTextLines}";

In case you’re curious, the results would look something like this (the characters are tabs, which make the output a bit more readable):

Season   No Episodes   No Missing Eps   Percentage Missing,
1         42               9                  21.4
2         39               2                  5.1
3         45               28                 62.2
4         43               33                 76.7
5         40               18                 45
6         44               8                  18.2
7         25               0                  0
8         25               0                  0
9         26               0                  0

...

We could have made the code sample more concise and written just about all of this together in one long, continuous fluent expression like this:

var reportTextLines = File.ReadAllText(filePath)
        .Split(Environment.NewLine)
        .Select(x => x.Split(",").ToArray())
        .GroupBy(x => x[0])
        .Select(x =>
    x.Aggregate((S: x.Key, NumEps: 0, NumMisEps: 0),
        (acc, val) => (acc.S,
            acc.NumEps + int.Parse(va[4]),
            acc.NumMisEps + int.Parse(val[5]))
    )
)
.Select(x => $"{x.S}, {x.NumEps},{x.NumMisEps},{(x.NumMisEps/x.NumEps)*100}");


var reportBody = string.Join(Environment.NewLine, reportTextLines);
var reportHeader = "Season,No Episodes,No MissingEps,Percentage Missing";

var finalReport = $"{reportHeader}{Environment.NewLine}{reportHeader}";

Nothing is wrong with this sort of approach, but I like splitting it into individual lines for a couple of reasons:

  • The variable names provide some insight into what our code is doing. We’re sort of semi-enforcing a form of code commenting.

  • It’s possible to inspect the intermediate variables, to see what’s in them at each step. This makes debugging easier, because as I said in the previous chapter—it’s like being able to look back on your work in a mathematics problem, to see at which step you went wrong.

The two approaches don’t have any ultimate functional difference, nothing that would be noticed by the end user, so which style you adopt is more a matter of personal taste. Write in whatever way seems best to you. Do try to keep the code readable and easy for everyone to follow.

Taking It Further: Develop Your Functional Skills

Here’s a challenge for you. If some or all of the techniques described to you here are new, go off and have fun with them for a bit. Challenge yourself to writing code with the following rules:

  • Treat all variables as immutable: do not change any variable value after it’s set. Basically, treat everything as if it were a constant.

  • None of the following statements are permitted: if, for, foreach, while. An if statement is acceptable only in a ternary expression—i.e., the single-line expression in the style someBoolean ? valueOne : valueTwo.

  • Where possible, write as many functions as small, concise arrow functions (aka lambda expressions).

Either do this as part of your production code or go out and look for a code challenge site. Try Advent of Code or Project Euler, something you can get your teeth into.

If you don’t want to go through the bother of creating an entire solution for these exercises in Visual Studio, LINQPad can always provide a quick and easy way to rattle off some C# code.

After you have the hang of this, you’ll be ready to move on to the next step. I hope you’re having fun so far!

Summary

In this chapter, we looked at a variety of simple LINQ-based techniques for writing functional-style code immediately in any C# codebase using at least .NET Framework 3.5, because these features are evergreen and have been in place for all those years in every subsequent version of .NET without needing to be updated or replaced.

We discussed the more advanced features of Select() statements, some of the less well-known features of LINQ, and methods for aggregating and recursion.

In the next chapter, we’ll look at some of the most recent developments in C# that can be used in more up-to-date codebases.

1 I’m more of an SF (or sci-fi, if you prefer) fan, truth be told.

2 And in one example, a couple of definitions were also outside the codebase in database stored procedures.

3 Except your employer. They pay your bills. They hopefully send you birthday cards once a year too, if they’re nice.

4 As a functional programmer and a believer in exposing the most abstract interface possible, I never use ToList(). I always use ToArray(), even if ToList() is ever so slightly faster.

5 No, Sophie. It’s not good enough to just use your fingers!

6 Check out the man himself on YouTube.

7 For those of you unacquainted, this is a British SF series that has been running on and off since 1963. It is, in my opinion, the greatest TV series ever made. I’m not taking any arguments on that.

8 Sad to say, the BBC junked many episodes of the series in the 1970s. If you have any of those, please do hand them back.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset