Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4. Data Structures and LINQ

Overview

In this chapter, you will learn about the main collections and their primary usage in C#. You will then see how Language-Integrated Query (LINQ) can be used to query collections in memory using code that is efficient and succinct. By the end of this chapter, you will be well versed in using LINQ for operations such as sorting, filtering, and aggregating data.

Introduction

Throughout the previous chapters, you have used variables that refer to a single value, such as the string and double system types, system class instances, and your own class instances. .NET has a variety of data structures that can be used to store multiple values. These structures are generally referred to as collections. This chapter builds on this concept by introducing collection types from the System.Collections.Generic namespace.

You can create variables that can store multiple object references using collection types. Such collections include lists that resize to accommodate the number of elements and dictionaries that offer access to the elements using a unique key as an identifier. For example, you may need to store a list of international dialing codes using the codes as unique identifiers. In this case, you need to be certain that the same dialing code is not added to the collection twice.

These collections are instantiated like any other classes and are used extensively in most applications. Choosing the correct type of collection depends primarily on how you intend to add items and the way you would like to access such items once they are in a collection. The commonly used collection types include List, Set, and HashSet, which you will cover in detail shortly.

LINQ is a technology that offers an expressive and concise syntax for querying objects. Much of the complexities around filtering, sorting, and grouping objects can be removed using the SQL-like language, or if you prefer, a set of extension methods that can be chained together to produce collections that can be enumerated with ease.

Data Structures

.NET provides various types of in-built data structures, such as the Array, List, and Dictionary types. At the heart of all data structures are the IEnumerable and ICollection interfaces. Classes that implement these interfaces offer a way to enumerate through the individual elements and to manipulate their items. There is rarely a need to create your own classes that derive directly from these interfaces, as all the required functionality is covered by the built-in collection types, but it is worth knowing the key properties as they are heavily used throughout .NET.

The generic version of each collection type requires a single type parameter, which defines the type of elements that can be added to a collection, using the standard <T> syntax of the generic types.

The IEnumerable interface has a single property, that is, IEnumerator<T> GetEnumerator(). This property returns a type that provides methods that allow the caller to iterate through the elements in the collection. You do not need to call the GetEnumerator() method directly, as the compiler will call it whenever you use a foreach statement, such as foreach(var book in books). You will learn more about using this in the upcoming sections.

The ICollection interface has the following properties:

int Count { get; }: Returns the number of items in the collection.
bool IsReadOnly { get; }: Indicates if the collection is read-only. Certain collections can be marked as read-only to prevent callers from adding, deleting, or moving elements in the collection. C# will not prevent you from amending the properties of individual items in a read-only collection.
void Add(T item): Adds an item of type <T> to the collection.
void Clear(): Removes all items from the collection.
bool Contains(T item): Returns true if the collection contains a certain value. Depending on the type of item in the collection, this can be value-equality, where an object is similarly based on its members, or reference-equality, where the object points to the same memory location.
void CopyTo(T[] array, int arrayIndex): Copies each element from the collection into the target array, starting with the first element at a specified index position. This can be useful if you need to skip a specific number of elements from the beginning of the collection.
bool Remove(T item): Removes the specified item from the collection. If there are multiple occurrences of the instance, then only the first instance is removed. This returns true if an item was successfully removed.

IEnumerable and ICollection are interfaces that all collections implement:

Figure 4.1: ICollection and IEnumerable class diagram

There are further interfaces that some collections implement, depending on how elements are accessed within a collection.

The IList interface is used for collections that can be accessed by index position, starting from zero. So, for a list that contains two items, Red and Blue, the element at index zero is Red and the element at index one is Blue.

Figure 4.2: IList class diagram

The IList interface has the following properties:

T this[int index] { get; set; }: Gets or sets the element at the specified index position.
int Add(T item): Adds the specified item and returns the index position of that item in the list.
void Clear(): Removes all items from the list.
bool Contains(T item): Returns true if the list contains the specified item.
int IndexOf(T item): Returns the index position of the item, or -1 if not found.
void Insert(int index, T item): Inserts the item at the index position specified.
void Remove(T item): Removes the item if it exists within the list.
void RemoveAt(int index): Removes the item at the specified index position.

You have now seen the primary interfaces common to collections. So, now you will now take a look at the main collection types that are available and how they are used.

Lists

The List<T> type is one of the most extensively used collections in C#. It is used where you have a collection of items and want to control the order of items using their index position. It implements the IList interface, which allows items to be inserted, accessed, or removed using an index position:

Figure 4.3: List class diagram

Lists have the following behavior:

Items can be inserted at any position within the collection. Any trailing items will have their index position incremented.
Items can be removed, either by index or value. This will also cause trailing items to have their index position updated.
Items can be set using their index value.
Items can be added to the end of the collection.
Items can be duplicated within the collection.
The position of items can be sorted using the various Sort methods.

One example of a list might be the tabs in a web browser application. Typically, a user may want to drag a browser tab amongst other tabs, open new tabs at the end, or close tabs anywhere in a list of tabs. The code to control these actions can be implemented using List.

Internally, List maintains an array to store its objects. This can be efficient when adding items to the end, but it may be inefficient when inserting items, particularly near the beginning of the list, as the index position of items will need to be recalculated.

The following example shows how the generic List class is used. The code uses the List<string> type parameter, which allows string types to be added to the list. Attempts to add any other type will result in a compiler error. This will show the various commonly used methods of the List class.

Create a new folder called Chapter04 in your source code folder.
Change to the Chapter04 folder and create a new console app, called Chapter04, using the following .NET command:
sourceChapter04>dotnet new console -o Chapter04
The template "Console Application" was created successfully.
Delete the Class1.cs file.
Add a new folder called Examples.
Add a new class file called ListExamples.cs.
Add the System.Collections.Generic namespace to access the List<T> class and declare a new variable called colors:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
    class ListExamples
    {
        public static void Main()
        {
            var colors = new List<string> {"red", "green"};
            colors.Add("orange");

The code declares the new colors variable, which can store multiple color names as strings. Here, the collection initialization syntax is used so that red and green are added as part of the initialization of the variable. The Add method is called, adding orange to the list.

Similarly, AddRange adds yellow and pink to the end of the list:
colors.AddRange(new [] {"yellow", "pink"});
At this point, there are five colors in the list, with red at index position 0 and green at position 1. You can verify this using the following code:
Console.WriteLine($"Colors has {colors.Count} items");
Console.WriteLine($"Item at index 1 is {colors[1]}");

Running the code produces the following output:

Colors has 5 items

Item at index 1 is green

Using Insert, blue can be inserted at the beginning of the list, that is, at index 0, as shown in the following code. Note that this moves red from index 0 to 1 and all other colors will have their index incremented by one:
            Console.WriteLine("Inserting blue at 0");
            colors.Insert(0, "blue");
            Console.WriteLine($"Item at index 1 is now {colors[1]}");

You should see the following output on running this code:

Inserting blue at 0

Item at index 1 is now red

Using foreach you can iterate through the strings in the list, writing each string to the console, as follows:
            Console.WriteLine("foreach");
            foreach (var color in colors)
                Console.Write($"{color}|");
            Console.WriteLine();

You should get the following output:

foreach

Now, add the following code to reverse the array. Here, each color string is converted into an array of char type using ToCharArray:
            Console.WriteLine("ForEach Action:");
            colors.ForEach(color =>
            {
                var characters = color.ToCharArray();
                Array.Reverse(characters);
                var reversed = new string(characters);
                Console.Write($"{reversed}|");
            });
            Console.WriteLine();

This does not affect any of the values in the colors List, as characters refers to a different object. Note that foreach iterates through each string, whereas ForEach defines an Action delegate to be invoked using each string (recall that in Chapter 3, Delegates, Events, and Lambdas, you saw how lambda statements can be used to create Action delegates).

Running the code leads to this output:
ForEach Action:
eulb|der|neerg|egnaro|wolley|knip|
In the next snippet, the List constructor accepts a source collection. This creates a new list containing a copy of the colors strings in this case, which is sorted using the default Sort implementation:
var backupColors = new List<string>(colors);
backupColors.Sort();

The string type uses value-type semantics, which means that the backupColors list is populated with a copy of each source string value. Updating a string in one list will not affect the other list. Conversely, classes are defined as reference-types so passing a list of class instances to the constructor will still create a new list, with independent element indexes, but each element will point to the same shared reference in memory rather than an independent copy.

In the following snippet, prior to removing all colors (using colors.Clear), each value is written to the console (the list will be repopulated shortly):
            Console.WriteLine("Foreach before clearing:");
            foreach (var color in colors)
                Console.Write($"{color}|");
            Console.WriteLine();
            colors.Clear();
            Console.WriteLine($"Colors has {colors.Count} items");

Running the code produces this output:

Foreach before clearing:

Colors has 0 items

Then, AddRange is used again, to add the full list of colors back to the colors list, using the sorted backupColors items as a source:
            colors.AddRange(backupColors);
            Console.WriteLine("foreach after addrange (sorted items):");
            foreach (var color in colors)
                Console.Write($"{color}|");
            Console.WriteLine();

You should see the following output:

foreach after addrange (sorted items):

The ConvertAll method is passed a delegate that can be used to return a new list of any type:
            var indexes = colors.ConvertAll(color =>                      $"{color} is at index {colors.IndexOf(color)}");
            Console.WriteLine("ConvertAll:");
            Console.WriteLine(string.Join(Environment.NewLine, indexes));

Here, a new List<string> is returned with each item being formatted using its value and the item's index in the list. As expected, running the code produces this output:

ConvertAll:

blue is at index 0

green is at index 1

orange is at index 2

pink is at index 3

red is at index 4

yellow is at index 5

In the next snippet, two Contains() methods are used to show string value-equality in action:
Console.WriteLine($"Contains RED: {colors.Contains("RED")}");
Console.WriteLine($"Contains red: {colors.Contains("red")}");

Note that the uppercase RED is not in the list, but the lowercase red will be. Running the code produces this output:

Contains RED: False

Contains red: True

Now, add the following snippet:
var existsInk = colors.Exists(color => color.EndsWith("ink"));
Console.WriteLine($"Exists *ink: {existsInk}");

Here, the Exists method is passed a Predicate delegate, which returns True or False if the test condition is met. Predicate is an inbuilt delegate, which returns a boolean value. In this case, True will be returned if any item exists where the string value ends with the letters ink (pink, for example).

You should see the following output:

Exists *ink: True

You know there is already a red color, but it will be interesting to see what happens if you insert red again, twice, at the very beginning of the list:
            Console.WriteLine("Inserting reds");
            colors.InsertRange(0, new [] {"red", "red"});
            foreach (var color in colors)
                Console.Write($"{color}|");
            Console.WriteLine();

You will get the following output:

Inserting reds

This shows that it is possible to insert the same item more than once into a list.

The next snippet shows you how to use the FindAll method. FindAll is similar to the Exists method, in that it is passed a Predicate condition. All items that match that rule will be returned. Add the following code:
var allReds = colors.FindAll(color => color == "red");
Console.WriteLine($"Found {allReds.Count} red");

You should get an output as follows. As expected, there are three red items returned:

Found 3 red

Finishing the example, the Remove method is used to remove the first red from the list. There are still two reds left. You can use FindLastIndex to get the index of the last red item:
            colors.Remove("red");
            var lastRedIndex = colors.FindLastIndex(color => color == "red");
            Console.WriteLine($"Last red found at index {lastRedIndex}");
            Console.ReadLine();
        }
    }
}

Running the code produces this output:

Last red found at index 5

Note

You can find the code used for this example at https://packt.link/dLbK6.

With the knowledge of how the generic List class is used, it is time for you to work on an exercise.

Exercise 4.01: Maintaining Order within a List

At the beginning of the chapter, web browser tabs were described as an ideal example of lists. In this exercise, you will put this idea into action, and create a class that controls the navigation of the tabs within an app that mimics a web browser.

For this, you will create a Tab class and a TabController app that allows new tabs to be opened and existing tabs to be closed or moved. The following steps will help you complete this exercise:

In VSCode, select your Chapter04 project.
Add a new folder called Exercises.
Inside the Exercises folder, add a folder called Exercise01 and add a file called Exercise01.cs.
Open Exercise01.cs and define a Tab class with a string URL constructor parameter as follows:
using System;
using System.Collections;
using System.Collections.Generic;
namespace Chapter04.Exercises.Exercise01
{
    public class Tab
    {
        public Tab()
        {}
        public Tab(string url) => (Url) = (url);
        public string Url { get; set; }
        public override string ToString() => Url;
    }

Here, the ToString method has been overridden to return the current URL to help when logging details to the console.

Create the TabController class as follows:
    public class TabController : IEnumerable<Tab>
    {
        private readonly List<Tab> _tabs = new();

The TabController class contains a List of tabs. Notice how the class inherits from the IEnumerable interface. This interface is used so that the class provides a way to iterate through its items, using a foreach statement. You will provide methods to open, move, and close tabs, which will directly control the order of items in the _tabs list, in the next steps. Note that you could have exposed the _tabs list directly to callers, but it would be preferable to limit access to the tabs through your own methods. Hence, it is defined as a readonly list.

Next, define the OpenNew method, which adds a new tab to the end of the list:
        public Tab OpenNew(string url)
        {
            var tab = new Tab(url);
            _tabs.Add(tab);
            Console.WriteLine($"OpenNew {tab}");
            return tab;
        }
Define another method, Close, which removes the tab from the list if it exists. Add the following code for this:
        public void Close(Tab tab)
        {
            if (_tabs.Remove(tab))
            {
                Console.WriteLine($"Removed {tab}");
            }
        }
To move a tab to the start of the list, add the following code:
        public void MoveToStart(Tab tab)
        {
            if (_tabs.Remove(tab))
            {
                _tabs.Insert(0, tab);
                Console.WriteLine($"Moved {tab} to start");
            }

Here, MoveToStart will try to remove the tab and then insert it at index 0.

Similarly, add the following code to move a tab to the end:
        public void MoveToEnd(Tab tab)
        {
            if (_tabs.Remove(tab))
            {
                _tabs.Add(tab);
                Console.WriteLine($"Moved {tab} to end. Index={_tabs.IndexOf(tab)}");
            }
        }

Here, calling MoveToEnd removes the tab first, and then adds it to the end, logging the new index position to the console.

Finally, the IEnumerable interface requires that you implement two methods, IEnumerator<Tab> GetEnumerator() and IEnumerable.GetEnumerator(). These allow the caller to iterate through a collection using either a generic of type Tab or using the second method to iterate via an object-based type. The second method is a throwback to earlier versions of C# but is needed for compatibility.

For the actual results for both methods, you can use the GetEnumerator method of the _tab list, as that contains the tabs in list form. Add the following code to do so:
        public IEnumerator<Tab> GetEnumerator() => _tabs.GetEnumerator();
        IEnumerator IEnumerable.GetEnumerator() => _tabs.GetEnumerator();
    }
You can now create a console app that tests the controller's behavior. Start by opening three new tabs and logging the tab details via LogTabs (this will be defined shortly):
    static class Program
    {
        public static void Main()
        {
            var controller = new TabController();
            Console.WriteLine("Opening tabs...");
            var packt = controller.OpenNew("packtpub.com");
            var msoft = controller.OpenNew("microsoft.com");
            var amazon = controller.OpenNew("amazon.com");
            controller.LogTabs();
Now, move amazon to the start and packt to the end, and log the tab details:
            Console.WriteLine("Moving...");
            controller.MoveToStart(amazon);
            controller.MoveToEnd(packt);
            controller.LogTabs();
Close the msoft tab and log details once more:
            Console.WriteLine("Closing tab...");
            controller.Close(msoft);
            controller.LogTabs();
            Console.ReadLine();
        }
Finally, add an extension method that helps log the URL of each tab in TabController. Define this as an extension method for IEnumerable<Tab>, rather than TabController, as you simply need an iterator to iterate through the tabs using a foreach loop.
Use PadRight to left-align each URL, as follows:
        private static void LogTabs(this IEnumerable<Tab> tabs)
        {
            Console.Write("TABS: |");
            foreach(var tab in tabs)
                Console.Write($"{tab.Url.PadRight(15)}|");
            Console.WriteLine();
        }
   }
}
Running the code produces the following output:
Opening tabs...
OpenNew packtpub.com
OpenNew microsoft.com
OpenNew amazon.com
TABS: |packtpub.com   |microsoft.com |amazon.com     |
Moving...
Moved amazon.com to start
Moved packtpub.com to end. Index=2
TABS: |amazon.com     |microsoft.com |packtpub.com   |
Closing tab...
Removed microsoft.com
TABS: |amazon.com     |packtpub.com   |
Note
Sometimes Visual Studio might report a non-nullable property error the first time you execute the program. This is a helpful reminder that you are attempting to use a string value that may have a null value at runtime.

The three tabs are opened. amazon.com and packtpub.com are then moved before microsoft.com is finally closed and removed from the tab list.

Note

You can find the code used for this exercise at https://packt.link/iUcIs.

In this exercise, you have seen how lists can be used to store multiple items of the same type while maintaining the order of items. The next section covers the Queue and Stack classes, which allow items to be added and removed in a predefined sequence.

Queues

The Queue class provides a first-in, first-out mechanism. Items are added to the end of the queue using the Enqueue method and are removed from the front of the queue using the Dequeue method. Items in the queue cannot be accessed via an index element.

Queues are typically used when you need a workflow that ensures items are processed in the order in which they are added to the queue. A typical example might be a busy online ticketing system selling a limited number of concert tickets to customers. To ensure fairness, customers are added to a queuing system as soon as they log on. The system would then dequeue each customer and process each order, in full, either until all tickets have been sold or the customer queue is empty.

The following example creates a queue containing five CustomerOrder records. When it is time to process the orders, each order is dequeued using the TryDequeue method, which will return true until all orders have been processed. The customer orders are processed in the order that they were added. If the number of tickets requested is more than or equal to the tickets remaining, then the customer is shown a success message. An apology message is shown if the number of tickets remaining is less than the requested amount.

Figure 4.4: The Queue's Enqueue() and Dequeue() workflow

Perform the following steps to complete this example:

In the Examples folder of your Chapter04 source folder, add a new class called QueueExamples.cs and edit it as follows:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
    class QueueExamples
    {
        record CustomerOrder (string Name, int TicketsRequested)
        {}
        public static void Main()
        {
            var ticketsAvailable = 10;
            var customers = new Queue<CustomerOrder>();
Add five orders to the queue using the Enqueue method as follows:
            customers.Enqueue(new CustomerOrder("Dave", 2));
            customers.Enqueue(new CustomerOrder("Siva", 4));
            customers.Enqueue(new CustomerOrder("Julien", 3));
            customers.Enqueue(new CustomerOrder("Kane", 2));
            customers.Enqueue(new CustomerOrder("Ann", 1));
Now, use a while loop that repeats until TryDequeue returns false, meaning all current orders have been processed:
            // Start processing orders...
            while(customers.TryDequeue(out CustomerOrder nextOrder))
            {
                if (nextOrder.TicketsRequested <= ticketsAvailable)
                {
                    ticketsAvailable -= nextOrder.TicketsRequested;
                    Console.WriteLine($"Congratulations {nextOrder.Name}, you've purchased {nextOrder.TicketsRequested} ticket(s)");
                }
                else
                {
                    Console.WriteLine($"Sorry {nextOrder.Name}, cannot fulfil {nextOrder.TicketsRequested} ticket(s)");
                }
            }
            Console.WriteLine($"Finished. Available={ticketsAvailable}");
            Console.ReadLine();
        }
    }
}
Running the example code produces the following output:
Congratulations Dave, you've purchased 2 ticket(s)
Congratulations Siva, you've purchased 4 ticket(s)
Congratulations Julien, you've purchased 3 ticket(s)
Sorry Kane, cannot fulfil 2 ticket(s)
Congratulations Ann, you've purchased 1 ticket(s)
Finished. Available=0
Note
The first time you run this program, Visual Studio might show a non-nullable type error. This error is a reminder that you are using a variable that could be a null value.

The output shows that Dave requested two tickets. As there are two or more tickets available, he was successful. Both Siva and Julien were also successful, but by the time Kane placed his order of two tickets, there was only one ticket available, so he was shown the apology message. Finally, Ann requested one ticket and was successful in her order.

Note

You can find the code used for this example at https://packt.link/Zb524.

Stacks

The Stack class provides the opposite mechanism to the Queue class; items are processed in last-in, first-out order. As with the Queue class, you cannot access elements via their index position. Items are added to the stack using the Push method and removed using the Pop method.

An application's Undo menu can be implemented using a stack. For example, in a word processor, as the user edits a document, an Action delegate is created, which can reverse the most recent change whenever the user presses Ctrl + Z. The most recent action is popped off the stack and the change is undone. This allows multiple steps to be undone.

Figure 4.5: The Stack's Push() and Pop() workflow

The following example shows this in practice.

You will start by creating an UndoStack class that supports multiple undo operations. The caller decides what action should run each time the Undo request is called.

A typical undoable operation would be storing a copy of text prior to the user adding a word. Another undoable operation would be storing a copy of the current font prior to a new font being applied. You can start by adding the following code, where you are creating the UndoStack class and defining a readonly Stack of Action delegates, named _undoStack:

In your Chapter04Examples folder, add a new class called StackExamples.cs and edit it as follows:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
    class UndoStack
    {
        private readonly Stack<Action> _undoStack = new Stack<Action>();
When the user has done something, the same action can be undone. So push an undoable Action to the front of _undoStack:
        public void Do(Action action)
        {
            _undoStack.Push(action);
        }
The Undo method checks to see if there are any items to undo, then calls Pop to remove the most recent Action and invoke it, thus undoing the change that was just applied. The code for this can be added as follows:
        public void Undo()
        {
            if (_undoStack.Count > 0)
            {
                var undo = _undoStack.Pop();
                undo?.Invoke();
            }
        }
    }
Now, you can create a TextEditor class that allows edits to be added to UndoStack. This constructor is passed UndoStack as there could be multiple editors that need to add various Action delegates to the stack:
    class TextEditor
    {
        private readonly UndoStack _undoStack;
        public TextEditor(UndoStack undoStack)
        {
            _undoStack = undoStack;
        }
        public string Text {get; private set; }
Next, add the EditText command, which takes a copy of the previousText value and creates an Action delegate that can revert the text to its previous value, if invoked:
        public void EditText(string newText)
        {
            var previousText = Text;
            _undoStack.Do( () =>
            {
                Text = previousText;
                Console.Write($"Undo:'{newText}'".PadRight(40));
                Console.WriteLine($"Text='{Text}'");
            });
Now, the newText value should be appended to the Text property, using the += operator. The details for this are logged to the console, using PadRight to improve the format:
            Text += newText;
            Console.Write($"Edit:'{newText}'".PadRight(40));
            Console.WriteLine($"Text='{Text}'");
        }
    }
Finally, it is time to create a console app that tests TextEditor and UndoStack. Four edits are initially made, followed by two undo operations, and finally two more text edits:
    class StackExamples
    {

        public static void Main()
        {
            var undoStack = new UndoStack();
            var editor = new TextEditor(undoStack);
            editor.EditText("One day, ");
            editor.EditText("in a ");
            editor.EditText("city ");
            editor.EditText("near by ");
            undoStack.Undo(); // remove 'near by'
            undoStack.Undo(); // remove 'city'
            editor.EditText("land ");
            editor.EditText("far far away ");
            Console.ReadLine();
        }
    }
}
Running the console app produces the following output:
Edit:'One day, '                        Text='One day, '
Edit:'in a '                            Text='One day, in a '
Edit:'city '                            Text='One day, in a city '
Edit:'near by '                         Text='One day, in a city near by '
Undo:'near by '                         Text='One day, in a city '
Undo:'city '                            Text='One day, in a '
Edit:'land '                            Text='One day, in a land '
Edit:'far far away '                    Text='One day, in a land far far away '
Note
Visual Studio may show non-nullable property error the first time the code is executed. This is because Visual Studio notices that the Text property can be a null value at runtime so offers a suggestion to improve the code.

The left-hand output shows the text edits and undoes operations as they are applied and the resulting Text value on the right-hand side. The two Undo calls result in near by and city being removed from the Text value, before land and far far away are finally added to the Text value.

Note

You can find the code used for this example at https://packt.link/tLVyf.

HashSets

The HashSet class provides mathematical set operations with collections of objects in an efficient and highly performant manner. HashSet does not allow duplicate elements and items are not stored in any particular order. Using the HashSet class is ideal for high-performance operations, such as needing to quickly find where two collections of objects overlap.

Typically, HashSet is used with the following operations:

public void UnionWith(IEnumerable<T> other): Produces a set union. This modifies HashSet to include the items present in the current HashSet instance, the other collection, or both.
public void IntersectWith(IEnumerable<T> other): Produces a set intersect. This modifies HashSet to include items present in the current HashSet instance and the other collection.
public void ExceptWith(IEnumerable<T> other): Produces a set subtraction. This removes items from the HashSet that are present in the current HashSet instance and the other collection.

HashSet is useful when you need to include or exclude certain elements from collections. As an example, consider that an agent manages various celebrities and has been asked to find three sets of stars:

Those that can act or sing.
Those that can act and sing.
Those that can act only (no singers allowed).

In the following snippet, a list of actors' and singers' names is created:

In your Chapter04Examples folder, add a new class called HashSetExamples.cs and edit it as follows:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
    class HashSetExamples
    {
        public static void Main()
        {
            var actors = new List<string> {"Harrison Ford", "Will Smith",
                                           "Sigourney Weaver"};
            var singers = new List<string> {"Will Smith", "Adele"};
Now, create a new HashSet instance that initially contains singers only and then use UnionWith to modify the set to contain a distinct set of those that can act or sing:
            var actingOrSinging = new HashSet<string>(singers);
            actingOrSinging.UnionWith(actors);
            Console.WriteLine($"Acting or Singing: {string.Join(", ",
                              actingOrSinging)}");
For those that can act and sing, start with a HashSet instance of singers, and modify the HashSet instance using IntersectWith to contain a distinct list of those that are in both collections:
            var actingAndSinging = new HashSet<string>(singers);
            actingAndSinging.IntersectWith(actors);
            Console.WriteLine($"Acting and Singing: {string.Join(", ",
                              actingAndSinging)}");
Finally, for those that can act only, start with the actor collection, and use ExceptWith to remove those from the HashSet instance that can also sing:
            var actingOnly = new HashSet<string>(actors);
            actingOnly.ExceptWith(singers);
            Console.WriteLine($"Acting Only: {string.Join(", ", actingOnly)}");
            Console.ReadLine();
        }
    }
}
Running the console app produces the following output:
Acting or Singing: Will Smith, Adele, Harrison Ford, Sigourney Weaver
Acting and Singing: Will Smith
Acting Only: Harrison Ford, Sigourney Weaver

From the output, you can see that out of the given list of actors and singers, only Will Smith can act and sing.

Note

You can find the code used for this example at https://packt.link/ZdNbS.

Dictionaries

Another commonly used collection type is the generic Dictionary<TK, TV>. This allows multiple items to be added, but a unique key is needed to identify an item instance.

Dictionaries are commonly used to look up values using known keys. The key and value type parameters can be of any type. A value can exist in a Dictionary more than once, provided that its key is unique. Attempting to add a key that already exists will result in a runtime exception being thrown.

A common example of a Dictionary might be a registry of known countries that are keyed by their ISO country code. A customer service application may load customer details from a database and then use the ISO code to look up the customer's country from the country list, rather than having the extra overhead of creating a new country instance for each customer.

Note

You can find more information on standard ISO country codes at https://www.iso.org/iso-3166-country-codes.html.

The main methods used in the Dictionary class are as follows:

public TValue this[TKey key] {get; set;}: Gets or sets a value associated with the key. An exception is thrown if the key does not exist.
Dictionary<TKey, TValue>.KeyCollection Keys { get; }: Returns a KeyCollection dictionary instance that contains all keys.
Dictionary<TKey, TValue>.ValueCollection Values { get; }: Returns a ValueCollection dictionary instance that contains all values.
public int Count { get; }: Returns the number of elements in the Dictionary.
void Add(TKey key, TValue value): Adds the key and associated value. If the key already exists, an exception is thrown.
void Clear(): Clears all keys and values from the Dictionary.
bool ContainsKey(TKey key): Returns true if the specified key exists.
bool ContainsValue(TValue value): Returns true if the specified value exists.
bool Remove(TKey key): Removes a value with the associated key.
bool TryAdd(TKey key, TValue value): Attempts to add the key and value. If the key already exists, an exception is "not" thrown. Returns true if the value was added.
bool TryGetValue(TKey key, out TValue value): Gets the value associated with the key, if it is available. Returns true if it was found.

The following code shows how a Dictionary can be used to add and navigate Country records:

In your Chapter04Examples folder, add a new class called DictionaryExamples.cs.
Start by defining a Country record, which is passed a Name parameter:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
    public record Country(string Name)
    {}
    class DictionaryExamples
    {
        public static void Main()
        {
Use the Dictionary initialization syntax to create a Dictionary with five countries, as follows:
            var countries = new Dictionary<string, Country>
            {
                {"AFG", new Country("Afghanistan")},
                {"ALB", new Country("Albania")},
                {"DZA", new Country("Algeria")},
                {"ASM", new Country("American Samoa")},
                {"AND", new Country("Andorra")}
            };
In the next code snippet, Dictionary implements the IEnumerable interface, which allows you to retrieve a key-value pair representing the key and value items in the Dictionary:
            Console.WriteLine("Enumerate foreach KeyValuePair");
            foreach (var kvp in countries)
            {
                Console.WriteLine($" {kvp.Key} = {kvp.Value.Name}");
            }
Running the example code produces the following output. By iterating through each item in countries, you can see the five country codes and their names:
Enumerate foreach KeyValuePair
        AFG = Afghanistan
        ALB = Albania
        DZA = Algeria
        ASM = American Samoa
        AND = Andorra
There is an entry with the AFG key, so using the set indexer passing in AFG as a key allows a new Country record to be set that replaces the previous item with the AGF key. You can add the following code for this:
            Console.WriteLine("set indexor AFG to new value");
            countries["AFG"] = new Country("AFGHANISTAN");
            Console.WriteLine($"get indexor AFG: {countries["AFG"].Name}");
When you run the code, adding a key for AFG allows you to get a value using that key:
set indexor AFG to new value
get indexor AFG: AFGHANISTAN
ContainsKey AGO: False
ContainsKey and: False
Key comparisons are case-sensitive with string keys, so AGO is present but and is not as the corresponding country (Andorra) is defined with the uppercase AND key. You can add the following code to check this:
Console.WriteLine($"ContainsKey {"AGO"}: {countries.ContainsKey("AGO")}");
Console.WriteLine($"ContainsKey {"and"}: {countries.ContainsKey("and")}"); // Case sensitive
Using Add to add a new entry will throw an exception if the key already exists. This can be seen by adding the following code:
            var anguilla = new Country("Anguilla");
            Console.WriteLine($"Add {anguilla}...");
            countries.Add("AIA", anguilla);
            try
            {
                var anguillaCopy = new Country("Anguilla");
                Console.WriteLine($"Adding {anguillaCopy}...");
                countries.Add("AIA", anguillaCopy);
            }
            catch (Exception e)
            {
                Console.WriteLine($"Caught {e.Message}");
            }
Conversely, TryAdd does not throw an exception if you attempt to add a duplicate key. There already exists an entry with the AIA key, so using TryAdd simply returns a false value rather than throwing an exception:
var addedAIA = countries.TryAdd("AIA", new Country("Anguilla"));
Console.WriteLine($"TryAdd AIA: {addedAIA}");
As the following output shows, adding Anguilla once using the AIA key is valid but attempting to add it again using the AIA key results in an exception being caught:
Add Country { Name = Anguilla }...
Adding Country { Name = Anguilla }...
Caught An item with the same key has already been added. Key: AIA
TryAdd AIA: False
TryGetValue, as the name suggests, allows you to try to get a value by key. You pass in a key that may be missing from the Dictionary. Requesting an object whose key is missing from the Dictionary will ensure that an exception is not thrown. This is useful if you are unsure whether a value has been added for the specified key:
            var tryGet = countries.TryGetValue("ALB", out Country albania1);
            Console.WriteLine($"TryGetValue for ALB: {albania1}                              Result={tryGet}");
            countries.TryGetValue("alb", out Country albania2);
            Console.WriteLine($"TryGetValue for ALB: {albania2}");
        }
    }
}
You should see the following output upon running this code:
TryGetValue for ALB: Country { Name = Albania } Result=True
TryGetValue for ALB:
Note
Visual Studio might report the following warning: Warning CS8600: Converting null literal or possible null value to non-nullable type. This is a reminder from Visual Studio that a variable may have a null value at runtime.

You have seen how the Dictionary class is used to ensure that only unique identities are associated with values. Even if you do not know which keys are in the Dictionary until runtime, you can use the TryGetValue and TryAdd methods to prevent runtime exceptions.

Note

You can find the code used for this example at https://packt.link/vzHUb.

In this example, a string key was used for the Dictionary. However, any type can be used as a key. You will often find that an integer value is used as a key when source data is retrieved from relational databases, as integers can often be more efficient in memory than strings. Now it is time to use this feature through an exercise.

Exercise 4.02: Using a Dictionary to Count the Words in a Sentence

You have been asked to create a console app that asks the user to enter a sentence. The console should then split the input into individual words (using a space character as a word delimiter) and count the number of times that each word occurs. If possible, simple forms of punctuation should be removed from the output, and you are to ignore capitalized words so that, for example, Apple and apple both appear as a single word.

This is an ideal use of a Dictionary. The Dictionary will use a string as the key (a unique entry for each word) with an int value to count the words. You will use string.Split() to split a sentence into words, and char.IsPunctuation to remove any trailing punctuation marks.

Perform the following steps to do so:

In your Chapter04Exercises folder, create a new folder called Exercise02.
Inside the Exercise02 folder, add a new class called Program.cs.
Start by defining a new class called WordCounter. This can be marked as static so that it can be used without needing to create an instance:
using System;
using System.Collections.Generic;
namespace Chapter04.Exercises.Exercise02
{
static class WordCounter
{
Define a static method called Process:
        public static IEnumerable<KeyValuePair<string, int>> Process(            string phrase)
        {
            var wordCounts = new Dictionary<string, int>();

This is passed a phrase and returns IEnumerable<KeyValuePair>, which allows the caller to enumerate through a Dictionary of results. After this definition, the Dictionary of wordCounts is keyed using a string (each word found) and an int (the number of times that a word occurs).

You are to ignore the case of words with capital letters, so convert the string into its lowercase equivalent before using the string.Split method to split the phrase.
Then you can use the RemoveEmptyEntries option to remove any empty string values. Add the following code for this:
var words = phrase.ToLower().Split(' ', StringSplitOptions.RemoveEmptyEntries);
Use a simple foreach loop to iterate through the individual words found in the phrase:
            foreach(var word in words)
            {
                var key = word;
                if (char.IsPunctuation(key[key.Length-1]))
                {
                    key = key.Remove(key.Length-1);
                }

The char.IsPunctuation method is used to remove punctuation marks from the end of the word.

Use the TryGetValue method to check if there is a Dictionary entry with the current word. If so, update the count by one:
                if (wordCounts.TryGetValue(key, out var count))
                {
                    wordCounts[key] = count + 1;
                }
                else
                {
                    wordCounts.Add(key, 1);
                }
            }

If the word does not exist, add a new word key with a starting value of 1.

Once all the words in the phrase have been processed, return the wordCounts Dictionary:
            return wordCounts;
        }
    }
Now, write the console app that allows the user to enter a phrase:
    class Program
    {
        public static void Main()
        {
            string input;
            do
            {
                Console.Write("Enter a phrase:");
                input = Console.ReadLine();

The do loop will end once the user enters an empty string; you will add the code for this in an upcoming step.

Call the WordCounter.Process method to return a key-value pair that can be enumerated through.
For each key and value, write the word and its count, padding each word to the right:
                if (!string.IsNullOrEmpty(input))
                {
                    var countsByWord = WordCounter.Process(input);
                    var i = 0;
                    foreach (var (key, value) in countsByWord)
                    {
                        Console.Write($"{key.PadLeft(20)}={value} ");
                        i++;
                        if (i % 3 == 0)
                        {
                            Console.WriteLine();
                        }
                    }
                    Console.WriteLine();

A new line is started after every third word (using i % 3 = 0) for improved output formatting.

Finish off the do-while loop:
                    }
            } while (input != string.Empty);
        }
    }
}
Running the console using the opening text from The Gettysburg Address of 1863 produces this output:
Enter a phrase: Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure.
                four=1                 score=1                 and=3
               seven=1                 years=1                 ago=1
                 our=1               fathers=1             brought=1
               forth=1                  upon=1                this=1
           continent=1                     a=2                 new=1
             nation=3             conceived=2                  in=2
             liberty=1             dedicated=2                  to=1
                 the=1           proposition=1                that=2
                 all=1                   men=1                 are=2
             created=1                 equal=1                 now=1
                  we=1               engaged=1               great=1
               civil=1                   war=1             testing=1
             whether=1                    or=1                 any=1
                  so=2                   can=1                 long=1
             endure=1
Note
You can search online for The Gettysburg Address or visit https://rmc.library.cornell.edu/gettysburg/good_cause/transcript.htm.

From the results, you can see that each word is displayed only once and that certain words, such as and and that, appear more than once in the speech. The words are listed in the order they appear in the text, but this is not always the case with the Dictionary class. It should be assumed that the order will not remain fixed this way; dictionaries' values should be accessed using a key.

Note

You can find the code used for this exercise at https://packt.link/Dnw4a.

So far, you have learned about the main collections commonly used in .NET. It is now time to look at LINQ, which makes extensive use of collections based on the IEnumerable interface.

LINQ

LINQ (pronounced link) is short for Language Integrated Query. LINQ is a general-purpose language that can be used to query objects in memory by using a syntax that is similar to Structured Query Language (SQL), that is, it is used to query databases. It is an enhancement of the C# language that makes it easier to interact with objects in memory using SQL-like Query Expressions or Query Operators (implemented through a series of extension methods).

Microsoft's original idea for LINQ was to bridge the gap between .NET code and data sources, such as relational databases and XML, using LINQ providers. LINQ providers form a set of building blocks that can be used to query various sources of data, using a similar set of Query Operators, without the caller needing to know the intricacies of how each data source works. The following is a list of providers and how they are used:

LINQ to Objects: Queries applied to objects in memory, such as those defined in a list.
LINQ to SQL: Queries applied to relational databases such as SQL Server, Sybase, or Oracle.
LINQ to XML: Queries applied to XML documents.

This chapter will cover LINQ to Objects. This is, by far, the most common use of LINQ providers and offers a flexible way to query collections in memory. In fact, when talking about LINQ, most people refer to LINQ to Objects, mainly due to its ubiquitous use throughout C# applications.

At the heart of LINQ is the way that collections can be converted, filtered, and aggregated into new forms using a concise and easy-to-use syntax. LINQ can be used in two interchangeable styles:

Query Operators
Query Expressions

Each style offers a different syntax to achieve the same result, and which one you use often comes down to personal preference. Each style can be interwoven in code easily.

Query Operators

These are based on a series of core extension methods. The results from one method can be chained together into a programming style, which can often be easier to grasp than their expression-based counterparts.

The extension methods typically take an IEnumerable<T> or IQueryable<T> input source, such as a list, and allow a Func<T> predicate to be applied to that source. The source is generic-based, so Query Operators work with all types. It is just as easy to work with List<string> as it is with List<Customer>, for example.

In the following snippet, .Where, .OrderBy, and .Select are the extension methods being called:

books.Where(book => book.Price > 10)

.OrderBy(book => book.Price)

.Select(book => book.Name)

Here, you are taking the results from a .Where extension method to find all books with a unit price greater than 10, which is then sorted using the .OrderBy extension method. Finally, the name of each book is extracted using the .Select method. These methods could have been declared as single lines of code, but chaining in this way provides a more intuitive syntax. This will be covered in more detail in the upcoming sections.

Query Expressions

Query Expressions are an enhancement of the C# language and resemble SQL syntax. The C# compiler compiles Query Expressions into a sequence of Query Operator extension method calls. Note that not all Query Operators are available with an equivalent Query Expression implementation.

Query Expressions have the following rules:

They start with a from clause.
They can contain at least one or more optional where, orderby, join, let, and additional from clauses.
They end with either a select or a group clause.

The following snippet is functionally equivalent to the Query Operator style defined in the previous section:

from book in books where book.Price > 10 orderby book.Price select book.Name

You will take a more in-depth look at both styles as you learn about the standard Query Operators shortly.

Deferred Execution

Whether you choose to use Query Operators, Query Expressions, or a mixture of the two, it is important to remember that for many operators, the query that you define is not executed when it is defined, but only when it is enumerated over. This means that it is not until a foreach statement or a ToList, ToArray, ToDictionary, ToLookup, or ToHashSet method is called that the actual query is executed.

This allows queries to be constructed elsewhere in code with additional criteria included, and then used or even reused with a different collection of data. Recall that in Chapter 3, Delegates, Lambdas, and Events, you saw similar behavior with delegates. Delegates are not executed where they are defined, but only when they are invoked.

In the following short Query Operator example, the output will be abz even though z is added after the query is defined but before it is enumerated through. This demonstrates that LINQ queries are evaluated on demand, rather than at the point where they are declared:

var letters = new List<string> { "a", "b"}

var query = letters.Select(w => w.ToUpper());

letters.Add("z");

foreach(var l in query)

Console.Write(l);

Standard Query Operators

LINQ is driven by a core set of extension methods, referred to as standard Query Operators. These are grouped into operations based on their functionality. There are many standard Query Operators available, so for this introduction, you will explore all the main operators that you are likely to use regularly.

Projection Operations

Projection operations allow you to convert an object into a new structure using only the properties that you need. You can create a new type, apply mathematical operations, or return the original object:

Select: Projects each item in the source into a new form.
SelectMany: Projects all items in the source, flattens the result, and optionally projects them to a new form. There is no Query Expression equivalent for SelectMany.

Select

Consider the following snippet, which iterates through a List<string> containing the values Mon, Tues, and Wednes, outputting each with the word day appended.

In your Chapter04Examples folder, add a new file called LinqSelectExamples.cs and edit it as follows:

using System;

using System.Collections.Generic;

using System.Linq;

namespace Chapter04.Examples

{

class LinqSelectExamples

{

public static void Main()

{

var days = new List<string> { "Mon", "Tues", "Wednes" };

var query1 = days.Select(d => d + "day");

foreach(var day in query1)

Console.WriteLine($"Query1: {day}");

Looking at the Query Operator syntax first, you can see that query1 uses the Select extension method and defines a Func<T> like this:

d => d + "day"

When executed, the variable d is passed to the lambda statement, which appends the word day to each string in the days list: "Mon", "Tues", "Wednes". This returns a new IEnumerable<string> instance, with the original values inside the source variable, days, remaining unchanged.

You can now enumerate through the new IEnumerable instance using foreach, as follows:

var query2 = days.Select((d, i) => $"{i} : {d}day");

foreach (var day in query2)

Console.WriteLine($"Query2: {day}");

Note that the Select method has another overload that allows the index position in the source and value to be accessed, rather than just the value itself. Here, d (the string value) and i (its index) are passed, using the ( d , i ) => syntax and joined into a new string. The output will be displayed as 0 : Monday, 1 : Tuesday, and so on.

Anonymous Types

Before you continue looking at Select projections, it is worth noting that C# does not limit you to just creating new strings from existing strings. You can project into any type.

You can also create anonymous types, which are types created by the compiler from the properties that you name and specify. For example, consider the following example, which results in a new type being created that represents the results of the Select method:

var query3 = days.Select((d, i) => new

{

Index = i,

UpperCaseName = $"{d.ToUpper()}DAY"

});

foreach (var day in query3)

Console.WriteLine($"Query3: Index={day.Index}, UpperCaseDay={day.UpperCaseName}");

Here, query3 results in a new type that has an Index and UpperCaseName property; the values are assigned using Index = i and UpperCaseName = $"{d.ToUpper()}DAY".

These types are scoped to be available within your local method and can then be used in any local statements, such as in the previous foreach block. This saves you from having to create classes to temporarily store values from a Select method.

Running the code produces output in this format:

Index=0, UpperCaseDay=MONDAY

As an alternative, consider how the equivalent Query Expression looks. In the following example, you start with the from day in days expression. This assigns the name day to the string values in the days list. You then use select to project that to a new string, appending "day" to each.

This is functionally equivalent to the example in query1. The only difference is the code readability:

var query4 = from day in days

select day + "day";

foreach (var day in query4)

Console.WriteLine($"Query4: {day}");

The following example snippet mixes a Query Operator and Query Expressions. The select Query Expression cannot be used to select a value and index, so the Select extension method is used to create an anonymous type with a Name and Index property:

var query5 = from dayIndex in

days.Select( (d, i) => new {Name = d, Index = i})

select dayIndex;

foreach (var day in query5)

Console.WriteLine($"Query5: Index={day.Index} : {day.Name}");

Console.ReadLine();

}

Running the full example produces this output:

Query1: Monday

Query1: Tuesday

Query1: Wednesday

Query2: 0 : Monday

Query2: 1 : Tuesday

Query2: 2 : Wednesday

Query3: Index=0, UpperCaseDay=MONDAY

Query3: Index=1, UpperCaseDay=TUESDAY

Query3: Index=2, UpperCaseDay=WEDNESDAY

Query4: Monday

Query4: Tuesday

Query4: Wednesday

Query5: Index=0 : Mon

Query5: Index=1 : Tues

Query5: Index=2 : Wednes

Again, it largely comes down to personal choice as to which you prefer using. As queries become longer, one form may require less code than the other.

Note

You can find the code used for this example at https://packt.link/wKye0.

SelectMany

You have seen how Select can be used to project values from each item in a source collection. In the case of a source that has enumerable properties, the SelectMany extension method can extract the multiple items into a single list, which can then be optionally projected into a new form.

The following example creates two City records, each with multiple Station names, and uses SelectMany to extract all stations from both cities:

In your Chapter04Examples folder, add a new file called LinqSelectManyExamples.cs and edit it as follows:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
    record City (string Name, IEnumerable<string> Stations);
    class LinqSelectManyExamples
    {
        public static void Main()
        {
            var cities = new List<City>
            {
                new City("London", new[] {"Kings Cross KGX",                                           "Liverpool Street LVS",                                           "Euston EUS"}),
                new City("Birmingham", new[] {"New Street NST"})
            };
            Console.WriteLine("All Stations: ");
            foreach (var station in cities.SelectMany(city => city.Stations))
            {
                Console.WriteLine(station);
            }

The Func parameter, which is passed to SelectMany, requires you to specify an enumerable property, in this case, the City class's Stations property, which contains a list of string names (see the highlighted code).

Notice how a shortcut is used here, by directly integrating the query into a foreach statement. You are not altering or reusing the query variable, so there is no benefit in defining it separately, as done earlier.

SelectMany extracts all the station names from all of the items in the List<City> variable. Starting with the City class at element 0, which has the name London, it will extract the three station names ("Kings Cross KGX", "Liverpool Street LVS", and "Euston EUS"). It will then move on to the second City element, named Birmingham, and extract the single station, named "New Street NST".

Running the example produces the following output:
All Stations:
Kings Cross KGX
Liverpool Street LVS
Euston EUS
New Street NST
As an alternative, consider the following snippet. Here, you revert to using a query variable, stations, to make the code easier to follow:
            Console.Write("All Station Codes: ");
            var stations = cities
                .SelectMany(city => city.Stations.Select(s => s[^3..]));
            foreach (var station in stations)
            {
                Console.Write($"{station} ");
            }
            Console.WriteLine();
            Console.ReadLine();
        }
    }
}

Rather than just returning each Station string, this example uses a nested Select method and a Range operator to extract the last three characters from the station name using s[^3..], where s is a string for each station name and ^3 indicates that the Range operator should extract a string that starts at the last three characters in the string.

Running the example produces the following output:
All Station Codes: KGX LVS EUS NST

You can see the last three characters of each station name are shown in the output.

Note

You can find the code used for this example at https://packt.link/g8dXZ.

In the next section you will read about the filtering operations that filter a result as per a condition.

Filtering Operations

Filtering operations allow you to filter a result to return only those items that match a condition. For example, consider the following snippet, which contains a list of orders:

In your Chapter04Examples folder, add a new file called LinqWhereExample.cs and edit it as follows:
LinqWhereExamples.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
    record Order (string Product, int Quantity, double Price);
    class LinqWhereExamples
    {
        public static void Main()
        {
            var orders = new List<Order>
            {
                new Order("Pen", 2, 1.99),
                new Order("Pencil", 5, 1.50),
                new Order("Note Pad", 1, 2.99),

You can find the complete code here: https://packt.link/ZJpb5.

Here, some order items are defined for various stationery products. Suppose you want to output all orders that have a quantity greater than five (this should output the Ruler and USB Memory Stick orders from the source).

For this, you can add the following code:
            Console.WriteLine("Orders with quantity over 5:");
            foreach (var order in orders.Where(o => o.Quantity > 5))
            {
                Console.WriteLine(order);
            }
Now, suppose you extend the criteria to find all products where the product is Pen or Pencil. You can chain that result into a Select method, which will return each order's total value; remember that Select can return anything from a source, even a simple extra calculation like this:
            Console.WriteLine("Pens or Pencils:");
            foreach (var orderValue in orders
                .Where(o => o.Product == "Pen" || o.Product == "Pencil")
                .Select( o => o.Quantity * o.Price))
            {
                Console.WriteLine(orderValue);
            }
Next, the Query Expression in the following snippet uses a where clause to find the orders with a price less than or equal to 3.99. This projects them into an anonymous type that has Name and Value properties, which you enumerate through using a foreach statement:
            var query = from order in orders
               where order.Price <= 3.99
               select new {Name=order.Product, Value=order.Quantity*order.Price};
            Console.WriteLine("Cheapest Orders:");
            foreach(var order in query)
            {
                Console.WriteLine($"{order.Name}: {order.Value}");
            }
        }
    }
}
Running the full example produces this result:
Orders with quantity over 5:
Order { Product = Ruler, Quantity = 10, Price = 0.5 }
Order { Product = USB Memory Stick, Quantity = 6, Price = 20 }
Pens or Pencils:
3.98
7.5
Cheapest Orders:
Pen: 3.98
Pencil: 7.5
Note Pad: 2.99
Stapler: 3.99
Ruler: 5

Now you have seen Query Operators in action, it is worth returning to deferred execution to see how this affects a query that is enumerated multiple times over.

In this next example, you have a collection of journeys made by a vehicle, which are populated via a TravelLog record. The TravelLog class contains an AverageSpeed method that logs a console message each time it is executed, and, as the name suggests, returns the average speed of the vehicle during that journey:

In your Chapter04Examples folder, add a new file called LinqMultipleEnumerationExample.cs and edit it as follows:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
    record TravelLog (string Name, int Distance, int Duration)
    {
        public double AverageSpeed()
        {
            Console.WriteLine($"AverageSpeed() called for '{Name}'");
            return Distance / Duration;
        }
    }
    class LinqMultipleEnumerationExample
    {
Next, define the console app's Main method, which populates a travelLogs list with four TravelLog records. You will add the following code for this:
        public static void Main()
        {
            var travelLogs = new List<TravelLog>
            {
                new TravelLog("London to Brighton", 50, 4),
                new TravelLog("Newcastle to London", 300, 24),
                new TravelLog("New York to Florida", 1146, 19),
                new TravelLog("Paris to Berlin", 546, 10)
            };
You will now create a fastestJourneys query variable, which includes a Where clause. This Where clause will call each journey's AverageSpeed method when enumerated.
Then, using a foreach loop, you enumerate through the items in fastestJourneys and write the name and distance to the console (note that you do not access the AverageSpeed method inside the foreach loop):
            var fastestJourneys = travelLogs.Where(tl => tl.AverageSpeed() > 50);
            Console.WriteLine("Fastest Distances:");
            foreach (var item in fastestJourneys)
            {
                Console.WriteLine($"{item.Name}: {item.Distance} miles");
            }
            Console.WriteLine();
Running the code block will produce the following output, the Name and Distance for each journey:
Fastest Distances:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
New York to Florida: 1146 miles
AverageSpeed() called for 'Paris to Berlin'
Paris to Berlin: 546 miles
You can see that AverageSpeed was called four times, once for each journey as part of the Where condition. This is as expected so far, but now, you can reuse the same query to output the Name and, alternatively, the Duration:
            Console.WriteLine("Fastest Duration:");
            foreach (var item in fastestJourneys)
            {
                Console.WriteLine($"{item.Name}: {item.Duration} hours");
            }
            Console.WriteLine();
Running this block produces the same four calls to the AverageSpeed method:
Fastest Duration:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
New York to Florida: 19 hours
AverageSpeed() called for 'Paris to Berlin'
Paris to Berlin: 10 hours

This shows that whenever a query is enumerated, the full query is re-evaluated every time. This might not be a problem for a fast method such as AverageSpeed, but what if a method needs to access a database to extract some data? That would result in multiple database calls and, possibly, a very slow application.

You can use methods such as ToList, ToArray, ToDictionary, ToLookup, or ToHashSet to ensure that a query that could be enumerated many times is executed once only rather than being re-evaluated repeatedly. Continuing with this example, the following block uses the same Where clause but includes an extra ToList call to immediately execute the query and ensure it is not re-evaluated:
            Console.WriteLine("Fastest Duration Multiple loops:");
            var fastestJourneysList = travelLogs
                  .Where(tl => tl.AverageSpeed() > 50)
                  .ToList();
            for (var i = 0; i < 2; i++)
            {
                Console.WriteLine($"Fastest Duration Multiple loop iteration {i+1}:");
                foreach (var item in fastestJourneysList)
                {
                    Console.WriteLine($"{item.Name}: {item.Distance} in {item.Duration} hours");
                }
            }
        }
    }
}
Running the block produces the following output. Notice how AverageSpeed is called four times only and is called prior to either of the two Fastest Duration Multiple loop iteration messages:
Fastest Duration Multiple loops:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
AverageSpeed() called for 'Paris to Berlin'
Fastest Duration Multiple loop iteration 1:
New York to Florida: 1146 in 19 hours
Paris to Berlin: 546 in 10 hours
Fastest Duration Multiple loop iteration 2:
New York to Florida: 1146 in 19 hours
Paris to Berlin: 546 in 10 hours

Notice that from the collection of journeys made by a vehicle, the code returns the average speed of the vehicle during the journeys.

Note

You can find the code used for this example at https://packt.link/CIZJE.

Sorting Operations

There are five operations to sort items in a source. Items are primarily sorted and that can be followed by an optional secondary sort, which sorts the items within their primary group. For example, you can use a primary sort to sort a list of people firstly by the City property and then use a secondary sort to further sort them by the Surname property:

OrderBy: Sorts values into ascending order.
OrderByDescending: Sorts values into descending order.
ThenBy: Sorts values that have been primarily sorted into a secondary ascending order.
ThenByDescending: Sorts values that have been primarily sorted into a secondary descending order.
Reverse: Simply returns a collection where the order of elements in the source is reversed. There is no expression equivalent.

OrderBy and OrderByDescending

In this example, you will use the System.IO namespace to query files in the host machine's temp folder, rather than creating small objects from lists.

The static Directory class offers methods that can query the filesystem. FileInfo retrieves details about a specific file, such as its size or creation date. The Path.GetTempPath method returns the system's temp folder. To illustrate the point, in the Windows operating system, this can typically be found at C:UsersusernameAppDataLocalTemp, where username is a specific Windows login name. This will be different for other users and other systems:

In your Chapter04Examples folder, add a new file called LinqOrderByExamples.cs and edit it as follows:
using System;
using System.IO;
using System.Linq;
namespace Chapter04.Examples
{
    class LinqOrderByExamples
    {
        public static void Main()
        {
Use the Directory.EnumerateFiles method to find all filenames with the .tmp extension in the temp folder:
            var fileInfos = Directory.EnumerateFiles(Path.GetTempPath(), "*.tmp")
                .Select(filename => new FileInfo(filename))
                .ToList();

Here, each filename is projected into a FileInfo instance and chained into a populated collection using ToList, which allows you to further query the resulting fileInfos details.

Next, the OrderBy method is used to sort the earliest files by comparing the CreationTime property of the file:
            Console.WriteLine("Earliest Files");
            foreach (var fileInfo in fileInfos.OrderBy(fi => fi.CreationTime))
            {
                Console.WriteLine($"{fileInfo.CreationTime:dd MMM yy}: {fileInfo.Name}");
            }
To find the largest files, re-query fileInfos and sort each file by its Length property using OrderByDescending:
            Console.WriteLine("Largest Files");
            foreach (var fileInfo in fileInfos                                        .OrderByDescending(fi => fi.Length))
            {
                Console.WriteLine($"{fileInfo.Length:N0} bytes: {fileInfo.Name}");
            }
Finally, use where and orderby descending expressions to find the largest files that are less than 1,000 bytes in length:
            Console.WriteLine("Largest smaller files");
            foreach (var fileInfo in
                from fi in fileInfos
                where fi.Length < 1000
                orderby fi.Length descending
                select fi)
            {
                Console.WriteLine($"{fileInfo.Length:N0} bytes: {fileInfo.Name}");
            }
            Console.ReadLine();
        }
    }
}
Depending on the files in your temp folder, you should see an output like this:
Earliest Files
05 Jan 21: wct63C3.tmp
05 Jan 21: wctD308.tmp
05 Jan 21: wctFE7.tmp
04 Feb 21: wctE092.tmp
Largest Files
38,997,896 bytes:       wctE092.tmp
4,824,572 bytes:        cb6dfb76-4dc9-494d-9683-ce31eab43612.tmp
4,014,036 bytes:        492f224c-c811-41d6-8c5d-371359d520db.tmp
Largest smaller files
726 bytes:      wct38BC.tmp
726 bytes:      wctE239.tmp
512 bytes:      ~DF8CE3ED20D298A9EC.TMP
416 bytes:      TFR14D8.tmp

With this example, you have queried files in the host machine's temp folder, rather than creating small objects from lists.

Note

You can find the code used for this example at https://packt.link/mWeVC.

ThenBy and ThenByDescending

The following example sorts popular quotes, based on the number of words found in each.

In your Chapter04Examples folder, add a new file called LinqThenByExamples.cs and edit it as follows:

using System;

using System.IO;

using System.Linq;

namespace Chapter04.Examples

{

class LinqThenByExamples

{

public static void Main()

{

You start by declaring a string array of quotes as follows:

var quotes = new[]

{

"Love for all hatred for none",

"Change the world by being yourself",

"Every moment is a fresh beginning",

"Never regret anything that made you smile",

"Die with memories not dreams",

"Aspire to inspire before we expire"

};

In the next snippet, each of these string quotes is projected into a new anonymous type based on the number of words in the quote (found using String.Split()). The items are first sorted in descending order to show those with the most words and then sorted in alphabetical order:

foreach (var item in quotes

.Select(q => new {Quote = q, Words = q.Split(" ").Length})

.OrderByDescending(q => q.Words)

.ThenBy(q => q.Quote))

{

Console.WriteLine($"{item.Words}: {item.Quote}");

}

Console.ReadLine();

}

Running the code lists the quotes in word count order as follows:

7: Never regret anything that made you smile

6: Aspire to inspire before we expire

6: Change the world by being yourself

6: Every moment is a fresh beginning

6: Love for all hatred for none

5: Die with memories not dreams

Note how the quotes with six words are shown alphabetically.

The following (highlighted code) is the equivalent Query Expression with orderby quote.Words descending followed by the quote.Words ascending clause:

var query = from quote in

(quotes.Select(q => new {Quote = q, Words = q.Split(" ").Length}))

orderby quote.Words descending, quote.Words ascending

select quote;

foreach(var item in query)

{

Console.WriteLine($"{item.Words}: {item.Quote}");

}

Console.ReadLine();

}

Note

You can find the code used for this example at https://packt.link/YWJRz.

Now you have sorted popular quotes based on the number of words found in each. It is time to apply the skills learnt in the next exercise.

Exercise 4.03: Filtering a List of Countries by Continent and Sorting by Area

In the preceding examples, you have looked at code that can select, filter, and sort a collection source. You will now combine these into an exercise that filters a small list of countries for two continents (South America and Africa) and sorts the results by geographical size.

Perform the following steps to do so:

In your Chapter04Exercises folder, create a new Exercise03 folder.
Add a new class called Program.cs in the Exercise03 folder.
Start by adding a Country record that will be passed the Name of a country, the Continent to which it belongs, and its Area in square miles:
using System;
using System.Linq;
namespace Chapter04.Exercises.Exercise03
{
    class Program
    {
        record Country (string Name, string Continent, int Area);
        public static void Main()
        {
Now create a small subset of country data defined in an array, as follows:
            var countries = new[]
            {
                new Country("Seychelles", "Africa", 176),
                new Country("India", "Asia", 1_269_219),
                new Country("Brazil", "South America",3_287_956),
                new Country("Argentina", "South America", 1_073_500),
                new Country("Mexico", "South America",750_561),
                new Country("Peru", "South America",494_209),
                new Country("Algeria", "Africa", 919_595),
                new Country("Sudan", "Africa", 668_602)
            };

The array contains the name of a country, the continent it belongs to, and its geographical size in square miles.

Your search criteria must include South America or Africa. So define them in an array rather than hardcoding the where clause with two specific strings:
var requiredContinents = new[] {"South America", "Africa"};

This offers extra code flexibility should you need to alter it.

Build up a query by filtering and sorting by continent, sorting by area, and using the .Select extension method, which returns the Index and item value:
            var filteredCountries = countries
                .Where(c => requiredContinents.Contains(c.Continent))
                .OrderBy(c => c.Continent)
                .ThenByDescending(c => c.Area)
                .Select( (cty, i) => new {Index = i, Country = cty});

            foreach(var item in filteredCountries)
                Console.WriteLine($"{item.Index+1}: {item.Country.Continent}, {item.Country.Name} = {item.Country.Area:N0} sq mi");
        }
    }
}

You finally project each into a new anonymous type to be written to the console.

Running the code block produces the following result:
1: Africa, Algeria = 919,595 sq mi
2: Africa, Sudan = 668,602 sq mi
3: Africa, Seychelles = 176 sq mi
4: South America, Brazil = 3,287,956 sq mi
5: South America, Argentina = 1,073,500 sq mi
6: South America, Mexico = 750,561 sq mi
7: South America, Peru = 494,209 sq mi

Notice that Algeria has the largest area in Africa, and Brazil has the largest area in South America (based on this small subset of data). Notice how you add 1 to each Index for readability (since starting at zero is less user-friendly).

Note

You can find the code used for this exercise at https://packt.link/Djddw.

You have seen how LINQ extension methods can be used to access items in a data source. Now, you will learn about partitioning data, which can be used to extract subsets of items.

Partitioning Operations

So far, you have looked at filtering the items in a data source that match a defined condition. Partitioning is used when you need to divide a data source into two distinct sections and return either of those two sections for subsequent processing.

For example, consider that you have a list of vehicles sorted by value and want to process the five least expensive vehicles using some method. If the list is sorted in ascending order, then you could partition the data using the Take(5) method (defined in the following paragraphs), which will extract the first five items and discard the remaining.

There are six partitioning operations that are used to split a source, with either of the two sections being returned. There are no partitioning Query Expressions:

Skip: Returns a collection that skips items up to a specified numeric position in the source sequence. Used when you need to skip the first N items in a source collection.
SkipLast: Returns a collection that skips the last N items in the source sequence.
SkipWhile: Returns a collection that skips items in the source sequence that match a specified condition.
Take: Returns a collection that contains the first N items in the sequence.
TakeLast: Returns a collection that contains the last N items in the sequence.
TakeWhile: Returns a collection that contains only those items that match the condition specified.

The following example demonstrates various Skip and Take operations on an unsorted list of exam grades. Here, you use Skip(1) to ignore the highest grade in a sorted list.

In your Chapter04Examples folder, add a new file called LinqSkipTakeExamples.cs and edit it as follows:
using System;
using System.Linq;
namespace Chapter04.Examples
{
    class LinqSkipTakeExamples
    {
        public static void Main()
        {
            var grades = new[] {25, 95, 75, 40, 54, 9, 99};
            Console.Write("Skip: Highest Grades (skipping first):");
            foreach (var grade in grades
                .OrderByDescending(g => g)
                .Skip(1))
            {
                Console.Write($"{grade} ");
            }
            Console.WriteLine();
Next, the relational is operator is used to exclude those less than 25 or greater than 75:
            Console.Write("SkipWhile@ Middle Grades (excluding 25 or 75):");
            foreach (var grade in grades
                .OrderByDescending(g => g)
                .SkipWhile(g => g is <= 25 or >=75))
            {
                Console.Write($"{grade} ");
            }
            Console.WriteLine();
By using SkipLast, you can show the bottom half of the results. Add the code for this as follows:
            Console.Write("SkipLast: Bottom Half Grades:");
            foreach (var grade in grades
                .OrderBy(g => g)
                .SkipLast(grades.Length / 2))
            {
                Console.Write($"{grade} ");
            }
            Console.WriteLine();
Finally, Take(2) is used here to show the two highest grades:
            Console.Write("Take: Two Highest Grades:");
            foreach (var grade in grades
                .OrderByDescending(g => g)
                .Take(2))
            {
                Console.Write($"{grade} ");
            }
        }
    }
}
Running the example produces this output, which is as expected:
Skip: Highest Grades (skipping first):95 75 54 40 25 9
SkipWhile Middle Grades (excluding 25 or 75):54 40 25 9
SkipLast: Bottom Half Grades:9 25 40 54
Take: Two Highest Grades:99 95

This example demonstrated the various Skip and Take operations on an unsorted list of exam grades.

Note

You can find the code used for this example at https://packt.link/TsDFk.

Grouping Operations

GroupBy groups elements that share the same attribute. It is often used to group data or provide a count of items grouped by a common attribute. The result is an enumerable IGrouping<K, V> type collection, where K is the key type and V is the value type specified. IGrouping itself is enumerable as it contains all items that match the specified key.

For example, consider the next snippet, which groups a List of customer orders by name. In your Chapter04Examples folder, add a new file called LinqGroupByExamples.cs and edit it as follows:

LinqGroupByExamples.cs

using System;

using System.Collections.Generic;

using System.Linq;

namespace Chapter04.Examples

{

record CustomerOrder(string Name, string Product, int Quantity);

class LinqGroupByExamples

{

public static void Main()

{

var orders = new List<CustomerOrder>

{

new CustomerOrder("Mr Green", "LED TV", 4),

new CustomerOrder("Mr Smith", "iPhone", 2),

new CustomerOrder("Mrs Jones", "Printer", 1),

You can find the complete code here: https://packt.link/GbwF2.

In this example, you have a list of CustomerOrder objects and want to group them by the Name property. For this, the GroupBy method is passed a Func delegate, which selects the Name property from each CustomerOrder instance.

Each item in the GroupBy result contains a Key (in this case, the customer's Name). You can then sort the grouping item to show the CustomerOrders items sorted by Quantity, as follows:

foreach (var item in grouping.OrderByDescending(i => i.Quantity))

{

Console.WriteLine($" {item.Product} * {item.Quantity}");

}

Console.ReadLine();

}

Running the code produces the following output:

Customer Mr Green:

LED TV * 4

MP3 Player * 1

Microwave Oven * 1

Customer Mr Smith:

PC * 5

iPhone * 2

Printer * 2

Customer Mrs Jones:

Printer * 1

You can see the data is first grouped by customer Name and then ordered by order Quantity within each customer grouping. The equivalent Query Expression is written like this:

var query = from order in orders

group order by order.Name;

foreach (var grouping in query)

{

Console.WriteLine($"Customer {grouping.Key}:");

foreach (var item in from item in grouping

orderby item.Quantity descending

select item)

{

Console.WriteLine($" {item.Product} * {item.Quantity}");

}

You have now seen some of the commonly used LINQ operators. You will now bring them together in an exercise.

Exercise 4.04: Finding the Most Commonly Used Words in a Book

In Chapter 3, Delegates, Events, and Lambdas, you used the WebClient class to download data from a website. In this exercise, you will use data downloaded from Project Gutenberg.

Note

Project Gutenberg is a library of 60,000 free eBooks. You can search online for Project Gutenberg or visit https://www.gutenberg.org/.

You will create a console app that allows the user to enter a URL. Then, you will download the book's text from the Project Gutenberg URL and use various LINQ statements to find the most frequent words in the book's text.

Additionally, you want to exclude some common stop-words; these are words such as and, or, and the that appear regularly in English, but add little to the meaning of a sentence. You will use the Regex.Split method to help split words more accurately than a simple space delimiter. Perform the following steps to do so:

Note

You can find more information on Regex can be found at https://packt.link/v4hGN.

In your Chapter04Exercises folder, create a new Exercise04 folder.
Add a new class called Program.cs in the Exercise04 folder.
First, define the TextCounter class. This will be passed the path to a file, which you will add shortly. This should contain common English stop-words:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
namespace Chapter04.Exercises.Exercise04
{
    class TextCounter
    {
        private readonly HashSet<string> _stopWords;
        public TextCounter(string stopWordPath)
        {
            Console.WriteLine($"Reading stop word file: {stopWordPath}");
Using File.ReadAllLines, add each word into the _stopWords HashSet.
_stopWords = new HashSet<string>(File.ReadAllLines(stopWordPath));
}

You have used a HashSet, as each stop-word is unique.

Next, the Process method is passed a string that contains the book's text and the maximum number of words to show.
Return the result as a Tuple<string, int> collection, which saves you from having to create a class or record to hold the results:
public IEnumerable<Tuple<string, int>> Process(string text, int maximumWords)
{
Now perform the query part. Use Regex.Split with the pattern @"s+" to split all the words.

In its simplest form, this pattern splits a string into a list of words, typically using a space or punctuation marks to identify word boundaries. For example, the string Hello Goodbye would be split into an array that contains two elements, Hello and Goodbye. The returned string items are filtered via where to ensure all stop-words are ignored using the Contains method. The words are then grouped by value, GroupBy(t=>t), projected to a Tuple using the word as a Key, and the number of times it occurs using grp.Count.

Finally, you sort by Item2, which for this Tuple is the word count, and then take only the required number of words:
            var words = Regex.Split(text.ToLower(), @"s+")
                .Where(t => !_stopWords.Contains(t))
                .GroupBy(t => t)
                .Select(grp => Tuple.Create(grp.Key, grp.Count()))
                .OrderByDescending(tup => tup.Item2) //int
                .Take(maximumWords);
            return words;
        }
    }
Now start creating the main console app:
    class Program
    {
        public static void Main()
        {
Include a text file called StopWords.txt in the Chapter04 source folder:
const string StopWordFile = "StopWords.txt";
var counter = new TextCounter(StopWordFile);
Note
You can find StopWords.txt on GitHub at https://packt.link/Vi8JH, or you can download any standard stop-word file, such as NLTK's https://packt.link/ZF1Tf. This file should be saved in the Chapter04Exercises folder.
Once TextCounter has been created, prompt the user for a URL:
            string address;
            do
            {
                //https://www.gutenberg.org/files/64333/64333-0.txt
                Console.Write("Enter a Gutenberg book URL: ");
                address = Console.ReadLine();
                if (string.IsNullOrEmpty(address))
                    continue;
Enter a valid address and create a new WebClient instance and download the data file into a temporary file.
Perform extra processing to the text file before passing its contents to TextCounter:
                using var client = new WebClient();
                var tempFile = Path.GetTempFileName();
                Console.WriteLine("Downloading...");
                client.DownloadFile(address, tempFile);

The Gutenberg text files contain extra details such as the author and title. These can be read by reading each line in the file. The actual text of the book doesn't begin until finding a line that starts *** START OF THE PROJECT GUTENBERG EBOOK, so you need to read each line looking for this start message too:

Console.WriteLine($"Processing file {tempFile}");

const string StartIndicator = "*** START OF THE PROJECT GUTENBERG EBOOK";

//Title: The Little Review, October 1914(Vol. 1, No. 7)

//Author: Various

var title = string.Empty;

var author = string.Empty;

Next, append each line read into a StringBuilder instance, which is efficient for such string operations:
                var bookText = new StringBuilder();
                var isReadingBookText = false;
                var bookTextLineCount = 0;
Now parse each line inside tempFile, looking for the Author, Title, or the StartIndicator:
                foreach (var line in File.ReadAllLines(tempFile))
                {
                    if (line.StartsWith("Title"))
                    {
                        title = line;
                    }
                    else if (line.StartsWith("Author"))
                    {
                        author = line;
                    }
                    else if (line.StartsWith(StartIndicator))
                    {
                        isReadingBookText = true;
                    }
                    else if (isReadingBookText)
                    {
                        bookText.Append(line);
                        bookTextLineCount++;
                    }
                }
If the book text is found, provide a summary of lines and characters read before calling the counter.Process method. Here, you want the top 50 words:
                if (bookTextLineCount > 0)
                {
                    Console.WriteLine($"Processing {bookTextLineCount:N0} lines ({bookText.Length:N0} characters)..");
                  var wordCounts = counter.Process(bookText.ToString(), 50);
                  Console.WriteLine(title);
                  Console.WriteLine(author);
Once you have the results, use a foreach loop to output the word count details, adding a blank line to the output after every third word:
                    var i = 0;
                    //deconstruction
                    foreach (var (word, count) in wordCounts)
                    {
                        Console.Write($"'{word}'={count} ");
                        i++;
                        if (i % 3 == 0)
                        {
                            Console.WriteLine();
                        }
                    }
                    Console.WriteLine();
                }
                else
                {
Running the console app, using https://www.gutenberg.org/files/64333/64333-0.txt as an example URL produces the following output:
Reading stop word file: StopWords.txt
Enter a Gutenberg book URL: https://www.gutenberg.org/files/64333/64333-0.txt
Downloading...
Processing file C:Temp mpB0A3.tmp
Processing 4,063 lines (201,216 characters)..
Title: The Little Review, October 1914 (Vol. 1, No. 7)
Author: Various
'one'=108               'new'=95                'project'=62
'man'=56                'little'=54             'life'=52
'would'=51             'work'=50               'book'=42
'must'=42               'people'=39             'great'=37
'love'=37               'like'=36               'gutenberg-tm'=36
'may'=35                'men'=35                'us'=32
'could'=30             'every'=30             'first'=29
'full'=29               'world'=28             'mr.'=28
'old'=27                'never'=26             'without'=26
'make'=26               'young'=24             'among'=24
'modern'=23             'good'=23               'it.'=23
'even'=22               'war'=22                'might'=22
'long'=22               'cannot'=22             '_the'=22
'many'=21               'works'=21             'electronic'=21
'always'=20             'way'=20                'thing'=20
'day'=20                'upon'=20               'art'=20
'terms'=20             'made'=19
Note
Visual Studio might show the following when the code is run for the first time: warning SYSLIB0014: 'WebClient.WebClient()' is obsolete: 'WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.
This is a recommendation to use the newer HttpClient class instead of the WebClient class. Both are, however, functionally equivalent.

The output shows a list of words found amongst the 4,063 lines of text downloaded. The counter shows that one, new, and project are the most popular words. Notice how mr., gutenberg-tm, it., and _the appear as words. This shows that the Regex expression used is not completely accurate when splitting words.

Note

You can find the code used for this exercise at https://packt.link/Q7Pf8.

An interesting enhancement to this exercise would be to sort the words by count, include a count of the stop words found, or find the average word length.

Aggregation Operations

Aggregation operations are used to compute a single value from a collection of values in a data source. An example could be the maximum, minimum, and average rainfall from data collected over a month:

Average: Calculates the average value in a collection.
Count: Counts the items that match a predicate.
Max: Calculates the maximum value.
Min: Calculates the minimum value.
Sum: Calculates the sum of values.

The following example uses the Process.GetProcess method from the System.Diagnostics namespace to retrieve a list of processes currently running on the system:

In your Chapter04Examples folder, add a new file called LinqAggregationExamples.cs and edit it as follows:

using System;

using System.Diagnostics;

using System.Linq;

namespace Chapter04.Examples

{

class LinqAggregationExamples

{

public static void Main()

{

First, Process.GetProcesses().ToList() is called to retrieve a list of the active processes running on the system:

var processes = Process.GetProcesses().ToList();

Then, the Count extension method obtains a count of the items returned. Count has an additional overload, which accepts a Func delegate used to filter each of the items to be counted. The Process class has a PrivateMemorySize64 property, which returns the number of bytes of memory the process is currently consuming, so you can use that to count the small processes, that is, those using less than 1,000,000 bytes of memory:

var allProcesses = processes.Count;

var smallProcesses = processes.Count(proc => proc.PrivateMemorySize64 < 1_000_000);

Next, the Average extension method returns the overall average of a specific value for all items in the processes list. In this case, you use it to calculate the average memory consumption, using the PrivateMemorySize64 property again:

var average = processes.Average(p => p.PrivateMemorySize64);

The PrivateMemorySize64 property is also used to calculate the maximum and minimum memory used for all processes, along with the total memory, as follows:

var max = processes.Max(p => p.PrivateMemorySize64);

var min = processes.Min(p => p.PrivateMemorySize64);

var sum = processes.Sum(p => p.PrivateMemorySize64);

Once you have calculated the statistics, each value is written to the console:

Console.WriteLine("Process Memory Details");

Console.WriteLine($" All Count: {allProcesses}");

Console.WriteLine($"Small Count: {smallProcesses}");

Console.WriteLine($" Average: {FormatBytes(average)}");

Console.WriteLine($" Maximum: {FormatBytes(max)}");

Console.WriteLine($" Minimum: {FormatBytes(min)}");

Console.WriteLine($" Total: {FormatBytes(sum)}");

}

In the preceding snippet, the Count method returns the number of all processes and, using the Predicate overload, you Count those where the memory is less than 1,000,000 bytes (by examining the process.PrivateMemorySize64 property). You can also see that Average, Max, Min, and Sum are used to calculate statistics for process memory usage on the system.

Note

The aggregate operators will throw InvalidOperationException with the error Sequence contains no elements if you attempt to calculate using a source collection that contains no elements. You should check the Count or Any methods prior to calling any aggregate operators.

Finally, FormatBytes formats the amounts of memory into their megabyte equivalents:

private static string FormatBytes(double bytes)

{

return $"{bytes / Math.Pow(1024, 2):N2} MB";

}

Running the example produces results similar to this:

Process Memory Details

All Count: 305

Small Count: 5

Average: 38.10 MB

Maximum: 1,320.16 MB

Minimum: 0.06 MB

Total: 11,620.03 MB

From the output you will observe how the program retrieves a list of processes currently running on the system.

Note

You can find the code used for this example at https://packt.link/HI2eV.

Quantifier Operations

Quantifier operations return a bool that indicates whether all or some elements in a sequence match a Predicate condition. This is often used to verify any elements in a collection match some criteria, rather than relying on Count, which enumerates all items in the collection, even if you need just one result.

Quantifier operations are accessed using the following extension methods:

All: Returns true if all elements in the source sequence match a condition.
Any: Returns true if any element in the source sequence matches a condition.
Contains: Returns true if the source sequence contains the specified item.

The following card-dealing example selects three cards at random and returns a summary of those selected. The summary uses the All and Any extension methods to determine whether any of the cards were clubs or red and whether all cards were diamonds or an even number:

In your Chapter04Examples folder, add a new file called LinqAllAnyExamples.cs.
Start by declaring an enum that represents each of the four suits in a pack of playing cards and a record class that defines a playing card:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
    enum PlayingCardSuit
    {
        Hearts,
        Clubs,
        Spades,
        Diamonds
    }
    record PlayingCard (int Number, PlayingCardSuit Suit)
    {
It is common practice to override the ToString method to provide a user-friendly way to describe an object's state at runtime. Here, the card's number and suit are returned as a string:
        public override string ToString()
        {
            return $"{Number} of {Suit}";
        }
    }
Now create a class to represent a deck of cards (for ease, only create cards numbered one to 10). The deck's constructor will populate the _cards collection with 10 cards for each of the suits:
    class Deck
    {
        private readonly List<PlayingCard> _cards = new();
        private readonly Random _random = new();
        public Deck()
        {
            for (var i = 1; i <= 10; i++)
            {
                _cards.Add(new PlayingCard(i, PlayingCardSuit.Hearts));
                _cards.Add(new PlayingCard(i, PlayingCardSuit.Clubs));
                _cards.Add(new PlayingCard(i, PlayingCardSuit.Spades));
                _cards.Add(new PlayingCard(i, PlayingCardSuit.Diamonds));
            }
        }
Next, the Draw method randomly selects a card from the _cards List, which it removes before returning to the caller:
        public PlayingCard Draw()
        {
            var index = _random.Next(_cards.Count);
            var drawnCard = _cards.ElementAt(index);
            _cards.Remove(drawnCard);
            return drawnCard;
        }
    }
The console app selects three cards using the deck's Draw method. Add the code for this as follows:
    class LinqAllAnyExamples
    {
        public static void Main()
        {
            var deck = new Deck();
            var hand = new List<PlayingCard>();

            for (var i = 0; i < 3; i++)
            {
                hand.Add(deck.Draw());
            }
To show a summary, use the OrderByDescending and Select operations to extract the user-friendly ToString description for each PlayingCard. This is then joined into a single delimited string as follows:
            var summary = string.Join(" | ",
                hand.OrderByDescending(c => c.Number)
                    .Select(c => c.ToString()));
            Console.WriteLine($"Hand: {summary}");
Using All or Any, you can provide an overview of the cards and their score using the Sum of the card numbers. By using Any, you determine whether any of the cards in the hand are a club (the suit is equal to PlayingCardSuit.Clubs):
Console.WriteLine($"Any Clubs: {hand.Any(card => card.Suit == PlayingCardSuit.Clubs)}");
Similarly, Any is used to see if any of the cards belong to the Hearts or Diamonds suits, and therefore, are Red:
Console.WriteLine($"Any Red: {hand.Any(card => card.Suit == PlayingCardSuit.Hearts || card.Suit == PlayingCardSuit.Diamonds)}");
In the next snippet, the All extension looks at every item in the collection and returns true, in this case, if all cards are Diamonds:
Console.WriteLine($"All Diamonds: {hand.All(card => card.Suit == PlayingCardSuit.Diamonds)}");
All is used again to see if all card numbers can be divided by two without a remainder, that is, whether they are even:
Console.WriteLine($"All Even: {hand.All(card => card.Number % 2 == 0)}");
Conclude by using the Sum aggregation method to calculate the value of the cards in the hand:
            Console.WriteLine($"Score :{hand.Sum(card => card.Number)}");
        }
    }
}
Running the console app produces output like this:
Hand: 8 of Spades | 7 of Diamonds | 6 of Diamonds
Any Clubs: False
Any Red: True
All Diamonds: False
All Even: False
Score :21

The cards are randomly selected so you will have different hands each time you run the program. In this example, the score was 21, which is often a winning hand in card games.

Note

You can find the code used for this example at https://packt.link/xPuTc.

Join Operations

Join operations are used to join two sources based on the association of objects in one data source with those that share a common attribute in a second data source. If you are familiar with database design, this can be thought of as a primary and foreign key relationship between tables.

A common example of a join is one where you have a one-way relationship, such as Orders, which has a property of type Products, but the Products class does not have a collection property that represents a backward relationship to a collection of Orders. By using a Join operator, you can create a backward relationship to show Orders for Products.

The two join extension methods are the following:

Join: Joins two sequences using a key selector to extract pairs of values.
GroupJoin: Joins two sequences using a key selector and groups the resulting items.

The following example contains three Manufacturer records, each with a unique ManufacturerId. These numeric IDs are used to define various Car records, but to save memory, you will not have a direct memory reference from Manufacturer back to Car. You will use the Join method to create an association between the Manufacturer and Car instances:

In your Chapter04Examples folder, add a new file called LinqJoinExamples.cs.
First, declare the Manufacturer and Car records as follows:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record Manufacturer(int ManufacturerId, string Name);
record Car (string Name, int ManufacturerId);
Inside the Main entry point, create two lists, one for the manufacturers and the other to represent the cars:
LinqJoinExamples.cs
    class LinqJoinExamples
    {
        public static void Main()
        {
            var manufacturers = new List<Manufacturer>
            {
                new(1, "Ford"),
                new(2, "BMW"),
                new(3, "VW")
            };
            var cars = new List<Car>
            {
                new("Focus", 1),
                new("Galaxy", 1),
                new("GT40", 1),

You can find the complete code here: https://packt.link/Ue7Fj.

At this point, there is no direct reference, but as you know, you can use ManufacturerId to link the two together using the int IDs. You can add the following code for this:
            var joinedQuery = manufacturers.Join(
                cars,
                manufacturer => manufacturer.ManufacturerId,
                car => car.ManufacturerId,
                (manufacturer, car) => new                        {ManufacturerName = manufacturer.Name,                         CarName = car.Name});
            foreach (var item in joinedQuery)
            {
                Console.WriteLine($"{item}");
            }
        }
    }
}

In the preceding snippet, the Join operation has various parameters. You pass in the cars list and define which properties in the manufacturer and car classes should be used to create the join. In this case, manufacturer.ManufacturerId = car.ManufacturerId determines the correct join.

Finally, the manufacturer and car arguments return a new anonymous type that contains the manufacturer.Name and car.Name properties.

Running the console app produces the following output:
{ ManufacturerName = Ford, CarName = Focus }
{ ManufacturerName = Ford, CarName = Galaxy }
{ ManufacturerName = Ford, CarName = GT40 }
{ ManufacturerName = BMW, CarName = 1 Series }
{ ManufacturerName = BMW, CarName = 2 Series }
{ ManufacturerName = VW, CarName = Golf }
{ ManufacturerName = VW, CarName = Polo }

As you can see, each of the Car and Manufacturer instances has been joined correctly using ManufacturerId.

The equivalent Query Expression would be as follows (note that in this case, it is a more concise format than the Query Operator syntax):
var query = from manufacturer in manufacturers
            join car in cars
              on manufacturer.ManufacturerId equals car.ManufacturerId
              select new
              {
                ManufacturerName = manufacturer.Name, CarName = car.Name
              };
foreach (var item in query)
{
  Console.WriteLine($"{item}");
}
Note
You can find the code used for this example at http://packt.link/Wh8jK.

Before you finish exploring LINQ, there is one more area related to LINQ Query Expressions—the let clause.

Using a let Clause in Query Expressions

In earlier Query Expressions, you are often required to repeat similar-looking code in various clauses. Using a let clause, you can introduce new variables inside an Expression Query and reuse the variable's value throughout the rest of the query. For example, consider the following query:

var stations = new List<string>

{

"Kings Cross KGX",

"Liverpool Street LVS",

"Euston EUS",

"New Street NST"

};

var query1 = from station in stations

where station[^3..] == "LVS" || station[^3..] == "EUS" ||

station[0..^3].Trim().ToUpper().EndsWith("CROSS")

select new { code= station[^3..], name= station[0..^3].Trim().ToUpper()};

Here, you are searching for a station with the LVS or EUS code or a name ending in CROSS. To do this, you must extract the last three characters using a range, station[^3..], but you have duplicated that in two where clauses and the final projection.

The station code and station names could both be converted into local variables using the let clause:

var query2 = from station in stations

let code = station[^3..]

let name = station[0..^3].Trim().ToUpper()

where code == "LVS" || code == "EUS" ||

name.EndsWith("CROSS")

select new {code, name};

Here, you have defined code and name using a let clause and reused them throughout the query. This code looks much neater and is also easier to follow and maintain.

Running the code produces the following output:

Station Codes:

KGX : KINGS CROSS

LVS : LIVERPOOL STREET

EUS : EUSTON

Station Codes (2):

KGX : KINGS CROSS

LVS : LIVERPOOL STREET

EUS : EUSTON

Note

You can find the code used for this example at https://packt.link/b2KiG.

By now you have seen the main parts of LINQ. Now you will now bring these together into an activity that filters a set of flight records based on a user's criteria and provides various statistics on the subset of flights found.

Activity 4.01: Treasury Flight Data Analysis

You have been asked to create a console app that allows the user to download publicly available flight data files and apply statistical analysis to the files. This analysis should be used to calculate a count of the total records found, along with the average, minimum, and maximum fare paid within that subset.

The user should be able to enter a number of commands and each command should add a specific filter based on the flight's class, origin, or destination properties. Once the user has entered the required criteria, the go command must be entered, and the console should run a query and output the results.

The data file you will use for this activity contains details of flights made by the UK's HM Treasury department between January 1 to December 31, 2011 (there are 714 records.) You will need to use WebClient.DownloadFile to download the data from the following URL: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/245855/HMT_-_2011_Air_Data.csv

Note

The website might open differently for Internet Explorer or Google Chrome. This depends on how IE or Chrome are configured on your machine. Using WebClient.DownloadFile, you can download the data as suggested.

Ideally, the program should download data once and then reread it from the local filesystem each time it is started.

Figure 4.6: Preview of HM Treasury traffic data in Excel

Once downloaded, the data should then be read into a suitable record structure before being added to a collection, which allows various queries to be applied. The output should show the following aggregate values for all rows that match the user's criteria:

Record count
Average fare
Minimum fare
Maximum fare

The user should be able to enter the following console commands:

Class c: Adds a class filter, where c is a flight class to search for, such as economy or Business class.
Origin o: Adds an origin filter, where o is the flight origin, such as dublin, london, or basel.
Destination d: Adds a destination filter, where d is the flight destination, such as delhi.
Clear: Clears all filters.
go: Applies the current filters.

If a user enters multiple filters of the same type, then these should be treated as an OR filter.

An enum can be used to identify the filter criteria type entered, as shown in the following line of code:

enum FilterCriteriaType {Class, Origin, Destination}

Similarly, a record can be used to store each filter type and comparison operand, as follows:

record FilterCriteria(FilterCriteriaType Filter, string Operand)

Each filter specified should be added to a List<FilterCriteria> instance. For example, if the user enters two origin filters, one for dublin and another for london, then the list should contain two objects, each representing an origin type filter.

When the user enters the go command, a query should be built that performs the following steps:

Extracts all class filter values into a list of strings (List<string>).
Extracts all origin filter values into List<string>.
Extracts all destination filter values into List<string>.
Uses a where extension method to filter the fight records for each criteria type specified using the List<string>. It contains a method to perform a case-insensitive search.

The following steps will help you complete this activity:

Create a new folder called Activities in the Chapter04 folder.
Add a new folder called Activity01 to that new folder.
Add a new class file called Flight.cs. This will be a Record class with fields that match those in the flight data. A Record class should be used as it offers a simple type purely to hold data rather than any form of behavior.
Add a new class file called FlightLoader.cs. This class will be used for downloading or importing data. FlightLoader should include a list of the field index positions within the data file, to be used when reading each line of data and splitting the contents into a string array, for example:
public const int Agency = 0;
public const int PaidFare = 1;
Now for the FlightLoader implementation, use a static class to define the index of known field positions in the data file. This will make it easier to handle any future changes in the layout of the data.
Next, a Download method should be passed a URL and destination file. Use WebClient.DownloadFile to download the data file and then defer to Import to process the downloaded file.
An Import method is to be added. This is passed the name of the local file to import (downloaded using the Import method) and will return a list of Flight records.
Add a class file called FilterCriteria.cs. This should contain a FilterCriteriaType enum definition. You will offer filters based on the flight's class, origin, and destination properties, so FilterCriteriaType should represent each of these.
Now, for the main filtering class, add a new class file called FlightQuery.cs. The constructor will be passed a FlightLoader instance. Within it, create a list named _flights to contain the data imported via FlightLoader. Create a List<FilterCriteria> instance named _filters that represent each of the criteria items that are added, each time the user specifies a new filter condition.
The Import and Download methods of FlightLoader should be called by the console at startup, allowing previously downloaded data to be processed, via the _loader instance.
Create a Count variable that returns the number of flight records that have been imported.
When the user specifies a filter to add, the console will call AddFilter, passing an enum to define the criteria type and the string value being filtered for.
RunQuery is the main method that returns those flights that match the user's criteria. You need to use the built-in StringComparer.InvariantCultureIgnoreCase comparer to ensure string comparison ignores any case differences. You define a query variable that calls Select on the flights; at the moment, this would result in a filtered result set.
Each of the types of filter available is string-based, so you need to extract all the string items. If there are any items to filter, you add an extra Where call to the query for each type (Class, Destination, or Origin). Each Where clause uses a Contains predicate, which examines the associated property.
Next, add the two helper methods used by RunQuery. GetFiltersByType is passed each of the FilterCriteriaType enums that represent a known type of criteria type and finds any of these in the list of filters using the .Where method. For example, if the user added two Destination criteria such as India and Germany, this would result in the two strings India and Germany being returned.
FormatFilters simply joins a list of filterValues strings into a user-friendly string with the word OR between each item, such as London OR Dublin.
Now create the main console app. Add a new class called Program.cs, which will allow the user to input requests and process their commands.
Hardcode the download URL and destination filename.
Create the main FlightQuery class, passing in a FlightLoader instance. If the app has been run before, you can Import the local flight data, or use Download if not.
Show a summary of the records imported and the available commands.
When the user enters a command, there might also be an argument, such as destination united kingdom, where destination is the command and united kingdom is the argument. To determine this, use the IndexOf method to find the location of the first space character in the input, if any.
For the go command, call RunQuery and use various aggregation operators on the results returned.
For the remaining commands, clear or add filters as requested. If the Clear command is specified, call the query's ClearFilters method, which will clear the list of criteria items.
If a class filter command is specified, call AddFilter specifying the FilterCriteriaType.Class enum and the string Argument.
The same pattern should be used for Origin and Destination commands. Call AddFilter, passing in the required enum value and the argument.

The console output should be similar to the following, here listing the commands available to the user:

Commands: go | clear | class value | origin value | destination value

The user should be able to add two class filters, for economy or Business Class (all string comparisons should be case-insensitive), as shown in the following snippet:
Enter a command:class economy
Added filter: Class=economy
Enter a command:class Business Class
Added filter: Class=business class
Similarly, the user should be able to add an origin filter as follows (this example is for london):
Enter a command:origin london
Added filter: Origin=london
Adding the destination filter should look like this (this example is for zurich):
Enter a command:destination zurich
Added filter: Destination=zurich
Entering go should show a summary of all filters specified, followed by the results for flights that match the filters:
Enter a command:go
Classes: economy OR business class
Destinations: zurich
Origins: london
Results: Count=16, Avg=266.92, Min=-74.71, Max=443.49
Note
The solution to this activity can be found at https://packt.link/qclbF.

Summary

In this chapter, you saw how the IEnumerable and ICollection interfaces form the basis of .NET data structures, and how they can be used to store multiple items. You created different types of collections depending on how each collection is meant to be used. You learned that the List collection is most extensively used to store collections of items, particularly if the number of elements is not known at compile time. You saw that the Stack and Queue types allow the order of items to be handled in a controlled manner, and how the HashSet offers set-based processing, while the Dictionary stores unique values using a key identifier.

You then further explored data structures by using LINQ Query Expressions and Query Operators to apply queries to data, showing how queries can be altered at runtime depending on filtering requirements. You sorted and partitioned data and saw how similar operations can be achieved using both Query Operators and Query Expressions, each offering a preference and flexibility based on context.

In the next chapter, you will see how parallel and asynchronous code can be used to run complex or long-running operations together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Data Structures and LINQ

Create new playlist

Sign In

Sign Up

4. Data Structures and LINQ

Introduction

Data Structures

Lists

Exercise 4.01: Maintaining Order within a List

Queues

Stacks

HashSets

Dictionaries

Exercise 4.02: Using a Dictionary to Count the Words in a Sentence

LINQ

Query Operators

Query Expressions

Deferred Execution

Standard Query Operators

Projection Operations

Select

Anonymous Types

SelectMany

Filtering Operations

Sorting Operations

OrderBy and OrderByDescending

ThenBy and ThenByDescending

Exercise 4.03: Filtering a List of Countries by Continent and Sorting by Area

Partitioning Operations

Grouping Operations

Exercise 4.04: Finding the Most Commonly Used Words in a Book

Aggregation Operations

Quantifier Operations

Join Operations

Using a let Clause in Query Expressions

Activity 4.01: Treasury Flight Data Analysis

Summary

Table of Contents for
4. Data Structures and LINQ