Overview
In this chapter, you will learn about the main collections and their primary usage in C#. You will then see how Language-Integrated Query (LINQ) can be used to query collections in memory using code that is efficient and succinct. By the end of this chapter, you will be well versed in using LINQ for operations such as sorting, filtering, and aggregating data.
Throughout the previous chapters, you have used variables that refer to a single value, such as the string and double system types, system class instances, and your own class instances. .NET has a variety of data structures that can be used to store multiple values. These structures are generally referred to as collections. This chapter builds on this concept by introducing collection types from the System.Collections.Generic namespace.
You can create variables that can store multiple object references using collection types. Such collections include lists that resize to accommodate the number of elements and dictionaries that offer access to the elements using a unique key as an identifier. For example, you may need to store a list of international dialing codes using the codes as unique identifiers. In this case, you need to be certain that the same dialing code is not added to the collection twice.
These collections are instantiated like any other classes and are used extensively in most applications. Choosing the correct type of collection depends primarily on how you intend to add items and the way you would like to access such items once they are in a collection. The commonly used collection types include List, Set, and HashSet, which you will cover in detail shortly.
LINQ is a technology that offers an expressive and concise syntax for querying objects. Much of the complexities around filtering, sorting, and grouping objects can be removed using the SQL-like language, or if you prefer, a set of extension methods that can be chained together to produce collections that can be enumerated with ease.
.NET provides various types of in-built data structures, such as the Array, List, and Dictionary types. At the heart of all data structures are the IEnumerable and ICollection interfaces. Classes that implement these interfaces offer a way to enumerate through the individual elements and to manipulate their items. There is rarely a need to create your own classes that derive directly from these interfaces, as all the required functionality is covered by the built-in collection types, but it is worth knowing the key properties as they are heavily used throughout .NET.
The generic version of each collection type requires a single type parameter, which defines the type of elements that can be added to a collection, using the standard <T> syntax of the generic types.
The IEnumerable interface has a single property, that is, IEnumerator<T> GetEnumerator(). This property returns a type that provides methods that allow the caller to iterate through the elements in the collection. You do not need to call the GetEnumerator() method directly, as the compiler will call it whenever you use a foreach statement, such as foreach(var book in books). You will learn more about using this in the upcoming sections.
The ICollection interface has the following properties:
IEnumerable and ICollection are interfaces that all collections implement:
There are further interfaces that some collections implement, depending on how elements are accessed within a collection.
The IList interface is used for collections that can be accessed by index position, starting from zero. So, for a list that contains two items, Red and Blue, the element at index zero is Red and the element at index one is Blue.
The IList interface has the following properties:
You have now seen the primary interfaces common to collections. So, now you will now take a look at the main collection types that are available and how they are used.
The List<T> type is one of the most extensively used collections in C#. It is used where you have a collection of items and want to control the order of items using their index position. It implements the IList interface, which allows items to be inserted, accessed, or removed using an index position:
Lists have the following behavior:
One example of a list might be the tabs in a web browser application. Typically, a user may want to drag a browser tab amongst other tabs, open new tabs at the end, or close tabs anywhere in a list of tabs. The code to control these actions can be implemented using List.
Internally, List maintains an array to store its objects. This can be efficient when adding items to the end, but it may be inefficient when inserting items, particularly near the beginning of the list, as the index position of items will need to be recalculated.
The following example shows how the generic List class is used. The code uses the List<string> type parameter, which allows string types to be added to the list. Attempts to add any other type will result in a compiler error. This will show the various commonly used methods of the List class.
sourceChapter04>dotnet new console -o Chapter04
The template "Console Application" was created successfully.
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
class ListExamples
{
public static void Main()
{
var colors = new List<string> {"red", "green"};
colors.Add("orange");
The code declares the new colors variable, which can store multiple color names as strings. Here, the collection initialization syntax is used so that red and green are added as part of the initialization of the variable. The Add method is called, adding orange to the list.
colors.AddRange(new [] {"yellow", "pink"});
Console.WriteLine($"Colors has {colors.Count} items");
Console.WriteLine($"Item at index 1 is {colors[1]}");
Running the code produces the following output:
Colors has 5 items
Item at index 1 is green
Console.WriteLine("Inserting blue at 0");
colors.Insert(0, "blue");
Console.WriteLine($"Item at index 1 is now {colors[1]}");
You should see the following output on running this code:
Inserting blue at 0
Item at index 1 is now red
Console.WriteLine("foreach");
foreach (var color in colors)
Console.Write($"{color}|");
Console.WriteLine();
You should get the following output:
foreach
blue|red|green|orange|yellow|pink|
Console.WriteLine("ForEach Action:");
colors.ForEach(color =>
{
var characters = color.ToCharArray();
Array.Reverse(characters);
var reversed = new string(characters);
Console.Write($"{reversed}|");
});
Console.WriteLine();
This does not affect any of the values in the colors List, as characters refers to a different object. Note that foreach iterates through each string, whereas ForEach defines an Action delegate to be invoked using each string (recall that in Chapter 3, Delegates, Events, and Lambdas, you saw how lambda statements can be used to create Action delegates).
ForEach Action:
eulb|der|neerg|egnaro|wolley|knip|
var backupColors = new List<string>(colors);
backupColors.Sort();
The string type uses value-type semantics, which means that the backupColors list is populated with a copy of each source string value. Updating a string in one list will not affect the other list. Conversely, classes are defined as reference-types so passing a list of class instances to the constructor will still create a new list, with independent element indexes, but each element will point to the same shared reference in memory rather than an independent copy.
Console.WriteLine("Foreach before clearing:");
foreach (var color in colors)
Console.Write($"{color}|");
Console.WriteLine();
colors.Clear();
Console.WriteLine($"Colors has {colors.Count} items");
Running the code produces this output:
Foreach before clearing:
blue|red|green|orange|yellow|pink|
Colors has 0 items
colors.AddRange(backupColors);
Console.WriteLine("foreach after addrange (sorted items):");
foreach (var color in colors)
Console.Write($"{color}|");
Console.WriteLine();
You should see the following output:
foreach after addrange (sorted items):
blue|green|orange|pink|red|yellow|
var indexes = colors.ConvertAll(color => $"{color} is at index {colors.IndexOf(color)}");
Console.WriteLine("ConvertAll:");
Console.WriteLine(string.Join(Environment.NewLine, indexes));
Here, a new List<string> is returned with each item being formatted using its value and the item's index in the list. As expected, running the code produces this output:
ConvertAll:
blue is at index 0
green is at index 1
orange is at index 2
pink is at index 3
red is at index 4
yellow is at index 5
Console.WriteLine($"Contains RED: {colors.Contains("RED")}");
Console.WriteLine($"Contains red: {colors.Contains("red")}");
Note that the uppercase RED is not in the list, but the lowercase red will be. Running the code produces this output:
Contains RED: False
Contains red: True
var existsInk = colors.Exists(color => color.EndsWith("ink"));
Console.WriteLine($"Exists *ink: {existsInk}");
Here, the Exists method is passed a Predicate delegate, which returns True or False if the test condition is met. Predicate is an inbuilt delegate, which returns a boolean value. In this case, True will be returned if any item exists where the string value ends with the letters ink (pink, for example).
You should see the following output:
Exists *ink: True
Console.WriteLine("Inserting reds");
colors.InsertRange(0, new [] {"red", "red"});
foreach (var color in colors)
Console.Write($"{color}|");
Console.WriteLine();
You will get the following output:
Inserting reds
red|red|blue|green|orange|pink|red|yellow|
This shows that it is possible to insert the same item more than once into a list.
var allReds = colors.FindAll(color => color == "red");
Console.WriteLine($"Found {allReds.Count} red");
You should get an output as follows. As expected, there are three red items returned:
Found 3 red
colors.Remove("red");
var lastRedIndex = colors.FindLastIndex(color => color == "red");
Console.WriteLine($"Last red found at index {lastRedIndex}");
Console.ReadLine();
}
}
}
Running the code produces this output:
Last red found at index 5
Note
You can find the code used for this example at https://packt.link/dLbK6.
With the knowledge of how the generic List class is used, it is time for you to work on an exercise.
At the beginning of the chapter, web browser tabs were described as an ideal example of lists. In this exercise, you will put this idea into action, and create a class that controls the navigation of the tabs within an app that mimics a web browser.
For this, you will create a Tab class and a TabController app that allows new tabs to be opened and existing tabs to be closed or moved. The following steps will help you complete this exercise:
using System;
using System.Collections;
using System.Collections.Generic;
namespace Chapter04.Exercises.Exercise01
{
public class Tab
{
public Tab()
{}
public Tab(string url) => (Url) = (url);
public string Url { get; set; }
public override string ToString() => Url;
}
Here, the ToString method has been overridden to return the current URL to help when logging details to the console.
public class TabController : IEnumerable<Tab>
{
private readonly List<Tab> _tabs = new();
The TabController class contains a List of tabs. Notice how the class inherits from the IEnumerable interface. This interface is used so that the class provides a way to iterate through its items, using a foreach statement. You will provide methods to open, move, and close tabs, which will directly control the order of items in the _tabs list, in the next steps. Note that you could have exposed the _tabs list directly to callers, but it would be preferable to limit access to the tabs through your own methods. Hence, it is defined as a readonly list.
public Tab OpenNew(string url)
{
var tab = new Tab(url);
_tabs.Add(tab);
Console.WriteLine($"OpenNew {tab}");
return tab;
}
public void Close(Tab tab)
{
if (_tabs.Remove(tab))
{
Console.WriteLine($"Removed {tab}");
}
}
public void MoveToStart(Tab tab)
{
if (_tabs.Remove(tab))
{
_tabs.Insert(0, tab);
Console.WriteLine($"Moved {tab} to start");
}
Here, MoveToStart will try to remove the tab and then insert it at index 0.
public void MoveToEnd(Tab tab)
{
if (_tabs.Remove(tab))
{
_tabs.Add(tab);
Console.WriteLine($"Moved {tab} to end. Index={_tabs.IndexOf(tab)}");
}
}
Here, calling MoveToEnd removes the tab first, and then adds it to the end, logging the new index position to the console.
Finally, the IEnumerable interface requires that you implement two methods, IEnumerator<Tab> GetEnumerator() and IEnumerable.GetEnumerator(). These allow the caller to iterate through a collection using either a generic of type Tab or using the second method to iterate via an object-based type. The second method is a throwback to earlier versions of C# but is needed for compatibility.
public IEnumerator<Tab> GetEnumerator() => _tabs.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => _tabs.GetEnumerator();
}
static class Program
{
public static void Main()
{
var controller = new TabController();
Console.WriteLine("Opening tabs...");
var packt = controller.OpenNew("packtpub.com");
var msoft = controller.OpenNew("microsoft.com");
var amazon = controller.OpenNew("amazon.com");
controller.LogTabs();
Console.WriteLine("Moving...");
controller.MoveToStart(amazon);
controller.MoveToEnd(packt);
controller.LogTabs();
Console.WriteLine("Closing tab...");
controller.Close(msoft);
controller.LogTabs();
Console.ReadLine();
}
private static void LogTabs(this IEnumerable<Tab> tabs)
{
Console.Write("TABS: |");
foreach(var tab in tabs)
Console.Write($"{tab.Url.PadRight(15)}|");
Console.WriteLine();
}
}
}
Opening tabs...
OpenNew packtpub.com
OpenNew microsoft.com
OpenNew amazon.com
TABS: |packtpub.com |microsoft.com |amazon.com |
Moving...
Moved amazon.com to start
Moved packtpub.com to end. Index=2
TABS: |amazon.com |microsoft.com |packtpub.com |
Closing tab...
Removed microsoft.com
TABS: |amazon.com |packtpub.com |
Note
Sometimes Visual Studio might report a non-nullable property error the first time you execute the program. This is a helpful reminder that you are attempting to use a string value that may have a null value at runtime.
The three tabs are opened. amazon.com and packtpub.com are then moved before microsoft.com is finally closed and removed from the tab list.
Note
You can find the code used for this exercise at https://packt.link/iUcIs.
In this exercise, you have seen how lists can be used to store multiple items of the same type while maintaining the order of items. The next section covers the Queue and Stack classes, which allow items to be added and removed in a predefined sequence.
The Queue class provides a first-in, first-out mechanism. Items are added to the end of the queue using the Enqueue method and are removed from the front of the queue using the Dequeue method. Items in the queue cannot be accessed via an index element.
Queues are typically used when you need a workflow that ensures items are processed in the order in which they are added to the queue. A typical example might be a busy online ticketing system selling a limited number of concert tickets to customers. To ensure fairness, customers are added to a queuing system as soon as they log on. The system would then dequeue each customer and process each order, in full, either until all tickets have been sold or the customer queue is empty.
The following example creates a queue containing five CustomerOrder records. When it is time to process the orders, each order is dequeued using the TryDequeue method, which will return true until all orders have been processed. The customer orders are processed in the order that they were added. If the number of tickets requested is more than or equal to the tickets remaining, then the customer is shown a success message. An apology message is shown if the number of tickets remaining is less than the requested amount.
Perform the following steps to complete this example:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
class QueueExamples
{
record CustomerOrder (string Name, int TicketsRequested)
{}
public static void Main()
{
var ticketsAvailable = 10;
var customers = new Queue<CustomerOrder>();
customers.Enqueue(new CustomerOrder("Dave", 2));
customers.Enqueue(new CustomerOrder("Siva", 4));
customers.Enqueue(new CustomerOrder("Julien", 3));
customers.Enqueue(new CustomerOrder("Kane", 2));
customers.Enqueue(new CustomerOrder("Ann", 1));
// Start processing orders...
while(customers.TryDequeue(out CustomerOrder nextOrder))
{
if (nextOrder.TicketsRequested <= ticketsAvailable)
{
ticketsAvailable -= nextOrder.TicketsRequested;
Console.WriteLine($"Congratulations {nextOrder.Name}, you've purchased {nextOrder.TicketsRequested} ticket(s)");
}
else
{
Console.WriteLine($"Sorry {nextOrder.Name}, cannot fulfil {nextOrder.TicketsRequested} ticket(s)");
}
}
Console.WriteLine($"Finished. Available={ticketsAvailable}");
Console.ReadLine();
}
}
}
Congratulations Dave, you've purchased 2 ticket(s)
Congratulations Siva, you've purchased 4 ticket(s)
Congratulations Julien, you've purchased 3 ticket(s)
Sorry Kane, cannot fulfil 2 ticket(s)
Congratulations Ann, you've purchased 1 ticket(s)
Finished. Available=0
Note
The first time you run this program, Visual Studio might show a non-nullable type error. This error is a reminder that you are using a variable that could be a null value.
The output shows that Dave requested two tickets. As there are two or more tickets available, he was successful. Both Siva and Julien were also successful, but by the time Kane placed his order of two tickets, there was only one ticket available, so he was shown the apology message. Finally, Ann requested one ticket and was successful in her order.
Note
You can find the code used for this example at https://packt.link/Zb524.
The Stack class provides the opposite mechanism to the Queue class; items are processed in last-in, first-out order. As with the Queue class, you cannot access elements via their index position. Items are added to the stack using the Push method and removed using the Pop method.
An application's Undo menu can be implemented using a stack. For example, in a word processor, as the user edits a document, an Action delegate is created, which can reverse the most recent change whenever the user presses Ctrl + Z. The most recent action is popped off the stack and the change is undone. This allows multiple steps to be undone.
The following example shows this in practice.
You will start by creating an UndoStack class that supports multiple undo operations. The caller decides what action should run each time the Undo request is called.
A typical undoable operation would be storing a copy of text prior to the user adding a word. Another undoable operation would be storing a copy of the current font prior to a new font being applied. You can start by adding the following code, where you are creating the UndoStack class and defining a readonly Stack of Action delegates, named _undoStack:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
class UndoStack
{
private readonly Stack<Action> _undoStack = new Stack<Action>();
public void Do(Action action)
{
_undoStack.Push(action);
}
public void Undo()
{
if (_undoStack.Count > 0)
{
var undo = _undoStack.Pop();
undo?.Invoke();
}
}
}
class TextEditor
{
private readonly UndoStack _undoStack;
public TextEditor(UndoStack undoStack)
{
_undoStack = undoStack;
}
public string Text {get; private set; }
public void EditText(string newText)
{
var previousText = Text;
_undoStack.Do( () =>
{
Text = previousText;
Console.Write($"Undo:'{newText}'".PadRight(40));
Console.WriteLine($"Text='{Text}'");
});
Text += newText;
Console.Write($"Edit:'{newText}'".PadRight(40));
Console.WriteLine($"Text='{Text}'");
}
}
class StackExamples
{
public static void Main()
{
var undoStack = new UndoStack();
var editor = new TextEditor(undoStack);
editor.EditText("One day, ");
editor.EditText("in a ");
editor.EditText("city ");
editor.EditText("near by ");
undoStack.Undo(); // remove 'near by'
undoStack.Undo(); // remove 'city'
editor.EditText("land ");
editor.EditText("far far away ");
Console.ReadLine();
}
}
}
Edit:'One day, ' Text='One day, '
Edit:'in a ' Text='One day, in a '
Edit:'city ' Text='One day, in a city '
Edit:'near by ' Text='One day, in a city near by '
Undo:'near by ' Text='One day, in a city '
Undo:'city ' Text='One day, in a '
Edit:'land ' Text='One day, in a land '
Edit:'far far away ' Text='One day, in a land far far away '
Note
Visual Studio may show non-nullable property error the first time the code is executed. This is because Visual Studio notices that the Text property can be a null value at runtime so offers a suggestion to improve the code.
The left-hand output shows the text edits and undoes operations as they are applied and the resulting Text value on the right-hand side. The two Undo calls result in near by and city being removed from the Text value, before land and far far away are finally added to the Text value.
Note
You can find the code used for this example at https://packt.link/tLVyf.
The HashSet class provides mathematical set operations with collections of objects in an efficient and highly performant manner. HashSet does not allow duplicate elements and items are not stored in any particular order. Using the HashSet class is ideal for high-performance operations, such as needing to quickly find where two collections of objects overlap.
Typically, HashSet is used with the following operations:
HashSet is useful when you need to include or exclude certain elements from collections. As an example, consider that an agent manages various celebrities and has been asked to find three sets of stars:
In the following snippet, a list of actors' and singers' names is created:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
class HashSetExamples
{
public static void Main()
{
var actors = new List<string> {"Harrison Ford", "Will Smith",
"Sigourney Weaver"};
var singers = new List<string> {"Will Smith", "Adele"};
var actingOrSinging = new HashSet<string>(singers);
actingOrSinging.UnionWith(actors);
Console.WriteLine($"Acting or Singing: {string.Join(", ",
actingOrSinging)}");
var actingAndSinging = new HashSet<string>(singers);
actingAndSinging.IntersectWith(actors);
Console.WriteLine($"Acting and Singing: {string.Join(", ",
actingAndSinging)}");
var actingOnly = new HashSet<string>(actors);
actingOnly.ExceptWith(singers);
Console.WriteLine($"Acting Only: {string.Join(", ", actingOnly)}");
Console.ReadLine();
}
}
}
Acting or Singing: Will Smith, Adele, Harrison Ford, Sigourney Weaver
Acting and Singing: Will Smith
Acting Only: Harrison Ford, Sigourney Weaver
From the output, you can see that out of the given list of actors and singers, only Will Smith can act and sing.
Note
You can find the code used for this example at https://packt.link/ZdNbS.
Another commonly used collection type is the generic Dictionary<TK, TV>. This allows multiple items to be added, but a unique key is needed to identify an item instance.
Dictionaries are commonly used to look up values using known keys. The key and value type parameters can be of any type. A value can exist in a Dictionary more than once, provided that its key is unique. Attempting to add a key that already exists will result in a runtime exception being thrown.
A common example of a Dictionary might be a registry of known countries that are keyed by their ISO country code. A customer service application may load customer details from a database and then use the ISO code to look up the customer's country from the country list, rather than having the extra overhead of creating a new country instance for each customer.
Note
You can find more information on standard ISO country codes at https://www.iso.org/iso-3166-country-codes.html.
The main methods used in the Dictionary class are as follows:
The following code shows how a Dictionary can be used to add and navigate Country records:
using System;
using System.Collections.Generic;
namespace Chapter04.Examples
{
public record Country(string Name)
{}
class DictionaryExamples
{
public static void Main()
{
var countries = new Dictionary<string, Country>
{
{"AFG", new Country("Afghanistan")},
{"ALB", new Country("Albania")},
{"DZA", new Country("Algeria")},
{"ASM", new Country("American Samoa")},
{"AND", new Country("Andorra")}
};
Console.WriteLine("Enumerate foreach KeyValuePair");
foreach (var kvp in countries)
{
Console.WriteLine($" {kvp.Key} = {kvp.Value.Name}");
}
Enumerate foreach KeyValuePair
AFG = Afghanistan
ALB = Albania
DZA = Algeria
ASM = American Samoa
AND = Andorra
Console.WriteLine("set indexor AFG to new value");
countries["AFG"] = new Country("AFGHANISTAN");
Console.WriteLine($"get indexor AFG: {countries["AFG"].Name}");
set indexor AFG to new value
get indexor AFG: AFGHANISTAN
ContainsKey AGO: False
ContainsKey and: False
Console.WriteLine($"ContainsKey {"AGO"}: {countries.ContainsKey("AGO")}");
Console.WriteLine($"ContainsKey {"and"}: {countries.ContainsKey("and")}"); // Case sensitive
var anguilla = new Country("Anguilla");
Console.WriteLine($"Add {anguilla}...");
countries.Add("AIA", anguilla);
try
{
var anguillaCopy = new Country("Anguilla");
Console.WriteLine($"Adding {anguillaCopy}...");
countries.Add("AIA", anguillaCopy);
}
catch (Exception e)
{
Console.WriteLine($"Caught {e.Message}");
}
var addedAIA = countries.TryAdd("AIA", new Country("Anguilla"));
Console.WriteLine($"TryAdd AIA: {addedAIA}");
Add Country { Name = Anguilla }...
Adding Country { Name = Anguilla }...
Caught An item with the same key has already been added. Key: AIA
TryAdd AIA: False
var tryGet = countries.TryGetValue("ALB", out Country albania1);
Console.WriteLine($"TryGetValue for ALB: {albania1} Result={tryGet}");
countries.TryGetValue("alb", out Country albania2);
Console.WriteLine($"TryGetValue for ALB: {albania2}");
}
}
}
TryGetValue for ALB: Country { Name = Albania } Result=True
TryGetValue for ALB:
Note
Visual Studio might report the following warning: Warning CS8600: Converting null literal or possible null value to non-nullable type. This is a reminder from Visual Studio that a variable may have a null value at runtime.
You have seen how the Dictionary class is used to ensure that only unique identities are associated with values. Even if you do not know which keys are in the Dictionary until runtime, you can use the TryGetValue and TryAdd methods to prevent runtime exceptions.
Note
You can find the code used for this example at https://packt.link/vzHUb.
In this example, a string key was used for the Dictionary. However, any type can be used as a key. You will often find that an integer value is used as a key when source data is retrieved from relational databases, as integers can often be more efficient in memory than strings. Now it is time to use this feature through an exercise.
You have been asked to create a console app that asks the user to enter a sentence. The console should then split the input into individual words (using a space character as a word delimiter) and count the number of times that each word occurs. If possible, simple forms of punctuation should be removed from the output, and you are to ignore capitalized words so that, for example, Apple and apple both appear as a single word.
This is an ideal use of a Dictionary. The Dictionary will use a string as the key (a unique entry for each word) with an int value to count the words. You will use string.Split() to split a sentence into words, and char.IsPunctuation to remove any trailing punctuation marks.
Perform the following steps to do so:
using System;
using System.Collections.Generic;
namespace Chapter04.Exercises.Exercise02
{
static class WordCounter
{
public static IEnumerable<KeyValuePair<string, int>> Process( string phrase)
{
var wordCounts = new Dictionary<string, int>();
This is passed a phrase and returns IEnumerable<KeyValuePair>, which allows the caller to enumerate through a Dictionary of results. After this definition, the Dictionary of wordCounts is keyed using a string (each word found) and an int (the number of times that a word occurs).
var words = phrase.ToLower().Split(' ', StringSplitOptions.RemoveEmptyEntries);
foreach(var word in words)
{
var key = word;
if (char.IsPunctuation(key[key.Length-1]))
{
key = key.Remove(key.Length-1);
}
The char.IsPunctuation method is used to remove punctuation marks from the end of the word.
if (wordCounts.TryGetValue(key, out var count))
{
wordCounts[key] = count + 1;
}
else
{
wordCounts.Add(key, 1);
}
}
If the word does not exist, add a new word key with a starting value of 1.
return wordCounts;
}
}
class Program
{
public static void Main()
{
string input;
do
{
Console.Write("Enter a phrase:");
input = Console.ReadLine();
The do loop will end once the user enters an empty string; you will add the code for this in an upcoming step.
if (!string.IsNullOrEmpty(input))
{
var countsByWord = WordCounter.Process(input);
var i = 0;
foreach (var (key, value) in countsByWord)
{
Console.Write($"{key.PadLeft(20)}={value} ");
i++;
if (i % 3 == 0)
{
Console.WriteLine();
}
}
Console.WriteLine();
A new line is started after every third word (using i % 3 = 0) for improved output formatting.
}
} while (input != string.Empty);
}
}
}
Enter a phrase: Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure.
four=1 score=1 and=3
seven=1 years=1 ago=1
our=1 fathers=1 brought=1
forth=1 upon=1 this=1
continent=1 a=2 new=1
nation=3 conceived=2 in=2
liberty=1 dedicated=2 to=1
the=1 proposition=1 that=2
all=1 men=1 are=2
created=1 equal=1 now=1
we=1 engaged=1 great=1
civil=1 war=1 testing=1
whether=1 or=1 any=1
so=2 can=1 long=1
endure=1
Note
You can search online for The Gettysburg Address or visit https://rmc.library.cornell.edu/gettysburg/good_cause/transcript.htm.
From the results, you can see that each word is displayed only once and that certain words, such as and and that, appear more than once in the speech. The words are listed in the order they appear in the text, but this is not always the case with the Dictionary class. It should be assumed that the order will not remain fixed this way; dictionaries' values should be accessed using a key.
Note
You can find the code used for this exercise at https://packt.link/Dnw4a.
So far, you have learned about the main collections commonly used in .NET. It is now time to look at LINQ, which makes extensive use of collections based on the IEnumerable interface.
LINQ (pronounced link) is short for Language Integrated Query. LINQ is a general-purpose language that can be used to query objects in memory by using a syntax that is similar to Structured Query Language (SQL), that is, it is used to query databases. It is an enhancement of the C# language that makes it easier to interact with objects in memory using SQL-like Query Expressions or Query Operators (implemented through a series of extension methods).
Microsoft's original idea for LINQ was to bridge the gap between .NET code and data sources, such as relational databases and XML, using LINQ providers. LINQ providers form a set of building blocks that can be used to query various sources of data, using a similar set of Query Operators, without the caller needing to know the intricacies of how each data source works. The following is a list of providers and how they are used:
This chapter will cover LINQ to Objects. This is, by far, the most common use of LINQ providers and offers a flexible way to query collections in memory. In fact, when talking about LINQ, most people refer to LINQ to Objects, mainly due to its ubiquitous use throughout C# applications.
At the heart of LINQ is the way that collections can be converted, filtered, and aggregated into new forms using a concise and easy-to-use syntax. LINQ can be used in two interchangeable styles:
Each style offers a different syntax to achieve the same result, and which one you use often comes down to personal preference. Each style can be interwoven in code easily.
These are based on a series of core extension methods. The results from one method can be chained together into a programming style, which can often be easier to grasp than their expression-based counterparts.
The extension methods typically take an IEnumerable<T> or IQueryable<T> input source, such as a list, and allow a Func<T> predicate to be applied to that source. The source is generic-based, so Query Operators work with all types. It is just as easy to work with List<string> as it is with List<Customer>, for example.
In the following snippet, .Where, .OrderBy, and .Select are the extension methods being called:
books.Where(book => book.Price > 10)
.OrderBy(book => book.Price)
.Select(book => book.Name)
Here, you are taking the results from a .Where extension method to find all books with a unit price greater than 10, which is then sorted using the .OrderBy extension method. Finally, the name of each book is extracted using the .Select method. These methods could have been declared as single lines of code, but chaining in this way provides a more intuitive syntax. This will be covered in more detail in the upcoming sections.
Query Expressions are an enhancement of the C# language and resemble SQL syntax. The C# compiler compiles Query Expressions into a sequence of Query Operator extension method calls. Note that not all Query Operators are available with an equivalent Query Expression implementation.
Query Expressions have the following rules:
The following snippet is functionally equivalent to the Query Operator style defined in the previous section:
from book in books where book.Price > 10 orderby book.Price select book.Name
You will take a more in-depth look at both styles as you learn about the standard Query Operators shortly.
Whether you choose to use Query Operators, Query Expressions, or a mixture of the two, it is important to remember that for many operators, the query that you define is not executed when it is defined, but only when it is enumerated over. This means that it is not until a foreach statement or a ToList, ToArray, ToDictionary, ToLookup, or ToHashSet method is called that the actual query is executed.
This allows queries to be constructed elsewhere in code with additional criteria included, and then used or even reused with a different collection of data. Recall that in Chapter 3, Delegates, Lambdas, and Events, you saw similar behavior with delegates. Delegates are not executed where they are defined, but only when they are invoked.
In the following short Query Operator example, the output will be abz even though z is added after the query is defined but before it is enumerated through. This demonstrates that LINQ queries are evaluated on demand, rather than at the point where they are declared:
var letters = new List<string> { "a", "b"}
var query = letters.Select(w => w.ToUpper());
letters.Add("z");
foreach(var l in query)
Console.Write(l);
LINQ is driven by a core set of extension methods, referred to as standard Query Operators. These are grouped into operations based on their functionality. There are many standard Query Operators available, so for this introduction, you will explore all the main operators that you are likely to use regularly.
Projection operations allow you to convert an object into a new structure using only the properties that you need. You can create a new type, apply mathematical operations, or return the original object:
Consider the following snippet, which iterates through a List<string> containing the values Mon, Tues, and Wednes, outputting each with the word day appended.
In your Chapter04Examples folder, add a new file called LinqSelectExamples.cs and edit it as follows:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
class LinqSelectExamples
{
public static void Main()
{
var days = new List<string> { "Mon", "Tues", "Wednes" };
var query1 = days.Select(d => d + "day");
foreach(var day in query1)
Console.WriteLine($"Query1: {day}");
Looking at the Query Operator syntax first, you can see that query1 uses the Select extension method and defines a Func<T> like this:
d => d + "day"
When executed, the variable d is passed to the lambda statement, which appends the word day to each string in the days list: "Mon", "Tues", "Wednes". This returns a new IEnumerable<string> instance, with the original values inside the source variable, days, remaining unchanged.
You can now enumerate through the new IEnumerable instance using foreach, as follows:
var query2 = days.Select((d, i) => $"{i} : {d}day");
foreach (var day in query2)
Console.WriteLine($"Query2: {day}");
Note that the Select method has another overload that allows the index position in the source and value to be accessed, rather than just the value itself. Here, d (the string value) and i (its index) are passed, using the ( d , i ) => syntax and joined into a new string. The output will be displayed as 0 : Monday, 1 : Tuesday, and so on.
Before you continue looking at Select projections, it is worth noting that C# does not limit you to just creating new strings from existing strings. You can project into any type.
You can also create anonymous types, which are types created by the compiler from the properties that you name and specify. For example, consider the following example, which results in a new type being created that represents the results of the Select method:
var query3 = days.Select((d, i) => new
{
Index = i,
UpperCaseName = $"{d.ToUpper()}DAY"
});
foreach (var day in query3)
Console.WriteLine($"Query3: Index={day.Index}, UpperCaseDay={day.UpperCaseName}");
Here, query3 results in a new type that has an Index and UpperCaseName property; the values are assigned using Index = i and UpperCaseName = $"{d.ToUpper()}DAY".
These types are scoped to be available within your local method and can then be used in any local statements, such as in the previous foreach block. This saves you from having to create classes to temporarily store values from a Select method.
Running the code produces output in this format:
Index=0, UpperCaseDay=MONDAY
As an alternative, consider how the equivalent Query Expression looks. In the following example, you start with the from day in days expression. This assigns the name day to the string values in the days list. You then use select to project that to a new string, appending "day" to each.
This is functionally equivalent to the example in query1. The only difference is the code readability:
var query4 = from day in days
select day + "day";
foreach (var day in query4)
Console.WriteLine($"Query4: {day}");
The following example snippet mixes a Query Operator and Query Expressions. The select Query Expression cannot be used to select a value and index, so the Select extension method is used to create an anonymous type with a Name and Index property:
var query5 = from dayIndex in
days.Select( (d, i) => new {Name = d, Index = i})
select dayIndex;
foreach (var day in query5)
Console.WriteLine($"Query5: Index={day.Index} : {day.Name}");
Console.ReadLine();
}
}
}
Running the full example produces this output:
Query1: Monday
Query1: Tuesday
Query1: Wednesday
Query2: 0 : Monday
Query2: 1 : Tuesday
Query2: 2 : Wednesday
Query3: Index=0, UpperCaseDay=MONDAY
Query3: Index=1, UpperCaseDay=TUESDAY
Query3: Index=2, UpperCaseDay=WEDNESDAY
Query4: Monday
Query4: Tuesday
Query4: Wednesday
Query5: Index=0 : Mon
Query5: Index=1 : Tues
Query5: Index=2 : Wednes
Again, it largely comes down to personal choice as to which you prefer using. As queries become longer, one form may require less code than the other.
Note
You can find the code used for this example at https://packt.link/wKye0.
You have seen how Select can be used to project values from each item in a source collection. In the case of a source that has enumerable properties, the SelectMany extension method can extract the multiple items into a single list, which can then be optionally projected into a new form.
The following example creates two City records, each with multiple Station names, and uses SelectMany to extract all stations from both cities:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record City (string Name, IEnumerable<string> Stations);
class LinqSelectManyExamples
{
public static void Main()
{
var cities = new List<City>
{
new City("London", new[] {"Kings Cross KGX", "Liverpool Street LVS", "Euston EUS"}),
new City("Birmingham", new[] {"New Street NST"})
};
Console.WriteLine("All Stations: ");
foreach (var station in cities.SelectMany(city => city.Stations))
{
Console.WriteLine(station);
}
The Func parameter, which is passed to SelectMany, requires you to specify an enumerable property, in this case, the City class's Stations property, which contains a list of string names (see the highlighted code).
Notice how a shortcut is used here, by directly integrating the query into a foreach statement. You are not altering or reusing the query variable, so there is no benefit in defining it separately, as done earlier.
SelectMany extracts all the station names from all of the items in the List<City> variable. Starting with the City class at element 0, which has the name London, it will extract the three station names ("Kings Cross KGX", "Liverpool Street LVS", and "Euston EUS"). It will then move on to the second City element, named Birmingham, and extract the single station, named "New Street NST".
All Stations:
Kings Cross KGX
Liverpool Street LVS
Euston EUS
New Street NST
Console.Write("All Station Codes: ");
.SelectMany(city => city.Stations.Select(s => s[^3..]));
foreach (var station in stations)
{
Console.Write($"{station} ");
}
Console.WriteLine();
Console.ReadLine();
}
}
}
Rather than just returning each Station string, this example uses a nested Select method and a Range operator to extract the last three characters from the station name using s[^3..], where s is a string for each station name and ^3 indicates that the Range operator should extract a string that starts at the last three characters in the string.
All Station Codes: KGX LVS EUS NST
You can see the last three characters of each station name are shown in the output.
Note
You can find the code used for this example at https://packt.link/g8dXZ.
In the next section you will read about the filtering operations that filter a result as per a condition.
Filtering operations allow you to filter a result to return only those items that match a condition. For example, consider the following snippet, which contains a list of orders:
LinqWhereExamples.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record Order (string Product, int Quantity, double Price);
class LinqWhereExamples
{
public static void Main()
{
var orders = new List<Order>
{
new Order("Pen", 2, 1.99),
new Order("Pencil", 5, 1.50),
new Order("Note Pad", 1, 2.99),
You can find the complete code here: https://packt.link/ZJpb5.
Here, some order items are defined for various stationery products. Suppose you want to output all orders that have a quantity greater than five (this should output the Ruler and USB Memory Stick orders from the source).
Console.WriteLine("Orders with quantity over 5:");
foreach (var order in orders.Where(o => o.Quantity > 5))
{
Console.WriteLine(order);
}
Console.WriteLine("Pens or Pencils:");
foreach (var orderValue in orders
.Where(o => o.Product == "Pen" || o.Product == "Pencil")
.Select( o => o.Quantity * o.Price))
{
Console.WriteLine(orderValue);
}
var query = from order in orders
where order.Price <= 3.99
select new {Name=order.Product, Value=order.Quantity*order.Price};
Console.WriteLine("Cheapest Orders:");
foreach(var order in query)
{
Console.WriteLine($"{order.Name}: {order.Value}");
}
}
}
}
Orders with quantity over 5:
Order { Product = Ruler, Quantity = 10, Price = 0.5 }
Order { Product = USB Memory Stick, Quantity = 6, Price = 20 }
Pens or Pencils:
3.98
7.5
Cheapest Orders:
Pen: 3.98
Pencil: 7.5
Note Pad: 2.99
Stapler: 3.99
Ruler: 5
Now you have seen Query Operators in action, it is worth returning to deferred execution to see how this affects a query that is enumerated multiple times over.
In this next example, you have a collection of journeys made by a vehicle, which are populated via a TravelLog record. The TravelLog class contains an AverageSpeed method that logs a console message each time it is executed, and, as the name suggests, returns the average speed of the vehicle during that journey:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record TravelLog (string Name, int Distance, int Duration)
{
public double AverageSpeed()
{
Console.WriteLine($"AverageSpeed() called for '{Name}'");
return Distance / Duration;
}
}
class LinqMultipleEnumerationExample
{
public static void Main()
{
var travelLogs = new List<TravelLog>
{
new TravelLog("London to Brighton", 50, 4),
new TravelLog("Newcastle to London", 300, 24),
new TravelLog("New York to Florida", 1146, 19),
new TravelLog("Paris to Berlin", 546, 10)
};
var fastestJourneys = travelLogs.Where(tl => tl.AverageSpeed() > 50);
Console.WriteLine("Fastest Distances:");
foreach (var item in fastestJourneys)
{
Console.WriteLine($"{item.Name}: {item.Distance} miles");
}
Console.WriteLine();
Fastest Distances:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
New York to Florida: 1146 miles
AverageSpeed() called for 'Paris to Berlin'
Paris to Berlin: 546 miles
Console.WriteLine("Fastest Duration:");
foreach (var item in fastestJourneys)
{
Console.WriteLine($"{item.Name}: {item.Duration} hours");
}
Console.WriteLine();
Fastest Duration:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
New York to Florida: 19 hours
AverageSpeed() called for 'Paris to Berlin'
Paris to Berlin: 10 hours
This shows that whenever a query is enumerated, the full query is re-evaluated every time. This might not be a problem for a fast method such as AverageSpeed, but what if a method needs to access a database to extract some data? That would result in multiple database calls and, possibly, a very slow application.
Console.WriteLine("Fastest Duration Multiple loops:");
var fastestJourneysList = travelLogs
.Where(tl => tl.AverageSpeed() > 50)
.ToList();
for (var i = 0; i < 2; i++)
{
Console.WriteLine($"Fastest Duration Multiple loop iteration {i+1}:");
foreach (var item in fastestJourneysList)
{
Console.WriteLine($"{item.Name}: {item.Distance} in {item.Duration} hours");
}
}
}
}
}
Fastest Duration Multiple loops:
AverageSpeed() called for 'London to Brighton'
AverageSpeed() called for 'Newcastle to London'
AverageSpeed() called for 'New York to Florida'
AverageSpeed() called for 'Paris to Berlin'
Fastest Duration Multiple loop iteration 1:
New York to Florida: 1146 in 19 hours
Paris to Berlin: 546 in 10 hours
Fastest Duration Multiple loop iteration 2:
New York to Florida: 1146 in 19 hours
Paris to Berlin: 546 in 10 hours
Notice that from the collection of journeys made by a vehicle, the code returns the average speed of the vehicle during the journeys.
Note
You can find the code used for this example at https://packt.link/CIZJE.
There are five operations to sort items in a source. Items are primarily sorted and that can be followed by an optional secondary sort, which sorts the items within their primary group. For example, you can use a primary sort to sort a list of people firstly by the City property and then use a secondary sort to further sort them by the Surname property:
In this example, you will use the System.IO namespace to query files in the host machine's temp folder, rather than creating small objects from lists.
The static Directory class offers methods that can query the filesystem. FileInfo retrieves details about a specific file, such as its size or creation date. The Path.GetTempPath method returns the system's temp folder. To illustrate the point, in the Windows operating system, this can typically be found at C:UsersusernameAppDataLocalTemp, where username is a specific Windows login name. This will be different for other users and other systems:
using System;
using System.IO;
using System.Linq;
namespace Chapter04.Examples
{
class LinqOrderByExamples
{
public static void Main()
{
var fileInfos = Directory.EnumerateFiles(Path.GetTempPath(), "*.tmp")
.Select(filename => new FileInfo(filename))
.ToList();
Here, each filename is projected into a FileInfo instance and chained into a populated collection using ToList, which allows you to further query the resulting fileInfos details.
Console.WriteLine("Earliest Files");
foreach (var fileInfo in fileInfos.OrderBy(fi => fi.CreationTime))
{
Console.WriteLine($"{fileInfo.CreationTime:dd MMM yy}: {fileInfo.Name}");
}
Console.WriteLine("Largest Files");
foreach (var fileInfo in fileInfos .OrderByDescending(fi => fi.Length))
{
Console.WriteLine($"{fileInfo.Length:N0} bytes: {fileInfo.Name}");
}
Console.WriteLine("Largest smaller files");
foreach (var fileInfo in
from fi in fileInfos
where fi.Length < 1000
orderby fi.Length descending
select fi)
{
Console.WriteLine($"{fileInfo.Length:N0} bytes: {fileInfo.Name}");
}
Console.ReadLine();
}
}
}
Earliest Files
05 Jan 21: wct63C3.tmp
05 Jan 21: wctD308.tmp
05 Jan 21: wctFE7.tmp
04 Feb 21: wctE092.tmp
Largest Files
38,997,896 bytes: wctE092.tmp
4,824,572 bytes: cb6dfb76-4dc9-494d-9683-ce31eab43612.tmp
4,014,036 bytes: 492f224c-c811-41d6-8c5d-371359d520db.tmp
Largest smaller files
726 bytes: wct38BC.tmp
726 bytes: wctE239.tmp
512 bytes: ~DF8CE3ED20D298A9EC.TMP
416 bytes: TFR14D8.tmp
With this example, you have queried files in the host machine's temp folder, rather than creating small objects from lists.
Note
You can find the code used for this example at https://packt.link/mWeVC.
The following example sorts popular quotes, based on the number of words found in each.
In your Chapter04Examples folder, add a new file called LinqThenByExamples.cs and edit it as follows:
using System;
using System.IO;
using System.Linq;
namespace Chapter04.Examples
{
class LinqThenByExamples
{
public static void Main()
{
You start by declaring a string array of quotes as follows:
var quotes = new[]
{
"Love for all hatred for none",
"Change the world by being yourself",
"Every moment is a fresh beginning",
"Never regret anything that made you smile",
"Die with memories not dreams",
"Aspire to inspire before we expire"
};
In the next snippet, each of these string quotes is projected into a new anonymous type based on the number of words in the quote (found using String.Split()). The items are first sorted in descending order to show those with the most words and then sorted in alphabetical order:
.Select(q => new {Quote = q, Words = q.Split(" ").Length})
.OrderByDescending(q => q.Words)
.ThenBy(q => q.Quote))
{
Console.WriteLine($"{item.Words}: {item.Quote}");
}
Console.ReadLine();
}
}
}
Running the code lists the quotes in word count order as follows:
7: Never regret anything that made you smile
6: Aspire to inspire before we expire
6: Change the world by being yourself
6: Every moment is a fresh beginning
6: Love for all hatred for none
5: Die with memories not dreams
Note how the quotes with six words are shown alphabetically.
The following (highlighted code) is the equivalent Query Expression with orderby quote.Words descending followed by the quote.Words ascending clause:
var query = from quote in
(quotes.Select(q => new {Quote = q, Words = q.Split(" ").Length}))
orderby quote.Words descending, quote.Words ascending
select quote;
foreach(var item in query)
{
Console.WriteLine($"{item.Words}: {item.Quote}");
}
Console.ReadLine();
}
}
}
Note
You can find the code used for this example at https://packt.link/YWJRz.
Now you have sorted popular quotes based on the number of words found in each. It is time to apply the skills learnt in the next exercise.
In the preceding examples, you have looked at code that can select, filter, and sort a collection source. You will now combine these into an exercise that filters a small list of countries for two continents (South America and Africa) and sorts the results by geographical size.
Perform the following steps to do so:
using System;
using System.Linq;
namespace Chapter04.Exercises.Exercise03
{
class Program
{
record Country (string Name, string Continent, int Area);
public static void Main()
{
var countries = new[]
{
new Country("Seychelles", "Africa", 176),
new Country("India", "Asia", 1_269_219),
new Country("Brazil", "South America",3_287_956),
new Country("Argentina", "South America", 1_073_500),
new Country("Mexico", "South America",750_561),
new Country("Peru", "South America",494_209),
new Country("Algeria", "Africa", 919_595),
new Country("Sudan", "Africa", 668_602)
};
The array contains the name of a country, the continent it belongs to, and its geographical size in square miles.
var requiredContinents = new[] {"South America", "Africa"};
This offers extra code flexibility should you need to alter it.
var filteredCountries = countries
.Where(c => requiredContinents.Contains(c.Continent))
.OrderBy(c => c.Continent)
.ThenByDescending(c => c.Area)
.Select( (cty, i) => new {Index = i, Country = cty});
foreach(var item in filteredCountries)
Console.WriteLine($"{item.Index+1}: {item.Country.Continent}, {item.Country.Name} = {item.Country.Area:N0} sq mi");
}
}
}
You finally project each into a new anonymous type to be written to the console.
1: Africa, Algeria = 919,595 sq mi
2: Africa, Sudan = 668,602 sq mi
3: Africa, Seychelles = 176 sq mi
4: South America, Brazil = 3,287,956 sq mi
5: South America, Argentina = 1,073,500 sq mi
6: South America, Mexico = 750,561 sq mi
7: South America, Peru = 494,209 sq mi
Notice that Algeria has the largest area in Africa, and Brazil has the largest area in South America (based on this small subset of data). Notice how you add 1 to each Index for readability (since starting at zero is less user-friendly).
Note
You can find the code used for this exercise at https://packt.link/Djddw.
You have seen how LINQ extension methods can be used to access items in a data source. Now, you will learn about partitioning data, which can be used to extract subsets of items.
So far, you have looked at filtering the items in a data source that match a defined condition. Partitioning is used when you need to divide a data source into two distinct sections and return either of those two sections for subsequent processing.
For example, consider that you have a list of vehicles sorted by value and want to process the five least expensive vehicles using some method. If the list is sorted in ascending order, then you could partition the data using the Take(5) method (defined in the following paragraphs), which will extract the first five items and discard the remaining.
There are six partitioning operations that are used to split a source, with either of the two sections being returned. There are no partitioning Query Expressions:
The following example demonstrates various Skip and Take operations on an unsorted list of exam grades. Here, you use Skip(1) to ignore the highest grade in a sorted list.
using System;
using System.Linq;
namespace Chapter04.Examples
{
class LinqSkipTakeExamples
{
public static void Main()
{
var grades = new[] {25, 95, 75, 40, 54, 9, 99};
Console.Write("Skip: Highest Grades (skipping first):");
foreach (var grade in grades
.OrderByDescending(g => g)
.Skip(1))
{
Console.Write($"{grade} ");
}
Console.WriteLine();
Console.Write("SkipWhile@ Middle Grades (excluding 25 or 75):");
foreach (var grade in grades
.OrderByDescending(g => g)
.SkipWhile(g => g is <= 25 or >=75))
{
Console.Write($"{grade} ");
}
Console.WriteLine();
Console.Write("SkipLast: Bottom Half Grades:");
foreach (var grade in grades
.OrderBy(g => g)
.SkipLast(grades.Length / 2))
{
Console.Write($"{grade} ");
}
Console.WriteLine();
Console.Write("Take: Two Highest Grades:");
foreach (var grade in grades
.OrderByDescending(g => g)
.Take(2))
{
Console.Write($"{grade} ");
}
}
}
}
Skip: Highest Grades (skipping first):95 75 54 40 25 9
SkipWhile Middle Grades (excluding 25 or 75):54 40 25 9
SkipLast: Bottom Half Grades:9 25 40 54
Take: Two Highest Grades:99 95
This example demonstrated the various Skip and Take operations on an unsorted list of exam grades.
Note
You can find the code used for this example at https://packt.link/TsDFk.
GroupBy groups elements that share the same attribute. It is often used to group data or provide a count of items grouped by a common attribute. The result is an enumerable IGrouping<K, V> type collection, where K is the key type and V is the value type specified. IGrouping itself is enumerable as it contains all items that match the specified key.
For example, consider the next snippet, which groups a List of customer orders by name. In your Chapter04Examples folder, add a new file called LinqGroupByExamples.cs and edit it as follows:
LinqGroupByExamples.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record CustomerOrder(string Name, string Product, int Quantity);
class LinqGroupByExamples
{
public static void Main()
{
var orders = new List<CustomerOrder>
{
new CustomerOrder("Mr Green", "LED TV", 4),
new CustomerOrder("Mr Smith", "iPhone", 2),
new CustomerOrder("Mrs Jones", "Printer", 1),
You can find the complete code here: https://packt.link/GbwF2.
In this example, you have a list of CustomerOrder objects and want to group them by the Name property. For this, the GroupBy method is passed a Func delegate, which selects the Name property from each CustomerOrder instance.
Each item in the GroupBy result contains a Key (in this case, the customer's Name). You can then sort the grouping item to show the CustomerOrders items sorted by Quantity, as follows:
foreach (var item in grouping.OrderByDescending(i => i.Quantity))
{
Console.WriteLine($" {item.Product} * {item.Quantity}");
}
}
Console.ReadLine();
}
}
}
Running the code produces the following output:
Customer Mr Green:
LED TV * 4
MP3 Player * 1
Microwave Oven * 1
Customer Mr Smith:
PC * 5
iPhone * 2
Printer * 2
Customer Mrs Jones:
Printer * 1
You can see the data is first grouped by customer Name and then ordered by order Quantity within each customer grouping. The equivalent Query Expression is written like this:
var query = from order in orders
group order by order.Name;
foreach (var grouping in query)
{
Console.WriteLine($"Customer {grouping.Key}:");
foreach (var item in from item in grouping
orderby item.Quantity descending
select item)
{
Console.WriteLine($" {item.Product} * {item.Quantity}");
}
}
You have now seen some of the commonly used LINQ operators. You will now bring them together in an exercise.
In Chapter 3, Delegates, Events, and Lambdas, you used the WebClient class to download data from a website. In this exercise, you will use data downloaded from Project Gutenberg.
Note
Project Gutenberg is a library of 60,000 free eBooks. You can search online for Project Gutenberg or visit https://www.gutenberg.org/.
You will create a console app that allows the user to enter a URL. Then, you will download the book's text from the Project Gutenberg URL and use various LINQ statements to find the most frequent words in the book's text.
Additionally, you want to exclude some common stop-words; these are words such as and, or, and the that appear regularly in English, but add little to the meaning of a sentence. You will use the Regex.Split method to help split words more accurately than a simple space delimiter. Perform the following steps to do so:
Note
You can find more information on Regex can be found at https://packt.link/v4hGN.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
namespace Chapter04.Exercises.Exercise04
{
class TextCounter
{
private readonly HashSet<string> _stopWords;
public TextCounter(string stopWordPath)
{
Console.WriteLine($"Reading stop word file: {stopWordPath}");
_stopWords = new HashSet<string>(File.ReadAllLines(stopWordPath));
}
You have used a HashSet, as each stop-word is unique.
public IEnumerable<Tuple<string, int>> Process(string text, int maximumWords)
{
In its simplest form, this pattern splits a string into a list of words, typically using a space or punctuation marks to identify word boundaries. For example, the string Hello Goodbye would be split into an array that contains two elements, Hello and Goodbye. The returned string items are filtered via where to ensure all stop-words are ignored using the Contains method. The words are then grouped by value, GroupBy(t=>t), projected to a Tuple using the word as a Key, and the number of times it occurs using grp.Count.
var words = Regex.Split(text.ToLower(), @"s+")
.Where(t => !_stopWords.Contains(t))
.GroupBy(t => t)
.Select(grp => Tuple.Create(grp.Key, grp.Count()))
.OrderByDescending(tup => tup.Item2) //int
.Take(maximumWords);
return words;
}
}
class Program
{
public static void Main()
{
const string StopWordFile = "StopWords.txt";
var counter = new TextCounter(StopWordFile);
Note
You can find StopWords.txt on GitHub at https://packt.link/Vi8JH, or you can download any standard stop-word file, such as NLTK's https://packt.link/ZF1Tf. This file should be saved in the Chapter04Exercises folder.
string address;
do
{
//https://www.gutenberg.org/files/64333/64333-0.txt
Console.Write("Enter a Gutenberg book URL: ");
address = Console.ReadLine();
if (string.IsNullOrEmpty(address))
continue;
using var client = new WebClient();
var tempFile = Path.GetTempFileName();
Console.WriteLine("Downloading...");
client.DownloadFile(address, tempFile);
The Gutenberg text files contain extra details such as the author and title. These can be read by reading each line in the file. The actual text of the book doesn't begin until finding a line that starts *** START OF THE PROJECT GUTENBERG EBOOK, so you need to read each line looking for this start message too:
Console.WriteLine($"Processing file {tempFile}");
const string StartIndicator = "*** START OF THE PROJECT GUTENBERG EBOOK";
//Title: The Little Review, October 1914(Vol. 1, No. 7)
//Author: Various
var title = string.Empty;
var author = string.Empty;
var bookText = new StringBuilder();
var isReadingBookText = false;
var bookTextLineCount = 0;
foreach (var line in File.ReadAllLines(tempFile))
{
if (line.StartsWith("Title"))
{
title = line;
}
else if (line.StartsWith("Author"))
{
author = line;
}
else if (line.StartsWith(StartIndicator))
{
isReadingBookText = true;
}
else if (isReadingBookText)
{
bookText.Append(line);
bookTextLineCount++;
}
}
if (bookTextLineCount > 0)
{
Console.WriteLine($"Processing {bookTextLineCount:N0} lines ({bookText.Length:N0} characters)..");
var wordCounts = counter.Process(bookText.ToString(), 50);
Console.WriteLine(title);
Console.WriteLine(author);
var i = 0;
//deconstruction
foreach (var (word, count) in wordCounts)
{
Console.Write($"'{word}'={count} ");
i++;
if (i % 3 == 0)
{
Console.WriteLine();
}
}
Console.WriteLine();
}
else
{
Reading stop word file: StopWords.txt
Enter a Gutenberg book URL: https://www.gutenberg.org/files/64333/64333-0.txt
Downloading...
Processing file C:Temp mpB0A3.tmp
Processing 4,063 lines (201,216 characters)..
Title: The Little Review, October 1914 (Vol. 1, No. 7)
Author: Various
'one'=108 'new'=95 'project'=62
'man'=56 'little'=54 'life'=52
'would'=51 'work'=50 'book'=42
'must'=42 'people'=39 'great'=37
'love'=37 'like'=36 'gutenberg-tm'=36
'may'=35 'men'=35 'us'=32
'could'=30 'every'=30 'first'=29
'full'=29 'world'=28 'mr.'=28
'old'=27 'never'=26 'without'=26
'make'=26 'young'=24 'among'=24
'modern'=23 'good'=23 'it.'=23
'even'=22 'war'=22 'might'=22
'long'=22 'cannot'=22 '_the'=22
'many'=21 'works'=21 'electronic'=21
'always'=20 'way'=20 'thing'=20
'day'=20 'upon'=20 'art'=20
'terms'=20 'made'=19
Note
Visual Studio might show the following when the code is run for the first time: warning SYSLIB0014: 'WebClient.WebClient()' is obsolete: 'WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.
This is a recommendation to use the newer HttpClient class instead of the WebClient class. Both are, however, functionally equivalent.
The output shows a list of words found amongst the 4,063 lines of text downloaded. The counter shows that one, new, and project are the most popular words. Notice how mr., gutenberg-tm, it., and _the appear as words. This shows that the Regex expression used is not completely accurate when splitting words.
Note
You can find the code used for this exercise at https://packt.link/Q7Pf8.
An interesting enhancement to this exercise would be to sort the words by count, include a count of the stop words found, or find the average word length.
Aggregation operations are used to compute a single value from a collection of values in a data source. An example could be the maximum, minimum, and average rainfall from data collected over a month:
The following example uses the Process.GetProcess method from the System.Diagnostics namespace to retrieve a list of processes currently running on the system:
In your Chapter04Examples folder, add a new file called LinqAggregationExamples.cs and edit it as follows:
using System;
using System.Diagnostics;
using System.Linq;
namespace Chapter04.Examples
{
class LinqAggregationExamples
{
public static void Main()
{
First, Process.GetProcesses().ToList() is called to retrieve a list of the active processes running on the system:
var processes = Process.GetProcesses().ToList();
Then, the Count extension method obtains a count of the items returned. Count has an additional overload, which accepts a Func delegate used to filter each of the items to be counted. The Process class has a PrivateMemorySize64 property, which returns the number of bytes of memory the process is currently consuming, so you can use that to count the small processes, that is, those using less than 1,000,000 bytes of memory:
var allProcesses = processes.Count;
var smallProcesses = processes.Count(proc => proc.PrivateMemorySize64 < 1_000_000);
Next, the Average extension method returns the overall average of a specific value for all items in the processes list. In this case, you use it to calculate the average memory consumption, using the PrivateMemorySize64 property again:
var average = processes.Average(p => p.PrivateMemorySize64);
The PrivateMemorySize64 property is also used to calculate the maximum and minimum memory used for all processes, along with the total memory, as follows:
var max = processes.Max(p => p.PrivateMemorySize64);
var min = processes.Min(p => p.PrivateMemorySize64);
var sum = processes.Sum(p => p.PrivateMemorySize64);
Once you have calculated the statistics, each value is written to the console:
Console.WriteLine("Process Memory Details");
Console.WriteLine($" All Count: {allProcesses}");
Console.WriteLine($"Small Count: {smallProcesses}");
Console.WriteLine($" Average: {FormatBytes(average)}");
Console.WriteLine($" Maximum: {FormatBytes(max)}");
Console.WriteLine($" Minimum: {FormatBytes(min)}");
Console.WriteLine($" Total: {FormatBytes(sum)}");
}
In the preceding snippet, the Count method returns the number of all processes and, using the Predicate overload, you Count those where the memory is less than 1,000,000 bytes (by examining the process.PrivateMemorySize64 property). You can also see that Average, Max, Min, and Sum are used to calculate statistics for process memory usage on the system.
Note
The aggregate operators will throw InvalidOperationException with the error Sequence contains no elements if you attempt to calculate using a source collection that contains no elements. You should check the Count or Any methods prior to calling any aggregate operators.
Finally, FormatBytes formats the amounts of memory into their megabyte equivalents:
private static string FormatBytes(double bytes)
{
return $"{bytes / Math.Pow(1024, 2):N2} MB";
}
}
}
Running the example produces results similar to this:
Process Memory Details
All Count: 305
Small Count: 5
Average: 38.10 MB
Maximum: 1,320.16 MB
Minimum: 0.06 MB
Total: 11,620.03 MB
From the output you will observe how the program retrieves a list of processes currently running on the system.
Note
You can find the code used for this example at https://packt.link/HI2eV.
Quantifier operations return a bool that indicates whether all or some elements in a sequence match a Predicate condition. This is often used to verify any elements in a collection match some criteria, rather than relying on Count, which enumerates all items in the collection, even if you need just one result.
Quantifier operations are accessed using the following extension methods:
The following card-dealing example selects three cards at random and returns a summary of those selected. The summary uses the All and Any extension methods to determine whether any of the cards were clubs or red and whether all cards were diamonds or an even number:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
enum PlayingCardSuit
{
Hearts,
Clubs,
Spades,
Diamonds
}
record PlayingCard (int Number, PlayingCardSuit Suit)
{
public override string ToString()
{
return $"{Number} of {Suit}";
}
}
class Deck
{
private readonly List<PlayingCard> _cards = new();
private readonly Random _random = new();
public Deck()
{
for (var i = 1; i <= 10; i++)
{
_cards.Add(new PlayingCard(i, PlayingCardSuit.Hearts));
_cards.Add(new PlayingCard(i, PlayingCardSuit.Clubs));
_cards.Add(new PlayingCard(i, PlayingCardSuit.Spades));
_cards.Add(new PlayingCard(i, PlayingCardSuit.Diamonds));
}
}
public PlayingCard Draw()
{
var index = _random.Next(_cards.Count);
var drawnCard = _cards.ElementAt(index);
_cards.Remove(drawnCard);
return drawnCard;
}
}
class LinqAllAnyExamples
{
public static void Main()
{
var deck = new Deck();
var hand = new List<PlayingCard>();
for (var i = 0; i < 3; i++)
{
hand.Add(deck.Draw());
}
var summary = string.Join(" | ",
hand.OrderByDescending(c => c.Number)
.Select(c => c.ToString()));
Console.WriteLine($"Hand: {summary}");
Console.WriteLine($"Any Clubs: {hand.Any(card => card.Suit == PlayingCardSuit.Clubs)}");
Console.WriteLine($"Any Red: {hand.Any(card => card.Suit == PlayingCardSuit.Hearts || card.Suit == PlayingCardSuit.Diamonds)}");
Console.WriteLine($"All Diamonds: {hand.All(card => card.Suit == PlayingCardSuit.Diamonds)}");
Console.WriteLine($"All Even: {hand.All(card => card.Number % 2 == 0)}");
Console.WriteLine($"Score :{hand.Sum(card => card.Number)}");
}
}
}
Hand: 8 of Spades | 7 of Diamonds | 6 of Diamonds
Any Clubs: False
Any Red: True
All Diamonds: False
All Even: False
Score :21
The cards are randomly selected so you will have different hands each time you run the program. In this example, the score was 21, which is often a winning hand in card games.
Note
You can find the code used for this example at https://packt.link/xPuTc.
Join operations are used to join two sources based on the association of objects in one data source with those that share a common attribute in a second data source. If you are familiar with database design, this can be thought of as a primary and foreign key relationship between tables.
A common example of a join is one where you have a one-way relationship, such as Orders, which has a property of type Products, but the Products class does not have a collection property that represents a backward relationship to a collection of Orders. By using a Join operator, you can create a backward relationship to show Orders for Products.
The two join extension methods are the following:
The following example contains three Manufacturer records, each with a unique ManufacturerId. These numeric IDs are used to define various Car records, but to save memory, you will not have a direct memory reference from Manufacturer back to Car. You will use the Join method to create an association between the Manufacturer and Car instances:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Chapter04.Examples
{
record Manufacturer(int ManufacturerId, string Name);
record Car (string Name, int ManufacturerId);
LinqJoinExamples.cs
class LinqJoinExamples
{
public static void Main()
{
var manufacturers = new List<Manufacturer>
{
new(1, "Ford"),
new(2, "BMW"),
new(3, "VW")
};
var cars = new List<Car>
{
new("Focus", 1),
new("Galaxy", 1),
new("GT40", 1),
You can find the complete code here: https://packt.link/Ue7Fj.
var joinedQuery = manufacturers.Join(
cars,
manufacturer => manufacturer.ManufacturerId,
car => car.ManufacturerId,
(manufacturer, car) => new {ManufacturerName = manufacturer.Name, CarName = car.Name});
foreach (var item in joinedQuery)
{
Console.WriteLine($"{item}");
}
}
}
}
In the preceding snippet, the Join operation has various parameters. You pass in the cars list and define which properties in the manufacturer and car classes should be used to create the join. In this case, manufacturer.ManufacturerId = car.ManufacturerId determines the correct join.
Finally, the manufacturer and car arguments return a new anonymous type that contains the manufacturer.Name and car.Name properties.
{ ManufacturerName = Ford, CarName = Focus }
{ ManufacturerName = Ford, CarName = Galaxy }
{ ManufacturerName = Ford, CarName = GT40 }
{ ManufacturerName = BMW, CarName = 1 Series }
{ ManufacturerName = BMW, CarName = 2 Series }
{ ManufacturerName = VW, CarName = Golf }
{ ManufacturerName = VW, CarName = Polo }
As you can see, each of the Car and Manufacturer instances has been joined correctly using ManufacturerId.
var query = from manufacturer in manufacturers
join car in cars
on manufacturer.ManufacturerId equals car.ManufacturerId
select new
{
ManufacturerName = manufacturer.Name, CarName = car.Name
};
foreach (var item in query)
{
Console.WriteLine($"{item}");
}
Note
You can find the code used for this example at http://packt.link/Wh8jK.
Before you finish exploring LINQ, there is one more area related to LINQ Query Expressions—the let clause.
In earlier Query Expressions, you are often required to repeat similar-looking code in various clauses. Using a let clause, you can introduce new variables inside an Expression Query and reuse the variable's value throughout the rest of the query. For example, consider the following query:
var stations = new List<string>
{
"Kings Cross KGX",
"Liverpool Street LVS",
"Euston EUS",
"New Street NST"
};
var query1 = from station in stations
where station[^3..] == "LVS" || station[^3..] == "EUS" ||
station[0..^3].Trim().ToUpper().EndsWith("CROSS")
select new { code= station[^3..], name= station[0..^3].Trim().ToUpper()};
Here, you are searching for a station with the LVS or EUS code or a name ending in CROSS. To do this, you must extract the last three characters using a range, station[^3..], but you have duplicated that in two where clauses and the final projection.
The station code and station names could both be converted into local variables using the let clause:
var query2 = from station in stations
let code = station[^3..]
let name = station[0..^3].Trim().ToUpper()
where code == "LVS" || code == "EUS" ||
name.EndsWith("CROSS")
select new {code, name};
Here, you have defined code and name using a let clause and reused them throughout the query. This code looks much neater and is also easier to follow and maintain.
Running the code produces the following output:
Station Codes:
KGX : KINGS CROSS
LVS : LIVERPOOL STREET
EUS : EUSTON
Station Codes (2):
KGX : KINGS CROSS
LVS : LIVERPOOL STREET
EUS : EUSTON
Note
You can find the code used for this example at https://packt.link/b2KiG.
By now you have seen the main parts of LINQ. Now you will now bring these together into an activity that filters a set of flight records based on a user's criteria and provides various statistics on the subset of flights found.
You have been asked to create a console app that allows the user to download publicly available flight data files and apply statistical analysis to the files. This analysis should be used to calculate a count of the total records found, along with the average, minimum, and maximum fare paid within that subset.
The user should be able to enter a number of commands and each command should add a specific filter based on the flight's class, origin, or destination properties. Once the user has entered the required criteria, the go command must be entered, and the console should run a query and output the results.
The data file you will use for this activity contains details of flights made by the UK's HM Treasury department between January 1 to December 31, 2011 (there are 714 records.) You will need to use WebClient.DownloadFile to download the data from the following URL: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/245855/HMT_-_2011_Air_Data.csv
Note
The website might open differently for Internet Explorer or Google Chrome. This depends on how IE or Chrome are configured on your machine. Using WebClient.DownloadFile, you can download the data as suggested.
Ideally, the program should download data once and then reread it from the local filesystem each time it is started.
Once downloaded, the data should then be read into a suitable record structure before being added to a collection, which allows various queries to be applied. The output should show the following aggregate values for all rows that match the user's criteria:
The user should be able to enter the following console commands:
If a user enters multiple filters of the same type, then these should be treated as an OR filter.
An enum can be used to identify the filter criteria type entered, as shown in the following line of code:
enum FilterCriteriaType {Class, Origin, Destination}
Similarly, a record can be used to store each filter type and comparison operand, as follows:
record FilterCriteria(FilterCriteriaType Filter, string Operand)
Each filter specified should be added to a List<FilterCriteria> instance. For example, if the user enters two origin filters, one for dublin and another for london, then the list should contain two objects, each representing an origin type filter.
When the user enters the go command, a query should be built that performs the following steps:
The following steps will help you complete this activity:
public const int Agency = 0;
public const int PaidFare = 1;
The console output should be similar to the following, here listing the commands available to the user:
Commands: go | clear | class value | origin value | destination value
Enter a command:class economy
Added filter: Class=economy
Enter a command:class Business Class
Added filter: Class=business class
Enter a command:origin london
Added filter: Origin=london
Enter a command:destination zurich
Added filter: Destination=zurich
Enter a command:go
Classes: economy OR business class
Destinations: zurich
Origins: london
Results: Count=16, Avg=266.92, Min=-74.71, Max=443.49
Note
The solution to this activity can be found at https://packt.link/qclbF.
In this chapter, you saw how the IEnumerable and ICollection interfaces form the basis of .NET data structures, and how they can be used to store multiple items. You created different types of collections depending on how each collection is meant to be used. You learned that the List collection is most extensively used to store collections of items, particularly if the number of elements is not known at compile time. You saw that the Stack and Queue types allow the order of items to be handled in a controlled manner, and how the HashSet offers set-based processing, while the Dictionary stores unique values using a key identifier.
You then further explored data structures by using LINQ Query Expressions and Query Operators to apply queries to data, showing how queries can be altered at runtime depending on filtering requirements. You sorted and partitioned data and saw how similar operations can be achieved using both Query Operators and Query Expressions, each offering a preference and flexibility based on context.
In the next chapter, you will see how parallel and asynchronous code can be used to run complex or long-running operations together.