The end of Chapter 14 showed a query using standard query operators for GroupJoin()
, SelectMany()
, and Distinct()
, in addition to the creation of two anonymous types. The result was a statement that spanned multiple lines and was rather more complex and difficult to comprehend than statements typically written using only features of earlier versions of C#. Modern programs that manipulate rich data sets often require such complex queries; it would therefore be nice if the language made them easier to read. Domain-specific query languages such as SQL make it much easier to read and understand a query, but lack the full power of the C# language. That is why the C# language designers added query expressions syntax to C# 3.0. With query expressions, many standard query operator expressions are transformed into more readable code, much like SQL.
In this chapter, we introduce query expressions and use them to express many of the queries from the preceding chapter.
Two of the operations that developers most frequently perform are filtering the collection to eliminate unwanted items and projecting the collection so that the items take a different form. For example, given a collection of files, we could filter it to create a new collection of only the files with a “.cs” extension, or only the files larger than 1 million bytes. We could also project the file collection to create a new collection of paths to the directories where the files are located and the corresponding directory size. Query expressions provide straightforward syntaxes for both of these common operations. Listing 15.1 shows a query expression that filters a collection of strings; Output 15.1 shows the results.
using System;
using System.Collections.Generic;
using System.Linq;
// ...
static string[] Keywords = {
"abstract", "add*", "alias*", "as", "ascending*",
"async*", "await*", "base","bool", "break",
"by*", "byte", "case", "catch", "char", "checked",
"class", "const", "continue", "decimal", "default",
"delegate", "descending*", "do", "double",
"dynamic*", "else", "enum", "event", "equals*",
"explicit", "extern", "false", "finally", "fixed",
"from*", "float", "for", "foreach", "get*", "global*",
"group*", "goto", "if", "implicit", "in", "int",
"into*", "interface", "internal", "is", "lock", "long",
"join*", "let*", "nameof*", "namespace", "new", "null",
"object", "on*", "operator", "orderby*", "out",
"override", "params", "partial*", "private", "protected",
"public", "readonly", "ref", "remove*", "return", "sbyte",
"sealed", "select*", "set*", "short", "sizeof",
"stackalloc", "static", "string", "struct", "switch",
"this", "throw", "true", "try", "typeof", "uint", "ulong",
"unsafe", "ushort", "using", "value*", "var*", "virtual",
"unchecked", "void", "volatile", "where*", "while", "yield*"};
private static void ShowContextualKeywords1()
{
IEnumerable<string> selection =
from word in Keywords
where !word.Contains('*')
select word;
foreach (string keyword in selection)
{
Console.Write(keyword + " ");
}
}
// ...
abstract as base bool break byte case catch char checked class const
continue decimal default delegate do double else enum event explicit
extern false finally fixed float for foreach goto if implicit in int
interface internal is lock long namespace new null object operator out
override params private protected public readonly ref return sbyte
sealed short sizeof stackalloc static string struct switch this throw
true try typeof uint ulong unchecked unsafe ushort using virtual void
volatile while
In this query expression, selection
is assigned the collection of C# reserved keywords. The query expression in this example includes a where
clause that filters out the noncontextual keywords.
Query expressions always begin with a “from clause” and end with a “select clause” or a “group clause,” identified by the from
, select
, or group
contextual keyword, respectively. The identifier word
in the from
clause is called a range variable; it represents each item in the collection, much as the loop variable in a foreach
loop represents each item in a collection.
Developers familiar with SQL will notice that query expressions have a syntax that is similar to that of SQL. This design was deliberate—it was intended that LINQ should be easy to learn for programmers who already know SQL. However, there are some obvious differences. The first difference that most SQL-experienced developers will notice is that the C# query expression shown here has the clauses in the following order: from
, then where
, then select
. The equivalent SQL query puts the SELECT
clause first, then the FROM
clause, and finally the WHERE
clause.
One reason for this change in sequence is to enable use of IntelliSense, the feature of the IDE whereby the editor produces helpful user interface elements such as drop-down lists that describe the members of a given object. Because from
appears first and identifies the string array Keywords
as the data source, the code editor can deduce that the range variable word
is of type string
. When you are entering the code into the editor and reach the dot following word
, the editor will display only the members of string
.
If the from
clause appeared after the select
, as it does in SQL, as you were typing in the query the editor would not know what the data type of word
was, so it would not be able to display a list of word
’s members. In Listing 15.1, for example, it wouldn’t be possible to predict that Contains()
was a possible member of word
.
The C# query expression order also more closely matches the order in which operations are logically performed. When evaluating the query, you begin by identifying the collection (described by the from
clause), then filter out the unwanted items (with the where
clause), and finally describe the desired result (with the select
clause).
Finally, the C# query expression order ensures that the rules for “where” (range) variables are in scope are mostly consistent with the scoping rules for local variables. For example, a (range) variable must be declared by a clause (typically a from
clause) before the variable can be used, much as a local variable must always be declared before it can be used.
The result of a query expression is a collection of type IEnumerable<T>
or IQueryable<T>
.1 The actual type T
is inferred from the select
or group by
clause. In Listing 15.1, for example, the compiler knows that Keywords
is of type string[]
, which is convertible to IEnumerable<string>
, and deduces that word
is therefore of type string
. The query ends with select word
, which means the result of the query expression must be a collection of strings, so the type of the query expression is IEnumerable<string>
.
1. The result of a query expression is, as a practical matter, almost always IEnumerable<T>
or a type derived from it. It is legal, though somewhat perverse, to create an implementation of the query methods that return other types; there is no requirement in the language that the result of a query expression be convertible to IEnumerable<T>
.
In this case, the “input” and the “output” of the query are both a collection of strings. However, the “output” type can be quite different from the “input” type if the expression in the select
clause is of an entirely different type. Consider the query expression in Listing 15.2, and its corresponding output in Output 15.2.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
// ...
static void List1(string rootDirectory, string searchPattern)
{
IEnumerable<string> fileNames = Directory.GetFiles(
rootDirectory, searchPattern);
IEnumerable<FileInfo> fileInfos =
from fileName in fileNames
select new FileInfo(fileName);
foreach (FileInfo fileInfo in fileInfos)
{
Console.WriteLine(
$@".{ fileInfo.Name } ({
fileInfo.LastWriteTime })");
}
}
// ...
Account.cs (11/22/2011 11:56:11 AM)
Bill.cs (8/10/2011 9:33:55 PM)
Contact.cs (8/19/2011 11:40:30 PM)
Customer.cs (11/17/2011 2:02:52 AM)
Employee.cs (8/17/2011 1:33:22 AM)
Person.cs (10/22/2011 10:00:03 PM)
This query expression results in an IEnumerable<FileInfo>
rather than the IEnumerable<string>
data type returned by Directory.GetFiles()
. The select
clause of the query expression can potentially project out a data type that is different from what was collected by the from
clause expression.
In this example, the type FileInfo
was chosen because it has the two relevant fields needed for the desired output: the filename and the last write time. There might not be such a convenient type if you needed other information not captured in the FileInfo
object. Anonymous types provide a convenient and concise way to project the exact data you need without having to find or create an explicit type. (In fact, this scenario was the key motivator for adding anonymous types to the language.) Listing 15.3 provides output similar to that in Listing 15.2, but via anonymous types rather than FileInfo
.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
// ...
static void List2(string rootDirectory, string searchPattern)
{
var fileNames =Directory.EnumerateFiles(
rootDirectory, searchPattern)
var fileResults =
from fileName in fileNames
select new
{
Name = fileName,
LastWriteTime = File.GetLastWriteTime(fileName)
};
foreach (var fileResult in fileResults)
{
Console.WriteLine(
$@"{ fileResult.Name } ({
fileResult.LastWriteTime })");
}
}
// ...
In this example, the query projects out only the filename and its last file write time. A projection such as the one in Listing 15.3 makes little difference when working with something small, such as FileInfo
. However, “horizontal” projection that filters down the amount of data associated with each item in the collection is extremely powerful when the amount of data is significant and retrieving it (perhaps from a different computer over the Internet) is expensive. Rather than retrieving all the data when a query executes, the use of anonymous types enables the capability of storing and retrieving only the required data into the collection.
Imagine, for example, a large database that has tables with 30 or more columns. If there were no anonymous types, developers would be required either to use objects containing unnecessary information or to define small, specialized classes useful only for storing the specific data required. Instead, anonymous types enable support for types to be defined by the compiler—types that contain only the data needed for their immediate scenario. Other scenarios can have a different projection of only the properties needed for that scenario.
In Listing 15.1, we include a where
clause that filters out reserved keywords but not contextual keywords. This where
clause filters the collection “vertically”; if you think of the collection as a vertical list of items, the where
clause makes that vertical list shorter so that the collection holds fewer items. The filter criteria are expressed with a predicate—a lambda expression that returns a bool
such as word.Contains()
(as in Listing 15.1) or File.GetLastWriteTime(file) < DateTime.Now.AddMonths(-1)
. The latter is shown in Listing 15.6, whose output appears in Output 15.5.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
// ...
static void FindMonthOldFiles(
string rootDirectory, string searchPattern)
{
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
where File.GetLastWriteTime(fileName) <
DateTime.Now.AddMonths(-1)
select new FileInfo(fileName);
foreach (FileInfo file in files)
{
// As simplification, current directory is
// assumed to be a subdirectory of
// rootDirectory
string relativePath = file.FullName.Substring(
Environment.CurrentDirectory.Length);
Console.WriteLine(
$".{ relativePath } ({ file.LastWriteTime })");
}
}
// ...
.TestDataBill.cs (8/10/2011 9:33:55 PM)
.TestDataContact.cs (8/19/2011 11:40:30 PM)
.TestDataEmployee.cs (8/17/2011 1:33:22 AM)
.TestDataPerson.cs (10/22/2011 10:00:03 PM)
To order the items using a query expression, you can use the orderby
clause, as shown in Listing 15.7.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
// ...
static void ListByFileSize1(
string rootDirectory, string searchPattern)
{
IEnumerable<string> fileNames =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
orderby (new FileInfo(fileName)).Length descending,
fileName
select fileName;
foreach (string fileName in fileNames)
{
Console.WriteLine(fileName);
}
}
// ...
Listing 15.7 uses the orderby
clause to sort the files returned by Directory.GetFiles()
first by file size in descending order, and then by filename in ascending order. Multiple sort criteria are separated by commas, such that first the items are ordered by size, and then, if the size is the same, they are ordered by filename. ascending
and descending
are contextual keywords indicating the sort order direction. Specifying the order as ascending or descending is optional; if the direction is omitted (as it is here on filename
), the default is ascending
.
Listing 15.8 includes a query that is very similar to the query in Listing 15.7, except that the type argument of IEnumerable<T>
is FileInfo
. Notice that there is a problem with this query: We have to redundantly create a FileInfo
twice, in both the orderby
clause and the select
clause.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
// ...
static void ListByFileSize2(
string rootDirectory, string searchPattern)
{
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
orderby new FileInfo(fileName).Length, fileName
select new FileInfo(fileName);
foreach (FileInfo file in files)
{
// As a simplification, the current directory
// is assumed to be a subdirectory of
// rootDirectory
string relativePath = file.FullName.Substring(
Environment.CurrentDirectory.Length);
Console.WriteLine(
$".{ relativePath }({ file.Length })");
}
}
// ...
Unfortunately, although the end result is correct, Listing 15.8 ends up instantiating a FileInfo
object twice for each item in the source collection, which seems wasteful and unnecessary. To avoid this kind of unnecessary and potentially expensive overhead, you can use a let
clause, as demonstrated in Listing 15.9.
// ...
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
let file = new FileInfo(fileName)
orderby file.Length, fileName
select file;
// ...
The let
clause introduces a new range variable that can hold the value of an expression that is used throughout the remainder of the query expression. You can add as many let
clauses as you like; simply add each as an additional clause to the query after the first from
clause but before the final select
/group by
clause.
A common data manipulation scenario is the grouping of related items. In SQL, this generally involves aggregating the items to produce a summary or total or other aggregate value. LINQ, however, is notably more expressive. LINQ expressions allow for individual items to be grouped into a series of subcollections, and those groups can then be associated with items in the collection being queried. For example, Listing 15.10 and Output 15.6 demonstrate how to group together the contextual keywords and the regular keywords.
using System;
using System.Collections.Generic;
using System.Linq;
// ...
private static void GroupKeywords1()
{
IEnumerable<IGrouping<bool, string>> selection =
from word in Keywords
group word by word.Contains('*');
foreach (IGrouping<bool, string> wordGroup
in selection)
{
Console.WriteLine(Environment.NewLine + "{0}:",
wordGroup.Key ?
"Contextual Keywords" : "Keywords");
foreach (string keyword in wordGroup)
{
Console.Write(" " +
(wordGroup.Key ?
keyword.Replace("*", null) : keyword));
}
}
}
// ...
Keywords:
abstract as base bool break byte case catch char checked class
const continue decimal default delegate do double else enum event
explicit extern false finally fixed float for foreach goto if
implicit in int interface internal is lock long namespace new null
operator out override object params private protected public
readonly ref return sbyte sealed short sizeof stackalloc static
string struct switch this throw true try typeof uint ulong unsafe
ushort using virtual unchecked void volatile while
Contextual Keywords:
add alias ascending async await by descending dynamic equals from
get global group into join let nameof on orderby partial remove
select set value var where yield
There are several things to note in this listing. First, the query result is a sequence of elements of type IGrouping<bool, string>
. The first type argument indicates that the “group key” expression following by
was of type bool
, and the second type argument indicates that the “group element” expression following group
was of type string
. That is, the query produces a sequence of groups where the Boolean key is the same for each string
in the group.
Because a query with a group by
clause produces a sequence of collections, the common pattern for iterating over the results is to create nested foreach
loops. In Listing 15.10, the outer loop iterates over the groupings and prints out the type of keyword as a header. The nested foreach
loop prints each keyword in the group as an item below the header.
The result of this query expression is itself a sequence, which you can then query like any other sequence. Listing 15.11 and Output 15.7 show how to create an additional query that adds a projection onto a query that produces a sequence of groups. (The next section, on query continuations, shows a preferable syntax for adding more query clauses to a complete query.)
using System;
using System.Collections.Generic;
using System.Linq;
// ...
private static void GroupKeywords1()
{
IEnumerable<IGrouping<bool, string>> keywordGroups =
from word in Keywords
group word by word.Contains('*');
var selection =
from groups in keywordGroups
select new
{
IsContextualKeyword = groups.Key,
Items = groups
};
foreach (var wordGroup in selection)
{
Console.WriteLine(Environment.NewLine + "{0}:",
wordGroup.IsContextualKeyword ?
"Contextual Keywords" : "Keywords");
foreach (var keyword in wordGroup.Items)
{
Console.Write(" " +
keyword.Replace("*", null));
}
}
}
// ...
Keywords:
abstract as base bool break byte case catch char checked class
const continue decimal default delegate do double else enum
event explicit extern false finally fixed float for foreach goto if
implicit in int interface internal is lock long namespace new null
operator out override object params private protected public
readonly ref return sbyte sealed short sizeof stackalloc static
string struct switch this throw true try typeof uint ulong unsafe
ushort using virtual unchecked void volatile while
Contextual Keywords:
add alias ascending async await by descending dynamic equals from
get global group into join let nameof on orderby partial remove
select set value var where yield
The group
clause results in a query that produces a collection of IGrouping<TKey, TElement>
objects—just as the GroupBy()
standard query operator did (see Chapter 14). The select
clause in the subsequent query uses an anonymous type to effectively rename IGrouping<TKey, TElement>.Key
to IsContextualKeyword
and to name the subcollection property Items
. With this change, the nested foreach
loop uses wordGroup.Items
rather than wordGroup
directly, as shown in Listing 15.10. Another potential property to add to the anonymous type would be a count of the items within the subcollection. This functionality is already available through wordGroup.Items.Count()
, so the benefit of adding it to the anonymous type directly is questionable.
As we saw in Listing 15.11, you can use an existing query as the input to a second query. However, it is not necessary to write an entirely new query expression when you want to use the results of one query as the input to another. You can extend any query with a query continuation clause using the contextual keyword into
. A query continuation is nothing more than syntactic sugar for creating two queries and using the first as the input to the second. The range variable introduced by the into
clause (groups
in Listing 15.11) becomes the range variable for the remainder of the query; any previous range variables are logically a part of the earlier query and cannot be used in the query continuation. Listing 15.12 shows how to rewrite the code of Listing 15.11 to use a query continuation instead of two queries.
using System;
using System.Collections.Generic;
using System.Linq;
// ...
private static void GroupKeywords1()
{
var selection =
from word in Keywords
group word by word.Contains('*')
into groups
select new
{
IsContextualKeyword = groups.Key,
Items = groups
};
// ...
}
// ...
The ability to run additional queries on the results of an existing query using into
is not specific to queries ending with group
clauses, but rather can be applied to all query expressions. Query continuation is simply a shorthand for writing query expressions that consume the results of other query expressions. You can think of into
as a “pipeline operator,” because it “pipes” the results of the first query into the second query. You can arbitrarily chain together many queries in this way.
It is often desirable to “flatten” a sequence of sequences into a single sequence. For example, each member of a sequence of customers might have an associated sequence of orders, or each member of a sequence of directories might have an associated sequence of files. The SelectMany
sequence operator (discussed in Chapter 14) concatenates together all the subsequences; to do the same thing with query expression syntax, you can use multiple from
clauses, as shown in Listing 15.13.
var selection =
from word in Keywords
from character in word
select character;
The preceding query will produce the sequence of characters a
, b
, s
, t
, r
, a
, c
, t
, a
, d
, d, *, a, l
, i, a, ....
Multiple from
clauses can also be used to produce the Cartesian product—the set of all possible combinations of several sequences—as shown in Listing 15.14.
var numbers = new[] { 1, 2, 3 };
var product =
from word in Keywords
from number in numbers
select new {word, number};
This would produce a sequence of pairs (abstract
, 1
), (abstract
, 2
), (abs
tract, 3
), (as
, 1
), (as
, 2
), ....
Somewhat surprisingly, adding query expressions to C# 3.0 required no changes to the CLR or to the CIL language. Rather, the C# compiler simply translates query expressions into a series of method calls. Consider, for example, the query expression from Listing 15.1, a portion of which appears in Listing 15.16.
private static void ShowContextualKeywords1()
{
IEnumerable<string> selection =
from word in Keywords
where word.Contains('*')
select word;
// ...
}
// ...
After compilation, the expression from Listing 15.16 is converted to an IEnumerable<T>
extension method call from System.Linq.Enumerable
, as shown in Listing 15.17.
private static void ShowContextualKeywords3()
{
IEnumerable<string> selection =
Keywords.Where(word => word.Contains('*'));
// ...
}
// ...
As discussed in Chapter 14, the lambda expression is then itself translated by the compiler to emit a method with the body of the lambda, and the usage of it becomes allocation of a delegate to that method.
Every query expression can (and must) be translated into method calls, but not every sequence of method calls has a corresponding query expression. For example, there is no query expression equivalent for the extension method TakeWhile<T>(Func<T, bool> predicate)
, which repeatedly returns items from the collection as long as the predicate returns true
.
For those queries that do have both a method call form and a query expression form, which is better? This is a judgment call; some queries are better suited for query expressions, whereas others are more readable as method invocations.
Guidelines
DO use query expression syntax to make queries easier to read, particularly if they involve complex from
, let
, join
, or group
clauses.
CONSIDER using the standard query operators (method call form) if the query involves operations that do not have a query expression syntax, such as Count()
, TakeWhile()
, or Distinct()
.
This chapter introduced a new syntax—namely, query expressions. Readers familiar with SQL will immediately see the similarities between query expressions and SQL. However, query expressions also introduce additional functionality, such as grouping into a hierarchical set of new objects, which is unavailable with SQL. All of the functionality of query expressions was already available via standard query operators, but query expressions frequently provide a simpler syntax for expressing such a query. Whether through standard query operators or query expression syntax, however, the end result is a significant improvement in the way developers can code against collection APIs—an improvement that ultimately provides a paradigm shift in the way object-oriented languages are able to interface with relational databases.
In the next chapter, we continue our discussion of collections, by investigating some of the .NET Framework collection types and exploring how to define custom collections.
End 3.0