LINQ, or Language Integrated Query, allows you to write structured type-safe queries over local object collections and remote data sources.
LINQ lets you query any collection implementing IEnumerable<>
, whether an array, list, XML
DOM, or remote data source (such as a table in SQL Server). LINQ offers
the benefits of both compile-time type checking and dynamic query composition.
A good way to experiment with LINQ is to download LINQPad at http://www.linqpad.net. LINQPad lets you interactively query local collections and SQL databases in LINQ without any setup and is preloaded with numerous examples.
The basic units of data in LINQ are
sequences and elements. A
sequence is any object that implements the generic IEnumerable
interface,
and an element is each item in the sequence. In the following example,
names
is a sequence, and Tom
, Dick
,
and Harry
are elements:
string[] names = { "Tom", "Dick", "Harry" };
We call a sequence such as this a local sequence because it represents a local collection of objects in memory.
A query operator is a method that
transforms a sequence. A typical query operator accepts an
input sequence and emits a transformed
output sequence. In the Enumerable
class in System.Linq
, there are around 40 query
operators; all implemented as static extension methods. These are called
standard query operators.
LINQ also supports sequences that can be dynamically fed
from a remote data source such as SQL Server. These sequences
additionally implement the IQueryable<>
interface and are supported through a
matching set of standard query operators in the Queryable
class.
A query is an expression that transforms sequences with
one or more query operators. The simplest query comprises one input
sequence and one operator. For instance, we can apply the Where
operator on a simple array to extract
those names whose length is at least four characters as
follows:
string[] names = { "Tom", "Dick", "Harry" };IEnumerable<string> filteredNames =
System.Linq.Enumerable.Where (
names, n => n.Length >= 4);
foreach (string n in filteredNames) Console.Write (n + "|"); // Dick|Harry|
Because the standard query operators are implemented as
extension methods, we can call Where
directly on names
—as though it were an instance
method:
IEnumerable<string> filteredNames =
names.Where
(n => n.Length >= 4);
(For this to compile, you must import the System.Linq
namespace with a using
directive.) The Where
method in System.Linq.Enumerable
has the following
signature:
static IEnumerable<TSource> Where<TSource> ( this IEnumerable<TSource> source, Func<TSource,bool> predicate)
source
is the input
sequence; predicate
is a
delegate that is invoked on each input element.
The Where
method includes all
elements in the output sequence, for which the
delegate returns true
. Internally,
it’s implemented with an iterator—here’s its source code:
foreach (TSource element in source) if (predicate (element)) yield return element;
Another fundamental query operator is the Select
method. This transforms
(projects) each element in the input sequence
with a given lambda expression:
string[] names = { "Tom", "Dick", "Harry" }; IEnumerable<string> upperNames = names.Select
(n =>n.ToUpper()
); foreach (string n in upperNames) Console.Write (n + "|"); // TOM|DICK|HARRY|
A query can project into an anonymous type:
var query = names.Select (n => new { Name = n, Length = n.Length }); foreach (var row in query) Console.WriteLine (row);
Here’s the result:
{ Name = Tom, Length = 3 } { Name = Dick, Length = 4 } { Name = Harry, Length = 5 }
The original ordering of elements within an input sequence
is significant in LINQ. Some query operators rely on this behavior,
such as Take
, Skip
, and Reverse
. The Take
operator outputs the first
x elements, discarding the rest:
int[] numbers = { 10, 9, 8, 7, 6 }; IEnumerable<int> firstThree = numbers.Take (3); // firstThree is { 10, 9, 8 }
The Skip
operator ignores the
first x elements, and outputs the rest:
IEnumerable<int> lastTwo = numbers.Skip (3);
Not all query operators return a sequence. The
element operators extract one element from the
input sequence; examples are First
,
Last
, Single
, and ElementAt
:
int[] numbers = { 10, 9, 8, 7, 6 }; int firstNumber = numbers.First(); // 10 int lastNumber = numbers.Last(); // 6 int secondNumber = numbers.ElementAt (2); // 8 int firstOddNum = numbers.First (n => n%2 == 1); // 9
All of these operators throw an exception if no elements are
present. To get a null/empty return value instead of an exception, use
FirstOrDefault
, LastOrDefault
, SingleOrDefault
, or ElementAtOrDefault
.
The Single
and SingleOrDefault
methods are equivalent to
First
and FirstOrDefault
except that they throw an
exception if there’s more than one match. This behavior is useful when
querying a database table for a row by primary key.
The aggregation operators return a
scalar value; usually of numeric type. The most commonly used
aggregation operators are Count
,
Min
, Max
, and Average
:
int[] numbers = { 10, 9, 8, 7, 6 }; int count = numbers.Count(); // 5 int min = numbers.Min(); // 6 int max = numbers.Max(); // 10 double avg = numbers.Average(); // 8
Count
accepts an optional
predicate, which indicates whether to include a given element. The
following counts all even numbers:
int evenNums = numbers.Count (n => n % 2 == 0); // 3
The Min
, Max
, and Average
operators accept an optional
argument that transforms each element prior to it being
aggregated:
int maxRemainderAfterDivBy5 = numbers.Max (n => n % 5); // 4
The following calculates the root-mean-square of numbers
:
double rms = Math.Sqrt (numbers.Average (n => n * n));
The quantifiers return a bool
value. The quantifiers are as follows:
Contains
, Any
, All
,
and SequenceEquals
(which compares
two sequences):
int[] numbers = { 10, 9, 8, 7, 6 }; bool hasTheNumberNine = numbers.Contains (9); // true bool hasMoreThanZeroElements = numbers.Any(); // true bool hasOddNum = numbers.Any (n => n % 2 == 1); // true bool allOddNums = numbers.All (n => n % 2 == 1); // false
The set operators accept two
same-typed input sequences. Concat
appends one sequence to
another; Union
does the same but
with duplicates removed:
int[] seq1 = { 1, 2, 3 }, seq2 = { 3, 4, 5 }; IEnumerable<int> concat = seq1.Concat (seq2), // { 1, 2, 3, 3, 4, 5 } union = seq1.Union (seq2), // { 1, 2, 3, 4, 5 }
The other two operators in this category are Intersect
and Except
:
IEnumerable<int> commonality = seq1.Intersect (seq2), // { 3 } difference1 = seq1.Except (seq2), // { 1, 2 } difference2 = seq2.Except (seq1); // { 4, 5 }
An important feature of many query operators is that they
execute not when constructed, but when enumerated (in other words, when
MoveNext
is called on its
enumerator). Consider the following query:
var numbers = new List<int> { 1 }; numbers.Add (1); IEnumerable<int> query = numbers.Select (n => n * 10); numbers.Add (2); // Sneak in an extra element foreach (int n in query) Console.Write (n + "|"); // 10|20|
The extra number that we sneaked into the list
after constructing the query is included in the
result, because it’s not until the foreach
statement runs that any filtering or
sorting takes place. This is called deferred or
lazy evaluation. Deferred execution decouples query
construction from query
execution, allowing you to construct a query in
several steps, as well as making it possible to query a database without
retrieving all the rows to the client. All standard query operators
provide deferred execution, with the following exceptions:
The conversion operators are handy, in part, because they defeat lazy evaluation. This can be useful to “freeze” or cache the results at a certain point in time, to avoid reexecuting a computationally intensive or remotely sourced query such as a LINQ to SQL table. (A side effect of lazy evaluation is that the query gets reevaluated should you later reenumerate it.)
The following example illustrates the ToList
operator:
var numbers = new List<int>() { 1, 2 };
List<int> timesTen = numbers
.Select (n => n * 10)
.ToList(); // Executes immediately into a List<int>
numbers.Clear();
Console.WriteLine (timesTen.Count); // Still 2
Subqueries provide another level of indirection. Everything in a
subquery is subject to deferred execution—including aggregation and
conversion methods, because the subquery is itself executed only
lazily upon demand. Assuming names
is a string array, a subquery looks like this:
names.Where (
n => n.Length ==
names.Min (n2 => n2.Length)
)
It is possible to divide the standard query operators (as implemented in the System.Linq.Enumerable
class) into 12
categories, summarized in Table 1-1.
Table 1-1. Query operator categories
Category | Description | Deferred execution? |
---|---|---|
Filtering | Returns a subset of elements that satisfy a given condition | Yes |
Transforms each element with a lambda function, optionally expanding subsequences | Yes | |
Joining | Meshes elements of one collection with another, using a time-efficient lookup strategy | Yes |
Ordering | Returns a reordering of a sequence | Yes |
Grouping | Groups a sequence into subsequences | Yes |
Accepts two same-typed sequences, and returns their commonality, sum, or difference | Yes | |
Element | Picks a single element from a sequence | No |
Performs a computation over a sequence, returning a scalar value (typically a number) | No | |
Performs a computation over a
sequence, returning | No | |
Converts a nongeneric sequence to a (queryable) generic sequence | Yes | |
Conversion: Export | Converts a sequence to an array, list, dictionary, or lookup, forcing immediate evaluation | No |
Generation | Manufactures a simple sequence | Yes |
Table 1-2 through Table 1-13 summarize each query operator. The operators shown in bold have special support in C# (see Query Expressions).
Table 1-2. Filtering operators
Method | Description |
---|---|
Returns a subset of elements that satisfy a given condition | |
Returns the first x elements, and discards the rest | |
Ignores the first x elements, and returns the rest | |
Emits elements from the input sequence until the given predicate is true | |
Ignores elements from the input sequence until the given predicate is true, and then emits the rest | |
Returns a collection that excludes duplicates |
Table 1-8. Element operators
Method | Description |
---|---|
Returns the first element in the sequence, or the first element satisfying a given predicate | |
Returns the last element in the sequence, or the last element satisfying a given predicate | |
Equivalent to | |
Returns the element at the specified position | |
Returns a single-value sequence whose
value is null or |
Table 1-9. Aggregation operators
Method | Description |
---|---|
Returns the total number of elements in the input sequence, or the number of elements satisfying a given predicate | |
Returns the smallest or largest element in the sequence | |
Calculates a numeric sum or average over elements in the sequence | |
Performs a custom aggregation |
To build more complex queries, you chain query operators together. For example, the following query extracts all strings containing the letter a, sorts them by length, and then converts the results to uppercase:
string[] names = { "Tom","Dick","Harry","Mary","Jay" }; IEnumerable<string> query = names .Where (n => n.Contains ("a")) .OrderBy (n => n.Length) .Select (n => n.ToUpper()); foreach (string name in query) Console.Write (name + "|"); // RESULT: JAY|MARY|HARRY|
Where
, OrderBy
, and Select
are all standard query operators that
resolve to extension methods in the Enumerable
class. The Where
operator emits a filtered version of the
input sequence; OrderBy
emits a
sorted version of its input sequence; Select
emits a sequence where each input
element is transformed or projected with a given
lambda expression (n.ToUpper()
, in
this case). Data flows from left to right through the chain of
operators, so the data is first filtered, then sorted, then projected.
The end result resembles a production line of conveyor belts, as
illustrated in Figure 1-6.
Deferred execution is honored throughout with operators, so no filtering, sorting, or projecting takes place until the query is actually enumerated.
So far, we’ve written queries by calling extension methods
in the Enumerable
class. In this
book, we describe this as fluent syntax. C# also provides
special language support for writing queries, called query
expressions. Here’s the preceding query expressed as a query
expression:
IEnumerable<string> query = from n in names where n.Contains ("a") orderby n.Length select n.ToUpper();
A query expression always starts with a from
clause, and ends with either a select
or group
clause. The from
clause declares a range
variable (in this case, n
)
which you can think of as traversing the input collection—rather like
foreach
. Figure 1-7 illustrates the complete syntax.
If you’re familiar with SQL, LINQ’s query expression syntax—with
the from
clause first and the
select
clause last—might look
bizarre. Query expression syntax is actually more logical because the
clauses appear in the order they’re executed. This
allows Visual Studio to prompt you with IntelliSense as you type, as
well as simplifying the scoping rules for subqueries.
The compiler processes query expressions by translating them to
fluent syntax. It does this in a fairly mechanical fashion—much like it
translates foreach
statements into
calls to GetEnumerator
and MoveNext
:
IEnumerable<string> query = names .Where (n => n.Contains ("a")) .OrderBy (n => n.Length) .Select (n => n.ToUpper());
The Where
, OrderBy
, and Select
operators then resolve using the same
rules that would apply if the query were written in fluent syntax. In
this case, they bind to extension methods in the Enumerable
class (assuming you’ve imported the
System.Linq
namespace) because
names
implements IEnumerable<string>
. The compiler
doesn’t specifically favor the Enumerable
class, however, when translating
query syntax. You can think of the compiler as mechanically injecting
the words Where, OrderBy, and
Select into the statement, and then compiling it as
though you’d typed the method names yourself. This offers flexibility in
how they resolve—the operators in LINQ to SQL and Entity Framework
queries, for instance, bind instead to the extension methods in the
Queryable
class.
Query expressions and fluent queries each have advantages.
Query expressions support only a small subset of query operators, namely:
Where, Select, SelectMany OrderBy, ThenBy, OrderByDescending, ThenByDescending GroupBy, Join, GroupJoin
For queries that use other operators, you must either write entirely in fluent syntax or construct mixed-syntax queries, for instance:
string[] names = { "Tom","Dick","Harry","Mary","Jay" };
IEnumerable<string> query =
from n in names
where n.Length == names.Min (n2 => n2.Length)
select n;
This query returns names whose length matches that of the
shortest (“Tom” and “Jay”). The subquery (in bold) calculates the
minimum length of each name, and evaluates to 3. We have to use fluent
syntax for the subquery, because the Min
operator has no support in query
expression syntax. We can, however, still use query syntax for the
outer query.
The main advantage of query syntax is that it can radically simplify queries that involve the following:
The let
keyword introduces
a new variable alongside the range variable. For instance, suppose we
want to list all names whose length, without vowels, is greater than two
characters:
string[] names = { "Tom","Dick","Harry","Mary","Jay" };
IEnumerable<string> query =
from n in names
let vowelless = Regex.Replace (n, "[aeiou]", "")
where vowelless.Length > 2
orderby vowelless
select n + " - " + vowelless;
The output from enumerating this query is:
Dick - Dck Harry - Hrry Mary - Mry
The let
clause performs a
calculation on each element, without losing the original element. In our
query, the subsequent clauses (where
,
orderby
, and select
) have access to both n
and vowelless
. A query can include any
multiple let
clauses, and they can be
interspersed with additional where
and join
clauses.
The compiler translates the let
keyword by projecting into a temporary anonymous type that contains both
the original and transformed elements:
IEnumerable<string> query = names .Select (n => new { n = n, vowelless = Regex.Replace (n, "[aeiou]", "") } ) .Where (temp0 => (temp0.vowelless.Length > 2)) .OrderBy (temp0 => temp0.vowelless) .Select (temp0 => ((temp0.n + " - ") + temp0.vowelless))
If you want to add clauses after a
select
or group
clause, you must use the into
keyword to “continue” the query. For
instance:
from c in "The quick brown tiger".Split()
select c.ToUpper() into upper
where upper.StartsWith ("T")
select upper
// RESULT: "THE", "TIGER"
Following an into
clause, the
previous range variable is out of scope.
The compiler translates queries with an into
keyword simply into a longer chain of
operators:
"The quick brown tiger".Split() .Select (c => c.ToUpper()) .Where (upper => upper.StartsWith ("T"))
(It omits the final Select(upper=>upper)
because it’s redundant.)
A query can include multiple generators (from
clauses). For example:
int[] numbers = { 1, 2, 3 }; string[] letters = { "a", "b" }; IEnumerable<string> query =from n in numbers
from l in letters
select n.ToString() + l;
The result is a cross product, rather like you’d get with nested
foreach
loops:
"1a", "1b", "2a", "2b", "3a", "3b"
When there’s more than one from
clause in a query, the compiler emits a call to SelectMany
:
IEnumerable<string> query = numbers.SelectMany
(n => letters,
(n, l) => (n.ToString() + l));
SelectMany
performs nested looping. It enumerates every element in the source collection (numbers
), transforming each element with the
first lambda expression (letters
).
This generates a sequence of subsequences, which it
then enumerates. The final output elements are determined by the second
lambda expression (n.ToString()+l
).
If you subsequently apply a where
clause, you can filter the cross product
and project a result akin to a join:
string[] players = { "Tom", "Jay", "Mary" };
IEnumerable<string> query =
from name1 in players
from name2 in players
where name1.CompareTo (name2) < 0
orderby name1, name2
select name1 + " vs " + name2;
RESULT: { "Jay vs Mary", "Jay vs Tom", "Mary vs Tom" }
The translation of this query into fluent syntax is more complex, requiring a temporary anonymous projection. The ability to perform this translation automatically is one of the key benefits of query expressions.
The expression in the second generator is allowed to use the first range variable:
string[] fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" }; IEnumerable<string> query = fromfullName
in fullNames from name infullName.Split()
select name + " came from " + fullName; Anne came from Anne Williams Williams came from Anne Williams John came from John Fred Smith
This works because the expression fullName.Split
emits a sequence (an array of
strings).
Multiple generators are used extensively in database queries, to flatten parent-child relationships and to perform manual joins.
LINQ provides three joining operators, the
main ones being Join
and GroupJoin
which perform keyed lookup-based
joins. Join
and GroupJoin
support only a subset of the
functionality you get with multiple generators/SelectMany
, but are more performant with local queries because they
use a hashtable-based lookup strategy rather than performing nested
loops. (With LINQ to SQL and Entity Framework queries, the joining
operators have no advantage over
multiple generators.)
Join
and GroupJoin
support
equi-joins only (i.e., the joining condition must
use the equality operator). There are two methods: Join
and GroupJoin
. Join
emits a flat result set whereas GroupJoin
emits a hierarchical result
set.
The query expression syntax for a flat join is:
fromouter-var
inouter-sequence
joininner-var
ininner-sequence
onouter-key-expr
equalsinner-key-expr
For example, given the following collections:
var customers = new[] { new { ID = 1, Name = "Tom" }, new { ID = 2, Name = "Dick" }, new { ID = 3, Name = "Harry" } }; var purchases = new[] { new { CustomerID = 1, Product = "House" }, new { CustomerID = 2, Product = "Boat" }, new { CustomerID = 2, Product = "Car" }, new { CustomerID = 3, Product = "Holiday" } };
we could perform a join as follows:
IEnumerable<string> query =from c in customers
join p in purchases on c.ID equals p.CustomerID
select c.Name + " bought a " + p.Product;
The compiler translates this to:
customers
.Join ( // outer collectionpurchases
, // inner collection c =>c.ID
, // outer key selector p =>p.CustomerID
, // inner key selector (c, p) => // result selector c.Name + " bought a " + p.Product );
Here’s the result:
Tom bought a House Dick bought a Boat Dick bought a Car Harry bought a Holiday
With local sequences, Join
and
GroupJoin
are more efficient at
processing large collections than SelectMany
because they first preload the
inner sequence into a keyed hashtable-based lookup. With a database
query, however, you could achieve the same result equally efficiently as
follows:
from c in customers
from p in purchases
where c.ID == p.CustomerID
select c.Name + " bought a " + p.Product;
GroupJoin
does the same work as Join
, but instead of yielding a flat result,
it yields a hierarchical result, grouped by each outer element.
The query expression syntax for GroupJoin
is the same as for Join
, but is followed by the into
keyword. Here’s a basic example, using the customers
and purchases
collections we set up in the
previous section:
IEnumerable<IEnumerable<Purchase>> query =
from c in customers
join p in purchases on c.ID equals p.CustomerID
into custPurchases
select custPurchases; // custPurchases is a sequence
An into
clause translates
to GroupJoin
only when it appears directly after a join
clause. After a select
or group
clause it means query
continuation. The two uses of the into
keyword are quite different, although
they have one feature in common: they both introduce a new query
variable.
The result is a sequence of sequences, which we could enumerate as follows:
foreach (IEnumerable<Purchase> purchaseSequence in query) foreach (Purchase p in purchaseSequence) Console.WriteLine (p.Description);
This isn’t very useful, however, because outerSeq
has no reference to the outer customer. More
commonly, you’d reference the outer range variable in the
projection:
from c in customers join p in purchases on c.ID equals p.CustomerIDinto custPurchases
select new { CustName = c.Name, custPurchases };
We could obtain the same result (but less efficiently, for local queries) by projecting into an anonymous type that included a subquery:
from c in customers select new { CustName = c.Name, custPurchases = purchases.Where (p => c.ID == p.CustomerID) }
Zip
is the simplest joining operator. It enumerates two
sequences in step (like a
zipper), returning a sequence based on applying a function over each element
pair. For example:
int[] numbers = { 3, 5, 7 };
string[] words = { "three", "five", "seven", "ignored" };
IEnumerable<string> zip =
numbers.Zip (words, (n, w) => n + "=" + w)
;
produces a sequence with the following elements:
3=three 5=five 7=seven
Extra elements in either input sequence are ignored. Zip
is not supported when querying a
database.
The orderby
keyword sorts
a sequence. You can specify any number of expressions upon which to
sort:
string[] names = { "Tom","Dick","Harry","Mary","Jay" };
IEnumerable<string> query = from n in names
orderby n.Length, n
select n;
This sorts first by length, and then by name, so the result is:
Jay, Tom, Dick, Mary, Harry
The compiler translates the first orderby
expression to a call to OrderBy
, and subsequent expressions to a call
to ThenBy
:
IEnumerable<string> query = names .OrderBy (n => n.Length) .ThenBy (n => n)
The ThenBy
operator
refines rather than replaces
the previous sorting.
You can include the descending
keyword after any of the orderby
expressions:
orderby n.Length descending
, n
This translates to:
.OrderByDescending (n => n.Length).ThenBy (n => n)
GroupBy
organizes a flat input sequence into sequences of
groups. For example, the following groups a
sequence of names by their length:
string[] names = { "Tom","Dick","Harry","Mary","Jay" }; var query = from name in names group name by name.Length;
The compiler translates this query into:
IEnumerable<IGrouping<int,string>> query = names.GroupBy (name => name.Length);
Here’s how to enumerate the result:
foreach (IGrouping<int,string> grouping in query) { Console.Write (" Length=" +grouping.Key
+ ":"); foreach (string name ingrouping
) Console.Write (" " + name); } Length=3: Tom Jay Length=4: Dick Mary Length=5: Harry
Enumerable.GroupBy
works by
reading the input elements into a temporary dictionary of lists so that
all elements with the same key end up in the same sublist. It then emits
a sequence of groupings. A grouping is a sequence
with a Key
property:
public interface IGrouping <TKey,TElement> : IEnumerable<TElement>, IEnumerable { // Key applies to the subsequence as a whole TKey Key { get; } }
By default, the elements in each grouping are untransformed input
elements, unless you specify an elementSelector
argument. The following
projects each input element to uppercase:
from name in names group name.ToUpper() by name.Length
which translates to this:
names.GroupBy ( name => name.Length, name => name.ToUpper() )
The subcollections are not emitted in order of key. GroupBy
does no sorting
(in fact, it preserves the original ordering). To sort, you must add an
OrderBy
operator (which means first
adding an into
clause, because
group by
ordinarily ends a
query):
from name in names
group name.ToUpper() by name.Length into grouping
orderby grouping.Key
select grouping
Query continuations are often used in a group by
query. The next query filters out
groups that have exactly two matches in them:
from name in names group name.ToUpper() by name.Lengthinto grouping
where grouping.Count() == 2
select grouping
OfType
and Cast
accept a
nongeneric IEnumerable
collection and
emit a generic IEnumerable<T>
sequence that you can subsequently
query:
var classicList = new System.Collections.ArrayList();
classicList.AddRange ( new int[] { 3, 4, 5 } );
IEnumerable<int> sequence1 = classicList.Cast<int>();
This is useful because it allows you to query collections written
prior to C# 2.0 (when IEnumerable<T>
was introduced), such as
ControlCollection
in System.Windows.Forms
.
Cast
and OfType
differ in their behavior when
encountering an input element that’s of an incompatible type: Cast
throws an exception whereas OfType
ignores the incompatible
element.
The rules for element compatibility follow those of C#’s is
operator. Here’s the internal
implementation of Cast
:
public static IEnumerable<TSource> Cast <TSource> (IEnumerable source) { foreach (object element in source) yield return (TSource)element; }
C# supports the Cast
operator
in query expressions—simply insert the element type immediately after
the from
keyword:
from int
x in classicList ...
from x in classicList.Cast <int>()
...