With the introduction of LINQ, the entire language for how we interact with collections of objects, data, XML files, and any other IEnumerable<T>
structure has been updated. LINQ provides a unified language—a single syntax for dealing with diverse structures. It opens many exciting possibilities for reducing the volume of source code.
In this chapter, I will show you how to take advantage of the features in LINQ to CSLA in order to reduce the amount of source code you write dealing with CSLA .NET collections. I will then explain the implementation of features in LINQ to CSLA, including indexing and making the results of LINQ queries bindable.
Consider the following code, written without using LINQ:
List<Project> lateProjects = new List<Project>();
foreach(Project project in allProjects)
{
if (project.DueDate < DateTime.Now)
{
lateProjects.Add(project);
}
}
LINQ replaces the previous eight lines of code with four lines of code that reveal the intent more clearly:
var lateProjects =
from project in allProjects
where project.DueDate < DateTime.Now
select project;
Since a goal of CSLA .NET is to reduce the amount of code that you have to write, making CSLA .NET work well with LINQ is a requirement for CSLA .NET 3.5 and higher.
In the first example, not only are you stating what you want, but you're also writing exactly how those objects are going to be found. Using LINQ, by contrast, you're writing code at a higher level and specifying what you want (projects where the due date is earlier than now). However, you leave to the collection how you want the query to be handled. By having business code focus on the what rather than on the how, you can allow for further optimizations by the framework.
In fact, one optimization in CSLA .NET is a new Indexable
attribute on properties of a child class. The Indexable
attribute provides hints to the CSLA .NET collection classes about where they might create indexes. The collection classes, in turn, create and maintain indexes on properties in their child classes that are marked Indexable
. Any code written the old way—because it specified not only what but how—would not be able to take advantage of that optimization without recoding. The latter example, by contrast, requires no updates when you decide to use indexes with your collection. I'll discuss indexes more in the next section.
While LINQ provides some great opportunities, it also presents special challenges to the implementers of frameworks. One of the chief reasons for implementing a business object framework is the reduction of UI code you need to write to implement a solution. Therefore, the ability to bind collections that inherit from BusinessListBase
is a critical component of a framework like CSLA .NET.
Just as collections that inherit from BusinessListBase
should work properly with binding, so should the result of LINQ queries that are run over collections of BusinessListBase
objects. Unfortunately, the result of a standard query from LINQ to Objects is a type called Sequence<U>
. As shown in Figure 14-1, this new collection doesn't have awareness of the collection from which it is generated. This means that operations on the result of the query, such as change tracking and collection-level validation, will fail to work as expected, due to the fact that the new collection has no awareness of the old one.
The overall goal in the development of LINQ to CSLA .NET is to make it possible to use LINQ with CSLA .NET collections and have data binding behavior work as expected. Namely, the goal is to make it so that you can bind to the result of a LINQ query performed against a CSLA .NET collection and have all the behavior you would expect when binding to the entire collection itself. Later in this chapter, I'll provide more detail about how this works.
Unlike most standard collections in the .NET Framework, such as List<T>
and Dictionary<K,V>
, collections in CSLA .NET are aware of changes that occur in the objects they contain. Because of this, my colleague Aaron Erickson and I have added the ability for business collections to optimize queries through the implementation of indexes. Let's say you're using LINQ to query against CSLA .NET collections under CSLA .NET 3.5 or higher. If you've marked fields as indexable using the Indexable
attribute, then the implementation of Where()
—the method that makes filtering work in the query—will take advantage of indices internal to the collection.
Running a query against any IEnumerable<T>
and generating a Sequence<U>
of results using LINQ to Objects is called projection. The execution of a simple LINQ to Objects query has two main parts: a where
clause, which applies criteria (in the form of an anonymous delegate or lambda) to find items, and a select
clause, which transforms each result into something else. Let's start with a typical LINQ query, such as this one that selects numbers equal to 42 from an array of numbers and returns the square of the result:
var coolNumbers = from n in allNumbers where n == 42 select n * n;
The LINQ query syntax is merely syntactic sugar that you can rewrite in more traditional form as follows:
var coolNumbers = allNumbers.Where(n => n == 42).Select(n => n*n);
This core mechanism—filter and project—is at the heart of every LINQ query. The filter (where
) determines which items from the collection matter, and the projection (select
) transforms each result from one thing into another thing.
The term projection refers to the collection you can generate from the result of the Select()
method.
Not all projections are equivalent. For example, let's say you call Select()
like this: .Select(n => n*0)
. In this case, it is impossible to determine how a change in the projected object should be reflected in the original. Since each result of n*0
is always 0
, there is no information in the resulting object that would allow you to know which object it maps back to in the source collection. Because of the lack of a definitive mapping, synchronization of any individual item is impossible.
That said, in certain cases—such as the projection of the same object type you started with in the original collection (an identity projection) where you're operating on exactly the same object type you started with—you don't have to worry about trying to figure out how changes in the projection reflect back to changes in the original. You can map an object in the projected collection back to the source collection in a way that is deterministic. This fact makes it possible for CSLA .NET to allow for binding to this type of projection.
An identity projection is a query in which the result of the query contains items of the same type as the source collection. In CSLA .NET 3.5 and higher, identity projections are handled in a special way that allows for binding to the projection. CSLA .NET changes the behavior of queries such that if you run an identity projection over BusinessListBase
, you will get a result known as a LinqBindingList<T>
.
The best way to understand LinqBindingList
is to think about it as a filter over BusinessListBase
. When you remove an item from a LinqBindingList
, it not only removes the item from the list, but it removes the item from the source list, and it removes any other LinqBindingList
instances from queries generated earlier from the same source BusinessListBase
, as shown in Figure 14-2. If you change a property in an item from within the LinqBindingList
, the n-level undo mechanism in BusinessListBase
will track the change.
Adding an item to a LinqBindingList
is possible, but only items that meet the criteria that were present upon creating the list using LINQ will appear in the LinqBindingList
. On the other hand, adding an item to a LinqBindingList
adds the item to the original list as well as any other LinqBindingList
instances that resulted from queries against the original collection of business objects.
In the event that you decide to do a query from a LinqBindingList
, another list of type LinqBindingList
will be generated. It will act as though it were generated from the original list—not the sublist—and will exhibit all the behaviors described previously.
LinqBindingList
is a class that is only ever meant to be produced from an identity projection performed over a class that derives from BusinessListBase
. In this respect, it is similar to the Sequence
class that LINQ to Objects provides, in that it is strictly the result of a LINQ query and not a structure that you would ever create outside of this context.
When searching for items in a BusinessListBase
- or ReadOnlyListBase
-based list, CSLA .NET is capable of doing an optimized search using an internal index if your child class marks the properties you are searching on as Indexable
.
When either of the main collection classes in CSLA .NET encounters a child class with an Indexable
attribute on a property, an internal index is built the first time a search using LINQ is conducted on that class. In cases where the property that is considered Indexable
is IComparable
, the indexing mechanism is a red-black tree. In cases where the property to be indexed is not IComparable
(such as Color
), the indexing mechanism is based on Dictionary<T>
. Common IComparable
property types that tend to be indexed are String, int
, and DateTime
, although others with custom IComparable
implementations are possible.
CSLA .NET supports the ability to plug in your own indexing provider via the IndexingProvider
property on BusinessListBase
and ReadOnlyListBase
. Note that doing so is not for the faint of heart. I provide more information about how to implement such a provider at the end of this chapter.
In either case, it is important to understand that all optimizations come at a cost. By implementing indexing on your objects, you are making add, remove, and update operations slower on your objects, and, in exchange, gaining tremendous benefits for optimization of searches. When you use indexing properly on large collections of business objects in memory on fields where comparison is a costly operation (such as strings), you can gain performance improvements that are several orders of magnitude faster. However, using indexing on fields that are not searched frequently and/or are updated frequently can actually cause performance to degrade. It is critical to have a sense for how you're going to use your child class when you start to use the Indexable
attribute on properties within it, just like you would be careful about how you apply an index to a database table.
When a CSLA .NET collection is serialized, the indices within the collection do not pass the serialization boundary. This would be possible in theory, but the performance implications of having not only the objects themselves but all the indices on those objects get passed over the wire are not insignificant. In addition, the indexing mechanism depends on hash-code generation of objects that is only consistent at the scope of a physical machine—that is, hash codes are not guaranteed to be equivalent on, say, a 64-bit operating system and a 32-bit operating system—so it would be impractical to translate the index values during serialization in the absence of a generic hash code generator that could guarantee durability across machine boundaries.
While the index itself is not passed over the serialization boundary, the index is re-created according to the options specified on the child class. This typically happens upon the first instance of a query that utilizes the index on the other side of the serialization boundary.
When marking a child class using the Indexable
attribute, you can further specify the behavior of the index. In some cases, you might think it is desirable to force the creation of an index for a child class. For example, if you know for sure that you are going to use the index and you don't want to take the performance hit the first time you use it, you would use the IndexModeAlways
option on the attribute. This way, when items are added, they're indexed immediately rather than the first time a query needs it. On the other hand, if it is important to optimize for load time on the collection, and if it is acceptable to take the hit on index creation the first time you do a query that may take advantage of the index, you would use IndexModeOnDemand
. The IndexModeNever
option is provided to mark cases in which you want to ensure that indexes are never created on a field.
Table 14-1 summarizes the parameters of the Indexable
attribute.
Table 14.1. Index Modes
Parameter | Description |
---|---|
| Load an index only when needed. |
| Always load and maintain an index for this property. |
| Never create an index for this property. |
The following example illustrates the use of the Indexable
attribute on a child class:
class Student : BusinessBase {
[Indexable(IndexModeEnum.IndexModeNever)]
public byte[] Photo {get; set; } //we never search by photo
[Indexable(IndexModeEnum.IndexModeAlways)]
public string SSN {get; set; } //we search by SSN all the time
[Indexable(IndexModeEnum.IndexModeOnDemand)]
public string FirstName {get; set; } //not as common of a search
[Indexable]
public string LastName {get; set; } // default is IndexModeOnDemand
}
When fully implemented, the IQueryable
interface is the mechanism by which anyone can provide his own implementation of LINQ. LINQ to XML, LINQ to SQL, and dozens of other LINQ providers all implement this interface. Thus, in order to provide the expected behavior when you run a LINQ query against BusinessListBase, IQueryable
has been implemented for BusinessListBase
.
The IQueryable
interface mandates only three members be implemented. Table 14-2 shows which members are required for an IQueryable
implementation.
Table 14.2. IQueryable Members
Member | Description |
---|---|
| Provides the source type for the implementation |
| The most recent expression tree associated with the collection |
| The |
ElementType
and Expression
both have reasonably simple implementations. In CSLA .NET, ElementType
simply returns an instance of the child type as follows:
public Type ElementType
{
get { return typeof(C); }
}
The Expression
property is also fairly simple. It returns the expression tree associated with the collection (I'll cover expression trees in the next section). In BusinessListBase
, this represents the entire collection, because you use LinqBindingList
rather than BusinessListBase
to represent the result of a query. In BusinessListBase, Expression
returns a private backing field where the current expression is held.
public Expression Expression
{
get {
if (_expression == null)
_expression = Expression.Constant(this);
return _expression;
}
}
The last member in any IQueryable
implementation is the Provider
property. The Provider
is where the core of any IQueryable
implementation really occurs. The property itself is fairly simple, returning a new instance of the CslaQueryProvider
:
public IQueryProvider Provider
{
get {
return new Linq.CslaQueryProvider<T, C>(this);
}
}
CslaQueryProvider
is the CSLA .NET implementation of the IQueryProvider
interface, which defines a custom implementation of LINQ. Of course, the devil is in the details, and understanding how an IQueryProvider
works is key to understanding what LINQ to CSLA .NET is doing behind the scenes.
The key to understanding LINQ is to understand the role of expression trees in the implementation of LINQ. Expression trees are representations of source code as data. Consider the following code passed to the Where()
method in LINQ:
var result = someCollection.Where( x => x.SomeVal == 42);
The parameter provided to the Where()
method is a lambda function—a more compact syntax for an anonymous method. However, when Where()
is invoked, the lambda function is converted into a data structure in the form of an expression tree that looks like Figure 14-3.
Expression trees allow for code to be executed by alternative means—that is, the code can be translated to other forms, such as SQL. For example, in LINQ to SQL, the previous expression will get converted into the following SQL query:
SELECT * FROM someCollection WHERE SomeVal = 42
What the expression gets converted to in other LINQ providers will vary according to the provider. For example, LINQ to XML could convert the expression to an XPath expression that ordinary code in the System.Xml
namespace could execute.
In CSLA .NET, the key differences in how expressions are handled occur in two areas. First, the optimization of Where takes advantage of indexing (which I cover in detail later in this chapter). Second, CSLA .NET makes sure that LINQ queries against BusinessListBase
return a LinqBindingList
that synchronizes with the collection from which it was originally queried, rather than a Sequence
that features no such synchronization. The indexing code analyzes the expression passed to the Where
clause to determine if indexing is possible. The synchronization code analyzes the expression passed to the Select
clause to determine if the projection is an identity projection, and thus, able to be synchronized.
The IQueryProvider
interface contains, surprisingly, only four members, as shown in Table 14-3.
Table 14.3. IQueryProvider Members
Member | Description |
---|---|
| Handles LINQ methods that return an |
|
|
| Handles LINQ methods that return an |
|
|
Of course, within this somewhat simple interface lies a great deal of complexity. The parameter provided to CreateQuery()
or Execute()
is an object of type Expression
, which we must evaluate in a manner that returns an expected result. To further complicate this, when you choose to implement your own IQueryProvider
, you give up the default LINQ to Objects implementation and are required to implement the entire range of possible Expression
objects that might be passed in.
Evaluating Expression for Synchronization Support
To support synchronization, you must first determine whether you're handling an identity projection. That test, which is in CreateQuery()
, is surprisingly simple. This only applies to the generic overload of CreateQuery() (CreateQuery<TResult>())
, because in that version, it is known what type of IQueryable
needs to be returned.
typeof(TElement) == typeof(C)
If this test passes, then you have an identity projection, because you'll be returning a collection that is of the same type as the source collection. In this case, you can return a LinqBindingList<C>
object that will have all the appropriate information to allow for further query execution.
_filter = new LinqBindingList<C>(_parent, this, expression);
_filter.BuildFilterIndex();
return (IQueryable<TElement>)_filter;
Because there are corner cases in which the LinqBindingList
needs to be referenced by the CslaQueryProvider
object later on, you store it as a private field. The LinqBindingList
needs to store three critical values, so the constructor takes three parameters: the original collection, the IQueryProvider
instance, and the expression. The original collection is the _parent
parameter. The IQueryProvider
instance, which holds query execution state, is the this
parameter. The expression, which allows the LinqBindingList
to continue query execution when query results are requested, is the expression
parameter.
Handling Non-Identity Projections
When the query does not return an identity projection, the CslaQueryProvider
still needs to return the expected result. Also, non-identity projections still should be able to take advantage of the indexing features of CSLA .NET. In such cases, control of the query, outside of handling for the Where()
method, is passed to the default implementation provided in LINQ to Objects.
Passing Control to the Default Handler
CreateQuery()
and Execute()
are called not only when a query is being established or filtered, but also when other LINQ extension methods are invoked, such as Count(), First()
, and many others. To understand why the implementation of IQueryProvider
works the way it does, you need to understand how LINQ to Objects injects itself into any IEnumerable
.
All the methods in LINQ to Objects are implemented as static extension methods on the IEnumerable interface, defined in the Enumerable
and Queryable
classes in the System.Linq
namespace. In fact, this is the reason why only upon including the System.Linq
namespace do the LINQ extension methods come into scope. Implementation of IQueryable
on a custom class changes this relationship. It is, by default, an all-or-nothing relationship, which means that once you implement your own IQueryProvider
, you're expected to handle the entire set of extension methods defined on Enumerable
(or at least, pass back a NonSupportedException
).
However, in a case like CSLA .NET, a special problem is presented. Because you're overriding only part of LINQ, you have no desire to re-implement the other more than 100 members that constitute LINQ. To help with this, CSLA .NET matches the Expression
passed to CreateQuery()
or Execute()
against one of the static methods on the Enumerable
class. The following snippet from the CreateQuery()
implementation shows an example of this:
MethodCallExpression mex = (MethodCallExpression) expression;
Type listType = typeof(Enumerable);
MethodInfo[] listMethods = listType.GetMethods();
foreach (MethodInfo method in listMethods)
if (MethodsEquivalent(mex,method))
{
Type[] genericArguments = mex.Method.GetGenericArguments();
MethodInfo genericMethodInfo = method.MakeGenericMethod(genericArguments);
var testObject = genericMethodInfo.Invoke(null, paramList.ToArray());
IQueryable<TElement> objectQuery =
((IEnumerable<TElement>)testObject).AsQueryable<TElement>();
return objectQuery;
}
The code is fairly straightforward, with MethodsEquivalent()
doing most of the work to determine whether a given MethodCallExpression
(which all calls to CreateQuery()
or Execute()
pass) is logically equivalent to a given reflected method definition that comes from the Enumerable
static class.
Once the correct MethodInfo
from Enumerable
is found, it is converted into a generic method call, which is then invoked dynamically via reflection and returned to the caller. The process is similar regardless of whether CreateQuery()
or Execute()
is being called.
This technique that CSLA .NET uses to match the Expression
to the correct implementation using the MethodsEquivalent()
function should continue to work even in the event that Microsoft adds more extension methods to IEnumerable<T>
and implements them in the static Enumerable
class. However, if Microsoft were to update the class name that holds the LINQ to Objects extension methods, a change to CSLA .NET would be required to support it.
The LinqBindingList<C>
holds the details of the query generated by it (the expression and the IQueryProvider
) and a reference to the parent list, so that changes, additions, and deletions in LinqBindingList<C>
can be propagated to the parent list, and in turn, to other instances of LinqBindingList<C>
that are derived from the same source.
LinqBindingList<C>
itself implements the IQueryable<T>
interface, allowing for subqueries against it that generate further LinqBindingList<C>
instances. These subsequent LinqBindingList<C>
instances are not children of the source LinqBindingList<C>
. rather, they all have the same relationship to the original BusinessListBase<T, C>
.
Having the ability to analyze the expression tree that is passed to the Where()
method in CSLA .NET provides an opportunity to optimize the query. Any indexing implementation is going to require
Building and maintaining an index
Intercepting the normal query operation such that the query can utilize the index
Indices are maintained on fields that are marked with the Indexable
property under circumstances described earlier in this chapter. These indices are managed through the _indexSet
field, which is an object that conforms to the IIndexSet
interface.
The index set hooks into the collection by being made aware of all changes, additions, and removals that can affect the index. Calls to methods on the _indexSet
are made throughout BusinessListBase
and ReadOnlyListBase
, making it possible for the _indexSet
to maintain itself.
Implementations of IIndex
Depending on the nature of your child object, there are several ways you can find items in a list quickly. As mentioned previously, you can organize those child objects that are comparable—that is, where someObj.SomeVal < x
has meaning and is implemented—in a way that makes that query faster. CSLA .NET uses a balanced tree structure—a red-black tree—to perform this function. Other child objects that are not IComparable
(such as Color
) use a dictionary-based index instead.
One of the responsibilities of the index set is to choose an appropriate implementation of IIndex
based on the type of object being indexed. An ideal indexing strategy allows not only for equality-based queries to use the index, but also for range-based queries to use the index.
Indices on IComparable Properties: Red-Black Tree
The red-black tree structure allows you to infer the order of the items. All items to the left of a given node in the tree are guaranteed to be lower. All items to the right of a given node in the tree are guaranteed to be higher. In the less-than operation, the key to making a ranged query work becomes enumerating through the left subtree of the node that either matches or is the lowest value that is higher than the value being compared. In a greater-than operation, it is the exact reverse: enumerating through the right subtree of the node that either matches or is the highest value that is lower than the value being compared.
The performance of lookup over balanced trees, such as red-black trees, is typically log(N) + 2, while retaining a good deal of memory efficiency and the ability to quickly find those items higher or lower than a given item. The performance characteristics of this lookup are good. Because CSLA .NET uses a red-black tree—a tree that guarantees that the depth is no deeper than log(N) + 2—you know that finding one item out of one million will take, at most, 22 comparison operations. Depending on where the item is, that same item without an index can take up to a million comparison operations, since without an index, each object has to be compared.
Indices on Other Properties: Dictionary
When the property you're building an index on does not implement IComparable
, you can't use those structures in any kind of tree implementation, since placement of an item in a tree depends on item comparison. Therefore, to implement an index on a non-IComparable
property, you need to use a Dictionary
to enable fast lookup for equality-based queries.
The performance characteristics of the Dictionary
are actually a little bit faster than a red-black tree most of the time, assuming that the GetHashCode()
implementation generates a high probability that calls on different objects of your child type generate unique values. However, because there is a chance, however small, that two different objects can generate the same hash code, you may require more than the single compare operation that is typical with Dictionary
. If you were to override GetHashCode()
in a manner where you return the same number for all instances of your object, you would end up with zero benefit to using indexing at all.
Implementation of GetHashCode()
on your own is almost always a bad idea unless you really know what you're doing.
The ability to evaluate the expression passed to the Where()
extension method is the key that makes any indexing implementation in LINQ possible. LINQ to CSLA .NET evaluates the query to see if it can possibly use an index. If it is unable to use an index, it searches for the item using the same method that LINQ does by itself—namely, it compares each item and returns those items that match.
LINQ does not allow the Where()
method to have a LambdaExpression
parameter whose Body
is anything other than a BinaryExpression
—that is, something that eventually returns a true
or false
result. The nature of the Where()
operation itself is the application of a test of an object for set inclusion. All BinaryExpression
objects have Left
and Right
properties that represent either side of an expression. For example, consider the following expression:
x.MyProperty == 42
In this case, the Left
of the BinaryExpression
is x.MyProperty
, and the Right
of the BinaryExpression
is 42
. You can also evaluate the NodeType
of the BinaryExpression
to learn that it is ExpressionType.Equal
, indicating that this is an equality test.
Left Evaluation
The left side of the expression is evaluated to determine whether you're doing a query on a child property that has an index. Typically, this means that the Left
expression is a property lookup, such as x.MyProperty
. In other words, if the expression is a MemberExpression
, CSLA .NET will know that you're possibly doing a property lookup on an indexed property. It can then check the Name
of the MemberExpression
and validate whether there is an index on a child property that matches the Name
from the MemberExpression
. If that test passes—that is, if there is a property lookup on the Left
side—then you can continue doing an indexed query.
Right Evaluation
After CSLA .NET has determined that you're doing a query on an Indexable
field, it evaluates the Right
side to determine the value that you're comparing to. The contents of the right side can be much more variable, including everything from a ConstantExpression
that represents a value (such as 42
) to something more complex, such as another method call.
In the case of a ConstantExpression
, CSLA .NET simply casts it back to an object that is comparable to the property on the Left
side. Other expressions require a dynamic invocation of the expression. You do this by calling the Compile()
method on the expression, compiling the expression, and subsequently calling DynamicInvoke()
and executing the expression. The result is an object that should at least be able to be tested for equality, and possibly other operations depending on the IComparable
status of the property.
Indexed Search
Now that CSLA .NET knows it has an Indexable
property and a value to compare, it proceeds to utilize the index and uses the Dictionary
or red-black tree-based implementation of IIndex
to search the index rather than the collection itself,. If the property is not Indexable
, CSLA .NET will simply run the search in the default manner that a typical IEnumerable
would use.
While the current implementation of indexing allows you to do a great deal of optimization, you can add even more optimization if you want to put forth the effort to do so. The model for indexing, as illustrated in Figure 14-4, is designed to support the provision of your own indexing mechanism, if you wish to extend the indexing capabilities of CSLA .NET.
In the next section, I'll cover the details of this interface and classes. Should you decide to implement your own indexing provider, you'll have an understanding of the default one provided with CSLA .NET.
IIndexSet
IIndexSet<T>
is the initial interface you must implement if you wish to implement your own CSLA .NET indexing provider. Table 14-4 shows the required members for IIndexSet
. CSLA .NET base collection classes such as BusinessListBase<C, T>
and ReadOnlyListBase<C, T>
know how to leverage this interface to take advantage of indexing.
Table 14.4. IIndexSet Members
Member | Description |
---|---|
| The index item on all indexable properties |
| The index item only on |
| Removes the item from all indices |
| Removes the item only from the index specified by |
| Removes and adds the item from all indices |
| Removes and adds the item from a specific index |
| Clears all the indices for all properties |
| Clears a specific index |
| Performs a search, filtering by |
| Returns the |
| Creates an index on |
IIndexSet
needs to have a reference to the IEnumerable
collection that it is indexing, such as BusinessListBase
and other CSLA .NET collection classes. When you write a class that supports IIndexSet
, you are typically managing the set of IIndex
objects (individual indices, described in the following sections) that constitute the index set, and providing the logic for performing the search to determine whether the query can use an index.
You can think of the classes that implement this interface is coordinators for all indexing activity. Classes should evaluate the expression tree passed to Search()
, determine the correct index to use (if applicable), and then either use the index (if it is an indexed property) or perform the search on the original collection (if there is no available index).
IndexSet
The IndexSet
concrete class represents the default index provider implementation for CSLA .NET. The class maintains a set of indices appropriate for the property types that it is indexing. It provides an implementation of Search()
that determines the IComparable
status of the property being searched, determines the operation being used (such as equality and less-than operations), and assures that the appropriate operation is called on the index.
This class provides an example of most of the expression evaluation techniques I have talked about previously in this chapter. For example, the following method in the IndexSet
class determines the value on the right side of the expression passed to Search()
:
private object GetRightValue(Expression rightSide)
{
//rightside is where I get the value
...switch (rightSide.NodeType)
{
//shortcut constants, don't eval these, it will be faster
case ExpressionType.Constant:
ConstantExpression constExp = (ConstantExpression)rightSide;
return (constExp.Value);
// convert back to lambda and eval to get the value
.default:
//Lambdas can be created from expressions
...LambdaExpression evalRight = Expression.Lambda(rightSide, null);
//Compile it, invoke it, and get the resulting value
return (evalRight.Compile().DynamicInvoke(null));
}
}
The key to this code is looking at the type of node you're dealing with. All Expression
objects implement the NodeType
property, which gives you clues about the nature of the Expression
you're dealing with. You now care about only two real cases: whether you have a ConstantExpression
, allowing you to bypass a more expensive call to Compile()
and DynamicInvoke()
, or whether you do not have a constant, in which case you'll have to compile the expression, invoke it, and get the value that it will ultimately return. In either case, you get back an object that boxes the value you want to use in the Search()
operation.
Compile()
and DynamicInvoke()
are expensive operations to conduct. You don't want to be doing these operations on a per-item basis as you search through a data structure.
Determining whether you have an index or not is much simpler.
bool IIndexSet<T>.HasIndexFor(string property)
{
return _internalIndexSet.ContainsKey(property);
}
private bool HasIndexablePropertyOnLeft(Expression leftSide)
{
if (leftSide.NodeType == ExpressionType.MemberAccess)
return (
this as IIndexSet<T>).HasIndexFor(((MemberExpression)leftSide).Member.Name
);
else
return false;
}
Determining whether you have an index for a given property specified by a string is relatively simple, as you have an internal field _internalIndexSet
that is really just a Dictionary
of indices. You can call ContainsKey()
on the _internalIndexSet
to determine whether you have the index.
The examination of the expression that represents the left side of the total expression is slightly more complex. You need for this left-side expression to be a MemberExpression
in order to make any sense of it. Thankfully, this is common, given that it rarely makes sense to have anything other than a MemberExpression
on the left side. Note that this may be possible technically (i.e., x => 1 == 1
can be passed, which, in theory, would return all results).
The first thing you do is test the NodeType
to ensure that it's a MemberAccess
expression. If that is in fact the case, you know you can cast it to a MemberExpression
, which then gives you access to the name of the property you're looking at (such as the name SomeProp
on the left side, which is in an expression such as x => x.SomeProp == 42
). With this information, you have everything you need to determine whether or not you have an index on the property.
Once you have established that you have a property that is Indexable
and a value to check against the index, you can test the operation between both sides of the expression to determine which index operation to perform.
Func<T, bool> exprCompiled = expr.Compile();
BinaryExpression binExp = (BinaryExpression)expr.Body;
object val = GetRightValue(binExp.Right);
IRangeTestableIndex<T> rangedIndex;
if (_internalIndexSet[property] is IRangeTestableIndex<T>)
{
rangedIndex = (IRangeTestableIndex<T>)_internalIndexSet[property];
switch (binExp.NodeType)
{
case ExpressionType.Equal:
foreach (T item in
_internalIndexSet[property].WhereEqual(val, exprCompiled))
yield return item;
break;
case ExpressionType.LessThan:
foreach (T item in rangedIndex.WhereLessThan(val))
yield return item;
break;
//and so forth, for other operations
...
Based on the NodeType
of the BinaryExpression
object (shown as binExp
), you can determine whether you have been passed an equality test, a less-than operation, or any other logical test. Of course, you have to make sure the index is IRangeTestableIndex
before you try to use it in the context of a ranged logical operator such as less-than or greater-than. However, once you know that, you can handle each operation appropriately. If you can't handle the operation, pass it back to LINQ to Objects to handle in the default manner.
IIndex
IIndex
represents the expected methods and behavior that an index should provide. Most implementations of IIndexSet
contain one or more IIndex
objects per property that is being indexed on T
. IIndex
derives from ICollection
and adds the members shown in Table 14-5.
Table 14.5. IIndex Members
Member | Description |
---|---|
| Returns the |
| Returns the items that match the value passed to it |
| Removes and adds back the given item from the index |
| Returns whether the index has been established |
| Sets the index as not loaded anymore |
| Sets the index as loaded |
Writing an IIndex
implementation involves implementing all the methods you typically would for ICollection
and then adding implementation for the members shown in Table 14-5. A typical implementation wraps an appropriate data structure, such as a red-black tree or dictionary, or possibly even derives from it, and then adds the implementation of the methods, which make indexed operations able to work with an IIndexSet
.
The only interesting part of building an IIndex
implementation is the handling of WhereEqual()
. Typically, WhereEqual()
takes the object passed to it—which represents a value from a property of the child object—and uses some mechanism based on that value to find the item in a manner somehow faster than looking through every single one in the index. Exactly how this is done depends on the index structure. In a Dictionary
-based structure, a hash code is generated from the property, and then the Where
test is performed on each item with a matching hash code. On other structures, such as a red-black tree, the normal search mechanism for such a tree would be used to locate the appropriate key matches.
Index
The Index
concrete class represents the default index implementation for child objects in CSLA .NET that don't implement IComparable
. Like most IIndex
implementations in CSLA .NET, the Index
class implements ICollection
methods, which then handle the collection operations that are ultimately handled by a private field where the actual index is kept. The following is typical for most ICollection
methods in the Index
implementation:
void ICollection<T>.Add(T item)
{
DoAdd(item);
}
private void DoAdd(T item)
{
//_theProp is the PropertyInfo this index is indexing
if (_theProp != null)
{
int hashCode = _theProp.GetValue(item, null).GetHashCode();
if (_index.ContainsKey(hashCode))
_index[hashCode].Add(item);
else
{
List<T> newList = new List<T>(1);
newList.Add(item);
_index.Add(hashCode, newList);
}
}
}
For the most part, this aspect of index design—the implementation of the ICollection
members—is pretty straightforward. More interesting is the implementation of WhereEqual()
.
IEnumerable<T> IIndex<T>.WhereEqual(object pivotVal, Func<T, bool> expr)
{
var hashCode = pivotVal.GetHashCode();
LoadOnDemandIndex();
if (_index.ContainsKey(hashCode))
foreach (T item in _index[hashCode])
if (expr(item))
yield return item;
}
This method is a little more interesting. Because it uses a Dictionary
to implement the index behind the scenes, it needs to determine the hash code of the value it is comparing against. Since the index may not be loaded, the method attempts to load the index on demand—a call that will do nothing if the index is already loaded. Once it knows the index is loaded, it can retrieve the list of items that match the hash code, which is returned by _index[hashCode]
. Only within this list, which is almost always one or two items unless there is a serious problem with the hash code, does it need to test using expr
to make sure that the test passes.
This works because of the relationship between the result of GetHashCode()
and Equals()
. As a rule, any two objects that are equal must generate the same hash code. The reverse, however, isn't true, as two different objects can generate the same hash code.
As you would expect with an index, the result is that the number of comparisons required to find an item using an index is vastly lower than it would be without an index. In fact, the larger the collection you're using an index on, the more benefit there is to using an index.
IRangeTestableIndex<T>
extends the IIndex<T>
interface by adding support for indices that are capable of optimizing ranged query options, such as those backed by an ordered data structure like a red-black tree. Table 14-6 shows its members.
Table 14.6. IRangeTestableIndex Members
Member | Description |
---|---|
| Returns the items less than the value passed |
| Returns the items less than or equal to the value passed |
| Returns the items greater than the value passed |
| Returns the items greater than or equal to the value passed |
While it is not required to use this interface in any given indexing implementation, the built-in indexing provider, IndexSet<T>
, utilizes this interface to specify behavior for those cases where T
is IComparable
.
BalancedTreeIndex
The BalancedTreeIndex<T>
concrete class implementing IRangeTestableIndex
represents the default index implementation for child objects in CSLA .NET that implement IComparable
. It is fundamentally similar to Index
in how it manages its internal collection, but it uses a red-black tree rather than a Dictionary
for storage.
One of the methods that IRangeTestableIndex
requires is the WhereGreaterThan()
method. The implementation of the index passes control to the internal data structure as follows:
public IEnumerable<T> WhereGreaterThan(object pivotVal)
{
var balancedIndex = (IBalancedSearch<T>)_index;
return balancedIndex.ItemsGreaterThan(pivotVal);
}
The red-black tree implementation implements an interface called IBalancedSearch<T>
, which takes advantage of the data structure of a balanced tree in order to efficiently iterate through those items greater than the value passed to it.
Understanding the internals of how a red-black tree works is best addressed through others who have written extensively on the subject, notably Robert Sedgewick in his classic, Algorithms in C++ (Addison-Wesley Professional, 2002), which covers the implementation of red-black trees as well as numerous other useful low-level structures.
In this chapter, I covered how CSLA .NET works with LINQ to Objects. I showed you how to optimize the LINQ to Objects implementation to make sure it provides the expected behavior, and I explained how to use the capabilities of CSLA .NET to increase the performance of your in-memory queries. You should now understand the options for how you can index your collections in CSLA .NET, how projections work against CSLA .NET collections, and how CSLA .NET implements these features. Furthermore, you should have a reasonable idea of how you might get started to write your own indexing provider for CSLA .NET.
I'll provide information about how to use LINQ to SQL—or other LINQ providers in relation to data, for that matter—in Chapter 18, where I discuss the data portal and the various techniques you can use to implement it.