Chapter 14. LINQ to CSLA

With the introduction of LINQ, the entire language for how we interact with collections of objects, data, XML files, and any other IEnumerable<T> structure has been updated. LINQ provides a unified language—a single syntax for dealing with diverse structures. It opens many exciting possibilities for reducing the volume of source code.

In this chapter, I will show you how to take advantage of the features in LINQ to CSLA in order to reduce the amount of source code you write dealing with CSLA .NET collections. I will then explain the implementation of features in LINQ to CSLA, including indexing and making the results of LINQ queries bindable.

Reducing Code with LINQ

Consider the following code, written without using LINQ:

List<Project> lateProjects = new List<Project>();
foreach(Project project in allProjects)
{
  if (project.DueDate < DateTime.Now)
  {
    lateProjects.Add(project);
  }
}

LINQ replaces the previous eight lines of code with four lines of code that reveal the intent more clearly:

var lateProjects =
from project in allProjects
where project.DueDate < DateTime.Now
  select project;

Since a goal of CSLA .NET is to reduce the amount of code that you have to write, making CSLA .NET work well with LINQ is a requirement for CSLA .NET 3.5 and higher.

In the first example, not only are you stating what you want, but you're also writing exactly how those objects are going to be found. Using LINQ, by contrast, you're writing code at a higher level and specifying what you want (projects where the due date is earlier than now). However, you leave to the collection how you want the query to be handled. By having business code focus on the what rather than on the how, you can allow for further optimizations by the framework.

In fact, one optimization in CSLA .NET is a new Indexable attribute on properties of a child class. The Indexable attribute provides hints to the CSLA .NET collection classes about where they might create indexes. The collection classes, in turn, create and maintain indexes on properties in their child classes that are marked Indexable. Any code written the old way—because it specified not only what but how—would not be able to take advantage of that optimization without recoding. The latter example, by contrast, requires no updates when you decide to use indexes with your collection. I'll discuss indexes more in the next section.

Overview of LINQ to CSLA .NET

While LINQ provides some great opportunities, it also presents special challenges to the implementers of frameworks. One of the chief reasons for implementing a business object framework is the reduction of UI code you need to write to implement a solution. Therefore, the ability to bind collections that inherit from BusinessListBase is a critical component of a framework like CSLA .NET.

Binding to Results from LINQ to Objects

Just as collections that inherit from BusinessListBase should work properly with binding, so should the result of LINQ queries that are run over collections of BusinessListBase objects. Unfortunately, the result of a standard query from LINQ to Objects is a type called Sequence<U>. As shown in Figure 14-1, this new collection doesn't have awareness of the collection from which it is generated. This means that operations on the result of the query, such as change tracking and collection-level validation, will fail to work as expected, due to the fact that the new collection has no awareness of the old one.

Binding from a projection

Figure 14.1. Binding from a projection

The overall goal in the development of LINQ to CSLA .NET is to make it possible to use LINQ with CSLA .NET collections and have data binding behavior work as expected. Namely, the goal is to make it so that you can bind to the result of a LINQ query performed against a CSLA .NET collection and have all the behavior you would expect when binding to the entire collection itself. Later in this chapter, I'll provide more detail about how this works.

Indexed LINQ Queries

Unlike most standard collections in the .NET Framework, such as List<T> and Dictionary<K,V>, collections in CSLA .NET are aware of changes that occur in the objects they contain. Because of this, my colleague Aaron Erickson and I have added the ability for business collections to optimize queries through the implementation of indexes. Let's say you're using LINQ to query against CSLA .NET collections under CSLA .NET 3.5 or higher. If you've marked fields as indexable using the Indexable attribute, then the implementation of Where()—the method that makes filtering work in the query—will take advantage of indices internal to the collection.

LINQ and Projection

Running a query against any IEnumerable<T> and generating a Sequence<U> of results using LINQ to Objects is called projection. The execution of a simple LINQ to Objects query has two main parts: a where clause, which applies criteria (in the form of an anonymous delegate or lambda) to find items, and a select clause, which transforms each result into something else. Let's start with a typical LINQ query, such as this one that selects numbers equal to 42 from an array of numbers and returns the square of the result:

var coolNumbers = from n in allNumbers where n == 42 select n * n;

The LINQ query syntax is merely syntactic sugar that you can rewrite in more traditional form as follows:

var coolNumbers = allNumbers.Where(n => n == 42).Select(n => n*n);

This core mechanism—filter and project—is at the heart of every LINQ query. The filter (where) determines which items from the collection matter, and the projection (select) transforms each result from one thing into another thing.

Note

The term projection refers to the collection you can generate from the result of the Select() method.

Not all projections are equivalent. For example, let's say you call Select() like this: .Select(n => n*0). In this case, it is impossible to determine how a change in the projected object should be reflected in the original. Since each result of n*0 is always 0, there is no information in the resulting object that would allow you to know which object it maps back to in the source collection. Because of the lack of a definitive mapping, synchronization of any individual item is impossible.

That said, in certain cases—such as the projection of the same object type you started with in the original collection (an identity projection) where you're operating on exactly the same object type you started with—you don't have to worry about trying to figure out how changes in the projection reflect back to changes in the original. You can map an object in the projected collection back to the source collection in a way that is deterministic. This fact makes it possible for CSLA .NET to allow for binding to this type of projection.

Identity Projections and LinqBindingList<T>

An identity projection is a query in which the result of the query contains items of the same type as the source collection. In CSLA .NET 3.5 and higher, identity projections are handled in a special way that allows for binding to the projection. CSLA .NET changes the behavior of queries such that if you run an identity projection over BusinessListBase, you will get a result known as a LinqBindingList<T>.

Understanding LinqBindingList

The best way to understand LinqBindingList is to think about it as a filter over BusinessListBase. When you remove an item from a LinqBindingList, it not only removes the item from the list, but it removes the item from the source list, and it removes any other LinqBindingList instances from queries generated earlier from the same source BusinessListBase, as shown in Figure 14-2. If you change a property in an item from within the LinqBindingList, the n-level undo mechanism in BusinessListBase will track the change.

Relationships between BusinessListBase and derived instances of LinqBindingList

Figure 14.2. Relationships between BusinessListBase and derived instances of LinqBindingList

Adding an item to a LinqBindingList is possible, but only items that meet the criteria that were present upon creating the list using LINQ will appear in the LinqBindingList. On the other hand, adding an item to a LinqBindingList adds the item to the original list as well as any other LinqBindingList instances that resulted from queries against the original collection of business objects.

In the event that you decide to do a query from a LinqBindingList, another list of type LinqBindingList will be generated. It will act as though it were generated from the original list—not the sublist—and will exhibit all the behaviors described previously.

LinqBindingList is a class that is only ever meant to be produced from an identity projection performed over a class that derives from BusinessListBase. In this respect, it is similar to the Sequence class that LINQ to Objects provides, in that it is strictly the result of a LINQ query and not a structure that you would ever create outside of this context.

Overview of Indexed Search Using CSLA .NET

When searching for items in a BusinessListBase- or ReadOnlyListBase-based list, CSLA .NET is capable of doing an optimized search using an internal index if your child class marks the properties you are searching on as Indexable.

When either of the main collection classes in CSLA .NET encounters a child class with an Indexable attribute on a property, an internal index is built the first time a search using LINQ is conducted on that class. In cases where the property that is considered Indexable is IComparable, the indexing mechanism is a red-black tree. In cases where the property to be indexed is not IComparable (such as Color), the indexing mechanism is based on Dictionary<T>. Common IComparable property types that tend to be indexed are String, int, and DateTime, although others with custom IComparable implementations are possible.

Note

CSLA .NET supports the ability to plug in your own indexing provider via the IndexingProvider property on BusinessListBase and ReadOnlyListBase. Note that doing so is not for the faint of heart. I provide more information about how to implement such a provider at the end of this chapter.

In either case, it is important to understand that all optimizations come at a cost. By implementing indexing on your objects, you are making add, remove, and update operations slower on your objects, and, in exchange, gaining tremendous benefits for optimization of searches. When you use indexing properly on large collections of business objects in memory on fields where comparison is a costly operation (such as strings), you can gain performance improvements that are several orders of magnitude faster. However, using indexing on fields that are not searched frequently and/or are updated frequently can actually cause performance to degrade. It is critical to have a sense for how you're going to use your child class when you start to use the Indexable attribute on properties within it, just like you would be careful about how you apply an index to a database table.

Serialization and Indexing

When a CSLA .NET collection is serialized, the indices within the collection do not pass the serialization boundary. This would be possible in theory, but the performance implications of having not only the objects themselves but all the indices on those objects get passed over the wire are not insignificant. In addition, the indexing mechanism depends on hash-code generation of objects that is only consistent at the scope of a physical machine—that is, hash codes are not guaranteed to be equivalent on, say, a 64-bit operating system and a 32-bit operating system—so it would be impractical to translate the index values during serialization in the absence of a generic hash code generator that could guarantee durability across machine boundaries.

While the index itself is not passed over the serialization boundary, the index is re-created according to the options specified on the child class. This typically happens upon the first instance of a query that utilizes the index on the other side of the serialization boundary.

Index Mode

When marking a child class using the Indexable attribute, you can further specify the behavior of the index. In some cases, you might think it is desirable to force the creation of an index for a child class. For example, if you know for sure that you are going to use the index and you don't want to take the performance hit the first time you use it, you would use the IndexModeAlways option on the attribute. This way, when items are added, they're indexed immediately rather than the first time a query needs it. On the other hand, if it is important to optimize for load time on the collection, and if it is acceptable to take the hit on index creation the first time you do a query that may take advantage of the index, you would use IndexModeOnDemand. The IndexModeNever option is provided to mark cases in which you want to ensure that indexes are never created on a field.

Table 14-1 summarizes the parameters of the Indexable attribute.

Table 14.1. Index Modes

Parameter

Description

IndexModeEnum.IndexModeOnDemand

Load an index only when needed.

IndexModeEnum.IndexModeAlways

Always load and maintain an index for this property.

IndexModeEnum.IndexModeNever

Never create an index for this property.

The following example illustrates the use of the Indexable attribute on a child class:

class Student : BusinessBase {
  [Indexable(IndexModeEnum.IndexModeNever)]
  public byte[] Photo {get; set; } //we never search by photo
  [Indexable(IndexModeEnum.IndexModeAlways)]
  public string SSN {get; set; } //we search by SSN all the time
  [Indexable(IndexModeEnum.IndexModeOnDemand)]
  public string FirstName {get; set; } //not as common of a search
  [Indexable]
  public string LastName {get; set; } // default is IndexModeOnDemand
}

The IQueryable Implementation for CSLA .NET

When fully implemented, the IQueryable interface is the mechanism by which anyone can provide his own implementation of LINQ. LINQ to XML, LINQ to SQL, and dozens of other LINQ providers all implement this interface. Thus, in order to provide the expected behavior when you run a LINQ query against BusinessListBase, IQueryable has been implemented for BusinessListBase.

The IQueryable interface mandates only three members be implemented. Table 14-2 shows which members are required for an IQueryable implementation.

Table 14.2. IQueryable Members

Member

Description

ElementType

Provides the source type for the implementation

Expression

The most recent expression tree associated with the collection

Provider

The IQueryProvider implementation used by IQueryable

ElementType and Expression both have reasonably simple implementations. In CSLA .NET, ElementType simply returns an instance of the child type as follows:

public Type ElementType
  {
    get { return typeof(C); }
  }

The Expression property is also fairly simple. It returns the expression tree associated with the collection (I'll cover expression trees in the next section). In BusinessListBase, this represents the entire collection, because you use LinqBindingList rather than BusinessListBase to represent the result of a query. In BusinessListBase, Expression returns a private backing field where the current expression is held.

public Expression Expression
  {
    get {
        if (_expression == null)
            _expression = Expression.Constant(this);
        return _expression;
    }
  }

The last member in any IQueryable implementation is the Provider property. The Provider is where the core of any IQueryable implementation really occurs. The property itself is fairly simple, returning a new instance of the CslaQueryProvider:

public IQueryProvider Provider
  {
    get {
      return new Linq.CslaQueryProvider<T, C>(this);
    }
  }

CslaQueryProvider is the CSLA .NET implementation of the IQueryProvider interface, which defines a custom implementation of LINQ. Of course, the devil is in the details, and understanding how an IQueryProvider works is key to understanding what LINQ to CSLA .NET is doing behind the scenes.

Understanding Expression Trees

The key to understanding LINQ is to understand the role of expression trees in the implementation of LINQ. Expression trees are representations of source code as data. Consider the following code passed to the Where() method in LINQ:

var result = someCollection.Where( x => x.SomeVal == 42);

The parameter provided to the Where() method is a lambda function—a more compact syntax for an anonymous method. However, when Where() is invoked, the lambda function is converted into a data structure in the form of an expression tree that looks like Figure 14-3.

A simple expression tree

Figure 14.3. A simple expression tree

Expression trees allow for code to be executed by alternative means—that is, the code can be translated to other forms, such as SQL. For example, in LINQ to SQL, the previous expression will get converted into the following SQL query:

SELECT * FROM someCollection WHERE SomeVal = 42

What the expression gets converted to in other LINQ providers will vary according to the provider. For example, LINQ to XML could convert the expression to an XPath expression that ordinary code in the System.Xml namespace could execute.

In CSLA .NET, the key differences in how expressions are handled occur in two areas. First, the optimization of Where takes advantage of indexing (which I cover in detail later in this chapter). Second, CSLA .NET makes sure that LINQ queries against BusinessListBase return a LinqBindingList that synchronizes with the collection from which it was originally queried, rather than a Sequence that features no such synchronization. The indexing code analyzes the expression passed to the Where clause to determine if indexing is possible. The synchronization code analyzes the expression passed to the Select clause to determine if the projection is an identity projection, and thus, able to be synchronized.

Digging into IQueryProvider

The IQueryProvider interface contains, surprisingly, only four members, as shown in Table 14-3.

Table 14.3. IQueryProvider Members

Member

Description

CreateQuery()

Handles LINQ methods that return an IQueryable result

CreateQuery<TResult>()

CreateQuery(), but with a known return type

Execute()

Handles LINQ methods that return an IQueryable result

Execute<TResult>()

Execute(), but with a known return type

Of course, within this somewhat simple interface lies a great deal of complexity. The parameter provided to CreateQuery() or Execute() is an object of type Expression, which we must evaluate in a manner that returns an expected result. To further complicate this, when you choose to implement your own IQueryProvider, you give up the default LINQ to Objects implementation and are required to implement the entire range of possible Expression objects that might be passed in.

Evaluating Expression for Synchronization Support

To support synchronization, you must first determine whether you're handling an identity projection. That test, which is in CreateQuery(), is surprisingly simple. This only applies to the generic overload of CreateQuery() (CreateQuery<TResult>()), because in that version, it is known what type of IQueryable needs to be returned.

typeof(TElement) == typeof(C)

If this test passes, then you have an identity projection, because you'll be returning a collection that is of the same type as the source collection. In this case, you can return a LinqBindingList<C> object that will have all the appropriate information to allow for further query execution.

_filter = new LinqBindingList<C>(_parent, this, expression);
_filter.BuildFilterIndex();
return (IQueryable<TElement>)_filter;

Because there are corner cases in which the LinqBindingList needs to be referenced by the CslaQueryProvider object later on, you store it as a private field. The LinqBindingList needs to store three critical values, so the constructor takes three parameters: the original collection, the IQueryProvider instance, and the expression. The original collection is the _parent parameter. The IQueryProvider instance, which holds query execution state, is the this parameter. The expression, which allows the LinqBindingList to continue query execution when query results are requested, is the expression parameter.

Handling Non-Identity Projections

When the query does not return an identity projection, the CslaQueryProvider still needs to return the expected result. Also, non-identity projections still should be able to take advantage of the indexing features of CSLA .NET. In such cases, control of the query, outside of handling for the Where() method, is passed to the default implementation provided in LINQ to Objects.

Passing Control to the Default Handler

CreateQuery() and Execute() are called not only when a query is being established or filtered, but also when other LINQ extension methods are invoked, such as Count(), First(), and many others. To understand why the implementation of IQueryProvider works the way it does, you need to understand how LINQ to Objects injects itself into any IEnumerable.

All the methods in LINQ to Objects are implemented as static extension methods on the IEnumerable interface, defined in the Enumerable and Queryable classes in the System.Linq namespace. In fact, this is the reason why only upon including the System.Linq namespace do the LINQ extension methods come into scope. Implementation of IQueryable on a custom class changes this relationship. It is, by default, an all-or-nothing relationship, which means that once you implement your own IQueryProvider, you're expected to handle the entire set of extension methods defined on Enumerable (or at least, pass back a NonSupportedException).

However, in a case like CSLA .NET, a special problem is presented. Because you're overriding only part of LINQ, you have no desire to re-implement the other more than 100 members that constitute LINQ. To help with this, CSLA .NET matches the Expression passed to CreateQuery() or Execute() against one of the static methods on the Enumerable class. The following snippet from the CreateQuery() implementation shows an example of this:

MethodCallExpression mex = (MethodCallExpression) expression;
Type listType = typeof(Enumerable);
MethodInfo[] listMethods = listType.GetMethods();
foreach (MethodInfo method in listMethods)
if (MethodsEquivalent(mex,method))
{
  Type[] genericArguments = mex.Method.GetGenericArguments();
  MethodInfo genericMethodInfo = method.MakeGenericMethod(genericArguments);
  var testObject = genericMethodInfo.Invoke(null, paramList.ToArray());
  IQueryable<TElement> objectQuery =
    ((IEnumerable<TElement>)testObject).AsQueryable<TElement>();
  return objectQuery;
}

The code is fairly straightforward, with MethodsEquivalent() doing most of the work to determine whether a given MethodCallExpression (which all calls to CreateQuery() or Execute() pass) is logically equivalent to a given reflected method definition that comes from the Enumerable static class.

Once the correct MethodInfo from Enumerable is found, it is converted into a generic method call, which is then invoked dynamically via reflection and returned to the caller. The process is similar regardless of whether CreateQuery() or Execute() is being called.

This technique that CSLA .NET uses to match the Expression to the correct implementation using the MethodsEquivalent() function should continue to work even in the event that Microsoft adds more extension methods to IEnumerable<T> and implements them in the static Enumerable class. However, if Microsoft were to update the class name that holds the LINQ to Objects extension methods, a change to CSLA .NET would be required to support it.

LinqBindingList

The LinqBindingList<C> holds the details of the query generated by it (the expression and the IQueryProvider) and a reference to the parent list, so that changes, additions, and deletions in LinqBindingList<C> can be propagated to the parent list, and in turn, to other instances of LinqBindingList<C> that are derived from the same source.

LinqBindingList<C> itself implements the IQueryable<T> interface, allowing for subqueries against it that generate further LinqBindingList<C> instances. These subsequent LinqBindingList<C> instances are not children of the source LinqBindingList<C>. rather, they all have the same relationship to the original BusinessListBase<T, C>.

Indexed LINQ and CSLA .NET

Having the ability to analyze the expression tree that is passed to the Where() method in CSLA .NET provides an opportunity to optimize the query. Any indexing implementation is going to require

  • Building and maintaining an index

  • Intercepting the normal query operation such that the query can utilize the index

Managing the Index Set

Indices are maintained on fields that are marked with the Indexable property under circumstances described earlier in this chapter. These indices are managed through the _indexSet field, which is an object that conforms to the IIndexSet interface.

The index set hooks into the collection by being made aware of all changes, additions, and removals that can affect the index. Calls to methods on the _indexSet are made throughout BusinessListBase and ReadOnlyListBase, making it possible for the _indexSet to maintain itself.

Implementations of IIndex

Depending on the nature of your child object, there are several ways you can find items in a list quickly. As mentioned previously, you can organize those child objects that are comparable—that is, where someObj.SomeVal < x has meaning and is implemented—in a way that makes that query faster. CSLA .NET uses a balanced tree structure—a red-black tree—to perform this function. Other child objects that are not IComparable (such as Color) use a dictionary-based index instead.

One of the responsibilities of the index set is to choose an appropriate implementation of IIndex based on the type of object being indexed. An ideal indexing strategy allows not only for equality-based queries to use the index, but also for range-based queries to use the index.

Indices on IComparable Properties: Red-Black Tree

The red-black tree structure allows you to infer the order of the items. All items to the left of a given node in the tree are guaranteed to be lower. All items to the right of a given node in the tree are guaranteed to be higher. In the less-than operation, the key to making a ranged query work becomes enumerating through the left subtree of the node that either matches or is the lowest value that is higher than the value being compared. In a greater-than operation, it is the exact reverse: enumerating through the right subtree of the node that either matches or is the highest value that is lower than the value being compared.

The performance of lookup over balanced trees, such as red-black trees, is typically log(N) + 2, while retaining a good deal of memory efficiency and the ability to quickly find those items higher or lower than a given item. The performance characteristics of this lookup are good. Because CSLA .NET uses a red-black tree—a tree that guarantees that the depth is no deeper than log(N) + 2—you know that finding one item out of one million will take, at most, 22 comparison operations. Depending on where the item is, that same item without an index can take up to a million comparison operations, since without an index, each object has to be compared.

Indices on Other Properties: Dictionary

When the property you're building an index on does not implement IComparable, you can't use those structures in any kind of tree implementation, since placement of an item in a tree depends on item comparison. Therefore, to implement an index on a non-IComparable property, you need to use a Dictionary to enable fast lookup for equality-based queries.

The performance characteristics of the Dictionary are actually a little bit faster than a red-black tree most of the time, assuming that the GetHashCode() implementation generates a high probability that calls on different objects of your child type generate unique values. However, because there is a chance, however small, that two different objects can generate the same hash code, you may require more than the single compare operation that is typical with Dictionary. If you were to override GetHashCode() in a manner where you return the same number for all instances of your object, you would end up with zero benefit to using indexing at all.

Warning

Implementation of GetHashCode() on your own is almost always a bad idea unless you really know what you're doing.

Expression Evaluation

The ability to evaluate the expression passed to the Where() extension method is the key that makes any indexing implementation in LINQ possible. LINQ to CSLA .NET evaluates the query to see if it can possibly use an index. If it is unable to use an index, it searches for the item using the same method that LINQ does by itself—namely, it compares each item and returns those items that match.

LINQ does not allow the Where() method to have a LambdaExpression parameter whose Body is anything other than a BinaryExpression—that is, something that eventually returns a true or false result. The nature of the Where() operation itself is the application of a test of an object for set inclusion. All BinaryExpression objects have Left and Right properties that represent either side of an expression. For example, consider the following expression:

x.MyProperty == 42

In this case, the Left of the BinaryExpression is x.MyProperty, and the Right of the BinaryExpression is 42. You can also evaluate the NodeType of the BinaryExpression to learn that it is ExpressionType.Equal, indicating that this is an equality test.

Left Evaluation

The left side of the expression is evaluated to determine whether you're doing a query on a child property that has an index. Typically, this means that the Left expression is a property lookup, such as x.MyProperty. In other words, if the expression is a MemberExpression, CSLA .NET will know that you're possibly doing a property lookup on an indexed property. It can then check the Name of the MemberExpression and validate whether there is an index on a child property that matches the Name from the MemberExpression. If that test passes—that is, if there is a property lookup on the Left side—then you can continue doing an indexed query.

Right Evaluation

After CSLA .NET has determined that you're doing a query on an Indexable field, it evaluates the Right side to determine the value that you're comparing to. The contents of the right side can be much more variable, including everything from a ConstantExpression that represents a value (such as 42) to something more complex, such as another method call.

In the case of a ConstantExpression, CSLA .NET simply casts it back to an object that is comparable to the property on the Left side. Other expressions require a dynamic invocation of the expression. You do this by calling the Compile() method on the expression, compiling the expression, and subsequently calling DynamicInvoke() and executing the expression. The result is an object that should at least be able to be tested for equality, and possibly other operations depending on the IComparable status of the property.

Indexed Search

Now that CSLA .NET knows it has an Indexable property and a value to compare, it proceeds to utilize the index and uses the Dictionary or red-black tree-based implementation of IIndex to search the index rather than the collection itself,. If the property is not Indexable, CSLA .NET will simply run the search in the default manner that a typical IEnumerable would use.

The Indexing Object Model

While the current implementation of indexing allows you to do a great deal of optimization, you can add even more optimization if you want to put forth the effort to do so. The model for indexing, as illustrated in Figure 14-4, is designed to support the provision of your own indexing mechanism, if you wish to extend the indexing capabilities of CSLA .NET.

The indexing object model

Figure 14.4. The indexing object model

In the next section, I'll cover the details of this interface and classes. Should you decide to implement your own indexing provider, you'll have an understanding of the default one provided with CSLA .NET.

IIndexSet

IIndexSet<T> is the initial interface you must implement if you wish to implement your own CSLA .NET indexing provider. Table 14-4 shows the required members for IIndexSet. CSLA .NET base collection classes such as BusinessListBase<C, T> and ReadOnlyListBase<C, T> know how to leverage this interface to take advantage of indexing.

Table 14.4. IIndexSet Members

Member

Description

InsertItem(T item)

The index item on all indexable properties

InsertItem(T item, string property)

The index item only on property if it is indexable

RemoveItem(T item)

Removes the item from all indices

RemoveItem(T item, string property)

Removes the item only from the index specified by property

ReindexItem(T item)

Removes and adds the item from all indices

ReindexItem(T item, string property)

Removes and adds the item from a specific index

ClearIndexes()

Clears all the indices for all properties

ClearIndex(string property)

Clears a specific index

Search(Expression expr)

Performs a search, filtering by expr

this [string property]

Returns the IIndex<T> specified by property

Load(string property)

Creates an index on property over the source

IIndexSet needs to have a reference to the IEnumerable collection that it is indexing, such as BusinessListBase and other CSLA .NET collection classes. When you write a class that supports IIndexSet, you are typically managing the set of IIndex objects (individual indices, described in the following sections) that constitute the index set, and providing the logic for performing the search to determine whether the query can use an index.

You can think of the classes that implement this interface is coordinators for all indexing activity. Classes should evaluate the expression tree passed to Search(), determine the correct index to use (if applicable), and then either use the index (if it is an indexed property) or perform the search on the original collection (if there is no available index).

IndexSet

The IndexSet concrete class represents the default index provider implementation for CSLA .NET. The class maintains a set of indices appropriate for the property types that it is indexing. It provides an implementation of Search() that determines the IComparable status of the property being searched, determines the operation being used (such as equality and less-than operations), and assures that the appropriate operation is called on the index.

This class provides an example of most of the expression evaluation techniques I have talked about previously in this chapter. For example, the following method in the IndexSet class determines the value on the right side of the expression passed to Search():

private object GetRightValue(Expression rightSide)
  {
    //rightside is where I get the value...
    switch (rightSide.NodeType)
    {
      //shortcut constants, don't eval these, it will be faster
      case ExpressionType.Constant:
        ConstantExpression constExp = (ConstantExpression)rightSide;
        return (constExp.Value);
      // convert back to lambda and eval to get the value.
      default:
        //Lambdas can be created from expressions...
        LambdaExpression evalRight = Expression.Lambda(rightSide, null);
        //Compile it, invoke it, and get the resulting value
        return (evalRight.Compile().DynamicInvoke(null));
    }
  }

The key to this code is looking at the type of node you're dealing with. All Expression objects implement the NodeType property, which gives you clues about the nature of the Expression you're dealing with. You now care about only two real cases: whether you have a ConstantExpression, allowing you to bypass a more expensive call to Compile() and DynamicInvoke(), or whether you do not have a constant, in which case you'll have to compile the expression, invoke it, and get the value that it will ultimately return. In either case, you get back an object that boxes the value you want to use in the Search() operation.

Note

Compile() and DynamicInvoke() are expensive operations to conduct. You don't want to be doing these operations on a per-item basis as you search through a data structure.

Determining whether you have an index or not is much simpler.

bool IIndexSet<T>.HasIndexFor(string property)
  {
    return _internalIndexSet.ContainsKey(property);
  }
  private bool HasIndexablePropertyOnLeft(Expression leftSide)
  {
    if (leftSide.NodeType == ExpressionType.MemberAccess)
      return (
        this as IIndexSet<T>).HasIndexFor(((MemberExpression)leftSide).Member.Name
      );
    else
      return false;
  }

Determining whether you have an index for a given property specified by a string is relatively simple, as you have an internal field _internalIndexSet that is really just a Dictionary of indices. You can call ContainsKey() on the _internalIndexSet to determine whether you have the index.

The examination of the expression that represents the left side of the total expression is slightly more complex. You need for this left-side expression to be a MemberExpression in order to make any sense of it. Thankfully, this is common, given that it rarely makes sense to have anything other than a MemberExpression on the left side. Note that this may be possible technically (i.e., x => 1 == 1 can be passed, which, in theory, would return all results).

The first thing you do is test the NodeType to ensure that it's a MemberAccess expression. If that is in fact the case, you know you can cast it to a MemberExpression, which then gives you access to the name of the property you're looking at (such as the name SomeProp on the left side, which is in an expression such as x => x.SomeProp == 42). With this information, you have everything you need to determine whether or not you have an index on the property.

Once you have established that you have a property that is Indexable and a value to check against the index, you can test the operation between both sides of the expression to determine which index operation to perform.

Func<T, bool> exprCompiled = expr.Compile();
      BinaryExpression binExp = (BinaryExpression)expr.Body;
      object val = GetRightValue(binExp.Right);
      IRangeTestableIndex<T> rangedIndex;
      if (_internalIndexSet[property] is IRangeTestableIndex<T>)
      {
        rangedIndex = (IRangeTestableIndex<T>)_internalIndexSet[property];
switch (binExp.NodeType)
        {
          case ExpressionType.Equal:

            foreach (T item in
                       _internalIndexSet[property].WhereEqual(val, exprCompiled))
              yield return item;
            break;

          case ExpressionType.LessThan:

            foreach (T item in rangedIndex.WhereLessThan(val))
              yield return item;
            break;
            //and so forth, for other operations...

Based on the NodeType of the BinaryExpression object (shown as binExp), you can determine whether you have been passed an equality test, a less-than operation, or any other logical test. Of course, you have to make sure the index is IRangeTestableIndex before you try to use it in the context of a ranged logical operator such as less-than or greater-than. However, once you know that, you can handle each operation appropriately. If you can't handle the operation, pass it back to LINQ to Objects to handle in the default manner.

IIndex

IIndex represents the expected methods and behavior that an index should provide. Most implementations of IIndexSet contain one or more IIndex objects per property that is being indexed on T. IIndex derives from ICollection and adds the members shown in Table 14-5.

Table 14.5. IIndex Members

Member

Description

IndexField

Returns the PropertyInfo for the property that this index is indexing

WhereEqual()

Returns the items that match the value passed to it

ReIndex()

Removes and adds back the given item from the index

Loaded

Returns whether the index has been established

InvalidateIndex()

Sets the index as not loaded anymore

LoadComplete()

Sets the index as loaded

Writing an IIndex implementation involves implementing all the methods you typically would for ICollection and then adding implementation for the members shown in Table 14-5. A typical implementation wraps an appropriate data structure, such as a red-black tree or dictionary, or possibly even derives from it, and then adds the implementation of the methods, which make indexed operations able to work with an IIndexSet.

The only interesting part of building an IIndex implementation is the handling of WhereEqual(). Typically, WhereEqual() takes the object passed to it—which represents a value from a property of the child object—and uses some mechanism based on that value to find the item in a manner somehow faster than looking through every single one in the index. Exactly how this is done depends on the index structure. In a Dictionary-based structure, a hash code is generated from the property, and then the Where test is performed on each item with a matching hash code. On other structures, such as a red-black tree, the normal search mechanism for such a tree would be used to locate the appropriate key matches.

Index

The Index concrete class represents the default index implementation for child objects in CSLA .NET that don't implement IComparable. Like most IIndex implementations in CSLA .NET, the Index class implements ICollection methods, which then handle the collection operations that are ultimately handled by a private field where the actual index is kept. The following is typical for most ICollection methods in the Index implementation:

void ICollection<T>.Add(T item)
  {
    DoAdd(item);
  }

  private void DoAdd(T item)
  {
    //_theProp is the PropertyInfo this index is indexing
    if (_theProp != null)
    {
      int hashCode = _theProp.GetValue(item, null).GetHashCode();
      if (_index.ContainsKey(hashCode))
        _index[hashCode].Add(item);
      else
      {
        List<T> newList = new List<T>(1);
        newList.Add(item);
        _index.Add(hashCode, newList);
      }
    }
  }

For the most part, this aspect of index design—the implementation of the ICollection members—is pretty straightforward. More interesting is the implementation of WhereEqual().

IEnumerable<T> IIndex<T>.WhereEqual(object pivotVal, Func<T, bool> expr)
  {
    var hashCode = pivotVal.GetHashCode();
    LoadOnDemandIndex();
    if (_index.ContainsKey(hashCode))
      foreach (T item in _index[hashCode])
        if (expr(item))
          yield return item;
  }

This method is a little more interesting. Because it uses a Dictionary to implement the index behind the scenes, it needs to determine the hash code of the value it is comparing against. Since the index may not be loaded, the method attempts to load the index on demand—a call that will do nothing if the index is already loaded. Once it knows the index is loaded, it can retrieve the list of items that match the hash code, which is returned by _index[hashCode]. Only within this list, which is almost always one or two items unless there is a serious problem with the hash code, does it need to test using expr to make sure that the test passes.

This works because of the relationship between the result of GetHashCode() and Equals(). As a rule, any two objects that are equal must generate the same hash code. The reverse, however, isn't true, as two different objects can generate the same hash code.

As you would expect with an index, the result is that the number of comparisons required to find an item using an index is vastly lower than it would be without an index. In fact, the larger the collection you're using an index on, the more benefit there is to using an index.

IRangeTestableIndex

IRangeTestableIndex<T> extends the IIndex<T> interface by adding support for indices that are capable of optimizing ranged query options, such as those backed by an ordered data structure like a red-black tree. Table 14-6 shows its members.

Table 14.6. IRangeTestableIndex Members

Member

Description

WhereLessThan()

Returns the items less than the value passed

WhereLessThanOrEqualTo()

Returns the items less than or equal to the value passed

WhereGreaterThan()

Returns the items greater than the value passed

WhereGreaterThanOrEqualTo()

Returns the items greater than or equal to the value passed

While it is not required to use this interface in any given indexing implementation, the built-in indexing provider, IndexSet<T>, utilizes this interface to specify behavior for those cases where T is IComparable.

BalancedTreeIndex

The BalancedTreeIndex<T> concrete class implementing IRangeTestableIndex represents the default index implementation for child objects in CSLA .NET that implement IComparable. It is fundamentally similar to Index in how it manages its internal collection, but it uses a red-black tree rather than a Dictionary for storage.

One of the methods that IRangeTestableIndex requires is the WhereGreaterThan() method. The implementation of the index passes control to the internal data structure as follows:

public IEnumerable<T> WhereGreaterThan(object pivotVal)
  {
    var balancedIndex = (IBalancedSearch<T>)_index;
    return balancedIndex.ItemsGreaterThan(pivotVal);
  }

The red-black tree implementation implements an interface called IBalancedSearch<T>, which takes advantage of the data structure of a balanced tree in order to efficiently iterate through those items greater than the value passed to it.

Understanding the internals of how a red-black tree works is best addressed through others who have written extensively on the subject, notably Robert Sedgewick in his classic, Algorithms in C++ (Addison-Wesley Professional, 2002), which covers the implementation of red-black trees as well as numerous other useful low-level structures.

Conclusion

In this chapter, I covered how CSLA .NET works with LINQ to Objects. I showed you how to optimize the LINQ to Objects implementation to make sure it provides the expected behavior, and I explained how to use the capabilities of CSLA .NET to increase the performance of your in-memory queries. You should now understand the options for how you can index your collections in CSLA .NET, how projections work against CSLA .NET collections, and how CSLA .NET implements these features. Furthermore, you should have a reasonable idea of how you might get started to write your own indexing provider for CSLA .NET.

I'll provide information about how to use LINQ to SQL—or other LINQ providers in relation to data, for that matter—in Chapter 18, where I discuss the data portal and the various techniques you can use to implement it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset