Chapter 3. Programming the.NET Framework

Most modern programming languages include some form of runtime that provides common services and access to the underlying operating systems and hardware. Examples of this range from a simple functionallibrary, such as the ANSI C Runtime used by C and C++, to the rich object-oriented class libraries provided by the Java Runtime Environment.

Similar to the way that Java programs depend on the Java class libraries and virtual machine, C# programs depend on the services in the .NET Framework such as the framework class library (FCL) and the Common Language Runtime (CLR).

For a high-level overview of the FCL, see Chapter 4.

This chapter addresses the most common tasks you need to perform when building C# programs. These topics generally fall into one of two categories: leveraging functionality included in the FCL and interacting with elements of the CLR.

Common Types

Certain types in the FCL are ubiquitous, in that they are fundamental to the way the FCL and CLR work and provide common functionality used throughout the entire FCL.

This section identifies some of the most common of these types and provides guidelines to their usage. The types mentioned in this section all exist in the System namespace.

Object Class

The System.Object class is the root of the class hierarchy and serves as the base class for every other class. The C# object type aliases System.Object. System.Object provides a handful of useful methods that are present on all objects, and whose signatures are listed in the following fragment of the System.Object class definition:

public class Object {
   public Object(  ) {...}
   public virtual bool Equals(object o) {...}
   public virtual int GetHashCode(  ){...}
   public Type GetType(  ){...}
   public virtual string ToString(  ) {...}
   protected virtual void Finalize(  ) {...}
   protected object MemberwiseClone(  ) {...}
   public static bool Equals (object a, object b) {...}
   public static bool ReferenceEquals (object a, object b) {...}
}
Object( )

The constructor for the Object base class.

Equals(object o)

This method evaluates whether two objects are equivalent.

The default implementation of this method on reference types compares the objects by reference, so classes are expected to override this method to compare two objects by value.

In C#, you can also overload the == and != operators. For more information see Section 2.9.8.1.

GetHashCode( )

This method allows types to hash their instances. A hashcode is an integer value that provides a “pretty good” unique ID for an object. If two objects hash to the same value, there’s a good chance that they are also equal. If they don’t hash to the same value, they are definitely not equal.

The hashcode is used when you store a key in a dictionary or hashtable collection, but you can also use it to optimize Equals( ) by comparing hash codes and skipping comparisons of values that are obviously not equal. This is a gain only when it is cheaper to create the hashcode than perform an equality comparison. If your hashcode is based on immutable data members, you can make this a no-brainer by caching the hash code the first time it is computed.

The return value from this function should pass the following tests: (1) two objects representing the same value should return the same hashcode, and (2) the returned values should generate a random distribution at runtime.

The default implementation of GetHashCode doesn’t actually meet these criteria because it merely returns a number based on the object reference. For this reason, you should usually override this method in your own types.

An implementation of GetHashCode could simply add or multiply all the data members together. You can achieve a more random distribution of hash codes by combining each member with a prime number (see Example 3-1).

To learn more about how the hashcode is used by the predefined collection classes, see Section 3.4 later in this chapter.

GetType( )

This method provides access to the Type object representing the type of the object and should never be implemented by your types. To learn more about the Type object and reflectionin general, see Section 3.10.

ToString( )

This method provides a string representation of the object and is generally intended for use when debugging or producing human-readable output.

The default implementation of this method merely returns the name of the type and should be overridden in your own types to return a meaningful string representation of the object. The predefined types such as int and string all override this method to return the value, as follows:

using System;
class Beeblebrox {}
class Test {
  static void Main(  ) {
    string s = "Zaphod";
    Beeblebrox b = new Beeblebrox(  );
    Console.WriteLine(s); // Prints "Zaphod"
    Console.WriteLine(b); // Prints "Beeblebrox"
  }
}
Finalize( )

The Finalize method cleans up nonmemory resources and is usually called by the garbage collector before reclaiming the memory for the object. The Finalize method can be overridden on any reference type, but this should be done only in a very few cases. For a discussion of finalizers and the garbage collector, see Section 3.12.

MemberwiseClone( )

This method creates shallow copies of the object and should never be implemented by your types. To learn how to control shallow/deep copy semantics on your own types, see Section 3.1.2 later in this chapter

Equals/ReferenceEquals (object a, object b)

Equals tests for value quality, and ReferenceEquals tests for reference equality. Equals basically calls the instance Equals method on the first object, using the second object as a parameter. In the case that both object references are null, it returns true, and in the case that only one reference is null, it returns false. The ReferenceEquals method returns true if both object references point to the same object or if both object references are null.

Creating FCL-friendly types

When defining new types that work well with the rest of the FCL, you should override several of these methods as appropriate.

Example 3-1 is an example of a new value type that is intended to be a good citizen in the FCL:

Example 3-1. Defining new value type
// Point3D - a 3D point
// Compile with: csc /t:library Point3D.cs
using System;
public sealed class Point3D {
  int x, y, z;
  public Point3D(int x, int y, int z) {
    this.x=x; this.y=y; this.z=z; // Initialize data
  }
  public override bool Equals(object o) {
    if (o == (object) this) return true; // Identity test
    if (o == null) return false; // Safety test
    if (!(o is Point3D)) // Type equivalence test
      return false;
    Point3D p = (Point3D) o;
    return ((this.x==p.x) && (this.y==p.y) && (this.z==p.z));
  }
  public override int GetHashCode(  ){
    return ((((37+x)*37)+y)*37)+z; // :-)
  }
  public override string ToString(  ) {
    return String.Format("[{0},{1},{2}]", x, y, z);
  }
}

This class overrides Equals to provide value-based equality semantics, creates a hashcode that follows the rules described in the preceding section, and overrides ToString for easy debugging. It can be used as follows:

// TestPoint3D - test the 3D point type
// Compile with: csc /r:Point3D.dll TestPoint3D.cs
using System;
using System.Collections;
class TestPoint3D {
  static void Main(  ) {
    // Uses ToString, prints "p1=[1,1,1] p2=[2,2,2] p3=[2,2,2]"
    Point3D p1 = new Point3D(1,1,1);
    Point3D p2 = new Point3D(2,2,2);
    Point3D p3 = new Point3D(2,2,2);
    Console.WriteLine("p1={0} p2={1} p3={2}", p1, p2, p3);

    // Tests for equality to demonstrate Equals
    Console.WriteLine(Equals(p1, p2)); // Prints "False"
    Console.WriteLine(Equals(p2, p3)); // Prints "True"

    // Use a hashtable to cache each point's variable name
    // (uses GetHashCode).
    Hashtable ht = new Hashtable(  );
    ht[p1] = "p1"; 
    ht[p2] = "p2";
    ht[p3] = "p3"; // replaces ht[p2], since p2 == p3

    // Prints:
    //    p1 is at [1,1,1]
    //    p3 is at [2,2,2] 
    foreach (DictionaryEntry de in ht)
      Console.WriteLine("{0} is at {1} ", de.Value, de.Key);
  }
}

ICloneable Interface

public interface ICloneable {
  object Clone(  );
}

ICloneable allows class or struct instances to be cloned. It contains a single method named Clone that returns a copy of the instance. When implementing this interface your Clone method can simply return this.MemberwiseClone( ), which performs a shallow copy (the fields are copied directly), or you can perform a custom deep copy, in which you clone individual fields in the class or struct.The following example is the simplest implementation of ICloneable:

public class Foo : ICloneable {
      public object Clone(  ) {
       return this.MemberwiseClone(  );
   }
}

IComparable Interface

interface IComparable {
  int CompareTo(object o);
}

IComparable is implemented by types that have instances that can be ordered (see Section 3.4). It contains a single method named CompareTo that:

  • Returns - if instance < o

  • Returns + if instance > o

  • Returns 0 if instance = = o

This interface is implemented by all numeric types: string, DateTime, etc. It may also be implemented by custom classes or structs to provide comparison semantics. For example:

using System;
using System.Collections;
class MyType : IComparable {
  public int x;
  public MyType(int x) {
    this.x = x;
  }
  public int CompareTo(object o) {
    return x -((MyType)o).x;
  }
}
class Test {
  static void Main(  ) {
    ArrayList a = new ArrayList(  );
    a.Add(new MyType(42));
    a.Add(new MyType(17));
    a.Sort(  );
    foreach(MyType t in a)
      Console.WriteLine(((MyType)t).x);
   }
}

IFormattable Interface

public interface IFormattable {
  string ToString(string format, IFormatProvider formatProvider);
}

The IFormattable interface is implemented by types that have formatting options for converting their value to a string representation. For instance, a decimal may be converted to a string representing currency or to a string that uses a comma for a decimal point. The formatting options are specified by the format string (see Section 3.3.4 later in this chapter). If an IFormatProvider interface is supplied, it specifies the specific culture to be used for the conversion.

IFormattable is commonly used when calling one of the String class Format methods (see Section 3.3).

All the common types (int, string, DateTime, etc.) implement this interface, and you should implement it on your own types if you want them to be fully supported by the String class when formatting.

Math

C# and the FCL provide a rich set of features that make math-oriented programming easy and efficient.

This section identifies some of the most common types applicable to math programming and demonstrates how to build new math types. The types mentioned in this section exist in the Systemnamespace.

Language Support for Math

C# has many useful features for math and can even build custom mathematical types. Operator overloading allows custom mathematical types, such as complex numbers and vectors, to be used in a natural way. Rectangular arrays provide a fast and easy way to express matrices. Finally, structs allow the efficient creation of low-overhead objects. For example:

struct Vector {
  float direction;
  float magnitude;
  public Vector(float direction, float magnitude) {
    this.direction = direction;
    this.magnitude = magnitude;
  }
  public static Vector operator *(Vector v, float scale) {
    return new Vector(v.direction, v.magnitude * scale);
  }
  public static Vector operator /(Vector v, float scale) {
    return new Vector(v.direction, v.magnitude * scale);
  }
  // ...
}
class Test {
  static void Main(  ) {
  Vector [,] matrix = {{new Vector(1f,2f), new Vector(6f,2f)},
                       {new Vector(7f,3f), new Vector(4f,9f)}};
  for (int i=0; i<matrix.GetLength(0); i++)
    for (int j=0; j<matrix.GetLength(1); j++)
      matrix[i, j] *= 2f;
  }
}

Special Types and Operators

The decimal datatype is useful for financial calculations, since it is a base10 number that can store 28 to 29 significant figures (see Section 2.2.5.3).

The checked operator allows integral operations to be bounds checked (see Section 2.4.2).

Math Class

The Math class provides static methods and constants for basic mathematical purposes. All trigonometric and exponential functions use the double type, and all angles use radians. For example:

using System;
class Test {
  static void Main(  ) {
    double a = 3;
    double b = 4;
    double C = Math.PI / 2;
    double c = Math.Sqrt (a*a+b*b-2*a*b*Math.Cos(C));
    Console.WriteLine("The length of side c is "+c);
  }
}

Random Class

The Random class produces pseudo-random numbers and may be extended if you require greater randomness (for cryptographically strong random numbers, see the System.Security.Cryptography.RandomNumberGenerator class). The random values returned are always between a minimum (inclusive) value and a maximum (exclusive) value. By default, the Random class uses the current time as its seed, but a custom seed can also be supplied to the constructor. Here’s a simple example:

Random r = new Random(  );
Console.WriteLine(r.Next(50)); // return between 0 and 50

Strings

C# offers a wide range of string-handling features. Support is provided for both mutable and immutable strings, extensible string formatting, locale-aware string comparisons, and multiple string encoding systems.

This section introduces and demonstrates the most common types you’ll use when working with strings. Unless otherwise stated, the types mentioned in this section all exist in the System or System.Text namespaces.

String Class

A C# string represents an immutable sequence of characters and aliases the System.String class. Strings have comparison, appending, inserting, conversion, copying, formatting, indexing, joining, splitting, padding, trimming, removing, replacing, and searching methods. The compiler converts + operations on operands where the left operand is a string to Concat methods and preevaluates and interns string constants wherever possible.

Immutability of Strings

Strings are immutable, which means they can’t be modified after creation. Consequently, many of the methods that initially appear to modify a string actually create a new string:

string a = "Heat";
string b = a.Insert(3, "r");
Console.WriteLine(b); // Prints Heart

If you need a mutable string, see the StringBuilder class.

String Interning

In addition, the immutability of strings enable all strings in an application to be interned. Interning describes the process whereby all the constant strings in an application are stored in a common place, and any duplicate strings are eliminated. This saves space at runtime but creates the possibility that multiple string references will point at the same spot in memory. This can be the source of unexpected results when comparing two constant strings, as follows:

string a = "hello";
string b = "hello";
Console.WriteLine(a == b); // True for String only
Console.WriteLine(a.Equals(b)); // True for all objects
Console.WriteLine(Object.ReferenceEquals(a, b)); // True!!

Formatting Strings

The Format method provides a convenient way to build strings that embed string representations of a variable number of parameters. Each parameter can be of any type, including both predefined types and user-defined types.

The Format method takes a format-specification string and a variable number of parameters. The format-specification string defines the template for the string and includes format specifications for each of the parameters. The syntax of a format specifier looks like this:

{ParamIndex[,MinWidth][:FormatString]}
ParamIndex

The zero-based index of the parameter to be formatted.

MinWidth

The minimum number of characters for the string representation of the parameter, to be padded by spaces if necessary (negative is left-justified, positive is right-justified).

FormatString

If the parameter represents an object that implements IFormattable, the FormatString is passed to the ToString method on IFormattable to construct the string. If not, the ToString method on Object is used to construct the string.

Tip

All of the common types (int, string, DateTime, etc.) implement IFormattable. A table of the numeric and picture format specifiers supported by the common predefined types is provided in Appendix C.

In the following example, we embed a basic string representation of the account variable (param 0) and a monetary string representation of the cash variable (param 1, C=Currency):

using System;
class TestFormatting {
  static void Main(  ) {
    int i = 2;
    decimal m = 42.73m;
    string s = String.Format("Account {0} has {1:C}.", i, m);
    Console.WriteLine(s); // Prints "Account 2 has $42.73"
  }
}

Indexing Strings

Consistent with all other indexing in the CLR, the characters in a string are accessed with a zero-based index:

using System;
class TestIndexing {
  static void Main(  ) {
    string s = "Going down?";
    for (int i=0; i<s.Length; i++)
      Console.WriteLine(s[i]); // Prints s vertically
  }
}

Encoding Strings

Strings can be converted between different character encodings using the Encoding type. The Encoding type can’t be created directly, but the ASCII, Unicode, UTF7, UTF8, and BigEndianUnicode static properties on the Encoding type return correctly constructed instances.

Here is an example that converts an array of bytes into a string using the ASCII encoding:

using System;
using System.Text;
class TestEncoding {
  static void Main(  ) {
    byte[] ba = new byte[] { 67, 35, 32, 105, 115, 
                             32, 67, 79, 79, 76, 33 };
    string s = Encoding.ASCII.GetString(ba);
    Console.WriteLine(s);
  }
}

StringBuilder Class

The StringBuilder class is used to represent mutable strings. It starts at a predefined size (16 characters by default) and grows dynamically as more string data is added. It can either grow unbounded or up to a configurable maximum. For example:

using System;
using System.Text;
class TestStringBuilder {
  static void Main(  ) {
    StringBuilder sb = new StringBuilder("Hello, ");
    sb.Append("World?");
    sb[12] = '!';
    Console.WriteLine(sb); // Hello, World!
  }
}

Collections

Collections are standard data structures that supplement arrays, the only built-in data structures in C#. This differs from languages such as Perl and Python, which incorporate key-value data structures and dynamically sized arrays into the language itself.

The FCL includes a set of types that provide commonly required data structures and support for creating your own. These types are typically broken down into two categories: interfaces that define a standardized set of design patterns for collection classes in general, and concrete classes that implement these interfaces and provide a usable range of data structures.

This section introduces all the concrete collection classes and abstract collection interfaces and provides examples of their use. Unless otherwise stated, the types mentioned in this section all exist in the System.Collections namespace.

Concrete Collection Classes

The FCL includes the concrete implementations of the collection design patterns that are described in this section.

Unlike C++, C# doesn’t yet support templates, so these implementations work generically by accepting elements of type System.Object.

ArrayList class

ArrayList is a dynamically sized array of objects that implements the IList interface (see Section 3.4.2.5 later in this chapter). An ArrayList works by maintaining an internal array of objects that is replaced with a larger array when it reaches its capacity of elements. It is very efficient at adding elements (since there is usually a free slot at the end) but is inefficient at inserting elements (since all elements have to be shifted to make a free slot). Searching can be efficient if the BinarySearch method is used, but you must Sort( ) the ArrayList first. You could use the Contains( ) method, but it performs a linear search in O(n) time.

ArrayList a = new ArrayList(  );
a.Add("Vernon");
a.Add("Corey");
a.Add("William");
a.Add("Muzz");
a.Sort(  );
for(int i = 0; i < a.Count; i++)
   Console.WriteLine(a [i]);

BitArray class

A BitArray is a dynamically sized array of Boolean values. It is more memory-efficient than a simple array of bools because it uses only one bit for each value, whereas a bool array uses two bytes for each value. Here is an example of its use:

BitArray bits = new BitArray( 0 ); // initialize a zero-length bit array
bits.Length = 2;
bits[1] = true;
bits.Xor(bits); // Xor the array with itself

Hashtable class

A Hashtable is a standard dictionary (key/value) data structure that uses a hashing algorithm to store and index values efficiently. This hashing algorithm is performed using the hashcode returned by the GetHashCode method on System.Object. Types used as keys in a Hashtable should therefore override GetHashCode to return a good hash of the object’s internal value.

Hashtable ht = new Hashtable(  );
ht["One"] = 1;
ht["Two"] = 2;
ht["Three"] = 3;
Console.WriteLine(ht["Two"]); // Prints "2"

Hashtable also implements IDictionary (see Section 3.4.2.6 later in this chapter), and therefore, can be manipulated as a normal dictionary data structure.

Queue class

A Queue is a standard first-in, first-out (FIFO) data structure, providing simple operations to enqueue, dequeue, peek, etc. Here is an example:

Queue q = new Queue(  );
q.Enqueue(1);
q.Enqueue(2);
Console.WriteLine(q.Dequeue(  )); // Prints "1"
Console.WriteLine(q.Dequeue(  )); // Prints "2"

SortedList class

A SortedList is a standard dictionary data structure that uses a binary-chop search to index efficiently. SortedList implements IDictionary (see Section 3.4.2.6):

SortedList s = new SortedList(  );
s["Zebra"] = 1;
s["Antelope"] = 2;
s["Eland"] = 3;
s["Giraffe"] = 4;
s["Meerkat"] = 5;
s["Dassie"] = 6;
s["Tokoloshe"] = 7;
Console.WriteLine(s["Meerkat"]); // Prints "5" in 3 lookups

Stack class

A Stack is a standard last-in first-out (LIFO) data structure:

Stack s = new Stack(  );
s.Push(1); // Stack = 1
s.Push(2); // Stack = 1,2
s.Push(3); // Stack = 1,2,3
Console.WriteLine(s.Pop(  )); // Prints 3, Stack=1,2
Console.WriteLine(s.Pop(  )); // Prints 2, Stack=1
Console.WriteLine(s.Pop(  )); // Prints 1, Stack=

StringCollection class

A StringCollection is a standard collection data structure for storing strings. StringCollection lives in the System.Collections.Specialized namespace and implements ICollection. It can be manipulated like a normal collection (see Section 3.4.2.3):

StringCollection sc = new StringCollection(  );
sc.Add("s1");
string[] sarr =  {"s2", "s3", "s4"};
sc.AddRange(sarr);
foreach (string s in sc)
  Console.Write("{0} ", s); // s1 s2 s3 s4

Collection Interfaces

The collection interfaces provide standard ways to enumerate, populate, and author collections. The FCL defines the interfaces in this section to support the standard collection design patterns.

IEnumerable interface

public interface IEnumerable {
  IEnumerator GetEnumerator(  );
}

The C# foreach statement works on any collection that implements the IEnumerable interface. The IEnumerable interface has a single method that returns an IEnumerator object.

IEnumerator interface

public interface IEnumerator {
   bool MoveNext(  );
   object Current {get;}
   void Reset(  );
}

The IEnumerator interface provides a standard way to iterate over collections. Internally, an IEnumerator maintains the current position of an item in the collection. If the items are numbered (inclusive) to n (exclusive), the current position starts off as -1, and finishes at n.

IEnumerator is typically implemented as a nested type and is initialized by passing the collection to the constructor of the IEnumerator:

using System;
using System.Collections;
public class MyCollection : IEnumerable {
  // Contents of this collection.
  private string[] items = {"hello", "world"};
  // Accessors for this collection.
  public string this[int index] { 
    get { return items[index]; }
  }
  public int Count {
    get { return items.Length; }
  }
  // Implement IEnumerable.
  public virtual IEnumerator GetEnumerator (  ) {
    return new MyCollection.Enumerator(this);
  }
  // Define a custom enumerator.
  private class Enumerator : IEnumerator { 
    private MyCollection collection;
    private int currentIndex = -1;
    internal Enumerator (MyCollection collection) {
      this.collection = collection;
    }
    public object Current {
      get {
        if (currentIndex==-1 || currentIndex == collection.Count)
          throw new InvalidOperationException(  );
        return collection [currentIndex];
      }
    }
    public bool MoveNext (  ) {
      if (currentIndex > collection.Count)
        throw new InvalidOperationException(  );
      return ++currentIndex < collection.Count;
    }
    public void Reset (  ) {
      currentIndex = -1;
    }
  }
}

The collection can then be enumerated in either of these two ways:

MyCollection mcoll = new MyCollection(  );
// Using foreach: substitute your typename for 
string
foreach (
string item in mcoll) {
  Console.WriteLine(item);
}
// Using IEnumerator: substitute your typename for 
string
IEnumerator ie = mcoll.GetEnumerator(  );
while (ie.MoveNext(  )) {
  
string item = (
string) ie.Current;
  Console.WriteLine(item);
}

ICollection interface

public interface ICollection : IEnumerable {
   void CopyTo(Array 
array, int 
index);
   int Count {get;}
   bool IsSynchronized {get;}
   object SyncRoot {get;}
}

ICollection is the interface implemented by all collections, including arrays, and provides the following methods:

CopyTo(Array array, int index)

This method copies all the elements into the array starting at the specified index in the source collection.

Count

This property returns the number of elements in the collection.

IsSynchronized( )

This method allows you to determine whether or not a collection is thread-safe. The collections provided in the FCL are not themselves thread-safe, but each one includes a Synchronized method that returns a thread-safe wrapper of the collection.

SyncRoot( )

This property returns an object (usually the collection itself) that can be locked to provide basic thread-safe support for the collection.

IComparer interface

IComparer is a standard interface that compares two objects for sorting in Arrays. You generally don’t need to implement this interface, since a default implementation that uses the IComparable interface is already provided by the Comparer type, which is used by the Array type.

public interface IComparer {
   int Compare(object x, object y);
}

IList interface

IList is an interface for array-indexable collections, such as ArrayList.

public interface IList : ICollection, IEnumerable {
   object this [int index] {get; set}
   bool IsFixedSize { get; }
   bool IsReadOnly { get; }
   int Add(object o);
   void Clear(  );
   bool Contains(object value);
   int IndexOf(object value);
   void Insert(int index, object value);
   void Remove(object value);
   void RemoveAt(int index);
}

IDictionary interface

IDictionary is an interface for key/value-based collections, such as Hashtable and SortedList.

public interface IDictionary : ICollection, IEnumerable {
   object this [object key] {get; set};
   bool IsFixedSize { get; }
   bool IsReadOnly { get; }
   ICollection Keys {get;}
   ICollection Values {get;}
   void Clear(  );
   bool Contains(object key);
   IDictionaryEnumerator GetEnumerator(  );
   void Remove(object key);
}

IDictionaryEnumerator interface

IDictionaryEnumerator is a standardized interface that enumerates over the contents of a dictionary.

public interface IDictionaryEnumerator : IEnumerator {
   DictionaryEntry Entry {get;}
   object Key {get;}
   object Value {get;}
}

IHashCodeProvider interface

IHashCodeProvider is a standard interface used by the Hashtable collection to hash its objects for storage.

public interface IHashCodeProvider {
   int GetHashCode(object o);
}

Regular Expressions

The FCL includes support for performing regular expression matching and replacement capabilities. The expressions are based on Perl5 regexp, including lazy quantifiers (e.g., ??, *?, +?, and {n,m}?), positive and negative lookahead, and conditional evaluation.

The types mentioned in this section all exist in the System.Text.RegularExpressions namespace.

Regex Class

The Regex class is the heart of the FCL regular expression support. Used both as an object instance and a static type, the Regex class represents an immutable, compiled instance of a regular expression that can be applied to a string via a matching process.

Internally, the regular expression is stored as either a sequence of internal regular expression bytecodes that are interpreted at match time or as compiled MSIL opcodes that are JIT-compiled by the CLR at runtime. This allows you to make a tradeoff between worsened regular expression startup time and memory utilization versus higher raw match performance at runtime.

For more information on the regular expression options, supported character escapes, substitution patterns, character sets, positioning assertions, quantifiers, grouping constructs, backreferences, and alternation, see Appendix B.

Match and MatchCollection Classes

The Match class represents the result of applying a regular expression to a string, looking for the first successful match. The MatchCollection class contains a collection of Match instances that represent the result of applying a regular expression to a string recursively until the first unsuccessful match occurs.

Group Class

The Group class represents the results from a single grouping expression. From this class, it is possible to drill down to the individual subexpression matches with the Captures property.

Capture and CaptureCollection Classes

The CaptureCollection class contains a collection of Capture instances, each representing the results of a single subexpression match.

Using Regular Expressions

Combining these classes, you can create the following example:

/*
 * Sample showing multiple groups
 * and groups with multiple captures
 */
using System;
using System.Text.RegularExpressions;
class Test {
  static void Main(  ) {
    string text = "abracadabra1abracadabra2abracadabra3";
    string pat = @"
      (       # start the first group
        abra  # match the literal 'abra'
        (     # start the second (inner) group
        cad   # match the literal 'cad'
        )?    # end the second (optional) group
      )       # end the first group
     +        # match one or more occurences
     ";
    Console.WriteLine("Original text = [{0}]", text);
    // Create the Regex. IgnorePatternWhitespace permits 
    // whitespace and comments.
    Regex r = new Regex(pat, RegexOptions.IgnorePatternWhitespace);
    int[] gnums = r.GetGroupNumbers(  ); // get the list of group numbers
    Match m = r.Match(text); // get first match
    while (m.Success) {
      Console.WriteLine("Match found:");
      // start at group 1
      for (int i = 1; i < gnums.Length; i++) {
        Group g = m.Groups[gnums[i]]; // get the group for this match
        Console.WriteLine("	Group{0}=[{1}]", gnums[i], g);
        CaptureCollection cc = g.Captures; // get caps for this group
        for (int j = 0; j < cc.Count; j++) {
          Capture c = cc[j];
          Console.WriteLine("		Capture{0}=[{1}] Index={2} Length={3}",
                            j, c, c.Index, c.Length);
        }
      }
      m = m.NextMatch(  ); // get next match
    } // end while
  }
}

The preceding example produces the following output:

Original text = [abracadabra1abracadabra2abracadabra3]
Match found:
        Group1=[abra]
                Capture0=[abracad] Index=0 Length=7
                Capture1=[abra] Index=7 Length=4
        Group2=[cad]
                Capture0=[cad] Index=4 Length=3
Match found:
        Group1=[abra]
                Capture0=[abracad] Index=12 Length=7
                Capture1=[abra] Index=19 Length=4
        Group2=[cad]
                Capture0=[cad] Index=16 Length=3
Match found:
        Group1=[abra]
                Capture0=[abracad] Index=24 Length=7
                Capture1=[abra] Index=31 Length=4
        Group2=[cad]
                Capture0=[cad] Index=28 Length=3

Input/Output

The FCL provides a streams-based I/O framework that can handle a wide range of stream and backing store types. This support for streams also infuses the rest of the FCL, with the pattern repeating in non-I/O areas such as cryptography, HTTP support, and more.

This section describes the core stream types and provides examples. The types mentioned in this section all exist in the System.IO namespace.

Streams and Backing Stores

A stream represents the flow of data coming in and out of a backing store. A backing store represents the endpoint of a stream. Although a backing store is often a file or network connection, in reality it can represent any medium capable of reading or writing raw data.

A simple example would be to use a stream to read and write to a file on disk. However, streams and backing stores are not limited to disk and network I/O. A more sophisticated example would be to use the cryptography support in the FCL to encrypt or decrypt a stream of bytes as they move around in memory.

Abstract Stream class

Stream is an abstract class that defines operations for reading and writing a stream of raw, typeless data as bytes. Once a stream has been opened, it stays open and can be read from or written to until the stream is flushed and closed. Flushing a stream updates the writes made to the stream; closing a stream first flushes the stream, then closes the stream.

Stream has the properties CanRead, CanWrite, Length, CanSeek, and Position. CanSeek is true if the stream supports random access and false if it only supports sequential access. If a stream supports random access, set the Position property to move to a linear position on that stream.

The Stream class provides synchronous and asynchronous read and write operations. By default, an asynchronous method calls the stream’s corresponding synchronous method by wrapping the synchronous method in a delegate type and starting a new thread. Similarly, by default, a synchronous method calls the stream’s corresponding asynchronous method and waits until the thread has completed its operation. Classes that derive from Stream must override either the synchronous or asynchronous methods but may override both sets of methods if the need arises.

Concrete Stream-derived classes

The FCL includes a number of different concrete implementations of the abstract base class Stream. Each implementation represents a different storage medium and allows a raw stream of bytes to be read from and written to the backing store.

Examples of this include the FileStream class (which reads and writes bytes to and from a file) and the System.Net.Sockets.NetworkStream class (which sends and receives bytes over the network).

In addition, a stream may act as the frontend to another stream, performing additional processing on the underlying stream as needed. Examples of this include stream encryption/decryption and stream buffering.

Here is an example that creates a text file on disk and uses the abstract Stream type to write data to it:

using System.IO;
class Test {
  static void Main(  ) {
    Stream s = new FileStream("foo.txt", FileMode.Create);
    s.WriteByte(67);
    s.WriteByte(35);
    s.Close(  );
  }
}

Encapsulating raw streams

The Stream class defines operations for reading and writing raw, typeless data in the form of bytes. Typically, however, you need to work with a stream of characters, not a stream of bytes. To solve this problem, the FCL provides the abstract base classes TextReader and TextWriter, which define a contract to read and write a stream of characters, as well as a set of concrete implementations.

Abstract TextReader/TextWriter classes

TextReader and TextWriter are abstract base classes that define operations for reading and writing a stream of characters. The most fundamental operations of the TextReader and TextWriter classes are the methods that read and write a single character to or from a stream.

The TextReader class provides default implementations for methods that read in an array of characters or a string representing a line of characters. The TextWriter class provides default implementations for methods that write an array of characters, as well as methods that convert common types (optionally with formatting options) to a sequence of characters.

The FCL includes a number of different concrete implementations of the abstract base classes TextReader and TextWriter. Some of the most prominent include StreamReader and StreamWriter, and StringReader and StringWriter.

StreamReader and StreamWriter classes

StreamReader and StreamWriter are concrete classes that derive from TextReader and TextWriter, respectively, and operate on a Stream (passed as a constructor parameter).

These classes allow you to combine a Stream (which can have a backing store but only knows about raw data) with a TextReader/TextWriter (which knows about character data, but doesn’t have a backing store).

In addition, StreamReader and StreamWriter can perform special translations between characters and raw bytes. Such translations include translating Unicode characters to ANSI characters to either big- or little-endian format.

Here is an example that uses a StreamWriter wrapped around a FileStream class to write to a file:

using System.Text;
using System.IO;
class Test {
  static void Main(  ) {
    Stream fs = new FileStream ("foo.txt", FileMode.Create);
    StreamWriter sw = new StreamWriter(fs, Encoding.ASCII);
    sw.Write("Hello!");
    sw.Close(  );
  }
}

StringReader and StringWriter classes

StringReader and StringWriter are concrete classes that derive from TextReader and TextWriter, respectively, and operate on a string (passed as a constructor parameter).

The StringReader class can be thought of as the simplest possible read-only backing store because it simply performs read operations on that string. The StringWriter class can be thought of as the simplest possible write-only backing store because it simply performs write operations on that StringBuilder.

Here is an example that uses a StringWriter wrapped around an underlying StringBuilder backing store to write to a string:

using System;
using System.IO;
using System.Text;
class Test {
  static void Main(  ) {
    StringBuilder sb = new StringBuilder(  );
    StringWriter sw = new StringWriter(sb);
    WriteHello(sw);
    Console.WriteLine(sb);
  }
  static void WriteHello(TextWriter tw) {
    tw.Write("Hello, String I/O!");
  }
}

Directories and Files

The File and Directory classes encapsulate the operations typically associated with file I/O, such as copying, moving, deleting, renaming, and enumerating files and directories.

The actual manipulation of the contents of a file is done with a FileStream. The File class has methods that return a FileStream, though you may directly instantiate a FileStream.

In this example, you read in and print out the first line of a text file specified on the command line:

using System;
using System.IO;
class Test {
   static void Main(string[] args) {
      Stream s = File.OpenRead(args[0]);
      StreamReader sr = new StreamReader(s);
      Console.WriteLine(sr.ReadLine(  ));
      sr.Close(  );
   }
}

Networking

The FCL includes a number of types that make accessing networked resources easy. Offering different levels of abstraction, these types allow an application to ignore much of the detail normally required to access networked resources, while retaining a high degree of control.

This section describes the core networking support in the FCL and provides numerous examples leveraging the predefined classes. The types mentioned in this section all exist in the System.Net and System.Net.Sockets namespaces.

Network Programming Models

High-level access is performed using a set of types that implement a generic request/response architecture that is extensible to support new protocols. The implementation of this architecture in the FCL also includes HTTP-specific extensions to make interacting with web servers easy.

Should the application require lower-level access to the network, types exist to support the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Finally, in situations in which direct transport-level access is required, there are types that provide raw socket access.

Generic Request/Response Architecture

The request/response architecture is based on Uniform Resource Indicator (URI) and stream I/O, follows the factory design pattern, and makes good use of abstract types and interfaces.

A factory type (WebRequest) parses the URI and creates the appropriate protocol handler to fulfill the request.

Protocol handlers share a common abstract base type (WebRequest) that exposes properties that configure the request and methods used to retrieve the response.

Responses are also represented as types and share a common abstract base type (WebResponse) that exposes a Stream, providing simple streams-based I/O and easy integration into the rest of the FCL.

This example is a simple implementation of the popular Unix snarf utility. It demonstrates the use of the WebRequest and WebResponse classes to retrieve the contents of a URI and print them to the console:

// Snarf.cs
// Run Snarf.exe <http-uri> to retrieve a web page
using System;
using System.IO;
using System.Net;
using System.Text;
class Snarf {
  static void Main(string[] args) {

    // Retrieve the data at the URL with an WebRequest ABC
    WebRequest req = WebRequest.Create(args[0]);
    WebResponse resp = req.GetResponse(  );

    // Read in the data, performing ASCII->Unicode encoding
    Stream s = resp.GetResponseStream(  );
    StreamReader sr = new StreamReader(s, Encoding.ASCII);
    string doc = sr.ReadToEnd(  );

    Console.WriteLine(doc); // Print result to console
  }
}

HTTP-Specific Support

The request/response architecture inherently supports protocol-specific extensions via the use of subtyping.

Since the WebRequest creates and returns the appropriate handler type based on the URI, accessing protocol-specific features is as easy as downcasting the returned WebRequest object to the appropriate protocol-specific handler and accessing the extended functionality.

The FCL includes specific support for the HTTP protocol, including the ability to easily access and control elements of an interactive web session, such as the HTTP headers, user-agent strings, proxy support, user credentials, authentication, keep-alives, pipelining, and more.

This example demonstrates the use of the HTTP-specific request/response classes to control the user-agent string for the request and retrieve the server type:

// ProbeSvr.cs
// Run ProbeSvr.exe <servername> to retrieve the server type
using System;
using System.Net;
class ProbeSvr {
  static void Main(string[] args) {

    // Get instance of WebRequest ABC, convert to HttpWebRequest
    WebRequest req = WebRequest.Create(args[0]);
    HttpWebRequest httpReq = (HttpWebRequest)req;

    // Access HTTP-specific features such as User-Agent
    httpReq.UserAgent = "CSPRProbe/1.0";

    // Retrieve response and print to console
    WebResponse resp = req.GetResponse(  );
    HttpWebResponse httpResp = (HttpWebResponse)resp;
    Console.WriteLine(httpResp.Server);
  }
}

Adding New Protocol Handlers

Adding handlers to support new protocols is trivial: simply implement a new set of derived types based on WebRequest and WebResponse, implement the IWebRequestCreate interface on your WebRequest-derived type, and register it as a new protocol handler with Web-Request.RegisterPrefix( ) at runtime. Once this is done, any code that uses the request/response architecture can access networked resources using the new URI format (and underlying protocol).

Using TCP, UDP, and Sockets

The System.Net.Sockets namespace includes types that provide protocol-level support for TCP and UDP. These types are built on the underlying Socket type, which is itself directly accessible for transport-level access to the network.

Two classes provide the TCP support: TcpListener and TcpClient. TcpListener listens for incoming connections, creating Socket instances that respond to the connection request. TcpClient connects to a remote host, hiding the details of the underlying socket in a Stream-derived type that allows stream I/O over the network.

A class called UdpClient provides the UDP support. UdpClient serves as both a client and a listener and includes multicast support, allowing individual datagrams to be sent and received as byte arrays.

Both the TCP and the UDP classes help access the underlying network socket (represented by the Socket class). The Socket class is a thin wrapper over the native Windows sockets functionality and is the lowest level of networking accessible to managed code.

The following example is a simple implementation of the Quote of the Day (QUOTD) protocol, as defined by the IETF in RFC 865. It demonstrates the use of a TCP listener to accept incoming requests and the use of the lower-level Socket type to fulfill the request:

// QOTDListener.cs 
// Run QOTDListener.exe to service incoming QOTD requests
using System;
using System.Net;
using System.Net.Sockets;
using System.Text;
class QOTDListener {
  static string[] quotes = {
    @"Sufficiently advanced magic is indistinguishable from technology
         -- Terry Pratchett",
    @"Sufficiently advanced technology is indistinguishable from magic
         -- Arthur C. Clarke" };
  static void Main(  ) {

    // Start a TCP listener on port 17
    TcpListener l = new TcpListener(17);
    l.Start(  );
    Console.WriteLine("Waiting for clients to connect");
    Console.WriteLine("Press Ctrl+C to quit...");
    int numServed = 1;
    while (true) {

      // Block waiting for an incoming socket connect request
      Socket s = l.AcceptSocket(  );

      // Encode alternating quotes as bytes for sending 
      Char[] carr = quotes[numServed%2].ToCharArray(  );
      Byte[] barr = Encoding.ASCII.GetBytes(carr);

      // Return data to client, then clean up socket and repeat
      s.Send(barr, barr.Length, 0);
      s.Shutdown(SocketShutdown.Both);
      s.Close(  );
      Console.WriteLine("{0} quotes served...", numServed++);
    }
  }
}

To test this example, run the listener and try connecting to port 17 on localhost using a telnet client. (Under Windows, this can be done from the command line by entering telnet localhost 17).

Notice the use of Socket.Shutdown and Socket.Close at the end of the while loop. This is required to flush and close the socket immediately, rather than wait for the garbage collector to finalize and collect unreachable Socket objects later.

Using DNS

The networking types in the base class library also support normal and reverse Domain Name System (DNS) resolution. Here’s an example using these types:

// DNSLookup.cs
// Run DNSLookup.exe <servername> to determine IP addresses
using System;
using System.Net;
class DNSLookup {
  static void Main(string[] args) {
    IPHostEntry he = Dns.GetHostByName(args[0]);
    IPAddress[] addrs = he.AddressList;
    foreach (IPAddress addr in addrs)
      Console.WriteLine(addr);
  }
}

Threading

A C# application runs in one or more threads that effectively execute in parallel within the same application. Here is a simple multithreaded application:

using System;
using System.Threading;
class ThreadTest {
  static void Main(  ) {
    Thread t = new Thread(new ThreadStart(Go));
    t.Start(  );
    Go(  );
  }
  static void Go(  ) {
    for (char c='a'; c<='z'; c++ )
      Console.Write(c);
  }
}

In this example, a new thread object is constructed by passing it a ThreadStart delegate that wraps the method that specifies where to start execution for that thread. You then start the thread and call Go, so two separate threads are running Go in parallel. However, there’s a problem: both threads share a common resource—the console. If you run ThreadTest, you could get output like this:

abcdabcdefghijklmnopqrsefghijklmnopqrstuvwxyztuvwxyz

Thread Synchronization

Thread synchronization comprises techniques for ensuring that multiple threads coordinate their access to shared resources.

The lock statement

C# provides the lock statement to ensure that only one thread at a time can access a block of code. Consider the following example:

using System;
using System.Threading;
class LockTest {
  static void Main(  ) {
    LockTest lt = new LockTest (  );
    Thread t = new Thread(new ThreadStart(lt.Go));
    t.Start(  );
    lt.Go(  );
  }
  void Go(  ) {
    lock(this)
      for ( char c='a'; c<='z'; c++)
        Console.Write(c);
  }
}

Running LockTest produces the following output:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz

The lock statement acquires a lock on any reference-type instance. If another thread has already acquired the lock, the thread doesn’t continue until the other thread relinquishes its lock on that instance.

The lock statement is actually a syntactic shortcut for calling the Enter and Exit methods of the FCL Monitor class (see Section 3.8.3):

System.Threading.Monitor.Enter(expression);
try {
  ...
}
finally {
  System.Threading.Monitor.Exit(expression);
}

Pulse and Wait operations

In combination with locks, the next most common threading operations are Pulse and Wait. These operations let threads communicate with each other via a monitor that maintains a list of threads waiting to grab an object’s lock:

using System;
using System.Threading;
class MonitorTest {
  static void Main(  ) {
    MonitorTest mt = new MonitorTest(  );
    Thread t = new Thread(new ThreadStart(mt.Go));
    t.Start(  );
    mt.Go(  );
  }
  void Go(  ) {
    for ( char c='a'; c<='z'; c++)
      lock(this) {
        Console.Write(c);
        Monitor.Pulse(this);
        Monitor.Wait(this);
      }
  }
}

Running MonitorTest produces the following result:

aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz

The Pulse method tells the monitor to wake up the next thread that is waiting to get a lock on that object as soon as the current thread has released it. The current thread typically releases the monitor in one of two ways. First, execution may leave the scope of the lock statement block. The second way is to call the Wait method, which temporarily releases the lock on an object and makes the thread fall asleep until another thread wakes it up by pulsing the object.

Deadlocks

The MonitorTest example actually contains a bug. When you run the program, it prints the correct output, but then the console window locks up. This occurs because the last thread printing z goes to sleep but never gets pulsed. You can solve the problem by replacing the Go method with this new implementation:

void Go(  ) {
  for ( char c='a'; c<='z'; c++)
    lock(this) {
      Console.Write(c);
      Monitor.Pulse(this);
      if (c<'z')
        Monitor.Wait(this);
    }
}

In general, the danger of using locks is that two threads may both end up being blocked waiting for a resource held by the other thread. This situation is known as a deadlock. Most common deadlock situations can be avoided by ensuring that you always acquire resources in the same order.

Atomic operations

Atomic operations are operations the system promises will not be interrupted. In the previous examples, the Go method isn’t atomic because it can be interrupted while it is running so another thread can run. However, updating a variable is atomic because the operation is guaranteed to complete without control being passed to another thread. The Interlocked class provides additional atomic operations, which allows basic operations to be performed without requiring a lock. This can be useful, since acquiring a lock is many times slower than a simple atomic operation.

Common Thread Types

Much of the functionality of threads is provided through the classes in the System.Threading namespace. The most basic thread class to understand is the Monitor class, which is explained in the following section.

Monitor Class

The System.Threading.Monitor class provides an implementation of Hoare’s Monitor that allows you to use any reference-type instance as a monitor.

Enter and Exit methods

The Enter and Exit methods, respectively, obtain and release a lock on an object. If the object is already held by another thread, Enter waits until the lock is released or the thread is interrupted by a ThreadInterruptedException. Every call to Enter for a given object on a thread should be matched with a call to Exit for the same object on the same thread.

TryEnter methods

The TryEnter methodsare similar to the Enter method but don’t require a lock on the object to proceed. These methods return true if the lock is obtained and false if it isn’t, optionally passing in a timeout parameter that specifies the maximum time to wait for the other threads to relinquish the lock.

Wait methods

The thread holding a lock on an object may call one of the Wait methods to temporarily release the lock and block itself while it waits for another thread to notify it by executing a pulse on the monitor. This approach can tell a worker thread that there is work to perform on that object. The overloaded versions of Wait allow you to specify a timeout that reactivates the thread if a pulse hasn’t arrived within the specified duration. When the thread wakes up, it reacquires the monitor for the object (potentially blocking until the monitor becomes available). Wait returns true if the thread is reactivated by another thread pulsing the monitor and returns false if the Wait call times out without receiving a pulse.

Pulse and PulseAll methods

A thread holding a lock on an object may call Pulse on that object to wake up a blocked thread as soon as the thread calling Pulse has released its lock on the monitor. If multiple threads are waiting on the same monitor, Pulse activates only the first in the queue (successive calls to Pulse wake up other waiting threads, one per call). The PulseAll method successively wakes up all the threads.

Assemblies

An assembly is a logical package (similar to a DLL in Win32) that consists of a manifest, a set of one or more modules, and an optional set of resources. This package forms the basic unit of deployment and versioning, and creates a boundary for type resolution and security permissioning.

Elements of an Assembly

Every .NET application consists of at least one assembly, which is in turn built from a number of basic elements.

The manifest contains a set of metadata that describes everything the runtime needs to know about the assembly. This information includes:

  • The textual name of the assembly

  • The version number of the assembly

  • An optional shared name and signed assembly hash

  • The list of files in the assembly with file hashes

  • The list of referenced assemblies, including versioning information and an optional public key

  • The list of types included in the assembly, with a mapping to the module containing the type

  • The set of minimum and optional security permissions requested by the assembly

  • The set of security permissions explicitly refused by the assembly

  • Culture, processor, and OS information

  • A set of custom attributes to capture details such as product name, owner information, etc.

Modules contain types described using metadata and implemented using MSIL.

Resources contain nonexecutable data that is logically included with the assembly. Examples of this include bitmaps, localizable text, persisted objects, etc.

Packaging

The simplest assembly contains a manifest and a single module containing the application’s types, packaged as an EXE with a Main entry point. More complex assemblies can include multiple modules ( Portable Executable (PE) files), separate resource files, a manifest, etc.

The manifest is generally included in one of the existing PE files in the assembly, although the manifest can also be a standalone PE file.

Modules are PE files, typically DLLs or EXEs. Only one module in an assembly can contain an entry point (either Main, WinMain, or DllMain).

An assembly may also contain multiple modules. This technique can reduce the working set of the application, because the CLR loads only the required modules. In addition, each module can be written in a different language, allowing a mixture of C#, VB.NET, and raw MSIL. Although not common, a single module can also be included in several different assemblies.

Finally, an assembly may contain a set of resources, which can be kept either in standalone files or included in one of the PE files in the assembly.

Deployment

An assembly is the smallest .NET unit of deployment. Due to the self-describing nature of a manifest, deployment can be as simple as copying the assembly (and in the case of a multifile assembly, all the associated files) into a directory.

This is a vast improvement over traditional COM development in which components, their supporting DLLs, and their configuration information are spread out over multiple directories and the Windows registry.

Generally, assemblies are deployed into the application directory and are not shared. These are called private assemblies. However, assemblies can also be shared between applications; these are called shared assemblies. To share an assembly you need to give it a shared name (also known as a “strong” name) and deploy it in the global assembly cache.

The shared name consists of a name, a public key, and a digital signature. The shared name is included in the assembly manifest and forms the unique identifier for the assembly.

The global assembly cache is a machine-wide storage area that contains assemblies intended for use by multiple applications.

For more information on working with shared assemblies and the global assembly cache, see Section E.1.4 in Appendix E.

Versioning

The manifest of an application contains a version number for the assembly and a list of all the referenced assemblies with associated version information. Assembly version numbers are divided into four parts and look like this:

<major>.<minor>.<build>.<revision>

This information is used to mitigate versioning problems as assemblies evolve over time.

At runtime, the CLR uses the version information specified in the manifest and a set of versioning policies defined for the machine to determine which versions of each dependent, shared assembly to load.

The default versioning policy for shared assemblies restricts you to the version that your application was built against. By changing a configuration file, the application or an administrator can override this behavior.

Private assemblies have no versioning policy, and the CLR simply loads the newest assemblies found in the application directory.

Type Resolution

The unique identifier for a type (known as a TypeRef) consists of a reference to the assembly it was defined in and the fully qualified type name (including any namespaces). For example, this local variable declaration:

System.Xml.XmlReader xr;

is represented in MSIL assembly language as follows:

.assembly extern System.Xml { .ver 1:0:3300:0 ... }
.locals init (class [System.Xml]System.Xml.XmlReader V_0)

In this example, the System.Xml.XmlReader type is scoped to the System.Xmlsharedassembly, which is identified using a shared name and associated version number.

When your application starts, the CLR attempts to resolve all static TypeRefs by locating the correct versions of each dependent assembly (as determined by the versioning policy) and verifying that the types actually exist (ignoring any access modifiers).

When your application attempts to use the type, the CLR verifies that you have the correct level of access and throws runtime exceptions if there is a versioning incompatibility.

Security Permissions

The assembly forms a boundary for security permissioning.

The assembly manifest contains hashes for any referenced assemblies (determined at compile time), a list of the minimum set of security permissions the assembly requires in order to function, a list of the optional permissions that it requests, and a list of the permissions that it explicitly refuses (i.e., never wants to receive).

To illustrate how these permissions might be used, imagine an email client similar to Microsoft Outlook, developed using the .NET Framework. It probably requiresthe ability to communicate over the network on ports 110 (POP3), 25 (SMTP), and 143 (IMAP4). It might request the ability to run Java Script functions in a sandbox to allow for full interactivity when presenting HTML emails. Finally, it probably refusesever being granted the ability to write to disk or read the local address book, thus avoiding scripting attacks such as the ILoveYou virus.

Essentially, the assembly declares its security needs and assumptions, but leaves the final decision on permissioning up to the CLR, which enforces local security policy.

At runtime the CLR uses the hashes to determine if a dependent assembly has been tampered with and combines the assembly permission information with local security policy to determine whether to load the assembly and which permissions to grant it.

This mechanism provides fine-grained control over security and is a major advantage of the .NET Framework over traditional Windows applications.

Reflection

Many of the services available in .NET and exposed via C# (such as late binding, serialization, remoting, attributes, etc.) depend on the presence of metadata. Your own programs can also take advantage of this metadata and even extend it with new information.

Manipulating existing types via their metadata is termed reflection and is done using a rich set of types in the System.Reflection namespace. Creating new types (and associated metadata) is termed Reflection.Emit and is done via the types in the System.Reflection.Emit namespace. You can extend the metadata for existing types with custom attributes. For more information, see Section 3.11 later in this chapter.

Type Hierarchy

Reflection involves traversing and manipulating an object model that represents an application, including all its compile-time and runtime elements. Consequently, it is important to understand the various logical units of a .NET application and their roles and relationships.

The fundamental units of an application are its types, which contain members and nested types. In addition to types, an application contains one or more modules and one or more assemblies. All these elements are static and are described in metadata produced by the compiler at compile time. The one exception to this rule is elements (such as types, modules, assemblies, etc.) that are created on the fly via Reflection.Emit, which is described in the later section Section 3.10.8.

At runtime, these elements are all contained within an AppDomain. An AppDomain isn’t described with metadata, yet it plays an important role in reflection because it forms the root of the type hierarchy of a .NET application at runtime.

In any given application, the relationship between these units is hierarchical, as depicted by the following diagram:

AppDomain (runtime root of hierarchy)
  Assemblies
    Modules
      Types
        Members
        Nested types

Each of these elements is discussed in the following sections.

Types, members, and nested types

The most basic element that reflection deals with is the type. This class represents the metadata for each type declaration in an application (both predefined and user-defined types).

Types contain members, which include constructors, fields, properties, events, and methods. In addition, types may contain nested types, which exist within the scope of an outer type and are typically used as helper classes. Types are grouped into modules, which are, in turn, contained within assemblies.

Assemblies and modules

Assemblies are the logical equivalent of DLLs in Win32 and the basic units of deployment, versioning, and reuse for types. In addition, assembliescreate a security, visibility, and scope resolution boundary for types (see Section 3.9).

A module is a physical file such as a DLL, an EXE, or a resource (such as GIFs or JPGs). While it isn’t common practice, an assemblycan be composed of multiple modules, allowing you to control application working set size, use multiple languages within one assembly, and share a module across multiple assemblies.

AppDomains

From the perspective of reflection, an AppDomain is the root of the type hierarchy and serves as the container for assemblies and types when they are loaded into memory at runtime. A helpful way to think about an AppDomainis to view it as the logical equivalent of a process in a Win32 application.

AppDomains provide isolation, creating a hard boundary for managed code just like the process boundary under Win32. Similar to processes, AppDomains can be started and stopped independently, and application faults take down only the AppDomain the fault occurs in, not the process hosting the AppDomain.

Retrieving the Type for an Instance

At the heart of reflection is System.Type, which is an abstract base class that provides access to the metadata of a type.

You can access the Type class for any instance using GetType, which is a nonvirtual method of System.Object. When you call GetType, the method returns a concrete subtype of System.Type, which can reflect over and manipulate the type.

Retrieving a Type Directly

You can also retrieve a Type class by name (without needing an instance) using the static method GetType on the Type class, as follows:

Type t = Type.GetType("System.Int32");

Finally, C# provides the typeof operator, which returns the Type class for any type known at compile time:

Type t = typeof(System.Int32);

The main difference between these two approaches is that Type.GetType is evaluated at runtime and is more dynamic, binding by name, while the typeof operator is evaluated at compile time, uses a type token, and is slightly faster to call.

Reflecting Over a Type Hierarchy

Once you have retrieved a Type instance you can navigate the application hierarchy described earlier, accessing the metadata via types that represent members, modules, assemblies, namespaces, AppDomains, and nested types. You can also inspect the metadata and any custom attributes, create new instances of the types, and invoke members.

Here is an example that uses reflection to display the members in three different types:

using System;
using System.Reflection;
class Test {
  static void Main(  ) {
    object o = new Object(  );
    DumpTypeInfo(o.GetType(  ));
    DumpTypeInfo(typeof(int));
    DumpTypeInfo(Type.GetType("System.String"));
  }
  static void DumpTypeInfo(Type t) {
    Console.WriteLine("Type: {0}", t);

    // Retrieve the list of members in the type
    MemberInfo[] miarr = t.GetMembers(  );

    // Print out details on each of them
    foreach (MemberInfo mi in miarr)
      Console.WriteLine("  {0}={1}", mi.MemberType, mi);
  }
}

Late Binding to Types

Reflection can also perform late binding, in which the application dynamically loads, instantiates, and uses a type at runtime. This provides greater flexibility at the expense of invocation overhead.

In this section, we create an example that uses very late binding, dynamically discovers new types at runtime, and uses them.

In the example one or more assemblies are loaded by name (as specified on the command line) and iterated through the types in the assembly, looking for subtypes of the Greeting abstract base class. When one is found, the type is instantiated and its SayHello method invoked, which displays an appropriate greeting.

To perform the runtime discovery of types, we use an abstract base class that’s compiled into an assembly as follows (see the source comment for filename and compilation information):

// Greeting.cs - compile with /t:library
public abstract class Greeting { 
  public abstract void SayHello(  );
}

Compiling this code produces a file named Greeting.dll, which the other parts of the sample can use.

We now create a new assembly containing two concrete subtypes of the abstract type Greeting, as follows (see the source comment for filename and compilation information):

// English.cs - compile with /t:library /r:Greeting.dll
using System;
public class AmericanGreeting : Greeting {
  private string msg = "Hey, dude. Wassup!";
  public override void SayHello(  ) {
    Console.WriteLine(msg);
  }
}
public class BritishGreeting : Greeting {
  private string msg = "Good morning, old chap!";
  public override void SayHello(  ) {
    Console.WriteLine(msg);
  }
}

Compiling the source file English.cs produces a file named English.dll, which the main program can now dynamically reflect over and use.

Now we create the main sample, as follows (see the source comment for filename and compilation information):

// SayHello.cs - compile with /r:Greeting.dll
// Run with SayHello.exe <dllname1> <dllname2> ... <dllnameN>
using System;
using System.Reflection;
class Test {
  static void Main (string[] args) {

    // Iterate over the cmd-line options,
    // trying to load each assembly
    foreach (string s in args) {
      Assembly a = Assembly.LoadFrom(s);
      
      // Pick through all the public type, looking for
      // subtypes of the abstract base class Greeting
      foreach (Type t in a.GetTypes(  ))
        if (t.IsSubclassOf(typeof(Greeting))) {

          // Having found an appropriate subtype, create it
          object o = Activator.CreateInstance(t);

          // Retrieve the SayHello MethodInfo and invoke it
          MethodInfo mi = t.GetMethod("SayHello");
          mi.Invoke(o, null);
        }
    }
  }
}

Running the sample now with SayHello English.dll produces the following output:

Hey, dude. Wassup!
Good morning, old chap!

The interesting aspect of the preceding sample is that it’s completely late-bound; i.e., long after the SayHello program is shipped you can create a new type and have SayHello automatically take advantage of it by simply specifying it on the command line. This is one of the key benefits of late binding via reflection.

Activation

In the previous examples, we loaded an assembly by hand and used the System.Activator class to create a new instance based on a type. There are many overrides of the CreateInstance method that provide a wide range of creation options, including the ability to short-circuit the process and create a type directly:

object o = Activator.CreateInstance("Assem1.dll",
                                    "Friendly.Greeting");

Other capabilities of the Activator type include creating types on remote machines, creating types in specific AppDomains (sandboxes), and creating types by invoking a specific constructor (rather than using the default constructor as these examples show).

Advanced Uses of Reflection

The preceding example demonstrates the use of reflection, but doesn’t perform any tasks you can’t accomplish using normal C# language constructs. However, reflection can also manipulate types in ways not supported directly in C#, as is demonstrated in this section.

While the CLR enforces access controls on type members (specified using access modifiers such as private and protected), these restrictions don’t apply to reflection. Assuming you have the correct set of permissions, you can use reflection to access and manipulate private data and function members, as this example using the Greeting subtypes from the previous section shows (see the source comment for filename and compilation information):

// InControl.cs - compile with /r:Greeting.dll,English.dll
using System;
using System.Reflection;
class TestReflection {
  // Note: This method requires the ReflectionPermission perm.
  static void ModifyPrivateData(object o, string msg) {

    // Get a FieldInfo type for the private data member
    Type t = o.GetType(  ); 
    FieldInfo fi = t.GetField("msg", BindingFlags.NonPublic|
                                     BindingFlags.Instance);

    // Use the FieldInfo to adjust the data member value
    fi.SetValue(o, msg);
  }
  static void Main(  ) {
    // Create instances of both types
    BritishGreeting bg = new BritishGreeting(  );
    AmericanGreeting ag = new AmericanGreeting(  );

    // Adjust the private data via reflection
    ModifyPrivateData(ag, "Things are not the way they seem");
    ModifyPrivateData(bg, "The runtime is in total control!");
    
    // Display the modified greeting strings
    ag.SayHello(  ); // "Things are not the way they seem"
    bg.SayHello(  ); // "The runtime is in total control!"
  }
}

When run, this sample generates the following output:

Things are not the way they seem
The runtime is in total control!

This demonstrates that the private msg data members in both types are modified via reflection, although there are no public members defined on the types that allow that operation. Note that while this technique can bypass access controls, it still doesn’t violate type safety.

Although this is a somewhat contrived example, the capability can be useful when building utilities such as class browsers and test suite automation tools that need to inspect and interact with a type at a deeper level than its public interface.

Creating New Types at Runtime

The System.Reflection.Emit namespace contains classes that can create entirely new types at runtime. These classes can define a dynamic assembly in memory; define a dynamic module in the assembly; define a new type in the module, including all its members; and emit the MSIL opcodes needed to implement the application logic in the members.

Here is an example that creates and uses a new type called HelloWorld with a member called SayHello:

using System;
using System.Reflection;
using System.Reflection.Emit;
public class Test {
  static void Main(  )  {
    // Create a dynamic assembly in the current AppDomain
    AppDomain ad = AppDomain.CurrentDomain;
    AssemblyName an = new AssemblyName(  );
    an.Name = "DynAssembly";
    AssemblyBuilder ab = 
      ad.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
    
    // Create a module in the assembly and a type in the module
    ModuleBuilder modb = ab.DefineDynamicModule("DynModule");
    TypeBuilder tb = modb.DefineType("AgentSmith", 
                                     TypeAttributes.Public);
 
    // Add a SayHello member to the type 
    MethodBuilder mb = tb.DefineMethod("SayHello",        
                                       MethodAttributes.Public,
                                       null, null);
                                        
    // Generate the MSIL for the SayHello Member
    ILGenerator ilg = mb.GetILGenerator(  );
    ilg.EmitWriteLine("Never send a human to do a machine's job.");
    ilg.Emit(OpCodes.Ret);

    // Finalize the type so we can create it
    Type t = tb.CreateType(  );

    // Create an instance of the new type
    object o = Activator.CreateInstance(t);
    
    // Prints "Never send a human to do a machine's job."
    t.GetMethod("SayHello").Invoke(o, null);
  }
}

A common example using Reflection.Emit is the regular expression support in the FCL, which can emit new types that are tuned to search for specific regular expressions, eliminating the overhead of interpreting the regular expression at runtime.

Other uses of Reflection.Emit in the FCL include dynamically generating transparent proxies for remoting and generating types that perform specific XSLT transforms with the minimum runtime overhead.

Custom Attributes

Types, members, modules, and assemblies all have associated metadata that is used by all the major CLR services, is considered an indivisible part of an application, and can be accessed via reflection(see Section 3.10).

A key characteristic of metadata is that it can be extended. You extend the metadata with custom attributes, which allow you to “decorate” a code element with additional information stored in the metadata associated with the element.

This additional information can then be retrieved at runtime and used to build services that work declaratively, which is the way that the CLR implements core features such as serialization and interception.

Language Support for Custom Attributes

Decorating an element with a custom attribute is known as specifying the custom attribute and is done by writing the name of the attribute enclosed in brackets ([]) immediately before the element declaration as follows:

[Serializable] public class Foo {...}

In this example, the Foo class is specified as serializable. This information is saved in the metadata for Foo and affects the way the CLR treats an instance of this class.

A useful way to think about custom attributes is that they expand the built-in set of declarative constructs such as public, private, and sealed in the C# language.

Compiler Support for Custom Attributes

In reality, custom attributes are simply types derived from System.Attribute with language constructs for specifying them on an element (see Section 2.16 in Chapter 2).

These language constructs are recognized by the compiler, which emits a small chunk of data into the metadata. This custom data includes a serialized call to the constructor of the custom attribute type (containing the values for the positional parameters) and a collection of property set operations (containing the values for the named parameters).

The compiler also recognizes a small number of pseudo-custom attributes. These are special attributes that have direct representation in metadata and are stored natively (i.e., not as chunks of custom data). This is primarily a runtime performance optimization, although it has some implications for retrieving attributes via reflection, as discussed later.

To understand this, consider the following class with two specified attributes:

[Serializable, Obsolete]
class Foo {...}

When compiled, the metadata for the Foo class looks like this in MSIL:

.class private auto ansi serializable beforefieldinit Foo
       extends [mscorlib]System.Object
{
  .custom instance void 
  [mscorlib]System.ObsoleteAttribute::.ctor(  ) = ( 01 00 00 00 ) 
  ...
}

Compare the different treatment by the compiler of the Obsolete attribute, which is a custom attribute and represented by a .custom directive containing the serialized attribute parameters, to the treatment of the Serializable attribute, which is a pseudo-custom attribute represented directly in the metadata with the serializable token.

Runtime Support for Custom Attributes

At runtime the core CLR services such as serialization and remoting inspect the custom and pseudo-custom attributes to determine how to handle an instance of a type.

In the case of custom attributes, this is done by creating an instance of the attribute (invoking the relevant constructor call and property-set operations), and then performing whatever steps are needed to determine how to handle an instance of the type.

In the case of pseudo-custom attributes, this is done by simply inspecting the metadata directly and determining how to handle an instance of the type. Consequently, handling pseudo-custom attributes is more efficient than handling custom attributes.

Note that none of these steps is initiated until a service or user program actually tries to access the attributes, so there is little runtime overhead unless it is required.

Predefined Attributes

The .NET Framework makes extensive use of attributes for purposes ranging from simple documentation to advanced support for threading, remoting, serialization, and COM interop. These attributes are all defined in the FCL and can be used, extended, and retrieved by your own code.

However, certain attributes are treated specially by the compiler and the runtime. Three attributes considered general enough to be defined in the C# specification are AttributeUsage, Conditional, and Obsolete. Other attributes such as CLSCompliant, Serializable, and NonSerialized are also treated specially.

AttributeUsage attribute

Syntax:

[AttributeUsage(target-enum
  [, AllowMultiple=[true|false]]?
  [, Inherited=[true|false]]?
] (for classes)

The AttributeUsage attribute is applied to a new attribute class declaration. It controls how the new attribute should be treated by the compiler, specifically, what set of targets (classes, interfaces, properties, methods, parameters, etc.) the new attribute can be specified on, whether multiple instances of this attribute may be applied to the same target, and whether this attribute propagates to subtypes of the target.

target-enum is a bitwise mask of values from the System.AttributeTargets enum, which looks like this:

namespace System {
  [Flags]
  public enum AttributeTargets {
    Assembly     = 0x0001,
    Module       = 0x0002,
    Class        = 0x0004,
    Struct       = 0x0008,
    Enum         = 0x0010,
    Constructor  = 0x0020,
    Method       = 0x0040,
    Property     = 0x0080,
    Field        = 0x0100,
    Event        = 0x0200,
    Interface    = 0x0400,
    Parameter    = 0x0800,
    Delegate     = 0x1000,
    ReturnValue  = 0x2000,
    All          = 0x3fff,
  }
}

Conditional attribute

Syntax:

[Conditional(symbol)] (for methods)

The Conditionalattribute can be applied to any method with a void return type. The presence of this attribute tells the compiler to conditionally omit calls to the method unless symbolis defined in the calling code. This is similar to wrapping every call to the method with#ifand #endif preprocessor directives, but Conditional has the advantage of needing to be specified only in one place. Conditional can be found in the System.Diagnostics namespace.

Obsolete attribute

Syntax:

[Obsolete(message[, true | false]] (for all attribute targets)

Applied to any valid attribute target, the Obsolete attribute indicates that the target is obsolete. Obsolete can include a message that explains which alternative types or members to use and a flag that tells the compiler to treat the use of this type or member as either a warning or an error.

For example, referencing type Bar in the following example causes the compiler to display an error message and halts compilation:

[Obsolete("Don't try this at home", true)]
class Bar { ... }

CLSCompliant attribute

Syntax:

[CLSCompliant(true|false)] (for all attribute targets)

Applied to an assembly, the CLSCompliant attribute tells the compiler whether to validate CLS compliance for all the exported types in the assembly. Applied to any other attribute target, this attribute allows the target to declare if it should be considered CLS-compliant. In order to mark a target as CLS-compliant, the entire assembly needs to be considered as such.

In the following example, the CLSCompliant attribute is used to specify an assembly as CLS-compliant and a class within it as not CLS-compliant:

[assembly:CLSCompliant(true)]

[CLSCompliant(false)]
public class Bar { 
  public ushort Answer { get {return 42;} } 
}

Serializable attribute

Syntax:

[Serializable] (for classes, structs, enums, delegates)

Applied to a class, struct, enum, or delegate, the Serializable attribute marks it as being serializable. This attribute is a pseudo-custom attribute and is represented specially in the metadata.

NonSerialized attribute

Syntax:

[NonSerialized] (for fields)

Applied to a field, the NonSerialized attribute prevents it from being serialized along with its containing class or struct. This attribute is a pseudo-custom attribute and is represented specially in the metadata.

Defining a New Custom Attribute

In addition to using the predefined attributes supplied by the .NET Framework, you can also create your own.

To create a custom attribute:

  1. Derive a class from System.Attribute or from a descendent of System.Attribute. By convention the class name should end with the word"Attribute,” although this isn’t required.

  2. Provide the class with a public constructor. The parameters to the constructor define the positional parameters of the attribute and are mandatory when specifying the attribute on an element.

  3. Declare public-instance fields, public-instance read/write properties, or public-instance write-only properties to specify the named parameters of the attribute. Unlike positional parameters, these are optional when specifying the attribute on an element.

    The types that can be used for attribute constructor parameters and properties are bool, byte, char, double, float, int, long, short, string, object, the Type type, enum, or a one-dimensional array of the aforementioned types.

  4. Finally, define what the attribute may be specified on using the AttributeUsage attribute, as described in the preceding section.

Consider the following example of a custom attribute, CrossRef-Attribute, which removes the limitation that the CLR metadata contain information about statically linked types but not dynamically linked ones:

// XRef.cs - cross-reference custom attribute
// Compile with: csc /t:library XRef.cs
using System;
[AttributeUsage(AttributeTargets.All, AllowMultiple=true)]
public class CrossRefAttribute : Attribute {
  Type   xref;
  string desc = "";
  public string Description { set { desc=value; } } 
  public CrossRefAttribute(Type xref) { this.xref=xref; }
  public override string ToString(  ) {
    string tmp = (desc.Length>0) ? " ("+desc+")" : "";
    return "CrossRef to "+xref.ToString(  )+tmp;
  }
}

From the attribute user’s perspective, this attribute can be applied to any target multiple times (note the use of the AttributeUsage attribute to control this). CrossRefAttribute takes one mandatory positional parameter (namely the type to cross-reference) and one optional named parameter (the description), and is used as follows:

[CrossRef(typeof(Bar), Description="Foos often hang around Bars")]
class Foo {...}

Essentially, this attribute embeds cross-references to dynamically linked types (with optional descriptions) in the metadata. This information can then be retrieved at runtime by a class browser to present a more complete view of a type’s dependencies.

Retrieving a Custom Attribute at Runtime

Retrieving attributes at runtime is done using reflection via one of System.Attribute’s GetCustomAttribute or GetCustomAttributes overloads. This is one of the few circumstances in which the difference between customattributes and pseudo-custom attributes becomes apparent, since pseudo-custom attributes can’t be retrieved with GetCustomAttribute.

Here is an example that uses reflection to determine which attributes are on a specific type:

using System;
[Serializable, Obsolete]
class Test {
  static void Main(  ) {
    Type t = typeof(Test);
    object[] caarr = Attribute.GetCustomAttributes(t);
    Console.WriteLine("{0} has {1} custom attribute(s)",
                      t, caarr.Length);
    foreach (object ca in caarr)
      Console.WriteLine(ca);
  }
}

Although the Test class of the preceding example has two attributes specified, the sample produces the following output:

Test has 1 custom attribute(s)
System.ObsoleteAttribute

This demonstrates how the Serializable attribute (a pseudo-custom attribute) isn’t accessible via reflection, while the Obsolete attribute (a custom attribute) still is.

Automatic Memory Management

Almost all modern programming languages allocate memory in two places: on the stack and on the heap.

Memory allocated on the stack stores local variables, parameters, and return values, and is generally managed automatically by the operating system.

Memory allocated on the heap, however, is treated differently by different languages. In C and C++, memory allocated on the heap is managed manually. In C# and Java, however, memory allocated on the heap is managed automatically.

While manual memory management has the advantage of being simple for runtimes to implement, it has drawbacks that tend not to exist in systems that offer automaticmemory management. For example, a large percentage of bugs in C and C++ programs stem from using an object after it has been deleted (dangling pointers) or from forgetting to delete an object when it is no longer needed (memory leaks).

The process of automaticallymanaging memory is known as garbage collection. While generally more complex for runtimes to implement than traditional manualmemory management, garbage collection greatly simplifies development and eliminates many common errors related to manualmemory management.

For example, it is almost impossible to generate a traditional memory leak in C#, and common bugs such as circular references in traditional COM development simply go away.

The Garbage Collector

C# depends on the CLR for many of its runtime services, and garbage collection is no exception.

The CLR includes a high-performing generational mark-and-compact garbage collector (GC)that performs automatic memory management for type instances stored on the managed heap.

The GC is considered to be a tracing garbage collector in that it doesn’t interfere with every access to an object, but rather wakes up intermittently and traces the graph of objects stored on the managed heap to determine which objects can be considered garbage and therefore collected.

The GC generally initiates a garbage collection when a memory allocation occurs, and memory is too low to fulfill the request. This process can also be initiated manually using the System.GC type. Initiating a garbage collection freezes all threads in the process to allow the GC time to examine the managed heap.

The GC begins with the set of object references considered roots and walks the object graph, markingall the objects it touches as reachable. Once this process is complete, all objects that have not been marked are considered garbage.

Objects that are considered garbage and don’t have finalizers are immediately discarded, and the memory is reclaimed. Objects that are considered garbage and do have finalizers are flagged for additional asynchronous processing on a separate thread to invoke their Finalize methods before they can be considered garbage and reclaimed at the next collection.

Objects considered still live are then shifted down to the bottom of the heap (compacted ), hopefully freeing space to allow the memory allocation to succeed.

At this point the memory allocation is attempted again, the threads in the process are unfrozen, and either normal processing continues or an OutOfMemoryException is thrown.

Optimization Techniques

Although this may sound like an inefficient process compared to simply managing memory manually, the GC incorporates various optimization techniques to reduce the time an application is frozen waiting for the GC to complete (known as pause time).

The most important of these optimizations is what makes the GC generational. This techniques takes advantage of the fact that while many objects tend to be allocated and discarded rapidly, certain objects are long-lived and thus don’t need to be traced during every collection.

Basically, the GC divides the managed heap into three generations. Objects that have just been allocated are considered to be in Gen0, objects that have survived one collection cycle are considered to be in Gen1, and all other objects are considered to be in Gen2.

When it performs a collection, the GC initially collects only Gen0objects. If not enough memory is reclaimed to fulfill the request, both Gen0 and Gen1 objects are collected, and if that fails as well, a full collection of Gen0, Gen1, and Gen2 objects is attempted.

Many other optimizations are also used to enhance the performance of automatic memory management, and generally, a GC-based application can be expected to approach the performance of an application that uses manual memory management.

Finalizers

When implementing your own types, you can choose to give them finalizers (via C# destructors), which are methods called asynchronously by the GC once an object is determined to be garbage.

Although this is required in certain cases, generally, there are many good technical reasons to avoid the use of finalizers.

As described in the previous section, objects with finalizers incur significant overhead when they are collected, requiring asynchronous invocation of their Finalize methods and taking two full GC cycles for their memory to be reclaimed.

Other reasons not to use finalizers include:

  • Objects with finalizers take longer to allocate on the managed heap than objects without finalizers.

  • Objects with finalizers that refer to other objects (even those without finalizers) can prolong the life of the referred objects unnecessarily.

  • It’s impossible to predict in what order the finalizers for a set of objects will be called.

  • You have limited control over when (or even if!) the finalizer for an object will be called.

In summary, finalizers are somewhat like lawyers: while there are cases when you really need them, generally, you don’t want to use them unless absolutely necessary, and if you do use them, you need to be 100% sure you understand what they are doing for you.

If you have to implement a finalizer, follow these guidelines or have a very good reason for not doing so:

  • Ensure that your finalizer executes quickly.

  • Never block in your finalizer.

  • Free any unmanaged resources you own.

  • Don’t reference any other objects.

  • Don’t throw any unhandled exceptions.

Dispose and Close Methods

It is generally desirable to explicitly call clean-up code once you have determined that an object will no longer be used. Microsoft recommends that you write a method named either Dispose or Close (depending on the semantics of the type) to perform the cleanup required. If you also have a destructor, include a special call to the static SuppressFinalize method on the System.GC type to indicate that the destructor no longer needs to be called. Typically, the real destructoris written to call the Dispose/Close method, as follows:

using System;
public class Worker : IDisposable {
  bool disposed = false;
  int id;
  public Worker(int id) {
    this.id=id;
  }
  // ...
  protected virtual void Dispose(bool disposing) {
    if (!this.disposed) { // don't dispose more than once
      if (disposing) {
        // disposing==true means you're not in the finalizer, so 
        // you can reference other objects here
        Console.WriteLine("#{0}: OK to clean up other objects.", id);
      }
      Console.WriteLine("#{0}: disposing.", id);
      // Perform normal cleanup
    }
    this.disposed = true;
  }
  public void Dispose(  ) {
    Dispose(true); 
    // Mark this object finalized
    GC.SuppressFinalize(this); // no need to destruct this instance 
  }
  ~Worker(  ) {
    Dispose(  false);
    Console.WriteLine("#{0}: destructing.", id);
  }
  public static void Main(  ) {
    // create a worker and call Dispose when we're done.
    using(Worker w1 = new Worker(1)) {
      // ...
    }
    // create a worker that will get cleaned up when the CLR
    // gets around to it.
    Worker w2 = new Worker(2);
  }
}

If you run this code, you will see that Worker 1 is never finalized, since its Dispose( ) method is implicitly called (the using block guarantees this). Worker 2 is finalized and disposed when the CLR gets around to it, but it’s never given a chance to clean up other objects. The disposable pattern gives you a way to close or dispose of any external objects you might be using, such as an I/O stream:

#1: OK to clean up other objects.
#1: disposing.
#2: disposing.
#2: destructing.

Interop with Native DLLs

PInvoke, short for Platform Invocation Services, lets C# access functions, structs, and callbacks in unmanaged DLLs. For example, perhaps you wish to call the MessageBox function in the Windows user32.dll:

int MessageBox(HWND hWnd, LPCTSTR lpText, 
               LPCTSTR lpCation, UINT uType);

To call this function, write a static extern method decorated with the DllImportattribute:

using System.Runtime.InteropServices;
class MsgBoxTest {
  [DllImport("user32.dll")]
  static extern int MessageBox(int hWnd, string text, 
                               string caption, int type);
  public static void Main(  ) {
    MessageBox(0, "Please do not press this button again.",
                  "Attention", 0);
  }
}

PInvoke then finds and loads the required Win32 DLLs and resolves the entry point of the requested function. The CLR includes a marshaler that knows how to convert parameters and return values between .NET types and unmanaged types. In this example the int parameters translate directly to four-byte integers that the function expects, and the string parameters are converted to null-terminated arrays of characters using one-byte ANSI characters under Win9x or two-byte Unicode characters under Windows NT, 2000, and XP.

Marshaling Common Types

The CLR marshaler is a .NET facility that knows about the core types used by COM and the Windows API and provides default translations to CLR types for you. The bool type, for instance, can be translated into a two-byte Windows BOOL type or a four-byte Boolean type. You can override a default translation using the MarshalAs attribute:

using System.Runtime.InteropServices;
static extern int Foo([MarshalAs(UnmanagedType.LPStr)]
                      string s);

In this case, the marshaler was told to use LPStr, so it will always use ANSI characters. Array classes and the StringBuilder class will copy the marshaled value from an external function back to the managed value, as follows:

using System;
using System.Text;
using System.Runtime.InteropServices;
class Test {
  [DllImport("kernel32.dll")]
  static extern int GetWindowsDirectory(StringBuilder sb,
                                        int maxChars);
   static void Main(  ) {
      StringBuilder s = new StringBuilder(256);
      GetWindowsDirectory(s, 256);
      Console.WriteLine(s);
   }
}

Marshaling Classes and Structs

Passing a class or struct to a C function requires marking the struct or class with the StructLayout attribute:

using System;
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Sequential)]
struct SystemTime {
   public ushort wYear; 
   public ushort wMonth;
   public ushort wDayOfWeek; 
   public ushort wDay; 
   public ushort wHour; 
   public ushort wMinute; 
   public ushort wSecond; 
   public ushort wMilliseconds; 
}
class Test {
   [DllImport("kernel32.dll")]
   static extern void GetSystemTime(ref SystemTime t);
   static void Main(  ) {
      SystemTime t = new SystemTime(  );
      GetSystemTime(ref t);
      Console.WriteLine(t.wYear);
   }
}

In both C and C#, fields in an object are located at n number of bytes from the address of that object. The difference is that a C# program finds this offset by looking it up using the field name; C field names are compiled directly into offsets. For instance, in C, wDay is just a token to represent whatever is at the address of a SystemTime instance plus 24 bytes.

For access speed and future widening of a datatype, these offsets are usually in multiples of a minimum width, called the pack size. For .NET types, the pack size is usually set at the discretion of the runtime, but by using the StructLayout attribute, field offsets can be controlled. The default pack size when using this attribute is 8 bytes, but it can be set to 1, 2, 4, 8, or 16 bytes (pass Pack=packsize to the StructLayout constructor), and there are also explicit options to control individual field offsets. This lets a .NET type be passed to a C function.

In and Out Marshaling

The previous Test example works if SystemTime is a struct and t is a ref parameter, but is actually less efficient:

struct SystemTime {...}
static extern void GetSystemTime(ref SystemTime t);

This is because the marshaler must always create fresh values for external parameters, so the previous method copies t when going in to the function and then copies the marshaled t when coming out of the function. By default, pass-by-value parameters are copied in, C# ref parameters are copied in/out, and C# out parameters are copied out, but there are exceptions for the types that have custom conversions. For instance, array classes and the StringBuilder class require copying when coming out of a function, so they arein/out. It is occasionally useful to override this behavior, with the in and out attributes. For example, if an array should be read-only, the in modifier indicates to copy only the array going into the function, and not the one coming out of it:

static extern void Foo([in] int[] array);

Callbacks from Unmanaged Code

C# can not only call C functions but can also be called by C functions, using callbacks. In C# a delegate type is used in place of a function pointer:

class Test {
   delegate bool CallBack(int hWnd, int lParam);
   [DllImport("user32.dll")]
   static extern int EnumWindows(CallBack hWnd, int lParam);
   static bool PrintWindow(int hWnd, int lParam) {
      Console.WriteLine(hWnd);
      return true;
   }
   static void Main(  ) {
      CallBack e = new CallBack(PrintWindow);
      EnumWindows(e, 0);
   }
}

Predefined Interop Support Attributes

The FCL provides a set of attributes you can use to mark up your objects with information used by the CLR marshaling services to alter their default marshaling behavior.

This section describes the most common attributes you will need when interoperating with native Win32 DLLs. These attributes all exist in the System.Runtime.InteropServices namespace.

DllImport attribute

Syntax:

[DllImport (dll-name
  [, EntryPoint=function-name]?
  [, CharSet=charset-enum]?
  [, SetLastError=true|false]?
  [, ExactSpelling=true|false]?
  [, PreserveSig=true|false]?
  [, CallingConvention=callconv-enum]?
)] (for methods)

The DllImport attribute annotates an external function that defines a DLL entry point. The parameters for this attribute are:

dll-name

A string specifying the name of the DLL.

function-name

A string specifying the function name in the DLL. This is useful if you want the name of your C# function to be different from the name of the DLL function.

charset-enum

A CharSet enum, specifying how to marshal strings. The default value is CharSet.Auto, which converts strings to ANSI characters on Win9x and Unicode characters on Windows NT, 2000, and XP.

SetLastError

If true, preserves the Win32 error info. The default is false.

ExactSpelling

If true, the EntryPoint must exactly match the function. If false, name-matching heuristics are used. The default is false.

PreserveSig

If true, the method signature is preserved exactly as it was defined. If false, an HRESULT transformation is performed.

callconv-enum

A CallingConvention enum, specifying the mode to use with the EntryPoint. The default is StdCall.

StructLayout attribute

Syntax:

[StructLayout(layout-enum
  [, Pack=packing-size]?
  [, CharSet=charset-enum]?
  [, Size=absolute-size])?
] (for classes, structs)

The StructLayout attribute specifies how the data members of a class or struct should be laid out in memory. Although this attribute is commonly used when declaring structures that are passed to or returned from native DLLs, it can also define data structures suited to file and network I/O. The parameters for this attribute are:

layout-enum

A LayoutKind enum, which can be 1) sequential, which lays out fields one after the next with a minimum pack size; 2) union, which makes all fields have an offset of 0, so long as they are value types; or 3) explicit, which lets each field have a custom offset.

packing-size

An int specifying whether the packing size is 1, 2, 4, 8, or 16 bytes. The default value is 8.

charset-enum

A CharSet enum, specifying how to marshal strings. The default value is CharSet.Auto, which converts strings to ANSI characters on Win9x and Unicode characters on Windows NT, 2000, and XP.

absolute-size

Specifies the size of the struct or class. This has to be at least as large as the sum of all the members.

FieldOffset attribute

Syntax:

[FieldOffset (byte-offset)] (for fields)

The FieldOffset attribute is used within a class or struct that has explicit field layout. This attribute can be applied to a field and specifies the field offset in bytes from the start of the class or struct. Note that these offsets don’t have to be strictly increasing and can overlap, thus creating a union data structure.

MarshalAs attribute

Syntax:

[MarshalAs(unmanaged-type
  [, named-parameters])?
] (for fields, parameters, return values)

The MarshalAs attribute overrides the default marshaling behavior that the marshaler applies to a parameter or field. The unmanaged-type value is taken from the UnmanagedType enum; see the following list for the permissible values:

Bool
LPStr
VBByRefStr
I1
LPWStr
AnsiBStr
U1
LPTStr
TBStr
I2
ByValTStr
VariantBool
U2
IUnknown
FunctionPtr
I4
IDispatch
LPVoid
U4
Struct
AsAny
I8
Interface
RPrecise
U8
SafeArray
LPArray
R4
ByValArray
LPStruct
R8
SysInt
CustomMarshaler
BStr
SysUInt
NativeTypeMax
Error
Currency

For a detailed description of how and when to use each of these enum values, as well as other legal named-parameters, see the .NET Framework SDK documentation.

In attribute

Syntax:

[In] (for parameters)

The In attribute specifies that data should be marshaled into the caller and can be combined with the Out attribute.

Out attribute

Syntax:

[Out] (for parameters)

The Out attribute specifies that data should be marshaled out from the called method to the caller and can be combined with the In attribute.

Interop with COM

The CLR provides support both for exposing C# objects as COM objects and for using COM objects from C#.

Binding COM and C# Objects

Interoperating between COM and C# works through either early or late binding. Early binding allows you to program with types known at compile time, while late binding forces you to program with types via dynamic discovery, using reflection on the C# side and IDispatch on the COM side.

When calling COM programs from C#, early binding works by providing metadata in the form of an assembly for the COM object and its interfaces. TlbImp.exe takes a COM type library and generates the equivalent metadata in an assembly. With the generated assembly, it’s possible to instantiate and call methods on a COM object just as you would on any other C# object.

When calling C# programs from COM, early binding works via a type library. Both TlbExp.exe and RegAsm.exe allow you to generate a COM type library from your assembly. You can then use this type library with tools that support early binding via type libraries such as Visual Basic 6.

Exposing COM Objects to C#

When you instantiate a COM object you are actually working with a proxy known as the Runtime Callable Wrapper (RCW). The RCW is responsible for managing the lifetime requirements of the COM object and translating the methods called on it into the appropriate calls on the COM object. When the garbage collector finalizes the RCW, it releases all references to the object it was holding. For situations in which you need to release the COM object without waiting for the garbage collector to finalize the RCW, you can use the static ReleaseComObject method of the System.Runtime.InteropServices.Marshal type.

The following example demonstrates changing the friendly name of the user with MSN Instant Messenger from C# via COM Interop:

// SetFN.cs - compile with /r:Messenger.dll
// Run SetFN.exe <Name> to set the FriendlyName for
//   the currently logged-in user
// Run TlbImp.exe "C:Program FilesMessengermsmsgs.exe"
//   to create Messenger.dll
using Messenger; // COM API for MSN Instant Messenger
public class MyApp {
 public static void Main(string[] args) {
    MsgrObject mo = new MsgrObject(  );
    IMsgrService im = mo.Services.PrimaryService;
    im.FriendlyName = args[0];
    }
}

Exposing C# Objects to COM

Just as an RCW proxy wraps a COM object when you access it from C#, code that accesses a C# object as a COM object must do so through a proxy as well. When your C# object is marshaled out to COM, the runtime creates a COM Callable Wrapper (CCW). The CCW follows the same lifetime rules as other COM objects, and as long as it is alive, a CCW maintains a traceable reference to the object it wraps, which keeps the object alive when the garbage collector is run.

The following example shows how you can export both a class and an interface from C# and control the assigned Global Unique Identifiers (GUIDs) and Dispatch IDs (DISPIDs). After compiling IRunInfo and StackSnapshot you can register both using RegAsm.exe.

// IRunInfo.cs
// Compile with:
// csc /t:library IRunInfo.cs
using System;
using System.Runtime.InteropServices;
[GuidAttribute("aa6b10a2-dc4f-4a24-ae5e-90362c2142c1")]
public interface IRunInfo {
  [DispId(1)]
  string GetRunInfo(  );
}

// StackSnapshot.cs
// compile with: csc /t:library /r:IRunInfo.dll StackSnapshot.cs
using System;
using System.Runtime.InteropServices;
using System.Diagnostics;
[GuidAttribute("b72ccf55-88cc-4657-8577-72bd0ff767bc")]
public class StackSnapshot : IRunInfo {
  public StackSnapshot(  ) {
    st = new StackTrace(  );
  }
  [DispId(1)]
  public string GetRunInfo(  ) {
    return st.ToString(  );
  }
  private StackTrace st;
}

COM Mapping in C#

When you use a COM object from C#, the RCW makes a COM method look like a normal C# instance method. In COM, methods normally return an HRESULT to indicate success or failure and use an out parameter to return a value. In C#, however, methods normally return their result values and use exceptions to report errors. The RCW handles this by checking the HRESULT returned from the call to a COM method and throwing a C# exception when it finds a failure result. With a success result, the RCW returns the parameter marked as the return value in the COM method signature.

Tip

For more information on the argument modifiers and default mappings from COM type library types to C# types, see Appendix D.

Common COM Interop Support Attributes

The FCL provides a set of attributes you can use to mark up your objects with information needed by the CLR interop services to expose managed types to the unmanaged world as COM objects.

This section describes the most common attributes you will use for this purpose. These attributes all exist in the System.Runtime.InteropServices namespace.

ComVisible attribute

Syntax:

[ComVisible(true|false)] (for assemblies, classes, structs, enums, interfaces, delegates)

When generating a type library, all public types in an assembly are exported by default. The ComVisible attribute specifies that particular public types (or even the entire assembly) should not be exposed.

DispId attribute

Syntax:

[DispId(dispatch-id)] (for methods, properties, fields)

The DispId attribute specifies the DispID assigned to a method, field, or property for access via an IDispatch interface.

ProgId attribute

Syntax:

[ProgId(progid)] (for classes)

The ProgId attribute specifies the COM ProgID to be used for your class.

Guid attribute

Syntax:

[GuidAttribute(guid)] (for assemblies, modules, classes, structs, enums, interfaces, delegates)

The Guid attribute specifies the COM GUID to be used for your class or interface. This attribute should be specified using its full type name to avoid clashes with the Guid type.

InterfaceType attribute

Syntax:

[InterfaceType(ComInterfaceType)] (for interfaces)

By default, interfaces are generated as dual interfaces in the type library, but you can use this attribute to use one of the three COM interface types (dual, dispatch, or a traditional IUnknown-derived interface).

ComRegisterFunction attribute

Syntax:

[ComRegisterFunction] (for methods)

Requests that RegAsm.execall a method during the process of registering your assembly. If you use this attribute, you must also specify an unregistration method that reverses all the changes you made in the registration function. Use the ComUnregisterFunction attribute to mark that method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset