Memory Management

This section looks at one of the larger underlying elements of managed code. One of the reasons why .NET applications are referred to as “managed” is that memory deallocation is handled automatically by the system. Developers are accustomed to worrying about memory management only in an abstract sense. The basic rule has been that every object created and every section of memory allocated needs to be released (destroyed). The CLR introduces a garbage collector (GC), which simplifies this paradigm. Gone are the days when a misbehaving component—for example, one that fails to properly dispose of its object references or allocates and never releases memory—could crash a machine.

However, the use of a GC introduces new questions about when and if objects need to be explicitly cleaned up. There are two elements in manually writing code to allocate and deallocate memory and system resources. The first is the release of any shared resources, such as file handles and database connections. This type of activity needs to be managed explicitly and is discussed shortly. The second element of manual memory management involves letting the system know when memory is no longer in use by your application. Developers in unmanaged languages like C are accustomed to explicitly disposing of memory references. While you can explicitly show your intent to destroy the object by setting it to Nothing manually, this is not good practice.

The .NET GC automatically manages the cleanup of allocated memory. You don't need to carry out memory management as an explicit action. The GC will reclaim objects, at times in the middle of executing the code in a method. Fortunately, the system ensures that collection happens only as long as your code doesn't reference the object later in the method.

For example, you could actually end up extending the amount of time an object is kept in memory just by setting that object to Nothing. Thus, setting a variable to Nothing at the end of the method prevents the garbage collection mechanism from proactively reclaiming objects, and therefore is discouraged.

Given this change in paradigms, the next few sections look at the challenges of traditional memory management and peek under the covers to reveal how the garbage collector works, including a quick look at how the GC eliminates these challenges from your list of concerns. In particular, you should understand how you can interact with the garbage collector and why the Using command, for example, is recommended over a finalization method in .NET.

Traditional Garbage Collection

An unmanaged runtime environment provides limited memory management. Once all the references are released on an object, the runtime automatically releases the memory. However, in some situations objects that are no longer referenced by an application are not properly cleaned up. One cause of this is the circular reference.

Circular References

One of the most common situations in which the unmanaged runtime is unable to ensure that objects are no longer referenced by the application is when these objects contain a circular reference. An example of a circular reference is when object A holds a reference to object B and object B holds a reference to object A.

Circular references are problematic because the unmanaged environment typically relies on a reference counting mechanism to determine whether an object can be deactivated. Each object may be responsible for maintaining its own reference count and for destroying itself once the reference count reaches zero. Clients of the object are responsible for updating the reference count appropriately.

However, in a circular reference scenario, object A continues to hold a reference to object B, and vice versa, so the internal cleanup logic of these components is never triggered. In addition, problems can occur if the clients do not properly maintain the object's reference count.

The application can invalidate its references to A and B by setting the associated variables equal to Nothing. However, even though objects A and B are no longer referenced by the application, the unmanaged runtime isn't notified to remove them because A and B still reference each other.

The CLR garbage collector solves the problem of circular references because it looks for a reference from the root application or thread to every class, and all classes that do not have such a reference are marked for deletion, regardless of any other references they might still maintain.

The CLR's Garbage Collector

The .NET garbage collection mechanism is complex, and the details of its inner workings are beyond the scope of this book, but it is important to understand the principles behind its operation. The GC is responsible for collecting objects that are no longer referenced.

At certain times, and based on internal rules, a task will run through all the objects looking for those that no longer have any references from the root application thread or one of the worker threads. Those objects may then be terminated; thus, the garbage is collected.

As long as all references to an object are either implicitly or explicitly released by the application, the GC will take care of freeing the memory allocated to it. Managed objects in .NET are not responsible for maintaining their reference count, and they are not responsible for destroying themselves. Instead, the GC is responsible for cleaning up objects that are no longer referenced by the application. The GC periodically determines which objects need to be cleaned up by leveraging the information the CLR maintains about the running application. The GC obtains a list of objects that are directly referenced by the application. Then, the GC discovers all the objects that are referenced (both directly and indirectly) by the “root” objects of the application. Once the GC has identified all the referenced objects, it is free to clean up any remaining objects.

The GC relies on references from an application to objects; thus, when it locates an object that is unreachable from any of the root objects, it can clean up that object. Any other references to that object will be from other objects that are also unreachable. Thus, the GC automatically cleans up objects that contain circular references.

Note just because you eliminate all references to an object doesn't mean that it will be terminated immediately. It just remains in memory until the garbage collection process gets around to locating and destroying it, a process called nondeterministic finalization.

This nondeterministic nature of CLR garbage collection provides a performance benefit. Rather than expend the effort to destroy objects as they are dereferenced, the destruction process can occur when the application is otherwise idle, often decreasing the impact on the user

It is possible to explicitly invoke the GC, but this process takes time and bypasses the automated optimization built into the CLR, so it is not the sort of behavior to invoke in a typical application. For example, you could call this method each time you set an object variable to Nothing, so that the object would be destroyed almost immediately, but this forces the GC to scan all the objects in your application — a very expensive operation in terms of performance.

It's far better to design applications such that it is acceptable for unused objects to sit in memory for some time before they are terminated. That way, the garbage collector can also run based on its optimal rules, collecting many dereferenced objects at the same time. This means you need to design objects that don't maintain expensive resources in instance variables. For example, database connections, open files on disk, and large chunks of memory (such as an image) are all examples of expensive resources. If you rely on the destruction of the object to release this type of resource, then the system might be keeping the resource tied up for a lot longer than you expect; in fact, on a lightly utilized Web server, it could literally be days.

The first principle is working with object patterns that incorporate cleaning up such pending references before the object is released. Examples of this include calling the close method on an open database connection or a file handle. In most cases, it's possible for applications to create classes that do not risk keeping these handles open. However, certain requirements, even with the best object design, can create a risk that a key resource will not be cleaned up correctly. In such an event, there are two occasions when the object could attempt to perform this cleanup: when the final reference to the object is released and immediately before the GC destroys the object.

One option is to implement the IDisposable interface. When implemented, this interface ensures that persistent resources are released. This is the preferred method for releasing resources. The second option is to add a method to your class that the system runs immediately before an object is destroyed. This option is not recommended for several reasons, including the fact that many developers fail to remember that the garbage collector is nondeterministic, meaning that you can't, for example, reference an SQLConnection object from your custom object's finalizer.

Finally, as part of .NET 2.0, Visual Basic introduced the Using command. The Using command is designed to change the way that you think about object cleanup. Instead of encapsulating your cleanup logic within your object, the Using command creates a window around the code that is referencing an instance of your object. When your application's execution reaches the end of this window, the system automatically calls the IDIsposable interface for your object to ensure that it is cleaned up correctly.

The Finalize Method

Conceptually, the GC calls an object's Finalize method immediately before it collects an object that is no longer referenced by the application. Classes can override the Finalize method to perform any necessary cleanup. The basic concept is to create a method that fills the same need as what in other object-oriented languages is referred to as a destructor. A Finalize method is recognized by the GC, and its presence prevents a class from being cleaned up until after the finalization method is completed. The following example shows the declaration of a finalization method:

Protected Overrides Sub Finalize()
  ' clean up code goes here
  MyBase.Finalize()
End Sub

This code uses both Protected scope and the Overrides keyword. Notice that not only does custom cleanup code go here (as indicated by the comment), but this method also calls MyBase.Finalize, which causes any finalization logic in the base class to be executed as well. Any class implementing a custom Finalize method should always call the base finalization class.

Be careful, however, not to treat the Finalize method as if it were a destructor. A destructor is based on a deterministic system, whereby the method is called when the object's last reference is removed. In the GC system, there are key differences in how a finalizer works:

  • Because the GC is optimized to clean up memory only when necessary, there is a delay between the time when the object is no longer referenced by the application and when the GC collects it. Therefore, the same expensive resources that are released in the Finalize method may stay open longer than they need to be.
  • The GC doesn't actually run Finalize methods. When the GC finds a Finalize method, it queues the object up for the finalizer to execute the object's method. This means that an object is not cleaned up during the current GC pass. Because of how the GC is optimized, this can result in the object remaining in memory for a much longer period.
  • The GC is usually triggered when available memory is running low. As a result, execution of the object's Finalize method is likely to incur performance penalties. Therefore, the code in the Finalize method should be as short and quick as possible.
  • There's no guarantee that a service you require is still available. For example, if the system is closing and you have a file open, then .NET may have already unloaded the object required to close the file, and thus a Finalize method can't reference an instance of any other .NET object.

The IDisposable Interface

In some cases, the Finalize behavior is not acceptable. For an object that is using an expensive or a limited resource, such as a database connection, a file handle, or a system lock, it is best to ensure that the resource is freed as soon as the object is no longer needed.

One way to accomplish this is to implement a method to be called by the client code to force the object to clean up and release its resources. This is not a perfect solution, but it is workable. This cleanup method must be called directly by the code using the object or via the use of the Using statement. The Using statement enables you to encapsulate an object's life span within a limited range, and automate the calling of the IDisposable interface.

The .NET Framework provides the IDisposable interface to formalize the declaration of cleanup logic. Be aware that implementing the IDisposable interface also implies that the object has overridden the Finalize method. Because there is no guarantee that the Dispose method will be called, it is critical that Finalize triggers your cleanup code if it was not already executed.

All cleanup activities should be placed in the Finalize method, but objects that require timely cleanup should implement a Dispose method that can then be called by the client application just before setting the reference to Nothing:

Class DemoDispose

  Private m_disposed As Boolean = False

  Public Sub Dispose()
    If (Not m_disposed) Then
      ' Call cleanup code in Finalize.
      Finalize()
      ' Record that object has been disposed.
      m_disposed = True
      ' Finalize does not need to be called.
      GC.SuppressFinalize(Me)
    End If
  End Sub

  Protected Overrides Sub Finalize()
    ' Perform cleanup here
    End Sub
End Class

The DemoDispose class implements a Finalize method but uses a Dispose method that calls Finalize to perform any necessary cleanup. To ensure that the Dispose method calls Finalize only once, the value of the private m_disposed field is checked. Once Finalize has been run, this value is set to True. The class then calls GC.SuppressFinalize to ensure that the GC does not call the Finalize method on this object when the object is collected. If you need to implement a Finalize method, this is the preferred implementation pattern.

Having a custom finalizer ensures that once released, the garbage collection mechanism will eventually find and terminate the object by running its Finalize method. However, when handled correctly, the IDisposable interface ensures that any cleanup is executed immediately, so resources are not consumed beyond the time they are needed.

Note that any class that derives from System.ComponentModel.Component automatically inherits the IDisposable interface. This includes all of the forms and controls used in a Windows Forms UI, as well as various other classes within the .NET Framework. Because this interface is inherited, you will review a custom implementation of the IDisposable interface. You can leverage the code download or preferably create a new Windows Console project. Add a new class to your project and name it Person. Once your new class has been generated, go to the editor window and below the class declaration add the code to implement the IDisposable interface:

Public Class Person
   Implements IDisposable

This interface defines two methods, Dispose and Finalize, that need to be implemented in the class. However, what's important is that Visual Studio automatically inserts both these methods into your code (code file: ProVB_DisposablePerson.vb):

#Region " IDisposable Support "
    Private disposedValue As Boolean ' To detect redundant calls
        
    ' IDisposable
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If Not Me.disposedValue Then
            If disposing Then
                ' TODO: dispose managed state (managed objects).
            End If
            ' TODO: free unmanaged resources (unmanaged objects)
            '   and override Finalize() below.
            ' TODO: set large fields to null.
        End If
        Me.disposedValue = True
    End Sub
        
    ' TODO: override Finalize() only if Dispose(ByVal disposing As Boolean) above
    '       has code to free unmanaged resources.
    Protected Overrides Sub Finalize()
        ' Do not change this code.  Put cleanup code in
        '     Dispose(ByVal disposing As Boolean) above.
        Dispose(False)
        MyBase.Finalize()
    End Sub
        
    ' This code added by Visual Basic to correctly implement the disposable
 pattern.
    Public Sub Dispose() Implements IDisposable.Dispose
        ' Do not change this code.  Put cleanup code in
        '     Dispose(ByVal disposing As Boolean) above.
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub
#End Region

Notice the use of the Overridable and Overrides keywords. The automatically inserted code is following a best-practice design pattern for implementation of the IDisposable interface and the Finalize method. The idea is to centralize all cleanup code into a single method that is called by either the Dispose method or the Finalize method as appropriate.

Accordingly, you can add the cleanup code as noted by the TODO: comments in the inserted code. As mentioned in Chapter 1, the TODO: keyword is recognized by Visual Studio's text parser, which triggers an entry in the task list to remind you to complete this code before the project is complete.

Generally, it is up to your client code to call the dispose method at the appropriate time to ensure that cleanup occurs. Typically, this should be done as soon as the code is done using the object.

This is not always as easy as it might sound. In particular, an object may be referenced by more than one variable, and just because code in one class is done with the object doesn't mean that it isn't referenced by other variables. If the Dispose method is called while other references remain, then the object may become unusable and cause errors when invoked via those other references.

Using IDisposable

One way to work with the IDisposable interface is to manually insert the calls to the interface implementation everywhere you reference the class (code file: ProVB_DisposableModule1.vb):

    CType(mPerson, IDisposable).Dispose()

Note that because the Dispose method is part of a secondary interface, use of the CType method to access that specific interface is needed in order to call the method.

This solution works fine for patterns where the object implementing IDisposable is used within a method, but it is less useful for other patterns—for example, an open database connection passed between methods or when the object is used as part of a Web service. In fact, even for client applications, this pattern is somewhat limited in that it requires the application to define the object globally with respect to its use.

For these situations, .NET 2.0 introduced a new command keyword: Using. The Using keyword is a way to quickly encapsulate the life cycle of an object that implements IDisposable, and ensure that the Dispose method is called correctly (code file: ProVB_DisposableModule1.vb):

        Using mPerson As Person = New Person
            'Use the mPerson in custom method calls

        End Using

The preceding statements allocate a new instance of the mPerson object. The Using command then instructs the compiler to automatically clean up this object's instance when the End Using command is executed. The result is a much cleaner way to ensure that the IDisposable interface is called.

Faster Memory Allocation for Objects

The CLR introduces the concept of a managed heap. Objects are allocated on the managed heap, and the CLR is responsible for controlling access to these objects in a type-safe manner. One of the advantages of the managed heap is that memory allocations on it are very efficient. When unmanaged code allocates memory on the unmanaged heap, it typically scans through some sort of data structure in search of a free chunk of memory that is large enough to accommodate the allocation.

The managed heap maintains a reference to the end of the most recent heap allocation. When a new object needs to be created on the heap, the CLR allocates memory starting from the end of the heap, and then increments the reference to the end of heap allocations accordingly. Figure 2.5 illustrates a simplification of what takes place in the managed heap for .NET.

Figure 2.5 Memory Map State diagram

2.5
  • State 1—A compressed memory heap with a reference to the endpoint on the heap.
  • State 2—Object B, although no longer referenced, remains in its current memory location. The memory has not been freed and does not alter the allocation of memory or other objects on the heap.
  • State 3—Even though there is now a gap between the memory allocated for object A and object C, the memory allocation for object D still occurs on the top of the heap. The unused fragment of memory on the managed heap is ignored at allocation time.
  • State 4—After one or more allocations, before there is an allocation failure, the garbage collector runs. It reclaims the memory that was allocated to object B and repositions the remaining valid objects. This compresses the active objects to the bottom of the heap, creating more space for additional object allocations (refer to Figure 2.5).

This is where the power of the GC really shines. Before the CLR reaches a point where it is unable to allocate memory on the managed heap, the GC is invoked. The GC not only collects objects that are no longer referenced by the application, but also has a second task: compacting the heap. This is important, because if the GC only cleaned up objects, then the heap would become progressively more fragmented. When heap memory becomes fragmented, you can wind up with the common problem of having a memory allocation fail—not because there isn't enough free memory, but because there isn't enough free memory in a contiguous section of memory. Thus, not only does the GC reclaim the memory associated with objects that are no longer referenced, it also compacts the remaining objects. The GC effectively squeezes out all of the spaces between the remaining objects, freeing up a large section of managed heap for new object allocations.

Garbage Collector Optimizations

The GC uses a concept known as generations, the primary purpose of which is to improve its performance. The theory behind generations is that objects that have been recently created tend to have a higher probability of being garbage-collected than objects that have existed on the system for a longer time.

To understand generations, consider the analogy of a mall parking lot where cars represent objects created by the CLR. People have different shopping patterns when they visit the mall. Some people spend a good portion of their day in the mall, and others stop only long enough to pick up an item or two. Applying the theory of generations to trying to find an empty parking space for a car yields a scenario in which the highest probability of finding a parking space is a function of where other cars have recently parked. In other words, a space that was occupied recently is more likely to be held by someone who just needed to quickly pick up an item or two. The longer a car has been parked, the higher the probability that its owner is an all-day shopper and the lower the probability that the parking space will be freed up anytime soon.

Generations provide a means for the GC to identify recently created objects versus long-lived objects. An object's generation is basically a counter that indicates how many times it has successfully avoided garbage collection. An object's generation counter starts at zero and can have a maximum value of two, after which the object's generation remains at this value regardless of how many times it is checked for collection.

You can put this to the test with a simple Visual Basic application. From the File menu, select either File ⇒ New ⇒ Project, or open the sample from the code download. Select a console application, provide a name and directory for your new project, and click OK. Within the Main module, add the following code snippet (code file: ProVB_MemoryModule1.vb):

Module Module1
  Sub Main()
    Dim myObject As Object = New Object()
    Dim i As Integer
      For i = 0 To 3
        Console.WriteLine(String.Format("Generation = {0}", _
                          GC.GetGeneration(myObject)))
        GC.Collect()
        GC.WaitForPendingFinalizers()
    Next i
    Console.Read()
  End Sub
End Module

This code sends its output to the .NET console. For a Windows application, this console defaults to the Visual Studio Output window. When you run this code, it creates an instance of an object and then iterates through a loop four times. For each loop, it displays the current generation count of myObject and then calls the GC. The GC.WaitForPendingFinalizers method blocks execution until the garbage collection has been completed.

As shown in Figure 2.6, each time the GC was run, the generation counter was incremented for myObject, up to a maximum of 2.

Figure 2.6 Progression of generations

2.6

Each time the GC is run, the managed heap is compacted, and the reference to the end of the most recent memory allocation is updated. After compaction, objects of the same generation are grouped together. Generation-2 objects are grouped at the bottom of the managed heap, and generation-1 objects are grouped next. New generation-0 objects are placed on top of the existing allocations, so they are grouped together as well.

This is significant because recently allocated objects have a higher probability of having shorter lives. Because objects on the managed heap are ordered according to generations, the GC can opt to collect newer objects. Running the GC over a limited portion of the heap is quicker than running it over the entire managed heap.

It's also possible to invoke the GC with an overloaded version of the Collect method that accepts a generation number. The GC will then collect all objects no longer referenced by the application that belong to the specified (or younger) generation. The version of the Collect method that accepts no parameters collects objects that belong to all generations.

Another hidden GC optimization results from the fact that a reference to an object may implicitly go out of scope; therefore, it can be collected by the GC. It is difficult to illustrate how the optimization occurs only if there are no additional references to the object and the object does not have a finalizer. However, if an object is declared and used at the top of a module and not referenced again in a method, then in the release mode, the metadata will indicate that the variable is not referenced in the later portion of the code. Once the last reference to the object is made, its logical scope ends; and if the garbage collector runs, the memory for that object, which will no longer be referenced, can be reclaimed before it has gone out of its physical scope.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset