Garbage collection, and memory management in general, will be the first and last things you work on. It is the apparent source of the most obvious performance problems, those that are quickest to fix, and will be something that you need to constantly monitor to keep in check. I say “apparent source” because as we will see, many problems are actually due to an incorrect understanding of the garbage collector’s behavior and expectations. You need to think of memory performance just as much as CPU performance. This is also true for unmanaged code performance, but in .NET it is a little more prominent, as well as easier to deal with. It is so fundamental to smooth .NET operation, that the most significant chunk of this book’s content is dedicated to just this topic.
Many people get very nervous when they think of the overhead garbage collection can cause. Once you understand it, though, it becomes straightforward to optimize your program for its operation. In the Introduction, you saw that the garbage collector can actually give you better overall heap performance in many cases because it deals with allocation and fragmentation better. In many ways, .NET’s memory management strategy, including the garbage collector, can actually be a benefit to your application, not a drawback.
I am covering garbage collection at the beginning of the book because so many of the concepts that come later will relate back to this chapter. Understanding the effect your program has on the garbage collector is so fundamental to achieving good performance, that it affects nearly everything else.
There are significant differences between how typical native heaps work and how the CLR’s garbage collected heaps work. The native heap in Windows maintains free lists to know where to put new allocations. Many long-running native code applications struggle with fragmentation. Time spent in memory allocation gradually increases as the allocator spends more and more time traversing the free lists looking for an open spot. Memory use continues to grow and, inevitably, the process will need to be restarted to begin the cycle anew. Some native programs deal with this by replacing the default implementation of malloc
with custom allocation schemes that work hard to reduce this fragmentation. Windows also provides low-fragmentation heaps, which the CLR uses internally.
In .NET, memory allocation is trivial because it usually happens at the end of a memory segment and is not much more than a few instructions, such as additions, decrements, and a comparison in the normal case. In these simple cases, there are no free lists to traverse and little possibility of fragmentation. GC heaps can actually be more efficient because objects allocated together in time tend to be near one another on the heap, improving locality.
In the default allocation path, a small code stub will check the desired object’s size against the space remaining in a small allocation buffer. As long as the allocation fits, it is extremely fast and has no contention. Once the allocation buffer is exhausted, the GC allocator will take over and find a spot for the object (this may involve the use of free lists). Then a new allocation buffer will be reserved for future allocation requests.
The assembly code for this process is only a handful of instructions and useful to examine.
The C# to demonstrate this is just a simple allocation:
class MyObject {
int x;
int y;
int z;
}
static void Main(string[] args)
{
var x = new MyObject();
}
Here is the breakdown of the calling code for the allocation:
; Copy method table pointer for the class into
; ecx as argument to new()
; You can use !dumpmt to examine this value.
mov ecx,3F3838h
; Call new
call 003e2100
; Copy return value (address of object) into a register
mov edi,eax
Here is the actual allocation:
; NOTE: Most code addresses removed for formatting reasons.
;
; Set eax to value 0x14, the size of the object to
; allocate, which comes from the method table
mov eax,dword ptr [ecx+4] ds:002b:003f383c=00000014
; Put allocation buffer information into edx
mov edx,dword ptr fs:[0E30h]
; edx+40 contains the address of the next available byte
; for allocation. Add that value to the desired size.
add eax,dword ptr [edx+40h]
; Compare the intended allocation against the
; end of the allocation buffer.
cmp eax,dword ptr [edx+44h]
; If we spill over the allocation buffer,
; jump to the slow path
ja 003e211b
; update the pointer to the next free
; byte (0x14 bytes past old value)
mov dword ptr [edx+40h],eax
; Subtract the object size from the pointer to
; get to the start of the new obj
sub eax,dword ptr [ecx+4]
; Put the method table pointer into the
; first 4 bytes of the object.
; eax now points to new object
mov dword ptr [eax],ecx
; Return to caller
ret
; Slow Path - call into CLR method
003e211b jmp clr!JIT_New (71763534)
In summary, this involves one direct method call and only nine instructions in the helper stub. That is hard to beat.
If you are using some configuration options such as server GC, then there is not even contention for the fast or the slow allocation path because there is a heap for every processor. .NET trades the simplicity in the allocation path for more complexity during de-allocation, but you do not have to deal with this complexity directly. You just need to learn how to optimize for it, which is what you will learn how to do in this chapter.
There are some ways to force the allocator to go down the slow path, however. If the allocation buffer is not large enough or the end of the segment has been reached, then the slow path will be called. In addition, if the object being allocated implements a finalizer, then the garbage collector needs to do more bookkeeping to track the object’s lifetime, thus it will call the slow path as well.
The details of how the garbage collector makes decisions are continually being refined, especially as .NET becomes more prevalent in high-performance systems. The following explanation may contain details that will change in upcoming .NET versions, but the overall picture is unlikely to change drastically in the near future.
In a managed process, there are two types of heaps: unmanaged and managed. Unmanaged heaps are allocated with the VirtualAlloc
Windows API and used by the operating system and CLR for unmanaged memory such as that for the Windows API, OS data structures, and even much of the CLR. The CLR allocates all managed .NET objects on the managed heap, also called the GC heap, because the objects on it are subject to garbage collection.
The managed heap is further divided into two types of heaps: the small object heap and the large object heap (LOH). Each one is assigned its own segments, which are blocks of memory belonging to that heap. Both the small object heap and the large object heap can have multiple segments assigned to them. The size of each segment can vary depending on your configuration and hardware platform.
Configuration | 32-bit Segment Size | 64-bit Segment Size |
---|---|---|
Workstation GC | 16 MB | 256 MB |
Server GC | 64 MB | 4 GB |
Server GC with > 4 logical processors | 32 MB | 2 GB |
Server GC with > 8 logical processors | 16 MB | 1 GB |
The small object heap segments are further divided into generations. There are three generations, referenced casually as gen 0, gen 1, and gen 2. Gen 0 and gen 1 are always in the same segment, but gen 2 can span multiple segments, as can the large object heap. The segment that contains gen 0 and gen 1 is called the ephemeral segment.
To start with, the small object heap is made up of one segment and the large object heap is another segment. Gen 2 and gen 1 start off at only a few bytes in size because they are empty so far.
Objects allocated on the small object heap pass through a lifetime process that needs some explanation. The CLR allocates all objects that are less than 85,000 bytes in size on the small object heap. They are always allocated in gen 0, usually at the end of the current used space. This is why allocations in .NET are extremely fast, as seen at the beginning of this chapter. If the fast allocation path fails, then the objects may be placed anywhere they can fit inside gen 0’s boundaries. If it will not fit in an existing spot, then the allocator will expand the current boundaries of gen 0 to accommodate the new object. This expansion occurs at the end of the used space towards the end of the segment. If this pushes past the end of the segment, it may trigger a garbage collection. The existing gen 1 space is untouched.
For small objects (less than 85,000 bytes), objects always begin their life in gen 0. As long as they are still alive, the GC will promote them to subsequent generations each time a collection happens. Garbage collections of gen 0 and gen 1 are sometimes called ephemeral collections.
When a garbage collection occurs, a compaction may occur, in which case the GC physically moves the objects to a new location to free space in the segment. If no compaction occurs, the boundaries are merely redrawn.
The individual objects have not moved, but the boundary lines have.
Compaction may occur in the collection of any generation and this is a relatively expensive process because the GC must fix up all of the references to those objects so they point to the new location, which may require pausing all managed threads. Because of this expense, the garbage collector will only do compaction when it is productive to do so, based on some internal metrics.
Once an object reaches gen 2, it remains there for the remainder of its lifetime. This does not mean that gen 2 grows forever—if the objects in gen 2 finally die off and an entire segment has no live objects, then the garbage collector can return the segment to the operating system or it can just hold on to it for future use. Process working set memory is not guaranteed to drop during a collection.
So what does alive mean? If the GC can reach the object via any of the known GC roots, following the graph of object references, then it is alive. A root can be the static variables in your program, the threads which have the stacks from all running methods (thus references to local variables), strong GC handles (such as pinned handles), and the finalizer queue. Note that you may have objects that no longer have roots to them, but if the objects are in gen 2, then a gen 0 collection will not clean them up. They will have to wait for a full collection.
If gen 0 ever starts to fill up a segment and a collection cannot compact it enough, then the GC will allocate a new segment. The new segment will house a new gen 1 and gen 0 while the old segment is converted to gen 2. Everything from the old generation 0 becomes part of the new generation 1 and the old generation 1 is likewise promoted to generation 2 (which conveniently does not have to be copied).
If gen 2 continues to grow, then it can span multiple segments. The LOH can also span multiple segments. Regardless of how many segments there are, generations 0 and 1 will always exist in the same segment. This knowledge of segments will come in handy later when we are trying to figure out which objects live where on the heap.
The large object heap obeys different rules. Any object that is at least 85,000 bytes in size is allocated on the LOH automatically and does not pass through the generational model—put another way, it is allocated directly in gen 2. The only types of objects that normally exceed this size are arrays and strings. For performance reasons, the LOH is not automatically compacted during collection and is thus easily susceptible to fragmentation. However, starting in .NET 4.5.1, you can compact it on-demand. Like gen 2, if memory in the LOH is no longer needed, then it can be reclaimed for other portions of the heap, but we will see later that ideally you do not want memory on the large object heap to be garbage collected at all.
In the LOH, the garbage collector always uses a free list to determine where to best place allocated objects. We will explore some techniques in this chapter to reduce fragmentation on this heap.
Note If you go poking around at the objects in the LOH in a debugger, you will notice that not only can the entire heap be smaller than 85,000 bytes in size, but that it can also have objects that are smaller than that size allocated on that heap. These objects are usually allocated by the CLR and you can ignore them.
A garbage collection runs for a specific generation and all generations below it. If it collects gen 1, it will also collect gen 0. If it collects gen 2, then all generations are collected, and the large object heap is collected as well. When a gen 0 or gen 1 collection occurs, the program is paused for the duration of the collection. For a gen 2 collection, portions of the collection can occur on a background thread, depending on the configuration options.
There are four phases to a garbage collection:
ret
instruction. Native threads are not suspended and will keep running unless they transition into managed code, at which point they too will be suspended. If you have a lot of threads, a significant portion of garbage collection time can be spent just suspending threads.The mark phase does not actually need to touch every object on the heap; it will only go through the target portion of the heap. For example, a gen 0 collection considers objects only from gen 0, a gen 1 collection will mark objects in both gen 0 and gen 1, and a gen 2, or full, collection, will need to traverse every live object in the heap, making it potentially very expensive.
An additional wrinkle here is that an object in a higher generation may be a root for an object in a lower generation. To track objects across all generations, the GC uses a card table that summarizes the heap with an array of bits that each represent a heap range. The bit is set “dirty” on a memory write in the corresponding range. When a collection happens, the GC will also consider any objects located in a dirty region as roots. This enables the GC to traverse only a subset of objects in the higher generation and it is not as expensive as a full collection for that generation.
There are a couple of important consequences to the behavior described above.
First, the time it takes to do a garbage collection is almost entirely dependent on the number of live objects in the collected generation, not the number of objects you allocated. This means that if you allocate a tree of a million objects, as long as you cut off that root reference before the next GC, those million objects contribute nothing to the amount of time the GC takes.
Second, the frequency of a garbage collection is primarily determined by how much memory is allocated in a specific generation. Once that amount passes an internal threshold, a GC will happen for that generation. The threshold continually changes and the GC adapts to your process’s behavior. If doing a collection on a particular generation is productive (it promotes many objects), then it will happen more frequently, and the converse is true. Another trigger for GCs is the total available memory on a machine, independent of your application. If available memory drops below a certain threshold, garbage collection may happen more frequently in an attempt to reduce the overall heap size.
From this description, it may feel like garbage collections are out of your control. This could not be farther from the truth! Manipulating GC behavior by controlling your memory allocation patterns is usually possible. It requires understanding of how the GC works, your allocation rate, how well you control object lifetimes, and what configuration options are available to you. Let’s take a closer look at those configuration options next.
Now that we have seen how it works conceptually, examine these more detailed heap diagrams, drawn from a debugger session via !eeheap -gc
.
The .NET Framework does not give you very many ways to configure the garbage collector out the box. It is best to think of this as “less rope to hang yourself with.” For the most part, the garbage collector configures and tunes itself based on your hardware configuration, available resources, and application behavior. What few options are provided are for very high-level behaviors, and are mainly determined by the type of program you are developing.
The most important choice you have is whether to use workstation or server garbage collection.
Workstation GC is the default. In this mode, all GCs happen on the same thread that triggered the collection and run at the same priority. For simple apps, especially those that run on interactive workstations where many managed processes run, this makes the most sense. For computers with a single processor, this is the only option and trying to configure anything else will not have any effect.
Server GC creates a dedicated thread for each logical processor or core. These threads run at highest priority (THREAD_PRIORITY_HIGHEST
), but are always kept in a suspended state until a GC is required. All garbage collections happen on these threads, not the application’s threads. After the GC, they sleep again.
In addition, the CLR creates a separate heap for each processor. Within each processor heap, there is a small object heap and a large object heap. From your application’s perspective, this is all logically the same heap—your code does not know which heap objects belong to and object references exist between all the heaps (they all share the same virtual address space).
Having multiple heaps gives a couple of advantages:
There are other internal differences as well such as larger segment sizes, which can mean a longer time between garbage collections.
You configure server GC in the app.config file inside the <runtime>
element:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
Should you use workstation or server GC? If your app is running on a multi-processor machine dedicated to just your application, then the choice is clear: server GC. It will provide the lowest latency collection in most situations. However, server GC also means a much higher working set which means you will get closer to physical memory limits. With more objects in memory, garbage collections may start taking longer, eating away at the advantage.
On the other hand, if you need to share the machine with multiple managed processes, the choice is not so clear. Server GC creates many high-priority threads and if multiple apps do that, they can all negatively affect one other with conflicting thread scheduling. In this case, it might be better to use workstation GC.
If you really want to use server GC in multiple applications on the same machine, another option is to affinitize the competing applications to specific processors. The CLR will create heaps only for the processors which are enabled for that application.
Whichever one you pick, most of the tips in this book apply to both types of collection.
Background GC changes how the garbage collector performs gen 2 collections by allowing it to occur more often in the background while other threads are executing. Gen 0 and gen 1 collections remain foreground GCs that block all application threads from executing.
Background GC works by having a dedicated thread for garbage collecting generation 2. For server GC there will be an additional thread per logical processor, in addition to the one already created for server GC in the first place. Yes, this means if you use server GC and background GC, you will have two threads per processor dedicated to GC, but this is not particularly concerning. It is not a big deal for processes to have many threads, especially when most of them are doing nothing most of the time. One thread is for foreground GC and runs at highest priority, but it is suspended most of the time. The thread for background GC runs at a lower priority concurrently with your application’s threads and will be suspended when the foreground GC threads become active, so that you do not have competing GC modes occurring simultaneously.
If you are using workstation GC, then background GC is always enabled. Starting with .NET 4.5, it is enabled on server GC by default, but you do have the ability to turn it off.
This configuration will turn off the background GC:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
In practice, there should rarely ever be a reason to disable background GC. It will usually cause worse performance and more frequent foreground collections. If you want to prevent these background GC threads from ever taking CPU time from your application, but do not mind a potential increase in full, blocking GC latency or frequency, then you can turn this off. You should measure the impact carefully.
The garbage collector has a number of latency modes, most of them accessed via the GCSettings.LatencyMode
property. The mode should rarely be changed, but the options can be useful at times.
Interactive
is the default GC latency mode when concurrent garbage collection is enabled (which is on by default). This mode allows collections to run in the background.
Batch
mode disables all concurrent garbage collection and forces collections to occur in a single batch. It is intrusive because it forces your program to stop completely during all GCs. It should not regularly be used, especially in programs with a user interface.
There are two low-latency modes you can use for a limited time. If you have periods of time that require critical performance, you can tell the GC not to perform expensive gen 2 collections.
LowLatency
: For workstation GC only, it will suppress gen 2 collections.SustainedLowLatency
: For workstation and server GC, it will suppress full gen 2 collections, but it will allow background gen 2 collections. You must enable background GC for this option to take effect.Both modes will greatly increase the size of the managed heap because compaction will not occur. If your process uses a lot of memory, you should avoid this feature.
Right before entering one of these modes, it is a good idea to force a last full GC by calling GC.Collect(2, GCCollectionMode.Forced)
. Once your code leaves this mode, do another GC.
You should never use either of the low-latency modes by default. It is designed for applications that must run without serious interruptions for a long time, but not 100% of the time. A good example is stock trading. During market hours, you do not want full garbage collections happening. When the market closes, you turn this mode off and perform full GCs until the market reopens.
Only turn on a low-latency mode if all of the following criteria apply:
Finally, starting in .NET 4.6, you can declare regions where garbage collections are disallowed, using the NoGCRegion
mode. This attempts to put the GC in a mode where it will not allow a GC to happen at all. It cannot be set via this property, however. Instead, you must use the TryStartNoGCRegion
method.
There are some significant caveats:
There are a number of overloads of TryStartNoGCRegion
, but the following example demonstrates the one with all of the options:
bool success = GC.TryStartNoGCRegion(
totalSize: 2000000,
lohSize: 1000000,
disallowFullBlockingGC: true);
if (success)
{
try
{
// do allocations
}
finally
{
if (GCSettings.LatencyMode == GCLatencyMode.NoGCRegion)
{
GC.EndNoGCRegion();
}
}
}
The totalSize
parameter is the total number of bytes that you expect to allocate in the region. The lohSize
parameter indicates how many of them you expect to be on the large object heap. The difference between totalSize
and lohSize
is the amount you expect to allocate on the ephemeral heap, and must be less than or equal to the size of the ephemeral heap (which size is given at the beginning of this chapter). By default, if the memory cannot be allocated by the CLR, it will do a full blocking GC to attempt to free some space. The disallowFullBlockingGC
parameter can disable this functionality.
You should only call EndNoGCRegion
if the previous call to TryStartNoGCRegion
succeeded. You cannot nest calls to TryStartNoGCRegion
.
If your memory allocations go over the amount you reserved, the guarantee is no longer honored and a garbage collection could happen.
Note The low-latency or no-GC modes are not absolute guarantees. If the system is running low on memory and the garbage collector has the choice between doing a full collection or throwing an
OutOfMemoryException
, it will perform a full collection regardless of your mode setting.
Alternative latency modes are rarely used and you should think twice about using them because of the potential unintended consequences. If you think it is useful, perform careful measurement to make sure. Tweaking the latency mode may cause other performance problems such as having more ephemeral collections (gen 0 and 1) in an attempt to deal with the lack of full collections. You may just trade one set of problems for another.
By default, arrays are limited to both UInt32.MaxValue
in number of elements and 2 GB in actual size. Using a configuration option, you can allow larger array sizes, but the maximum number of elements remains the same.
<configuration>
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
</configuration>
This allows 64-bit processes to have arrays that span more than 2 GB in size. However:
UInt32.MaxValue
(4,294,967,295).Some GC options must be configured before the process starts because they are required during CLR initialization. In general, these settings will very rarely be necessary and you should strongly consider whether you need them.
These settings are configured via environment variables which are set on the command-line before you launch the process (which will receive a copy of the current environment).
In server GC, there is a heap and at least one thread created for each processor. There may be times when you want to use fewer processors for GC, perhaps in tandem with changing the application’s processor affinity mask.
// Limit to using the first 16 processors
Process currentProcess = Process.GetCurrentProcess();
long mask = (long)currentProcess.ProcessorAffinity;
mask &= 0xFFFF;
currentProcess.ProcessorAffinity = (IntPtr)mask;
If the application is launched with processor affinity already applied, then server GC will automatically restrict the number of heaps and threads it creates for garbage collecting.
However, this limits the number of processors the application can use for general work as well. If you want your application to use all of the processors for its own work, but only run GC on a subset of those processors, you need to set the GCHeapCount
variable, which was introduced to CoreCLR in mid-2016, or .NET Framework 4.7.
SET COMPLUS_GCHeapCount=<n>
This option is only valid when using server GC. Replace <n>
with a number less than the number of logical processors in use.
You may want to use this if you need the benefits of server GC, but need to limit the amount of CPU used during GCs. Because server GCs run at a high priority, having a thread per core will stall all other processes on the machine. Usually, this is by design and there is an assumption that a server GC app “owns” the machine, but this option is there if you want to free up some processors. For example, you may have a 64-processor server and you want the parallelism and dedicated, fast GC threads that come with server GC, but 64 heaps may be overkill if you need to be more frugal and ensure other processes do not starve during GCs. In addition, you will lessen the amount of memory overhead if your total memory requirements are more modest.
In normal circumstances, each server GC thread is affinitized to run on a specific logical processor. This means that during a GC, it is a virtual guarantee that the GC thread will take over the processor as the highest-priority running thread.
With the following setting, you can turn off affinitization, which will allow GC threads to run on any available processor. This will ensure that the server GC process will cooperate better with other processes.
SET COMPLUS_GCNoAffinitize=1
This setting is designed to work well with COMPLUS_GCHeapCount
when you are improving the cooperation between your server GC application with other processes on the machine.
By turning this on, you are explicitly stating that you want more cooperation and less exclusivity. This means there is no chance that this setting improves your application’s performance, but it might improve your overall system performance.
When optimizing code to achieve the highest levels of performance it is unfortunately common to take shortcuts that can lead to bugs such as corrupting program state or even the heap structure itself. Heap corruption in .NET applications is almost always the result of buggy unmanaged code in the same process. However, it is still possible in managed-only applications and can indicate a bug within the CLR itself. When this happens, it can be extremely hard to debug as the crash will not happen at a deterministic place.
You can use the !VerifyHeap
command to verify the heap within the debugger.
0:006> !VerifyHeap
object 04b05980: bad member 00000066 at 04B05984
Last good object: 04B057E4.
0:006> !do 04B057E4
Name: System.Int32[]
MethodTable: 62281938
EEClass: 61e09600
Size: 412(0x19c) bytes
Array: Rank 1, Number of elements 100, Type Int32
Fields:
None
Also, it can be tricky to deliberately get the heap into a state where problems manifest reliably. The heap can be in an in-between state while a GC is happening so you need to take care to ensure you validate the heap only outside of a GC.
Thankfully, there is an easy way to do this outside of the debugger. You can turn on an option to cause the heap to be verified before and after every GC.
SET COMPLUS_HeapVerify=1
Turning on heap verification will cause performance to suffer as each GC will now force the heap to be validated, a process which will take longer depending on the size of your heap. If corruption is detected, an exception will be thrown and the process will be terminated.
This almost goes without saying, but if you reduce the amount of memory you are allocating, you reduce the pressure on the garbage collector to operate. You can also reduce memory fragmentation and CPU usage as well. It can take some creativity to achieve this goal and it might conflict with other design goals.
Critically examine each object and ask yourself:
Int64
to Int32
, for example)?Story In a server that handled user requests, we found out that one type of common request caused more memory to be allocated than the size of a heap segment. Since the CLR caps the maximum size of segments and gen 0 must exist in a single segment, we were guaranteed a GC on every single request. This is not a good spot to be in because there are few options besides reducing memory allocations.
There is one fundamental rule for high-performance programming with regard to the garbage collector. In fact, the garbage collector was explicitly designed with this idea in mind:
Collect objects in gen 0 or not at all.
Put differently, you want objects to have an extremely short lifetime so that the garbage collector will never touch them at all, or, if you cannot do that, they should go to gen 2 as fast as possible and stay there forever, never to be collected. This means that you maintain a reference to long-lived objects forever. Often, this also means pooling reusable objects, especially anything on the large object heap.
Garbage collections get more expensive in each generation. You want to ensure there are many gen 0/1 collections and very few gen 2 collections. Even with background GC for gen 2, there is still a CPU cost that you would rather not pay: a processor the rest of your program should be using.
Note You may have heard the myth that you should have 10 gen 0 collections for each gen 1 collection and 10 gen 1 collections for each gen 2 collection. This is not true. Just understand that you want to have lots of fast gen 0 collections and very few of the expensive gen 2 collections.
You want to avoid objects being promoted to gen 1 because those that are will tend to also be promoted to gen 2 in due course. Gen 1 is a sort of buffer before you get to gen 2.
Ideally, every object you allocate goes out of scope by the time the next gen 0 comes around. You can measure how long that interval is and compare it to the duration that data is alive in your application. See the end of the chapter for how to use tools to discover this information.
Obeying this rule requires a fundamental shift in your mindset if you are not used to it. It will inform nearly every aspect of your application, so get used to it early and think about it often.
The shorter an object’s lifetime, the less chance it has of being promoted to the next generation when a GC comes along. In general, you should not allocate objects until right before you need them. The exception would be when the cost of object creation is so high it makes sense to create them at an earlier point when it will not interfere with other processing.
On the other side of the object use, you want to make sure that objects go out of scope as soon as possible. For local variables, this can be after the last local usage, even before the end of the method. You can lexically scope it narrower by using the { }
brackets, but this will probably not make a practical difference because the compiler will generally recognize when a local object is no longer used anyway. If your code spreads out operations on an object, try to reduce the time between the first and last uses so that the GC can collect the object as early as possible.
Rarely, you may find a need to explicitly null
out a reference to a temporary object if it is a member or static field on a long-lived object. You would do this only if you want to prevent this object from being promoted by the garbage collector. First, try to change the design to make the reference a local variable where object life time is not as much an issue. If you decide to null
out a field, this may make the code slightly more complicated because you will have more checks for null
values scattered around. This can also create a tension between efficiency and always having full state available, particularly for debugging. One option to get around that problem is to convert the object you want to null
out to another form. For example, serialize an XML document hierarchy to a string, or a temporary state object to a log message that can more efficiently record the state for debugging later. This technique is usually only necessary for large, temporary object graphs that are in fields for convenience purposes.
Another way to manage this balance is to have variable behavior: run your program (or a specific portion of your program, say for a specific request) in a mode that does not null
out references but keeps them around as long as possible for easier debugging.
As described at the beginning of this chapter, the GC works by following object references. In server GC, it does this on multiple threads at once. You want to exploit parallelism as much as possible, and if one thread hits a very long chain of nested objects, the entire garbage collection process will not finish until that long-running thread is complete. In addition, if a particular thread allocates more memory than others, it will trigger a GC more often than if the same allocations were spread across multiple heaps.
Thankfully, there are load-balancing algorithms. For allocations, when the GC detects that heaps are becoming unbalanced, it will start forcing allocations to occur on different heaps. This functionality has existed for the small object heap for many CLR versions, but balancing the large object heap has only happened since version 4.5. On the collection side, cores that run out of collection work can steal work from other heaps.
Problems with unbalanced heaps are less common now with these GC features, but if you suspect too-frequent, or long GC pauses, it may be worth looking at your code for the presence of deep object trees or a thread bias for allocations.
If you do find that a single thread is responsible for most of the allocations, investigate ways to spread this responsibility around. Ensure that you are using Task
objects or the thread pool to even out the possibility of different threads handling different requests. Avoid the pattern of a single thread processing a queue of requests and doing the bulk of allocations before handing off the work to other threads to finish processing.
Objects that have many references to other objects will take more time for the garbage collector to traverse. A long GC pause time is often an indication of a large, complex object graph.
Another danger is that it becomes much harder to predict object lifetimes if you cannot easily determine all of the possible references to them. Reducing this complexity is a worthy goal just for sane code practices, but it also makes debugging and fixing performance problems easier.
Also, be aware that references between objects of different generations can cause inefficiencies in the garbage collector, specifically references from older objects to newer objects. For example, if an object in generation 2 has a reference to an object in generation 0, then every time a gen 0 GC occurs, a portion of gen 2 objects will also have to be scanned to see if they are still holding onto this reference to a generation 0 object. It is not as expensive as a full GC, but it is still unnecessary work if you can avoid it.
Pinning an object fixes it in place so that the garbage collector cannot move it. Pinning exists so that you can safely pass managed memory references to unmanaged code. It is most commonly used to pass arrays or strings to unmanaged code, but is also used to gain direct fixed
memory access to data structures or fields. If you are not doing interop with unmanaged code and you do not have any unsafe
code, then you should not have the need to pin at all. However, even if you avoid explicit pinning in your own code, there are plenty of common APIs that need to do it anyway.
While the pinning operation itself is inexpensive, it throws a bit of a wrench into the garbage collector’s operation by increasing the likelihood of fragmentation. The garbage collector tracks those pinned objects so that it can use the free spaces between them, but if you have excessive pinning, it can still cause fragmentation and heap growth.
Pinning can be either explicit or implicit. Explicit pinning is performed with use of a GCHandle
of type GCHandleType.Pinned
or the fixed
keyword and must be inside code marked as unsafe
. The difference between using fixed
or a handle is analogous to the difference between using
and explicitly calling Dispose
. fixed
is more convenient, but cannot be used in asynchronous situations, whereas you can pass around a handle and dispose of it in the callback.
Implicit pinning is more common, but can be harder to see and more difficult to remove. The most obvious source of pinning will be any objects passed to unmanaged code via Platform Invoke (P/Invoke). This is not just your own code—managed APIs that you call can, and often do, call native code, which will require pinning.
The CLR will also have pinned objects in its own data structures, but these should normally not be a concern.
Ideally, you should eliminate as much pinning as you can. If you cannot quite do that, follow the same rules for garbage collection: keep lifetime as short as possible. If objects are only pinned briefly then there is less chance for them to affect the next garbage collection. You also want to avoid having very many pinned objects at the same time. Pinning objects located in gen 2 or the LOH is generally fine because these objects are unlikely to move anyway. This can lead to a strategy of either allocating large buffers on the large object heap and giving out portions of them as needed, or allocating small buffers on the small object heap, but before pinning, ensure they are promoted to gen 2. This takes a bit of management on your part, but it can completely avoid the issue of having pinned buffers during a gen 0 GC.
Never implement a finalizer unless it is required. Finalizers are code, triggered by the garbage collector to cleanup unmanaged resources. They are called from a single thread, one after the other, and only after the garbage collector declares the object dead after a collection. This means that if your class implements a finalizer, you are guaranteeing that it will stay in memory even after the collection that should have killed it. There is also additional bookkeeping to be done on each GC as the finalizer list needs to be continually updated if the object is relocated. All of this combines to decrease overall GC efficiency and ensure that your program will dedicate more CPU resources to cleaning up your object.
Not only that, but an object with a finalizer is slower to allocate. Instead of the “fast path” allocator, it must do extra bookkeeping to ensure the GC tracks the object for its lifetime.
If you do implement a finalizer, you must also implement the IDisposable
interface to enable explicit cleanup, and call GC.SuppressFinalize(this)
in the Dispose
method to remove the object from the finalization queue. As long as you call Dispose
before the next collection, it will clean up the object properly without the need for the finalizer to run. The following example correctly demonstrates this pattern. Note that you can (and often should) implement the Dispose pattern without implementing a finalizer.
class Foo : IDisposable
{
private bool disposed = false;
private IntPtr handle;
private IDisposable managedResource;
~Foo() // Finalizer
{
Dispose(false);
}
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (this.disposed)
{
return;
}
if (disposing)
{
// Not safe to do this from finalizer
this.managedResource.Dispose();
}
// Cleanup unmanaged resources that are safe to
// do so in a finalizer
UnsafeClose(this.handle);
// If the base class is IDisposable object
// make sure you call base.Dispose(disposing);
this.disposed = true;
}
}
All cleanup logic is centralized in the Dispose(bool)
method. Everything else just calls it. The disposing
variable indicates whether a developer explicitly called Dispose
. If they did, then it is safe to Dispose
of all resources. However, if this method is called via the finalizer, then there is no guarantee any referenced objects are still valid, so only those unmanaged resources explicitly owned by this object can be safely cleaned up in this method. In the context of a finalizer, very few assumptions can be made about the state of objects referenced by this object. The code must be simple and touch only memory guaranteed to belong only to this object and still be valid. Typically this means that you should not access any other finalizable object, or any other disposable object (unless you can guarantee its validity).
Only mark the protected
version of Dispose
virtual and allow it to be overriden by child types. The disposed
field tracks whether the object has already been disposed, allowing the Dispose
method to be called more than once.
Dispose
methods and finalizers should never throw exceptions. Should an exception occur during a finalizer’s execution, then the process will terminate. Finalizers should also be careful doing any kind of I/O, even as simple as logging.
Properly implementing this pattern is important to ensure that it works correctly with polymorphic types. You will have to exercise judgment on whether to implement finalizers on base types that themselves do not have unmanaged resources, but may have derived types that do have such resources. It may be required in some cases to take the performance hit for correctness, but this should be avoided if at all possible.
Any type that contains instances of other IDisposable
types must itself implement IDisposable
. In this way, IDisposable
has a way of spreading through your data structures. Properly implemented, it should be easy to dispose of all the resources merely by calling the root IDisposable
’s Dispose
method.
Note You may have heard that finalizers are guaranteed to run. This is generally true, but not absolutely so. If a program is force-terminated then no more code runs and the process dies immediately. The finalizer thread is triggered by a garbage collection, so if there are no garbage collections, finalizers will not run. There is also a time limit to how long all of the finalizers are given on process shutdown. If your finalizer is at the end of the list, it may be skipped. Moreover, because finalizers execute sequentially, if another finalizer has an infinite loop bug in it, then no finalizers after it will ever run. This can lead to memory leaks. For all these reasons, you should not rely on finalizers to clean up state external to your process.
Not all allocations go to the same heap. Objects over a certain size will go to the large object heap and immediately be in gen 2. The boundary for large object allocations was set at 85,000 bytes by doing a statistical analysis of programs of the day. Any object of that size or greater is judged to be “large” and it goes on a separate heap.
You want to avoid allocations on the large object heap as much as possible. Not only is collecting garbage from this heap more expensive, it is more likely to fragment, causing unbounded memory increases over time. Continuous allocations to the large object heap send a strong signal to the garbage collector to do continuous garbage collections—not a good place to be in.
To avoid these problems, you need to strictly control what your program allocates on the large object heap. What does go there should last for the lifetime of your program and be reused as necessary in a pooling scheme.
The large object heap does not automatically compact, but you may tell it to do so programmatically starting with .NET 4.5.1. However, you should use this only as a last resort, as it will cause a very long pause. Before explaining how to do that, the next few sections will explain how to avoid getting into that situation in the first place.
You should usually avoid copying data whenever you can. For example, suppose you have read file data into a MemoryStream
(preferably a pooled one if you need large buffers). Once you have that memory allocated, treat it as read-only and every component that needs to access it will read from the same copy of the data.
A common requirement, then, is to refer to sub-ranges of a buffer, array, or memory range. .NET provides two ways to accomplish this at present.
The first option, available only for arrays, is the ArraySegment<T>
struct to represent just a portion of the underlying array. This ArraySegment
can be passed around to APIs independent of the original stream, and you can even attach a new MemoryStream
to just that segment. Throughout all of this, no copy of the data has been made.
var memoryStream = new MemoryStream(2048);
var segment = new ArraySegment<byte>(memoryStream.GetBuffer(),
100,
1024);
...
var blockStream = new MemoryStream(segment.Array,
segment.Offset,
segment.Count);
The biggest problem with copying memory is not the CPU necessarily, but the GC. If you find yourself needing to copy a buffer, then try to copy it into another pooled or existing buffer to avoid any new memory allocations.
A newer option for representing pieces of existing buffers is the Span<T>
struct. Span<T>
is still in a pre-release phase at the time of this writing, but it will likely become finalized with the release of C# 7.2 or future upgrades to the runtime. To use this library, you will need to consume the System.Memory NuGet package and use Visual Studio 2017.
Span<T>
is like an array in the sense that it represents a contiguous block of memory, but it has the distinction of being able to wrap managed memory, unmanaged memory, and stack memory with the same abstraction. For unmanaged memory, you can think of it as a smart wrapper that does pointer arithmetic for you.
The following examples of Span<T>
come from the Span project in the accompanying sample code.
The first example creates a standard byte
array on the managed heap and creates a span from a sub-portion of that array. (It could just as easily have spanned the entire array.)
{
...
byte[] array = new byte[] {0, 1, 2, 3};
Span<byte> byteSpan = new Span<byte>(array, 1, 2);
PrintSpan(byteSpan);
...
}
private static void PrintSpan<T>(Span<T> span)
{
for (int i = 0; i < span.Length; i++)
{
ref T val = ref span[i];
Console.Write(val);
if (i < span.Length - 1) { Console.Write(", "); }
}
Console.WriteLine();
}
This produces the following output:
1, 2
This example uses a Span<T>
to wrap a stack-allocated array:
unsafe
{
int* stackMem = stackalloc int[4];
Span<int> intSpan = new Span<int>(stackMem, 4);
for (int i=0;i<intSpan.Length;i++)
{
intSpan[i] = 13 + i;
}
PrintSpan(intSpan);
}
As you can see, it uses the exact same semantic to wrap this array, and the same helper method can be used to print the values. Its output is:
13, 14, 15, 16
The next example is slightly more complex. When you allocate from the native heap, you must specify the number of bytes you are allocating and when you wrap unmanaged memory in a Span<T>
, you are assigning types to that memory, so the length of the span is specified in the count of objects, not length of bytes. This example accounts for that by multiplying the size of the objects we want by the count before we allocate.
unsafe
{
const int ObjectCount = 4;
int memSize = sizeof(int) * ObjectCount;
IntPtr hNative = Marshal.AllocHGlobal(memSize);
Span<int> unmanagedSpan = new Span<int>(hNative.ToPointer(),
ObjectCount);
for (int i = 0; i < unmanagedSpan.Length; i++)
{
unmanagedSpan[i] = 100 + i;
}
PrintSpan(unmanagedSpan);
Marshal.FreeHGlobal(hNative);
}
The output is:
100, 101, 102, 103
The final example makes use of one of the extension methods included in the library to convert a string into a ReadOnlySpan<char>
. Unfortunately, there is no relationship between Span<T>
and ReadOnlySpan<T>
because Span<T>
utilizes ref
-return semantics to avoid copying values. That means, we have to have a separate utility method to print the values.
{
...
ReadOnlySpan<char> subString =
"NonAllocatingSubstring".AsSpan().Slice(13);
PrintSpan(subString);
...
}
private static void PrintSpan<T>(ReadOnlySpan<T> span)
{
for (int i = 0; i < span.Length; i++)
{
T val = span[i];
Console.Write(val);
if (i < span.Length - 1) { Console.Write(", "); }
}
Console.WriteLine();
}
The output of this code is:
S, u, b, s, t, r, i, n, g
There are also utility methods to convert from arrays and ArraySegment
structs to Span<T>
structs.
Remember the cardinal rule from earlier: Objects live very briefly or forever. They should either go away in gen 0 collections or last forever in gen 2. Some objects are essentially static—they are created and last the lifetime of the program naturally. Other objects do not obviously need to last forever, but their natural lifetime in the context of your program ensures they will live longer than the period of a gen 0 (and maybe gen 1) garbage collection. These types of objects are candidates for pooling. Another strong candidate for pooling is any object that you allocate on the large object heap, typically collections.
There is no single way to pool and there is no standard pooling API you can rely on. It really is up to you to develop a way that works for your application and the specific objects you need to pool.
One way to think about poolable objects is that you are turning a normally managed resource (memory) into something that you have to manage explicitly. .NET already has a pattern for dealing with finite managed resources: the IDisposable
pattern. See earlier in this chapter for the proper implementation of this pattern. A reasonable design is to derive a new type and have it implement IDisposable
, where the Dispose
method puts the pooled object back in the pool. This will be a strong clue to users of that type that they need to treat this resource specially.
Implementing a good pooling strategy is not trivial and can depend entirely on how your program needs to use it, as well as what types of objects need to be pooled. Here is code that shows one example of a simple pooling class to give you some idea of what is involved. This code is from the PooledObjects sample program.
interface IPoolableObject : IDisposable
{
int Size { get; }
void Reset();
void SetPoolManager(PoolManager poolManager);
}
class PoolManager
{
private class Pool
{
public int PooledSize { get; set; }
public int Count { get { return this.Stack.Count; } }
public Stack<IPoolableObject> Stack { get; private set; }
public Pool()
{
this.Stack = new Stack<IPoolableObject>();
}
}
const int MaxSizePerType = 10 * (1 << 10); // 10 MB
Dictionary<Type, Pool> pools =
new Dictionary<Type, Pool>();
public int TotalCount
{
get
{
int sum = 0;
foreach (var pool in this.pools.Values)
{
sum += pool.Count;
}
return sum;
}
}
public T GetObject<T>()
where T : class, IPoolableObject, new()
{
Pool pool;
T valueToReturn = null;
if (pools.TryGetValue(typeof(T), out pool))
{
if (pool.Stack.Count > 0)
{
valueToReturn = pool.Stack.Pop() as T;
}
}
if (valueToReturn == null)
{
valueToReturn = new T();
}
valueToReturn.SetPoolManager(this);
pool.PooledSize -= valueToReturn.Size;
return valueToReturn;
}
public void ReturnObject<T>(T value)
where T : class, IPoolableObject, new()
{
Pool pool;
if (!pools.TryGetValue(typeof(T), out pool))
{
pool = new Pool();
pools[typeof(T)] = pool;
}
if (value.Size + pool.PooledSize <= MaxSizePerType)
{
pool.PooledSize += value.Size;
value.Reset();
pool.Stack.Push(value);
}
}
}
class MyObject : IPoolableObject
{
private PoolManager poolManager;
public byte[] Data { get; set; }
public int UsableLength { get; set; }
public int Size
{
get { return Data != null ? Data.Length : 0; }
}
void IPoolableObject.Reset()
{
UsableLength = 0;
}
void IPoolableObject.SetPoolManager(
PoolManager poolManager)
{
this.poolManager = poolManager;
}
public void Dispose()
{
this.poolManager.ReturnObject(this);
}
}
It may seem a burden to force pooled objects to implement a custom interface, but apart from convenience, this highlights a very important fact: In order to use pooling and reuse objects, you must be able to fully understand and control them. Your code must reset them to a known, safe state every time they go back into the pool. This means you should not naively pool 3rd-party objects directly. By implementing your own objects with a custom interface, you are providing a very strong signal that the objects are special. You should especially be wary of pooling objects from the .NET Framework.
It is particularly tricky pooling collections because of their nature—you do not want to destroy the actual data storage (that is the whole point of pooling, after all), but you must be able to signify an empty collection with available space. Thankfully, most collection types implement both Length
and Capacity
properties that make this distinction. Given the dangers of pooling the existing .NET collection types, it is better if you implement your own collection types using the standard collection interfaces such as IList<T>
, ICollection<T>
, and others. See Chapter 6 for general guidance on creating your own collection types.
An additional strategy is to have your poolable types implement a finalizer as a safety mechanism. If the finalizer runs, it means that Dispose
was never called, which is a bug. You can choose to write something to the log, crash, or otherwise signal the problem. You must be very careful with this signaling, though, because touching memory that has been invalidated by the GC will cause a crash or hang.
Remember that a pool that never dumps objects is indistinguishable from a memory leak. Your pool should have a bounded size (in either bytes or number of objects), and once that has been exceeded, it should drop objects for the GC to clean up. Ideally, your pool is large enough to handle normal operations without dropping anything and the GC is only needed after brief spikes of unusual activity. Depending on the size and number of objects contained in your pool, dropping them may lead to long, full GCs. It is important to make sure your pool is tunable for your situation.
I do not usually run to pooling as a default solution. As a general-purpose mechanism, it is clunky and error-prone. However, you may find that your application will benefit from pooling of just a few types.
I once worked on an application that managed federation to thousands of back-end network resources per second. Most of its work was reading bytes off the network or writing to it. Nearly 90% of all allocated memory was going towards MemoryStream
objects that were being allocated and resized all over the place: string encoding, marshaling, unmarshaling, temporary buffers, and more. As a result, we were spending a phenomenal amount of time just doing GC—nearly 25% of all CPU time! Doing memory and CPU profiling quickly revealed the need for a better way to handle bytes than MemoryStream
.
This section will discuss the design and some implementation details of a pooled MemoryStream
class, called RecyclableMemoryStream
. You can download the code at https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream or use it directly from Visual Studio with a NuGet package.
Our requirements for the replacement were:
Dispose
semanticsMemoryStream
, as much as possibleThese requirements were all met and led to the following list of features and implementation details:
The devil is in the details, as the saying goes, so let’s dive into some of the implementation.
Before you can allocate a RecyclableMemoryStream
, you must create the pool manager, a RecyclableMemoryStreamManager
object. This is the class that actually manages the buffer pools and tracks resource usage. Think of it like a miniature heap inside the CLR’s heap. On this class, you set all of your configuration options, like default buffer sizes, maximum size of the heap, and more. There is typically one manager object per process and it lives for the lifetime of the process. However, if you have wildly different usage scenarios, there is no problem using multiple RecyclableMemoryStreamManager
objects.
The RecyclableMemoryStreamManager
maintains two categories of buffers: the Small Pool and the Large Pool. The Small Pool is made of lots of equal-sized buffers. The “Small” in Small Pool refers to the size of the individual buffer, not the size of the pool. The buffers in the Small Pool are called blocks (because they are combined to form the longer stream). The Large Pool contains larger buffers, but far fewer of them, and is designed to be used less frequently (only when GetBuffer
is called). Both pools use uniform buffer sizes to reduce the likelihood of heap fragmentation.
Using this library is easy:
var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7};
var manager = new RecyclableMemoryStreamManager();
using (var stream = manager.GetStream("Test"))
{
stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}
This code creates a RecyclableMemoryStreamManager
with default settings, grabs a stream, writes some bytes to it, and then returns the stream’s blocks to the pool with the Dispose
call. This example passes the tag “Test
” to the stream’s constructor. This tag is not unique per-stream, but serves to identify the location in code where it is allocated, which can help in debugging. It is not required to use tags, but they are useful. Internally, each stream is also assigned a unique GUID that does serve to uniquely identify the stream, which can be useful when tracing concurrent usage of multiple streams.
Internally, the RecyclableMemoryStream
will grab a block from the manager. As more data is written to the stream, more blocks are chained together and the stream’s APIs will make this look like a single contiguous block of memory. As the length of the stream grows, the total memory usage only grows by the block size (and that is assuming the blocks were not already pooled). This is in contrast to MemoryStream
’s implementation, which doubles the stream’s capacity as it grows, leading to potentially massive memory waste, which is fine on a small scale, but not on a massive scale.
As long as just Read
and Write
methods are used, only blocks will be used. However, sometimes it is necessary to get a single contiguous buffer. For this, there is the GetBuffer
API, inherited from MemoryStream
. When GetBuffer
is called, a contiguous block must be returned. If there is only one block in use, then a reference to it is returned. If multiple blocks are used, then the Large Pool is used to satisfy the request, and bytes are copied from the blocks to the larger buffer. If the buffer requested is larger than the maximum buffer size of the pool, then a memory allocation occurs to satisfy the request.
It is worthwhile noting that the buffer returned is at least as large as the data contained in it—it may in fact be much larger. You must use the stream’s Length
property to determine how much data is actually in it. Naive users of the library sometimes ignore this and write huge buffers to the network or to files. After converting the stream to a buffer, with an associated data length, it may be useful to wrap them in an ArraySegment<byte>
struct.
The ToArray
method is much less useful in a pooling scenario. It is required to return an array of exactly the right size, which means that an allocation (possibly on the large object heap) will occur, as well as a memory copy. Because of these inefficiencies, ToArray
should just be completely avoided.
I encourage you to study the code at the link provided earlier because it will be beneficial to understanding how the library attempts to avoid allocations while balancing the need for other requirements.
Once we implemented this library in production code, we saw allocation on the large object heap drop 99%. Worrying about expensive gen 2 collections became a thing of the past. The time spent in garbage collection dropped from 25% to less than 1%.
If you cannot completely avoid large object heap allocations, then you want to do your best to avoid fragmentation.
The large object heap can grow indefinitely if you are not careful, but it is mitigated by the free list. To take advantage of this free list, you want to increase the likelihood that memory allocations can be satisfied from holes in the heap.
One way to do this is to ensure that all allocations on the LOH are of uniform size, or at least multiples of some standard size. For example, a common need for LOH allocations is for buffer pools. Rather than have a hodge-podge of buffer sizes, ensure that they are all the same size, or in multiples of some well-known number such as one megabyte. This way, if one of the buffers does need to get garbage collected, there is a high likelihood that the next buffer allocation can fill its spot rather than going to the end of the heap.
In nearly all cases, you should not force collections to happen outside of their normal schedule as determined by the GC itself. Doing so disrupts the automatic tuning the garbage collector performs and may lead to worse behavior overall. However, there are some considerations in a high-performance system that may cause you to reconsider this advice in very specific situations.
In general, it may be beneficial to force a GC to occur during a more optimal time to avoid a GC occurring during a worse time later on. Note that we are only talking about the expensive, ideally rare, full GCs. Gen 0 and gen 1 GCs can and should happen frequently to avoid building up a too-large gen 0 size.
Some situations may merit a forced collection:
Situations 1 through 3 are all about avoiding full GCs during specific times by forcing them at other times. Situation 4 is about reducing your overall heap size if you have significant fragmentation on the LOH. If your scenario does not fit into one of those categories, you should not consider this a useful option.
To perform a full collection, call the GC.Collect
method with the generation of the collection you want it to perform. Optionally, you can specify a value of the GCCollectionMode
enumeration argument to tell the GC to decide for itself whether to do the collection. There are three possible values:
Default
: Currently, Forced
.Forced
: Tells the garbage collector to start the collection immediately.Optimized
: Allows the garbage collector to decide if now is a good time to run.GC.Collect(2);
// equivalent to:
GC.Collect(2, GCCollectionMode.Forced);
Story This exact situation existed on a server that took user queries. Every few hours we needed to reload over a gigabyte of data, replacing the existing data. Since this was an expensive operation and we were already reducing the number of requests the machine was receiving, we also forced two full GCs after the reload happened. This removed the old data and ensured that everything allocated in gen 0 either got collected or made it to gen 2 where it belonged. Then, once we resumed a full query load, there would not be a huge, full GC to affect the first queries.
Even if you do pooling, it is still possible that there are allocations you cannot control and the large object heap will become fragmented over time. Starting in .NET 4.5.1, you can tell the GC to compact the large object heap on the next full collection.
GCSettings.LargeObjectHeapCompactionMode =
GCLargeObjectHeapCompactionMode.CompactOnce;
Depending on the size of the large object heap, this can be a slow operation, up to multiple seconds. You may want to put your program in a state where it stops doing real work and force an immediate collection with the GC.Collect
method.
This setting only affects the next full GC that happens. Once the next full collection occurs, GCSettings.LargeObjectHeapCompactionMode
resets automatically to GCLargeObjectHeapCompactionMode.Default
.
Because of the expense of this operation, I recommend you reduce the number of large object heap allocations to as little as possible and pool those that you do make. This will significantly reduce the need for compaction. View this feature as a last resort and only if fragmentation and very large heap sizes are an issue.
If your application absolutely should not be impacted by gen 2 collections, then you can tell the GC to notify you when a full GC is approaching. This will give you a chance to stop processing temporarily, perhaps by shunting requests off the machine, or otherwise putting the application into a more favorable state.
It may seem like this notification mechanism is the answer to all GC woes, but I recommend extreme caution. You should only implement this after you have optimized as much as you can in other areas. You can only take advantage of GC notifications if all of the following statements are true:
Gen 2 collections will happen rarely only if you have large object allocations minimized and little promotion beyond gen 0, so it will still take a fair amount of work to get to the point where you can reliably take advantage of GC notifications.
Unfortunately, because of the imprecise nature of GC triggering, you can only specify the pre-trigger time in an approximate way with a number in the range 1–99. With a number that is very low, you will be notified much closer to when the GC will happen, but you risk having the GC occur before you can react to it. With a number that is too high, the GC may be quite far away and you will get a notification far too frequently, which is quite inefficient. It all depends on your allocation rate and overall memory load. Note that you specify two numbers: one for the gen 2 threshold and one for the large object heap threshold. As with other features, this notification is a best effort by the garbage collector. The garbage collector never guarantees you can avoid doing a collection.
To use this mechanism, follow these general steps:
GC.RegisterForFullGCNotification
method with the two threshold values.GC.WaitForFullGCApproach
method. This can wait forever or accept a timeout value.WaitForFullGCApproach
method returns Success
, put your program in a state acceptable for a full GC (e.g., turn off requests to the machine).GC.Collect
method.GC.WaitForFullGCComplete
(again with an optional timeout value) to wait for the full GC to compete before continuing.GC.CancelFullGCNotification
method.Because this requires a polling mechanism, you will need to run a thread that can do this check periodically. Many applications already have some sort of “housekeeping” thread that performs various actions on a schedule. This may be an appropriate task, or you can create a separate dedicated thread.
Here is a full example from the GCNotification
sample project demonstrating this behavior in a simple test application that allocates memory continuously. See the accompanying source code project to test this.
class Program
{
static void Main(string[] args)
{
const int ArrSize = 1024;
var arrays = new List<byte[]>();
GC.RegisterForFullGCNotification(25, 25);
// Start a separate thread to wait for GC notifications
Task.Run(()=>WaitForGCThread(null));
Console.WriteLine("Press any key to exit");
while (!Console.KeyAvailable)
{
try
{
arrays.Add(new byte[ArrSize]);
}
catch (OutOfMemoryException)
{
Console.WriteLine("OutOfMemoryException!");
arrays.Clear();
}
}
GC.CancelFullGCNotification();
}
private static void WaitForGCThread(object arg)
{
const int MaxWaitMs = 10000;
while (true)
{
// There is also an overload of WaitForFullGCApproach
// that waits indefinitely
GCNotificationStatus status =
GC.WaitForFullGCApproach(MaxWaitMs);
bool didCollect = false;
switch (status)
{
case GCNotificationStatus.Succeeded:
Console.WriteLine("GC approaching!");
Console.WriteLine(
"-- redirect processing to another machine -- ");
didCollect = true;
GC.Collect();
break;
case GCNotificationStatus.Canceled:
Console.WriteLine("GC Notification was canceled");
break;
case GCNotificationStatus.Timeout:
Console.WriteLine("GC notification timed out");
break;
}
if (didCollect)
{
do
{
status = GC.WaitForFullGCComplete(MaxWaitMs);
switch (status)
{
case GCNotificationStatus.Succeeded:
Console.WriteLine("GC completed");
Console.WriteLine(
"-- accept processing on this machine again --");
break;
case GCNotificationStatus.Canceled:
Console.WriteLine(
"GC Notification was canceled");
break;
case GCNotificationStatus.Timeout:
Console.WriteLine(
"GC completion notification timed out");
break;
}
// Looping isn't necessary, but it is useful if you want
// to check other state before waiting again.
} while (status == GCNotificationStatus.Timeout);
}
}
}
}
Another possible reason is to compact the large object heap, but you could trigger this based on memory usage instead, which may be more appropriate.
Weak references are references to an object that still allow the garbage collector to clean up the object. They are in contrast to the default strong references, which prevent collection completely (for that object). They are mostly useful for caching expensive objects that you would like to keep around, but are willing to let go if there is enough memory pressure. Weak references are a core CLR concept that are exposed through a couple of .NET classes:
WeakReference
WeakReference<T>
You should ignore the first one in favor of the generic version that was introduced in .NET 4.5. The non-generic version has API weaknesses that are resolved in the newer version, and I will only discuss the generic version here.
An example of a simple usage:
// The underlying Foo object can be garbage collected at any time!
WeakReference<Foo> weakRef = new WeakReference(new Foo());
...
// Create a strong reference to the object,
// now no longer eligible for GC
Foo myFoo;
if (weakRef.TryGetTarget(out myFoo))
{
...
}
Note that the reference to the WeakReference<T>
object itself is strong, which means that it will not be collected out from under you—it is only the underlying target object that is weakly referenced. If you are memory-conscious enough to use WeakReference<T>
then you might rightly be leery of continually allocating new WeakReference<T>
objects. Thankfully, you can reuse these wrapper objects by using the SetTarget
method to replace the underlying value as needed.
You can still have other references, both strong and weak, to the same object. Collection will only happen if the only references to it are weak (or non-existent).
Most applications do not need to use weak references at all, but there are some criteria that may indicate good usage:
WeakReference<T>
objects in the first place.)Following are two examples of using WeakReference<T>
for efficient caching.
A good way to use WeakReference<T>
is as part of a cache. Objects start out held through a strong reference, but after enough time of not being used (or some other criteria of your choosing), they can be demoted to being held by weak references, which may eventually disappear through garbage collection.
This example shows a simple cache that internally manages two levels of caches.
public class HybridCache<TKey, TValue> where TValue : class
{
class ValueContainer<T>
{
public T value;
public long additionTime;
public long demoteTime;
}
private readonly TimeSpan maxAgeBeforeDemotion;
// Values live here until they hit their maximum age
private readonly ConcurrentDictionary<TKey,
ValueContainer<TValue>>
strongReferences =
new ConcurrentDictionary<TKey, ValueContainer<TValue>>();
// Values are moved here after they hit their maximum age
private readonly ConcurrentDictionary<
TKey,
WeakReference<ValueContainer<TValue>>>
weakReferences =
new ConcurrentDictionary<
TKey,
WeakReference<ValueContainer<TValue>>>();
public int Count
{
get
{
return this.strongReferences.Count;
}
}
public int WeakCount
{
get
{
return this.weakReferences.Count;
}
}
public HybridCache(TimeSpan maxAgeBeforeDemotion)
{
this.maxAgeBeforeDemotion = maxAgeBeforeDemotion;
}
public void Add(TKey key, TValue value)
{
RemoveFromWeak(key);
var container = new ValueContainer<TValue>();
container.value = value;
container.additionTime = Stopwatch.GetTimestamp();
container.demoteTime = 0;
this.strongReferences.AddOrUpdate(
key,
container,
(k, existingValue) => container);
}
private void RemoveFromWeak(TKey key)
{
WeakReference<ValueContainer<TValue>> oldValue;
weakReferences.TryRemove(key, out oldValue);
}
public bool TryGetValue(TKey key, out TValue value)
{
value = null;
ValueContainer<TValue> container;
if (this.strongReferences.TryGetValue(key, out container))
{
AttemptDemotion(key, container);
value = container.value;
return true;
}
WeakReference<ValueContainer<TValue>> weakRef;
if (this.weakReferences.TryGetValue(key, out weakRef))
{
if (weakRef.TryGetTarget(out container))
{
value = container.value;
return true;
}
else
{
RemoveFromWeak(key);
}
}
return false;
}
/// <summary>
/// Call this method periodically from another thread.
/// </summary>
public void DemoteOldObjects()
{
var demotionList =
new List<KeyValuePair<TKey,
ValueContainer<TValue>>>();
long now = Stopwatch.GetTimestamp();
foreach (var kvp in this.strongReferences)
{
var age = CalculateTimeSpan(kvp.Value.additionTime,
now);
if (age > this.maxAgeBeforeDemotion)
{
demotionList.Add(kvp);
}
}
foreach (var kvp in demotionList)
{
Demote(kvp.Key, kvp.Value);
}
}
private void AttemptDemotion(TKey key,
ValueContainer<TValue> container)
{
long now = Stopwatch.GetTimestamp();
var age = CalculateTimeSpan(container.additionTime, now);
if (age > this.maxAgeBeforeDemotion)
{
Demote(key, container);
}
}
private void Demote(TKey key,
ValueContainer<TValue> container)
{
ValueContainer<TValue> oldContainer;
this.strongReferences.TryRemove(key, out oldContainer);
container.demoteTime = Stopwatch.GetTimestamp();
var weakRef =
new WeakReference<ValueContainer<TValue>>(container);
this.weakReferences.AddOrUpdate(key,
weakRef,
(k, oldRef) => weakRef);
}
private static TimeSpan CalculateTimeSpan(long offsetA,
long offsetB)
{
long diff = offsetB - offsetA;
double seconds = (double)diff / Stopwatch.Frequency;
return TimeSpan.FromSeconds(seconds);
}
}
This example uses weak references to make updates to a simple database more efficient by avoiding immediate, potentially expensive index updates.
class Person
{
public string Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime Birthday { get; set; }
}
class PersonDatabase
{
private Dictionary<string, Person> index =
new Dictionary<string, Person>();
private Dictionary<DateTime,
List<WeakReference<Person>>>
birthdayIndex =
new Dictionary<DateTime, List<WeakReference<Person>>>();
public bool NeedsIndexRebuild { get; private set; }
public void AddPerson(Person person)
{
this.index[person.Id] = person;
List<WeakReference<Person>> birthdayList;
if (!this.birthdayIndex.TryGetValue(person.Birthday,
out birthdayList))
{
birthdayIndex[person.Birthday]
= birthdayList
= new List<WeakReference<Person>>();
}
birthdayList.Add(new WeakReference<Person>(person));
}
public void RemovePerson(string id)
{
index.Remove(id);
}
public bool TryGetById(string id, out Person person)
{
return this.index.TryGetValue(id, out person);
}
public bool TryGetByBirthday(DateTime birthday,
out List<Person> people)
{
people = null;
List<WeakReference<Person>> weakPeople;
if (this.birthdayIndex.TryGetValue(birthday,
out weakPeople))
{
var list = new List<Person>(weakPeople.Count);
foreach(var wp in weakPeople)
{
Person person;
if (wp.TryGetTarget(out person))
{
list.Add(person);
}
else
{
// we got a null reference --
// we need to rebuild the indexes
this.NeedsIndexRebuild = true;
}
}
if (list.Count > 0)
{
people = list;
return true;
}
}
return false;
}
}
There is an overload of WeakReference<T>
’s constructor that takes a Boolean
value called trackResurrection
:
WeakReference<MyObject> weakRef =
new WeakReference<MyObject>(myObj, trackResurrection: true);
Resurrection is when you do something like this in a class’s finalizer:
class MyObject
{
static MyObject myObj;
~MyObject()
{
myObj = this;
}
}
By doing this, you are taking an object that had no more references to it (hence why the finalizer ran) and re-adding a reference to it. This technique is sometimes used in advanced caching scenarios, but it has a number of drawbacks:
GC.ReRegisterForFinalizer
on an object or the finalizer will not run again for it.You should just consider this technique a bug unless you really understand the state of the objects you are dealing with. There are better ways to reuse objects.
If you do use it, then you can tell WeakReference<T>
to allow longer access to the underlying object. If your object does not have a finalizer, then this parameter has no effect.
Instead of allocating memory from the heap, it is possible to allocate dynamically sized buffers on the stack using stackalloc
. Such allocations are faster than heap allocations and incur no garbage collection. However, there are some significant caveats:
To demonstrate how stackalloc
works, see the StackAlloc sample program, which contains this code:
private static unsafe void DoStackAlloc(int size)
{
int* buffer = stackalloc int[size];
for (int i = 0; i < size; i++)
{
buffer[i] = i;
}
}
The rest of the program runs this code in a loop, asking for input for how much to allocate. A sample run looks like this:
Enter size to stackalloc ('q' to exit): 100
Allocated 100-size array
Enter size to stackalloc ('q' to exit): 200
Allocated 200-size array
Enter size to stackalloc ('q' to exit): 100000
Allocated 100000-size array
Enter size to stackalloc ('q' to exit): 1000000
Process is terminated due to StackOverflowException.
StackOverflowException
has the notable distinction of being an exception that your program cannot catch. The sample code wraps the allocation in an exception handler, but to no avail. When this exception is thrown, your application will immediately exit. If you run under a debugger, however, it can catch it.
Despite the risks and limitations, stackalloc
is a valuable tool when you want small, dynamically sized arrays in your methods without the overhead of a heap allocation.
In this section, you will learn many tips and techniques to investigate what is happening on the GC heap. In many cases, multiple tools can give you the same information. I will endeavor to describe the use of a few in each scenario, where applicable.
.NET supplies a number of Windows performance counters, all in the .NET CLR Memory category. All of these counters except for Allocated Bytes/sec are updated at the end of a collection. If you notice values getting stuck, it is likely because collections are not happening very often.
GC.Collect
was called to explicitly start garbage collection.The CLR publishes numerous events about GC behavior. In most cases, you can rely on the tools to analyze these in aggregate for you, but it is still useful to understand how this information is logged in case you need to track down specific events and relate them to other events in your application. You can examine these in detail in PerfView with the Events view. Here are some of the most important:
The order of events that are received is important. For a normal, foreground GC of any generation, the sequence is:
If you want to analyze these events in your own applications or utilities, see the sections on TraceEvent and PerfView in Chapter 8 for an easy-to-use library. Through judicious use and analysis of ETW events you can detect whether operations in your application are being affected by GC (or any other type of external influence).
WinDbg can give you a few different views of the heap. First, by segment:
!eeheap -gc
The output will look something like this:
Number of GC Heaps: 1
generation 0 starts at 0x05824e2c
generation 1 starts at 0x0532100c
generation 2 starts at 0x05321000
ephemeral segment allocation context: none
segment begin allocated size
05320000 05321000 05891ff4 0x570ff4(5705716)
Large object heap starts at 0x06321000
segment begin allocated size
06320000 06321000 07312c80 0xff1c80(16718976)
07900000 07901000 088ee660 0xfed660(16701024)
08a30000 08a31000 09a1e660 0xfed660(16701024)
09c80000 09c81000 0ac6e660 0xfed660(16701024)
0ac80000 0ac81000 0bc6e540 0xfed540(16700736)
...more segments...
Total Size: Size: 0x213b9d94 (557555092) bytes.
------------------------------
GC Heap Size: Size: 0x213b9d94 (557555092) bytes.
If the process is running server GC, there will be more than one heap, each with their own set of ephemeral, gen 2, and large object segments.
Another view is provided with the !HeapStat
command, which aggregates across all segments to break down the sizes of each generation, including free space.
0:007> !HeapStat
Heap Gen0 Gen1 Gen2 LOH
Heap0 446920 5258784 12 551849376
Free space: Percentage
Heap0 12 1948 0 15936SOH: 0% LOH: 0%
This output shows that there is very little in the gen 2 heap, and there is an insignificant amount of free space (fragmentation) on the heap. The letters SOH mean “Small Object Heap”, which means every segment other than the Large Object Heap segments.
The !VMMap
command shows information about virtual address regions and the levels of protection applied to them:
0:000> !VMMap
Start Stop Length AllocProtect Protect State Type
00000000-00f5ffff 00f60000 NA Free
00f60000-00f60fff 00001000 ExWrCp Rd Commit Image
00f61000-00f61fff 00001000 ExWrCp Reserve Image
00f62000-00f62fff 00001000 ExWrCp Rd Commit Image
00f63000-00f63fff 00001000 ExWrCp Reserve Image
00f64000-00f64fff 00001000 ExWrCp Rd Commit Image
00f65000-00f65fff 00001000 ExWrCp Reserve Image
00f66000-00f66fff 00001000 ExWrCp Rd Commit Image
00f67000-00f67fff 00001000 ExWrCp Reserve Image
...
The !VMStat
command will take that information and summarize it by State:
0:000> !VMStat
TYPE MINIMUM MAXIMUM AVERAGE BLK COUNT TOTAL
==== ======= ======= ======= ========= =====
Free:
Small 8K 64K 43K 30 1,315K
Medium 84K 996K 332K 10 3,323K
Large 1,152K 2,090,816K 204,209K 17 3,471,563K
Summary 8K 2,090,816K 60,986K 57 3,476,203K
Reserve:
Small 4K 64K 34K 34 1,183K
Medium 68K 1,012K 299K 56 16,779K
Large 1,376K 32,768K 12,073K 7 84,515K
Summary 4K 32,768K 1,056K 97 102,479K
Commit:
Small 4K 64K 12K 204 2,575K
Medium 68K 964K 347K 44 15,307K
Large 1,048K 16,332K 12,716K 47 597,671K
Summary 4K 16,332K 2,086K 295 615,555K
Private:
Small 4K 64K 19K 88 1,716K
Medium 68K 1,012K 285K 57 16,267K
Large 1,376K 32,768K 15,215K 41 623,851K
Summary 4K 32,768K 3,450K 186 641,835K
Mapped:
Small 4K 64K 25K 8 204K
Medium 68K 1,004K 374K 6 2,247K
Large 1,540K 18,320K 5,442K 5 27,211K
Summary 4K 18,320K 1,561K 19 29,663K
Image:
Small 4K 64K 12K 142 1,839K
Medium 68K 964K 366K 37 13,571K
Large 1,048K 15,712K 3,890K 8 31,124K
Summary 4K 15,712K 248K 187 46,535K
The SysInternals tool VMMap can also give you a good summary of all the segments in a process. Once you have selected the process, highlight the Managed Heap in the table, and you will see a list of all segments in the process.
The GC records many events about its operation. You can use PerfView to examine these events in a very efficient way.
To see statistics on GC, start the AllocateAndRelease sample program.
Start PerfView and follow these steps:
For each process, you will find a list of data points and a set of tables summarizing GC behavior.
At the top of each section is a list of items describing the overall information.
GC Trace Summary Item | Description |
---|---|
CommandLine | The exact command that executed the process |
Runtime Version | The version of the CLR that is executing |
CLR Startup Flags | Flags controlling the behavior of the GC, such as CONCURRENT_GC or SERVER_GC |
Total CPU Time | Total time, in milliseconds, taken by the process during the profile |
Total GC CPU Time | Total time, in milliseconds, spent doing garbage collection |
Total Allocs | Amount of allocation you have done |
GC CPU MSec/MB Alloc | How much time, in milliseconds, the GC spent processing 1 MB of memory |
Total GC Pause | Amount of time, in milliseconds, the process was paused for GC |
% Time paused for Garbage Collection | GC pause time, expressed as a percent of total CPU time |
% CPU Time spent Garbage Collecting | CPU time can be different than % Time if you are running server GC |
Max GC Heap Size | Maximum size of the GC heap during profiling |
Peak Process Working Set | Maximum size of the working set during profiling |
Peak Virtual Memory Usage | Maximum virtual memory reserved during profiling |
Below that, you will find a table summarizing all the generations of GC.
GC Summary Info Column | Description |
---|---|
Gen | Generation, including ALL, which aggregates all GCs into a single set of stats. |
Count | Number of collections. |
Max Pause | Longest time, in milliseconds, that GC was paused. |
Max Peak MB | Maximum size of the generation on the heap. |
Max Alloc MB/sec | Peak allocation rate. |
Total Pause | Sum of all pause times, in milliseconds. |
Total Alloc MB | Amount of memory allocated. |
Alloc MB/MSec GC | Amount of memory allocated per millisecond of GC time. This is a measure of GC efficiency. Higher numbers mean a more effective (or less intrusive) GC. |
Survived MB/MSec GC | Amount of memory that survives a GC, per millisecond of GC time. This is another measure of GC efficiency. Higher numbers mean more memory is surviving. |
Mean Pause | Average pause time, in milliseconds. |
Induced | Count of explicit GC invocations (GC.Collect ). |
Below this table, you will find even more detailed tables listing specific GC instances in various categories, such as “Pauses > 200 MSec”, “LOH Allocation Pause (due to background GC) > 200 MSec”, “Gen 2”, and “All GC Events”.
GC Details Column | Description |
---|---|
GC Index | The order in which the GC occurred |
Pause Start | Time stamp, in milliseconds, from when the profile started, of when the GC occurred |
Trigger Reason | Reason the GC happened. |
Gen | Generation and letter code indicating the type of GC. Gen is 0-2. N=NonConcurrent, B=Background, F=Foreground, I=Induced, i=induced, not forced. |
Suspend Msec | The number of milliseconds required to suspend running threads |
Pause Msec | Total time process is paused for GC |
% Pause Time | % of time in GC, since previous GC |
% GC | % of CPU time used by GC |
Gen0 Alloc MB | Amount allocated since previous GC |
Gen0 Alloc Rate MB/sec | Allocation rate since previous GC |
Peak MB | Peak size of heap during GC |
After MB | Size of heap after GC is complete |
Ratio Peak/After | Efficiency, higher is better |
Promoted MB | Amount of memory that survived the GC |
Gen0 MB | Gen 0 size after this GC is complete |
Gen0 Survival Rate % | % of objects in gen 0 that survived GC |
Gen 0 Frag % | % of gen 0 that is free space |
Gen 1 MB | Gen 1 size after this GC is complete |
Gen1 Survival Rate % | % of objects in gen 1 that survived GC |
Gen1 Frag % | % of generation 1 that is free space |
Gen2 MB | Gen 2 size after this GC is complete |
Gen2 Survival Rate % | % of objects in gen 2 that survived GC |
Gen2 Frag % | % of gen 2 that is free space |
LOH MB | LOH size after this GC is complete |
LOH Survival Rate % | % of objects on LOH that survived GC |
LOH Frag % | % of the LOH that is free space |
Finalizable Surv MB | Finalizable object size that survived GC |
Pinned Obj | # of pinned objects this GC promoted. Fewer is better. |
As you can see, there is a wealth of information with each GC event which you can use to analyze GC performance.
Visual Studio can track .NET memory allocations via ETW sampling. Note that this is completely different than the Memory Usage profiler. That report is essentially a heap dump analyzer, which shows static snapshots of the objects on the heap and their ownership references, back to the root. The .NET memory allocation report in Performance Wizard uses ETW events to track which methods are actually doing the allocation, regardless of who ends up holding onto the references. It uses the GCAllocationTick_V2 ETW event that the CLR emits every 100KB of allocations.
Clicking on a method name will again take you to the familiar Function Details view. Just remember that you are looking at memory allocations rather than CPU time.
This report has many other views to drill-down along different dimensions. The Allocation view in particular is interesting. It is what is shown when you click on a type name in the main summary view.
This view aggregates by type and shows you which methods contribute to their allocations most frequently.
Another option is PerfView, which can show you the same information as Visual Studio, and much more, though the interface is not quite as polished.
See Chapter 1 for more information on using PerfView’s interface to get the most out of the view.
Using the above information, you should be able to find the stacks for all the allocations that occur in the test program, and their relative frequency. For example, in my trace, string allocation accounts for roughly 59.5% of all memory allocations.
You can also use CLR Profiler to find this information and display it in a number of ways.
Once you have collected a trace and the Summary window opens, click on the Allocation Graph button to open up a graphical trace of object allocations and the methods responsible for them.
The most frequently allocated objects are also the ones most likely to be triggering garbage collections. Reduce these allocations and the rate of GCs will go down.
All of the tools in this section will use the LargeMemoryUsage sample program, reproduced here:
class Program
{
const int ArraySize = 1000;
static object[] staticArray = new object[ArraySize];
static void Main(string[] args)
{
var localArray = new object[ArraySize];
var rand = new Random();
for (int i = 0; i < ArraySize; i++)
{
staticArray[i] = GetNewObject(rand.Next(0, 4));
localArray[i] = GetNewObject(rand.Next(0, 4));
}
Console.WriteLine("Examine heap now. Press any key to exit.");
Console.ReadKey();
// This will prevent localArray from being
// garbage collected before you take the snapshot
Console.WriteLine(staticArray.Length);
Console.WriteLine(localArray.Length);
}
private static Base GetNewObject(int type)
{
Base obj = null;
switch (type)
{
case 0: obj = new A(); break;
case 1: obj = new B(); break;
case 2: obj = new C(); break;
case 3: obj = new D(); break;
}
return obj;
}
}
class Base
{
private byte[] memory;
protected Base(int size) { this.memory = new byte[size]; }
}
class A : Base { public A() : base(1000) { } }
class B : Base { public B() : base(10000) { } }
class C : Base { public C() : base(100000) { } }
class D : Base { public D() : base(1000000) { } }
This simple program just allocated random amounts of different classes and waits for you to analyze the heap before exiting.
There are a number of ways to analyze this heap, starting with a very low level.
Using WinDbg, you could execute the !DumpHeap
command to just dump a list of every single object on the heap:
0:007> !DumpHeap
Address MT Size
02aa1000 00b2ac70 10 Free
02aa100c 00b2ac70 10 Free
02aa1018 00b2ac70 10 Free
02aa1024 71911eac 84
02aa1078 71912000 84
02aa10cc 71912044 84
02aa1120 71912088 84
02aa1174 719120cc 84
02aa11c8 719120cc 84
02aa121c 71912104 12
02aa1228 71911d64 14
The MT column specifies the address of the Method Table, which is essentially equivalent to the class.
You can dump a specific object to get its information:
0:007> !DumpObj /d 02bb8cf4
Name: LargeMemoryUsage.A
MethodTable: 00de4f6c
EEClass: 00de196c
Size: 12(0xc) bytes
File: D:SampleCode...LargeMemoryUsage.exe
Fields:
MT Field Offset Type VT Attr Value Name
719160e8 4000003 4 System.Byte[] 0 instance 02bb8d00 memory
Dumping every object in the heap will usually be overwhelming. Thankfully, you can filter the output a bit, such as by type:
0:007> !DumpHeap -type LargeMemoryUsage.A
Address MT Size
02aaba98 00de4f6c 12
02ab82cc 00de4f6c 12
02ab86cc 00de4f6c 12
02ab8acc 00de4f6c 12
If you have a heap range, you can filter the output to only objects within a specified range:
!DumpHeap -type LargeMemoryUsage.A 02aaba98 02ab86cc
DumpHeap Parameter |
Description |
---|---|
-min | Display objects at least the given size. |
-max | Display objects at most the given size. |
-startatLowerBound | Start scanning the heap at the specified address (must be the address of an object). |
-type | Does a substring match of the argument against the type name. |
-mt | Displays only objects with the given method table address. This is a more precise way to get output of a specific type of objects compared to -type , which can match on different types. |
-short | Outputs object addresses only. |
-strings | Displays a summary of strings in the heap. |
-stat | Only displays the statistical summary. |
While there is a rudimentary scripting language in WinDbg, doing advanced heap analysis can be difficult. Another option is to use CLR MD to analyze the objects.
private static void PrintGen0Objects(ClrRuntime clr)
{
var heap = clr.Heap;
foreach(var obj in heap.EnumerateObjects())
{
Console.WriteLine($"0x{obj.Address:x} - {obj.Type.Name}");
}
}
Because you have programmatic access to the same properties of an object as in WinDbg, you can filter by the same criteria, or even come up with more complex criteria to find and analyze objects.
So far, we have looked at ways to analyze each discrete object. That certainly comes in handy while debugging, but often when we are analyzing overall behavior, we want to consider all of the objects in aggregate.
Starting with Visual Studio 2013 Premium Edition (Enterprise Edition starting in Visual Studio 2015), there is a managed heap analyzer. You can access it after opening a managed memory dump by selecting “Debug Managed Memory”.
From here, you can do three things:
There is also a feature in the Performance Profiler to get heap snapshots during runtime. To access this, go to the Analyze | Performance Profiler, then select Memory Usage. The output is a graph of memory usage against garbage collections. While the analysis is running, you can take snapshots of the heap whenever you want.
Clicking the size or object count in a snapshot will take you to a table of all the objects on the heap, and their paths to the root (what is keeping them alive).
Each snapshot also allows you to see just the objects that changed from the previous snapshot, helping you analyze allocations over time.
These options give you a fairly basic, but useful overview of your heap. If you need more analytical power, then I recommend PerfView. PerfView will not show you individual objects, but its ability to show aggregated object relationships is unparalleled.
To use this feature in PerfView:
You should see a table like this:
It tells you immediately that D
accounts for 88% of the program’s memory at 462 MB with 924 objects. You can also see local variables are holding on to 258 MB of memory and the staticArray
object is holding onto 263 MB of memory.
PerfView is somewhat unique in that you can control how the sub-objects contribute to the size of their parent objects. This is done with the folding configuration. You can specify a folding percentage, below which all memory is attributed to the parent object, or a folding pattern to specify that certain object types are always folded into their parent objects (they effectively disappear from analysis). See Chapter 1 for more details on how to use PerfView.
You can also get a graphical view of the same information with CLR Profiler. While the program is running, click the Show Heap Now button to capture a heap sample.
There are many ways memory can leak, and all of the sections under “Investigating Memory and GC” in this chapter can help you narrow problems down, but there are a few general ways memory can leak in managed applications:
In Visual Studio (Premium or Enterprise editions), you can open up the heap dump and debug the managed heap. When you click on a type, the tabs below will allow you to see which other types are referencing those objects.
You can use PerfView for more detailed analysis:
Once the snapshot is completed, a file will show up in the left-hand pane. Double-click this file to open a view of the types in the heap. You can manipulate this view as any of the other stack views in PerfView (e.g., with grouping, folding, and filtering). You can double-click the entry for a type, which switches to the Referred-From view.
This view clearly shows that the D
objects belong to the staticArray
variable and a local variable (those lose their names during compilation).
You can generally get a good sense for what is on the heap from this view. If you take two dumps separated in time, then you can use the Diff menu to calculate a difference between the two snapshots. This can give you an idea for what is accumulating uncollected, if anything.
Visual Studio and PerfView are mostly useful for aggregate analysis. PerfView is a sampling profiler, even when it analyzes the heap, so it will sometimes give a skewed picture of what the heap looks like. If you need to drill down onto a specific object, or get the absolute truth about the whole picture, then you need to start using the debugger or CLR MD.
In WinDbg, to get a quick summary of what is on the heap, run the !DumpHeap -stat
command:
0:023> !DumpHeap -stat
...
71f718f8 8752 525120 System.Reflection.RuntimeMethodInfo
139e5424 15138 544968 System.Collections.Immutable.Sort...
71f7ffe4 11294 573796 System.Object[]
1370f7d0 4605 626280 Microsoft.VisualStudio.Compositio...
13707114 6190 990400 Microsoft.VisualStudio.Compositio...
1370f24c 5482 1227968 Microsoft.VisualStudio.Compositio...
71f8419c 4799 4684529 System.Byte[]
71f7fbf0 108732 8303452 System.String
00586810 30707 72014878 Free
It will produce a lot of output. I usually scroll to the end of the object summary to look at the largest consumers of heap space. (Note that after the object summary, it prints a list of objects that appear after free blocks—you want to scroll above that.)
If you do this a couple of times between letting the application run (and presumably leak), you can get a sense for what objects are taking up space. If you see the Free size increasing, that is an indication of either no collections happening or heap fragmentation. See later in this chapter for how to diagnose fragmentation.
The downside of WinDbg is that it is harder to get an overall picture of object ownership, especially for common objects like System.Byte[]
or System.String
. For this, use PerfView as described above.
If you want to analyze a single object, you will need to get its address first. To get the addresses of objects, use the !DumpStackObjects
command, or use !DumpHeap
to find objects of interest on heap, as in this example:
0:004> !DumpHeap -type LargeMemoryUsage.C
Address MT Size
021b17f0 007d3954 12
021b664c 007d3954 12
...
Statistics:
MT Count TotalSize Class Name
007d3954 475 5700 LargeMemoryUsage.C
Total 475 objects
Once you have the object’s address you can use the !gcroot
command:
0:003> !gcroot 02ed1fc0
HandleTable:
012113ec (pinned handle)
-> 03ed33a8 System.Object[]
-> 02ed1fc0 System.Random
Found 1 unique roots (run '!GCRoot -all' to see all roots).
!gcroot
is often adequate, but it may miss some cases, in particular if your object is rooted from an older generation. For this, you will need to use the !findroots
command.
In order for this command to work you first need to set a breakpoint in the GC, right before a collection is about to happen, which you can do by executing:
!findroots -gen 0
g
This sets a breakpoint right before the next gen 0 GC happens. It then loses effect and you will need to run the command again to break on the following GC.
Once the code breaks, you need to find the object you are interested in and execute this command with its address:
!findroots 027624fc
If the object is already in a higher generation than the current collection generation, you will see output like this:
Object 027624fc will survive this collection:
gen(0x27624fc) = 1 > 0 = condemned generation.
If the object itself is in the current generation being collected, but it has roots from an older generation, you will see something like this:
older generations::Root: 027624fc (object)->
023124d4(System.Collections.Generic.List`1
[[System.Object, mscorlib]])
If that is too tedious, you can use build your own !gcroot
command using CLR MD.
const string TargetType = "LargeMemoryUsage.D";
private static void PrintRootsOfObjects(ClrRuntime clr)
{
PrintHeader("Roots of Object");
Dictionary<ulong, ClrObject> childToParents =
new Dictionary<ulong, ClrObject>();
var heap = clr.Heap;
// Find an arbitrary object for demo purposes
ClrObject targetObject = FindObjectOfType(clr, TargetType);
if (targetObject.Address == 0)
{
Console.WriteLine(
$"Could not find any objects of type {TargetType}");
return;
}
// Analyze all objects, build up reference map
foreach (var obj in heap.EnumerateObjects())
{
foreach (var objRef in obj.EnumerateObjectReferences())
{
childToParents[objRef.Address] = obj;
}
}
// Walk up the chain of references
ClrObject currentObj = targetObject;
int indentSize = 0;
while(true)
{
Console.Write(new string(' ', indentSize));
Console.WriteLine(
$"0x{currentObj.Address:x} - {currentObj.Type.Name}");
ClrObject parentObject;
if (!childToParents.TryGetValue(currentObj.Address,
out parentObject))
{
break;
}
currentObj = parentObject;
indentSize += 4;
}
}
private static ClrObject FindObjectOfType(ClrRuntime clr,
string typeName)
{
foreach (var obj in clr.Heap.EnumerateObjects())
{
if (obj.Type.Name == TargetType)
{
return obj;
}
}
return new ClrObject();
}
This produces output similar to the following:
Roots of Object
===============
0x2e46bfc - LargeMemoryUsage.D
0x2e43428 - System.Object[]
Calculating an object’s size is a bit tricky. Do you mean the size of all the fields in that object? What if there is a reference to another object, such as an array–is that included? What if two objects both refer to each other?
Thankfully, most tools that show object size follow an algorithm that has these concepts:
To get object sizes in Visual Studio, use the Memory Usage profiler:
If you do not see the level of detail you expect, make sure that the table’s view options has turned off “Collapse Small Objects” and “Just My Code.”
In WinDbg, there are a couple SOS command that can show the same information.
The !DumpObj
command can show exclusive size of an object:
0:007> !DumpObj /d 058e8230
Name: LargeMemoryUsage.D
MethodTable: 035d4e74
EEClass: 035d1870
Size: 12(0xc) bytes
File: D:HighPerformanceDotNetBook...LargeMemoryUsage.exe
Fields:
MT Field Offset Type VT Attr Value Name
71b54080 4000003 4 System.Byte[] 0 instance 2a895510 memory
You can see that it does not take into account the owned byte array. For that, use the !ObjSize
command:
0:007> !ObjSize 058e8230
sizeof(058e8230) = 1000028 (0xf425c) bytes (LargeMemoryUsage.D)
If you run !ObjSize
without any parameters, it will show a list of all threads and GC handles, totaling up the size of objects rooted by each one.
0:007> !ObjSize
...
Thread 5580 (LargeMemoryUsage.Program.Main(System.String[])
[D:HighPerformanceDotNetBook...Program.cs @ 29]):
ebp+1c: 012ff37c -> <exec cmd="!DumpObj /d 05383448">
05383448</exec>: 283846000 (0x10eb2570) bytes (System.Object[])
...
Handle (pinned): 035b13ec -> 06383510: 286744176 (0x11175e70) bytes
(System.Object[])
Handle (pinned): 035b13f0 -> 06382500: 8864 (0x22a0) bytes
(System.Object[])
Handle (pinned): 035b13f4 -> 063822e0: 640 (0x280) bytes
(System.Object[])
Handle (pinned): 035b13f8 -> 0538121c: 12 (0xc) bytes
(System.Object)
Handle (pinned): 035b13fc -> 06381020: 8440 (0x20f8) bytes
(System.Object[])
CLR MD can also calculate this size for you. You have to do the work of traversing the objects yourself.
private static void PrintObjectSize(ClrRuntime clr)
{
PrintHeader("Object Size");
var obj = FindObjectOfType(clr, TargetType);
Console.WriteLine($"0x{obj.Address:x} - {obj.Type.Name}");
var heap = clr.Heap;
// Evaluation stack
Stack<ulong> stack = new Stack<ulong>();
HashSet<ulong> considered = new HashSet<ulong>();
int count = 0;
ulong size = 0;
stack.Push(obj.Address);
while (stack.Count > 0)
{
var objAddr = stack.Pop();
if (considered.Contains(objAddr))
continue;
considered.Add(objAddr);
ClrType type = heap.GetObjectType(objAddr);
if (type == null)
{
continue;
}
count++;
size += type.GetSize(objAddr);
type.EnumerateRefsOfObject(objAddr,
delegate (ulong child,
int offset)
{
if (child != 0 && !considered.Contains(child))
stack.Push(child);
});
}
Console.WriteLine($"Object Size: {obj.Size}");
Console.WriteLine($"Full size: {size}");
}
The output looks like this:
Object Size
===========
0x4636c24 - LargeMemoryUsage.D
Object Size: 12
Full size: 1000024
If you are interested only in aggregate object sizes, then PerfView can give you this information and allow you to aggregate sub-objects in multiple ways to get very fine-grained analysis. This was described in the previous section.
Understanding which objects are being allocated on the large object heap is critical to ensuring a well-performing system. The important rule discussed earlier in this chapter states that all objects should be cleaned up in a gen 0 collection, or they need to live forever.
Large objects are only cleaned up by an expensive gen 2 GC, so it violates that rule out of the gate.
To find out which objects are on the LOH, use PerfView and follow the previously given instructions for getting a GC event trace. In the resulting GC Heap Alloc Stacks view, in the By Name tab, you will find a special node that PerfView creates called “LargeObject.” Double-click on this to go to the Callers view, which shows which “callers” LargeObject has. In the sample program, they are all Int32
arrays. Double-clicking on those in turn will show where the allocations occurred.
CLR MD can also tell you which objects are in the large object heap.
private static void PrintLOHObjects(ClrRuntime clr)
{
PrintHeader("LOH Objects (limit:10)");
int objectCount = 0;
const int MaxObjectCount = 10;
if (clr.Heap.CanWalkHeap)
{
foreach (var segment in clr.Heap.Segments)
{
if (segment.IsLarge)
{
for (ulong objAddr = segment.FirstObject;
objAddr != 0;
objAddr = segment.NextObject(objAddr))
{
var type = clr.Heap.GetObjectType(objAddr);
if (type == null)
{
continue;
}
var obj = new ClrObject(objAddr, type);
if (++objectCount > MaxObjectCount)
{
break;
}
Console.WriteLine(
$"{obj.Address} {obj.Type.Name}");
}
}
}
}
}
As covered earlier, a performance counter will tell you how many pinned objects the GC encounters during a collection, but that will not help you determine which objects are being pinned.
Use the Pinning sample project, which pins things via explicit fixed
statements and by calling some Windows APIs.
Use WinDbg to view pinned objects with the !gchandles
command:
0:010> !gchandles
Handle Type Object Size Data Type
...
003511f8 Strong 01fa5dbc 52 System.Threading.Thread
003511fc Strong 01fa1330 112 System.AppDomain
003513ec Pinned 02fa33a8 8176 System.Object[]
003513f0 Pinned 02fa2398 4096 System.Object[]
003513f4 Pinned 02fa2178 528 System.Object[]
003513f8 Pinned 01fa121c 12 System.Object
003513fc Pinned 02fa1020 4420 System.Object[]
003514fc AsyncPinned 01fa3d04 64
System.Threading.OverlappedData
You will usually see lots of System.Object[]
objects pinned. The CLR uses these arrays internally for things like statics and other pinned objects. In the case above, you can see one AsyncPinned
handle. This object is related to the FileSystemWatcher
in the sample project.
Unfortunately, the debugger will not tell you why something is pinned, but often you can examine the pinned object and trace it back to the object that is responsible for it.
The following WinDbg session demonstrates tracing through object references to find higher-level objects that may give a clue to the origins of the pinned object. See if you can follow the trail of object references, starting with the address of the AsyncPinned handle from above.
0:010> !do 01fa3d04
Name: System.Threading.OverlappedData
MethodTable: 64535470
EEClass: 646445e0
Size: 64(0x40) bytes
File: C:windowsMicrosoft.Net...mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
64927254 4000700 4 System.IAsyncResult 0 instance 020a7a60
m_asyncResult
64924904 4000701 8 ...ompletionCallback 0 instance 020a7a70
m_iocb
...
0:010> !do 020a7a70
Name: System.Threading.IOCompletionCallback
MethodTable: 64924904
EEClass: 6463d320
Size: 32(0x20) bytes
File: C:windowsMicrosoft.Net...mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
649326a4 400002d 4 System.Object 0 instance 01fa2bcc _target
...
0:010> !do 01fa2bcc
Name: System.IO.FileSystemWatcher
MethodTable: 6a6b86c8
EEClass: 6a49c340
Size: 92(0x5c) bytes
File: C:windowsMicrosoft.Net...System.dll
Fields:
MT Field Offset Type VT Attr Value Name
649326a4 400019a 4 System.Object 0 instance 00000000 __identity
6a699b44 40002d2 8 ...ponentModel.ISite 0 instance 00000000 site
...
While the debugger gives you the maximum power, it is cumbersome at best. Instead, you can use PerfView, which can simplify a lot of the drudgery.
With a PerfView trace, you will see a view called “Pinning at GC Time Stacks” that will show you stacks of the objects being pinned across the observed collections.
You can also approach pinning problems by looking at the free space holes created in the various heaps, which is covered in the next section.
Fragmentation occurs when there are freed blocks of memory inside segments containing used blocks of memory. Fragmentation can occur at multiple levels, inside a GC heap segment, or at the virtual memory level for the whole process. Fragmentation becomes a problem when there are so many small free blocks that they are not usable for future allocations.
Fragmentation in gen 0 is usually not an issue, unless you have a very severe pinning problem where you have pinned so many objects and each block of free space is too small to fulfill any new allocations. This will cause the size of the small object heap to grow and more garbage collections will occur.
Fragmentation is usually more of an issue in gen 2 or the large object heap, especially if you are not using background GC. You may see fragmentation rates that seem high, perhaps even 50%, but this is not necessarily an indication of a problem. Consider the size of the overall heap, and if it is acceptable and not growing over time, you probably do not need to take action.
First, you will want to know if fragmentation is happening at all. WinDbg can show you what percentage of a heap is free space, indicating fragmentation, using the !HeapStat
command:
0:023> !HeapStat
Heap Gen0 Gen1 Gen2 LOH
Heap0 2870384 2423640 93212392 9692760
Free space: Percentage
Heap0 177940 21480 65552412 6324464 SOH: 66% LOH: 65%
This prints each heap and tells you the percentage of free space in both small and large object heaps. For large object heap fragmentation, you can often deduce the likely culprits just by looking at which objects are on the large object heap and examining their sizes and related code. See earlier in this chapter for information on how to find this out.
You can get a summary of types and objects that are adjacent to free blocks with the !DumpHeap -stat
command. At the very end of the heap summary, there will be some output like this:
Fragmented blocks larger than 0.5 MB:
Addr Size Followed by
16b61000 1.7MB 16d08948 System.Byte[]
16d08d7c 1.7MB 16ec4aa4 System.Byte[]
16f530c4 6.0MB 1755fb10 System.Byte[]
175e978c 0.6MB 17680ae0 System.Byte[]
176b9694 1.8MB 1787fff4 System.Byte[]
1e461000 1.5MB 1e5d7300 System.Byte[]
1e5d7734 1.4MB 1e74660c System.Byte[]
1e746a40 2.4MB 1e9a20d8 System.Byte[]
If you need detailed information about fragmentation, including which specific objects are causing the free space holes, you can use other WinDbg commands.
Get a list of free blocks with !DumpHeap -type Free
:
0:010> !DumpHeap -type Free
Address MT Size
02371000 008209f8 10 Free
0237100c 008209f8 10 Free
02371018 008209f8 10 Free
023a1fe8 008209f8 10 Free
023a3fdc 008209f8 22 Free
023abdb4 008209f8 574 Free
023adfc4 008209f8 46 Free
023bbd38 008209f8 698 Free
023bdfe0 008209f8 18 Free
023d19c0 008209f8 1586 Free
023d3fd8 008209f8 26 Free
023e578c 008209f8 2150 Free
...
For each block, figure out which heap segment it is in with !eeheap -gc
.
0:010> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x02371018
generation 1 starts at 0x0237100c
generation 2 starts at 0x02371000
ephemeral segment allocation context: none
segment begin allocated size
02370000 02371000 02539ff4 0x1c8ff4(1871860)
Large object heap starts at 0x03371000
segment begin allocated size
03370000 03371000 03375398 0x4398(17304)
Total Size: Size: 0x1cd38c (1889164) bytes.
------------------------------
GC Heap Size: Size: 0x1cd38c (1889164) bytes.
Dump all of the objects in that segment, or within a narrow range around the free space.
0:010> !DumpHeap 0x02371000 02539ff4
Address MT Size
02371000 008209f8 10 Free
0237100c 008209f8 10 Free
02371018 008209f8 10 Free
02371024 713622fc 84
02371078 71362450 84
023710cc 71362494 84
02371120 713624d8 84
02371174 7136251c 84
023711c8 7136251c 84
0237121c 71362554 12
...
This is a manual and tedious process, but it does come in handy and you should understand how to do it. You can write scripts to process the output and generate the WinDbg commands for you based on previous output, but CLR Profiler can show you the same information in a graphical, aggregated manner that may be good enough for your needs.
PerfView can also tell you when fragmentation is occurring in the GCStats view. Look at the Frag % columns. However, it does not tell you why, exactly.
The CLR MD library allows you to build your own tool to highlight fragmentation for you. Each ClrObject
has a Type
property, which has an IsFree
boolean property, which indicates if that type represents free space on the heap.
You may also get virtual memory fragmentation, which can cause an unmanaged allocation to fail because it cannot find a large enough range to satisfy the request. This can include allocating a new GC heap segment, which means your managed memory allocations will fail.
Use VMMap (part of SysInternals) to get a visual representation of your process. It will divide the heap into managed, native, and free regions. Selecting the Free portion will show you all Free segments. If the maximum size is insufficient for your requested memory allocation, you will get an OutOfMemoryException
.
VMMap also has a fragmentation view that can show where these blocks fit in the overall process space.
You can also retrieve this information in WinDbg:
!address -summary
This command produces this output:
...
-- Largest Region by Usage -- Base Address -- Region Size --
Free 26770000 49320000 (1.144 Gb)
...
You can retrieve information about specific blocks with the command:
!address -f:Free
This produces output similar to:
BaseAddr EndAddr+1 RgnSize Type State Protect Usage
--------------------------------------------------------------
0 150000 150000 MEM_FREE PAGE_NOACCESS Free
Virtual memory fragmentation is more likely in 32-bit processes, where you are limited to just two gigabytes of address space for your program by default. The biggest symptom of this is an OutOfMemoryException
. The easiest way to fix this is to convert your application to a 64-bit process, with its 128-terabyte address space. If you cannot do this, your only choice is to become far more efficient in memory allocations. You will need to compact the heaps and you may need to implement significant pooling.
You can retrieve this information from inside your app’s own code by using the GC.GetGeneration
method and passing it the object in question.
In WinDbg, once you obtain the address of the object of interest (say from !DumpStackObjects
or !DumpHeap
), use the !gcwhere
command:
0:003> !gcwhere 02ed1fc0
Address Gen Heap segment begin allocated size
02ed1fc0 1 0 02ed0000 02ed1000 02fe5d4c 0x14(20)
In CLR MD, you can use the ClrHeap.GetGeneration
method:
foreach(var obj in heap.EnumerateObjects())
{
int gen = heap.GetGeneration(obj.Address);
}
The simple way to do this is to simply enumerate all objects that are in the gen 1 or gen 2 portion of the heap.
CLR MD can do this for you with minimal code:
foreach(var obj in heap.EnumerateObjects())
{
int gen = heap.GetGeneration(obj.Address);
if (gen > 0)
{
// do some analysis
}
}
On a big heap, it would be extremely inefficient to iterate through every object in the heap. If you are interested in just the gen 1 heap, for example, you can make this a little bit better by walking the heap per segment.
private static void PrintGen1ObjectsByHeapSegment(ClrRuntime clr)
{
PrintHeader("Gen1 Objects by Heap Segment");
if (clr.Heap.CanWalkHeap)
{
foreach(var segment in clr.Heap.Segments)
{
// Only the ephemeral segment contains gen0 and gen1
if (segment.IsEphemeral)
{
//get range of gen 1
ulong start = segment.Gen1Start;
ulong end = start + segment.Gen1Length;
Console.WriteLine(
$"Segment Info: Start: {start}, End {end}");
for (ulong objAddr = segment.FirstObject;
objAddr != 0;
objAddr = segment.NextObject(objAddr))
{
if (objAddr >= start && objAddr < end)
{
var type =
clr.Heap.GetObjectType(objAddr);
if (type == null)
{
continue;
}
var obj = new ClrObject(objAddr, type);
Console.WriteLine(
$"{obj.Address} {obj.Type.Name}");
}
}
break;
}
}
}
}
On the other hand, perhaps you want to debug which objects survive a specific garbage collection—perhaps you are in the debugger, sitting at a breakpoint, and you want to know what happens after the next GC. This is possible in WinDbg, but it is fairly involved.
In WinDbg, execute these commands:
!FindRoots -gen 0
g
This will set a breakpoint right before the next gen 0 collection begins. Once it breaks, you can send whatever commands you want to dump the objects on the heap. You can simply do:
!DumpHeap
This will dump every object on the heap, which may be excessive. Optionally, you can add the -stat
parameter to limit output to a summary of the found objects (their counts, sizes, and types). However, if you want to limit your analysis to just gen 0, the !DumpHeap
command allows you to specify an address range. Recall the description of memory segments from the top of the chapter and that gen 0 is at the end of the segment.
To get a list of heaps and segments, you can use the eeheap -gc
command:
0:003> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x02ef0400
generation 1 starts at 0x02ed100c
generation 2 starts at 0x02ed1000
ephemeral segment allocation context: none
segment begin allocated size
02ed0000 02ed1000 02fe5d4c 0x114d4c(1133900)
Large object heap starts at 0x03ed1000
segment begin allocated size
03ed0000 03ed1000 041e2898 0x311898(3217560)
Total Size: Size: 0x4265e4 (4351460) bytes.
------------------------------
GC Heap Size: Size: 0x4265e4 (4351460) bytes.
This command will give you a printout of each generation and each segment. The segment that contains gen 0 and gen 1 is called the ephemeral segment. !eeheap
tells you the start of gen 0. To get the end of it, you merely need to find the segment that contains the start address. Each segment contains a number of addresses and the length. In the example above, the ephemeral segment starts at 02ed0000 and ends at 02fe5d4c. Therefore, the range of gen 0 on this heap is 02ef0400 - 02fe5d4c.
Now that you know this, you can put some limits on the !DumpHeap
command and print only the objects in gen 0:
!DumpHeap 02ef0400 02fe5d4c
Once you have done that, you will want to compare what happens as soon as the GC is complete. This is a little trickier. You will need to set a breakpoint on an internal CLR method. This method is called when the CLR is ready to resume managed code. If you are using workstation GC, call:
bp clr!WKS::GCHeap::RestartEE
For server GC:
bp clr!SVR::GCHeap::RestartEE
Once you have set the breakpoints, continue execution (F5 or the g
command). Once the GC is complete, the program will break again and you can repeat the !eeheap -gc
and !DumpHeap
commands.
Now you have two sets of outputs and you can compare them to see what changed and which objects are remaining after a GC. By using the other commands and techniques in this section, you can see who maintains a reference to that object.
Note If you use server GC, then remember there will be multiple heaps. To do this kind of analysis, you will need to repeat the commands for each heap. The
!eeheap
command will print information for every heap in the process.
When code explicitly calls GC.Collect
, it is called an “induced” garbage collection, and there are counters and ETW events that surface this information. However, they will not tell you who is calling it. You can easily search your own code base in Visual Studio or any advanced text editor, but if that turns up nothing, you will need to set a breakpoint on the GC.Collect
method itself to see how your program gets to it.
In WinDbg, set a managed breakpoint on the GC
class’s Collect
method:
!bpmd mscorlib.dll System.GC.Collect
Continue executing. Once the breakpoint is hit, to see the stack trace of who called the explicit collection, do:
!DumpStack
Because weak references are a type of GC handle, you can use the !gchandles
command in WinDbg to find them:
0:003> !gchandles
Handle Type Object Size Data Type
006b12f4 WeakShort 022a3c8c 100 System.Diagnostics.Tracing...
006b12fc WeakShort 022a3afc 52 System.Threading.Thread
006b10f8 WeakLong 022a3ddc 32 Microsoft.Win32.UnsafeNati...
006b11d0 Strong 022a3460 48 System.Object[]
...
Handles:
Strong Handles: 11
Pinned Handles: 5
Weak Long Handles: 1
Weak Short Handles: 2
Weak Short handles are the normal weak references you may use. Weak Long handles track whether a finalized object has been resurrected (objects without a finalizer always have short handles). Resurrection can occur when an object has been finalized, and rather than letting the GC clean it up, you decide to reuse it by assigning the object to a new reference from the finalizer. This can be relevant for pooling scenarios. However, it is possible to do pooling without finalization, and given the complexities of resurrection, just avoid this in favor of deterministic methods.
WinDbg’s !FinalizeQueue
command will show you all objects that are registered for finalization, as well as a summary of their types.
0:042> !FinalizeQueue
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------
generation 0 has 13 finalizable objects (288603b4->288603e8)
generation 1 has 6 finalizable objects (2886039c->288603b4)
generation 2 has 57247 finalizable objects (28828520->2886039c)
Ready for finalization 0 objects (288603e8->288603e8)
Statistics for all finalizable objects
(including all objects ready for finalization):
MT Count TotalSize Class Name
72753184 1 12 System.WeakReference`1...
6df6bea8 1 12 System.Windows.Forms.VisualStyles...
6df68c44 1 12 System.Windows.Forms.ImageList...
584582f0 1 12 System.WeakReference`1...
58443158 1 12 Microsoft.Build.BackEnd.Components...
...
If you want to see a summary of objects that are ready for finalization, you can execute:
!FinalizeQueue -detail
This will show you a list of type names that are currently “freachable” (that is, eligible to have their finalizers called). If you want to get the specific objects that are in that category, you can use the address range given in the output to dump all objects within the “ready for finalization” range:
!DumpHeap 288603e8 288606c4
In CLR MD, you can use the EnumerateFinalizableObjectAddresses
method to enumerate all finalizable objects:
private static void PrintFinalizableObjects(ClrRuntime clr)
{
foreach (var objAddr in
clr.Heap.EnumerateFinalizableObjectAddresses())
{
ClrType type = clr.Heap.GetObjectType(objAddr);
if (type == null)
{
continue;
}
ClrObject obj = new ClrObject(objAddr, type);
// Do something with the object...
}
}
Unfortunately, this does not tell you whether those objects are ready for finalization.
You need to understand garbage collection in-depth to truly optimize your applications. Choose the right configuration settings for your application, such as server GC if your application is the only one running on the machine, but be wary of using advanced configuration settings. Ensure that object lifetime is short, allocation rates are low, and that any objects that must live longer than the average GC frequency are pooled or otherwise kept alive in gen 2 forever. Make judicious use of stackalloc
to avoid heap allocations.
Avoid pinning and finalizers if possible. Any LOH allocations should be pooled and kept around forever to avoid full GCs. Reduce LOH fragmentation by keeping objects a uniform size and occasionally compacting the heap on-demand. Consider GC notifications to avoid having full collections impact application processing at inopportune times.
The garbage collector is a deterministic component and you can control its operation by closely managing your object allocation rate and lifetime. You are not giving up control by adopting .NET’s garbage collector, but it does require a little more subtlety and analysis.