JIT Compilation

.NET code is distributed as assemblies of Microsoft Intermediate Language (MSIL, or just IL for short). This language is somewhat assembly-like, but simpler. If you wish to learn more about IL or other CLR standards, do an Internet search for “ECMA C# CLI standards.”

When your managed program executes, it loads the CLR which will start executing some wrapper code. This is all machine code. The first time a managed method is called from your assembly, it actually runs a stub that executes the Just-in-Time (JIT) compiler which will convert the IL for that method to the hardware’s machine instructions. This process is called just-in-time compilation (“JITting”). The stub is replaced and the next time that method is called, the assembly instructions are called directly. This means, the first time any method is called, there is always a performance hit. In most cases, this hit is small and can be ignored. Every time after that, the code executes directly and incurs no overhead.

Compilation and JITting flow.
Compilation and JITting flow.

While all code in a method will be converted to assembly instructions when JITted, some pieces may be placed into “cold” code sections of memory, separate from the method’s normal execution path. These rarely executed paths will thus not push out other code from the “warm” sections, allowing for better overall performance as the commonly executed code is kept in memory, but the cold pages may be paged out. For this reason, rarely used things like error and exception-handling paths can be expensive.

In most cases, a method will only need to be JITted once. The exception is when the method has generic type arguments. If any of the type arguments are value types, then new code must be generated for each type. If all the type parameters are reference types, then only one copy of the native code is generated because, despite the different types, in machine code, each reference just looks like a standard pointer (4 or 8 bytes). Note that this does not mean the fundamental types are the same—the type system still maintains integrity and, for example, List<string> is still distinct from List<Regex>. It is just that the underlying machine code implementation of the methods does not need to care about those distinctions, so it does not generate duplicate code.

You need to be concerned about JIT costs if this first-time warm-up cost is important to your application or its users. Most applications only care about steady-state performance, but if you must have extremely high-availability, JIT can be an issue that you will need to optimize for. In this case, NGEN may be the right solution for you. This chapter will tell you how to take advantage of it, or if it is even the right thing.

Benefits of JIT Compilation

Code that is just-in-time compiled has some significant advantages over compiled unmanaged code.

  1. Good Locality of Reference: Code that is used together will often be in the same page of memory or processor cache line, preventing expensive page faults and relatively expensive main memory accesses.
  2. Potentially Reduced Memory Usage: There is CLR overhead for managed DLLs and their metadata, but it only compiles those methods that are actually used.
  3. Cross-assembly Inlining: Methods from other DLLs, including the .NET Framework, can be inlined into your own application, which can be a significant time savings at runtime.

There is also a benefit of hardware-specific optimizations, but in practice there are only a few actual optimizations for specific platforms. However, it is becoming increasingly possible to target multiple platforms with the same code, and perhaps we will see more aggressive platform-specific optimizations in the future.

Most code optimizations in .NET do not take place in the language compiler (the transformation from C#/VB.NET to IL); rather, they occur on-the-fly in the JIT compiler.

JIT in Action

You can easily see the IL-to-assembly-code transformation in action. As a simple example, here is the JitCall sample program that demonstrates the code fix-up that JIT does behind the scenes:

static void Main(string[] args)
{      
  int val = A();
  int val2 = A();
  Console.WriteLine(val + val2);
}

[MethodImpl(MethodImplOptions.NoInlining)]
static int A()
{
  return 42;
}

This method is decorated with the MethodImplOptions.NoInlining attribute. The reason for this is to force the method call to remain, even in optimized code. If it is not present, the call gets optimized out completely, as does the addition, leaving you with a constant value being put into the argument for the WriteLine method:

029D0450  mov         ecx,54h  
029D0455  call        72B2CE9C  
029D045A  ret  

We want to look at the JIT behavior of a simple method, so applying the attribute MethodImplOptions.NoInlining is a simple way of ensuring that the method sticks around for us to analyze.

To see what happens, first get the disassembly of Main. Getting to this point is a little bit of a trick. First, we need to launch the program and interrupt it before Main executes.

  1. Launch WinDbg.
  2. File | Open Executable… (Ctrl+E).
  3. Navigate to the JitCall binary. Make sure you pick the Release version of the binary or the assembly code will look quite different than what is printed here.
  4. The debugger will immediately break.
  5. Run the command: sxe ld clrjit. This will cause the debugger to break when clrjit.dll is loaded. This is convenient because once this is loaded you can set a breakpoint on the Main method before it is executed.
  6. Run the command: g.
  7. The program will execute until clrjit.dll is loaded and you see output similar to the following:
ModLoad: 6fe50000 6fecd000
  C:WindowsMicrosoft.NETFrameworkv4.0.30319clrjit.dll

Next, we’ll break into Main:

  1. Run the command: .loadby sos clr.
  2. Run the command: !bpmd JitCall Program.Main. This sets the breakpoint at the beginning of the Main function.
  3. Run the command: g.
  4. WinDbg will break right inside the Main method. You should see output similar to this:
(11b4.10f4): CLR notification exception 
  - code e0444143 (first chance)
JITTED JitCall!JitCall.Program.Main(System.String[])
Setting breakpoint: bp 007A0050 
  [JitCall.Program.Main(System.String[])]
Breakpoint 0 hit

Finally, we are in the right place. Open the Disassembly window (Alt+7). You may also find the Registers window interesting (Alt+4).

The disassembly of Main looks like this:

push  ebp
mov   ebp,esp
push  edi
push  esi

; Call A
call  dword ptr ds:[0E537B0h] ds:002b:00e537b0=00e5c015
mov   edi,eax
call  dword ptr ds:[0E537B0h]
mov   esi,eax

call  mscorlib_ni+0x340258 (712c0258)
mov   ecx,eax
add   edi,esi
mov   edx,edi
mov   eax,dword ptr [ecx]
mov   eax,dword ptr [eax+38h]

; Call Console.WriteLine
call  dword ptr [eax+14h]
pop   esi
pop   edi
pop   ebp
ret

There are two calls to the same pointer (the specific values you see will differ). This is the function call to A. Set break points on both of these lines and start stepping through the code one instruction at a time, making sure to step into the calls. The pointer at 0E537B0h will get updated after the first call.

Stepping into the first call to A, you can see that it is little more than a jmp to the CLR method ThePreStub. There is no return from this method here because ThePreStub will do the return.

mov   al,3
jmp   00e5c01d
mov   al,6
jmp   00e5c01d
(00e5c01d) movzx   eax,al
shl   eax,2
add   eax,0E5379Ch
jmp   clr!ThePreStub (72102af6)

On the second call to A, you can see that the function address of the original pointer was updated and the code at the new location looks more like a real method. Notice the 2Ah (our decimal 42 constant value from the source) being assigned and returned via the eax register.

012e0090 55        push  ebp
012e0091 8bec      mov   ebp,esp
012e0093 b82a000000    mov   eax,2Ah
012e0098 5d        pop   ebp
012e0099 c3        ret

For most applications, this first-time, or warm-up, cost is not significant, but there are certain types of code that lend themselves to high JIT time, which we will examine in the next few sections.

JIT Optimizations

The JIT compiler will perform some standard optimizations such as method inlining and array range check elimination, but there are things that you should be aware of that can prevent the JIT compiler from optimizing your code. Some of these topics have their own treatments in Chapter 5. Note that because the JIT compiler executes during runtime, it is limited in how much time it can spend doing optimizations. Despite this, it can do many important improvements.

One of the biggest classes of optimizations is method inlining, which puts the code from the method body into the call site, avoiding a method call in the first place. Inlining is critical for small methods which are called frequently, where the overhead of a function call is larger than the function’s own code.

All of these things prevent inlining:

  • Virtual methods.
  • Interfaces with diverse implementations in a single call site. See Chapter 5 for a discussion of the interface dispatch problem.
  • Loops.
  • Exception handling.
  • Recursion.
  • Method bodies larger than 32 bytes of IL. You can use an IL analysis tool (discussed in Chapter 1) to view the size of methods.

As of .Net 4.6, a new version of the JIT was released. Formerly known as RyuJIT, it featured significantly improved code generation performance as well as improved generated code quality, particularly for 64-bit code.

Be careful of calling properties or methods inside loops. In most cases, the JIT cannot optimize these calls out. You should take care to make loop bodies as cheap as possible and do these optimizations yourself. If a method or property can be called outside of a loop, then it usually should, with the result being stored in a local variable.

Reducing JIT and Startup Time

The other major factor in considering the JIT compiler is the amount of time it takes to generate the code. This mostly comes down to a factor of how much code needs to be JITted.

For example, large methods will take longer to JIT than shorter methods. If you have a single large method with a lot of branching in it, you still pay the JIT cost even if most of the method never executes. Breaking it up into smaller methods may help your up front JIT cost.

There are also language features and APIs that cause a large amount of code to be generated. In particular, be aware of the following situations:

  • LINQ
  • The dynamic keyword
  • async and await
  • Regular expressions
  • Code generation
  • Many types of serializers

All of these have a simple fact in common: Much more code is possibly hidden from you and actually executed than is obvious from your source. All of that hidden code may require significant time to be JITted. With regular expressions and generated code in particular, there is likely to be a pattern of large, repetitive blocks of code.

While code generation is usually something you would write for your own purposes, there are some areas of the .NET Framework that will do this for you, the most common being regular expressions and XML serialization. Before execution, regular expressions can be converted to an IL state machine in a dynamic assembly and then JITted. This takes more time up front, but saves a lot of time with repeated execution (as long as you make the regular expression static). You usually want to enable this option, but you probably want to defer it until it is needed so that the extra compilation does not impact application start time. Regular expressions can also trigger some complex algorithms in the JIT that take longer than normal, most of which are improved in the JIT that ships in .NET 4.6. As with everything else in the book, the only way to know for sure is to measure. See Chapter 6 for more discussion of regular expressions.

Even though code generation is implicated as a potential exacerbation of JIT challenges, as we will see in Chapter 5, code generation in a different context can get you out of some other performance issues.

LINQ’s syntactic simplicity can belie the amount of code that actually runs for each query. It can also hide things like delegate creation, memory allocations, and more. Simple LINQ queries may be OK, but as in most things, you should definitely measure.

The primary issue with dynamic code is, again, the sheer amount of code that it translates to. Jump to Chapter 5 to see what dynamic code looks like under the covers.

There are other factors besides JIT, such as I/O, that can increase your warm-up costs, and it behooves you to do an accurate investigation before assuming JIT is the only issue. Each assembly has a cost in terms of disk access for reading the file, internal overhead in the CLR data structures, and type loading. You may be able to reduce some of the I/O cost by combining many small assemblies into one large one, but type loading is likely to consume as much time as JITting.

If you do have a lot of JIT happening, you should see stacks containing calls to PreStubWorker and other methods that eventually end up inside clrjit.dll.

PerfView’s CPU profiling will show you any JIT stubs that are being called.
PerfView’s CPU profiling will show you any JIT stubs that are being called.

Later in this chapter, we will see how PerfView can show you exactly which methods are being JITted and how long each one took.

Optimizing JITting with Profiling (Multicore JIT)

.NET 4.5 added an API that tells .NET to profile your application’s startup and store the results on disk for future references. On subsequent startups, this profile is used to start generating the assembly code before it is executed. This happens in a dedicated thread apart from your own code’s threads (and giving this feature the nickname Multicore JIT). The saved profiles allow this generated code to have all the same benefits of locality as JITting. The profiles are updated automatically on each execution of your program.

To use it, simply call this at the beginning of your program:

ProfileOptimization.SetProfileRoot(@"C:MyAppProfile");
ProfileOptimization.StartProfile("default");

Note that the profile root folder must already exist, and you can name your profiles, which is useful if your app has different modes with substantially different execution profiles.

If you do use this feature, keep this in mind while doing your own profiling of startup performance: the applied optimizations will change your measurements. Depending on what you are looking for, you may want to temporarily disable multicore JIT.

When to Use NGEN

If application startup or warm-up costs are too high and the profile optimization mentioned earlier does not satisfy your performance requirements, then NGEN may be appropriate.

NGEN stands for Native Image Generator. It works by converting your IL assembly to a native image—in effect, running the JIT compiler and saving the results to a native image assembly cache. This results in faster startup time and less JIT overall. This native image should not be confused with native code in the sense of unmanaged code. Despite the fact that the image is now mostly assembly language, it is still a managed assembly because it must run under the CLR.

If your original assembly is called foo.dll, NGEN will generate a file called foo.ni.dll and put it in the native image cache. Whenever foo.dll is requested to be loaded, the CLR will inspect the cache for a matching .ni. file and verify that it matches the IL exactly. It does this using a combination of time stamps, names, and GUIDs to 100% ensure that it is the correct file to load.

NGEN has its place, but it does have some disadvantages. The first is that you lose locality of reference, as all the code in an assembly is placed sequentially, regardless of how it is actually executed. In addition, you can lose certain optimizations such as cross-assembly inlining. You can get most of these optimizations back if all of the assemblies are available to NGEN at the same time. You must also update the native images every time there is a change—not a big deal, but an extra step for deployment. NGEN can be very slow and native images can be significantly larger than their managed counterparts. Sometimes, JIT will produce more optimized code, especially for commonly executed paths. Before deciding to use NGEN, remember the prime directive of performance: Measure, Measure, Measure! See the tips at the end of this chapter for how to measure JIT costs in your application.

Most usages of generics can be successfully NGENed, but there are cases where the compiler cannot figure out the right generic types ahead of time. This code will still be JITted at runtime. And of course, any time you rely on dynamic type loading or generation, those pieces cannot be NGENed ahead of time.

To NGEN an assembly from the command line, execute this command:

D:>ngen install ReflectionExe.exe

1> Compiling assembly D:...ReflectionExe.exe (CLR v4.0.30319) ...
2> Compiling assembly ReflectionInterface, Version=1.0.0.0, 
     Culture=neutral, PublicKeyToken=null (CLR v4.0.30319) ...

From the output you see that there are actually two files being processed. NGEN will automatically look in the target file’s directory and NGEN any dependencies it finds. It does this by default to allow the code to make cross-assembly calls in an efficient away (such as inlining small methods). You can suppress this behavior with the /NoDependencies flag, but there may be a significant performance hit during runtime.

To remove an assembly’s native image from the machine’s native image cache, you can run:

D:>ngen uninstall ReflectionExe.exe

Uninstalling assembly D:...ReflectionExe.exe

You can verify that a native image was created by displaying the native image cache:

D:>ngen display ReflectionExe

NGEN Roots:
D:BookReflectionExeinReleaseReflectionExe.exe
NGEN Roots that depend on "ReflectionExe":
D:BookReflectionExeinReleaseReflectionExe.exe
Native Images:
ReflectionExe, Version=1.0.0.0, Culture=neutral, 
  PublicKeyToken=null

You can also display all cached native images by running the command:

ngen display. 

Note that a native image file is always larger than the purely managed IL version of it. The IL version is contained wholly inside the native image, and in addition, x86/x64 code can be more verbose than IL. For example, for the ReflectionExe.exe file used above, its file size is 5,632 bytes, while the native image in the GAC is 11,264 bytes. I have sometimes seen as much as a 4x file size increase, depending on the amount of metadata and type of code present in the managed assemblies.

Optimizing NGEN Images

I said above that one of the things you lose with NGEN is locality of reference. Starting with .NET 4.5, you can use a tool called Managed Profile Guided Optimization (MPGO) to fix this problem to a large extent. Similar to Profile Optimization for JIT, this is a tool that you manually run to profile your application’s startup (or whatever scenario you want). NGEN will then use the profile to create a native image that is better optimized for the common function chains.

MPGO is included with Visual Studio 2012 and higher. To use, run the command:

Mpgo.exe -scenario MyApp.exe -assemblyList *.* -OutDir c:Optimized

This will cause MPGO to run on some framework assemblies and then it will execute MyApp.exe. Now the application is in training mode. You should exercise the application appropriately and then shut it down. This will cause a new, optimized assembly to be created in the C:Optimized directory.

To take advantage of the optimized assembly, you must run NGEN on it:

Ngen.exe install C:OptimizedMyApp.exe

This will create optimized images in the native image cache. Next time the application is run, these new images will be used.

To use the MPGO tool effectively, you will need to incorporate it into your build system so that its output is what gets shipped with your application.

.NET Native

If you are building Universal Windows Platform applications, then you can use .NET Native, a compiler that transforms your compiled managed application into native code, similar to NGEN, but with these advantages:

  • A newer compiler, based on the Visual C++ native compiler.
  • Self-contained applications. There is no CLR dependency, once compiled. The CLR is reduced to a single DLL that ships with your app. Any framework code that you actually use is statically linked inside your executable.

The compiler creates the CLR-in-a-DLL by running a dependency reducer engine on your code. This process is informally known as “tree shaking.” It analyzes your code, configuration, XAML files, type arguments, and more to determine everything that could possibly run. This produces fast, compact applications that start up with very little lag.

There are, however, a number of downsides, mostly driven by the requirement that no JITting is allowed:

  • No reflection
  • No dynamic assembly loading or code invocation
  • No serialization or deserialization
  • No COM interop (regular P/Invoke is OK)
  • Only works for Universal Windows Platform apps now

To use .NET Native, you need to create a new Universal Windows Platform application in Visual Studio. Release builds will automatically build with .NET Native.

Universal Windows Platform applications allow you to specify whether .NET Native is used. It is on by default for release builds.
Universal Windows Platform applications allow you to specify whether .NET Native is used. It is on by default for release builds.

Custom Warmup

If the other techniques mentioned in this chapter do not work, you can take matters into your own hands and implement a system to warmup your code by executing it before it is executed in a production scenario. Just executing each method will cause the JIT to happen. This is a popular method in online scenarios where wall-clock time is vitally important. You may have internal timeouts of a few milliseconds, or a client that gives up after a few seconds. If you have a large code base, the first request (or first many requests) may entirely fail due to JITting.

If you have a significant amount of JIT, profile-guided optimization may not be good enough. You may just need to exercise the code in a test or offline mode before it handles a real workload.

Before deciding this is the right approach, it would be good to ask yourself some guiding questions:

  1. Is your code designed in such a way that it can easily be called in a warmup scenario?
  2. How long does warmup take? Can your application afford to take longer to startup?
  3. If you have a lot of code to warmup, can you JIT in parallel?
  4. If you need data to warmup, can this be generated automatically?
  5. Does your warmup code replay safe data (i.e., non-production, non-customer, replayable data)?
  6. Does your warmup code impact other systems? Even if your application can handle 32 cores of solid JITting, will this impact external systems in a negative manner?
  7. Will warming up code cause metrics to be impacted? Minimize these, split them out, or otherwise tune metrics to exclude warmup data.

The answer to some of these questions may need to be some prototyping. Doing warmup does not have to be a huge feature, but it is likely to be non-trivial.

Warmup is unlikely to cause every single piece of necessary code to JIT, but if it gets to a significant percentage (enough to avoid errors or timeouts later), then it can be good enough.

When JIT Cannot Compete

The JIT is great for most applications. If there are performance problems, there are usually bigger issues than pure code generation quality or speed. However, there are some areas where the JIT has room to improve.

One major situation where the JIT compiler is not going to be quite as good as a native code compiler is with direct native memory access vs. managed array access. For one, accessing native memory directly usually means you can avoid the memory copy that will come with marshaling it to managed code. While there are ways around this with things like UnmanagedMemoryStream, which will wrap a native buffer inside a Stream, you are really just making an unsafe memory access.

If you do transfer the bytes to a managed buffer, the code that accesses the buffer will have boundary checks. In many cases, these checks can be optimized away, but it is not guaranteed. With managed buffers, you can wrap a pointer around them and do some unsafe access to get around some of these checks.

If you find that unmanaged code really is more efficient at this kind of processing, you can try marshaling the entire data set to a native function via P/Invoke, compute the results with a highly optimized C++ DLL, and then return the results back to managed code. You will have to profile it to see if the data transfer cost is worth it.

Mature C++ compilers may also be better at other types of optimizations such as inlining or optimal register usage, but this is more likely to change with future versions of the JIT compiler.

With applications that do an extreme amount of array or matrix manipulation, you will have to consider this trade-off between performance and safety. For most applications, frankly, you will not have to care and the boundary checks are not a significant overhead. However, if you are doing significant mathematical manipulation, one possible option is to make explicit use of Single Instruction, Multiple Data (SIMD) instructions, which became available to the JIT compiler in .NET 4.6. See Chapter 6 for examples of how to use these.

Investigating JIT Behavior

Performance Counters

The CLR publishes a number of counters in the .NET CLR Jit category, including:

  • # of IL Bytes Jitted
  • # of Methods Jitted
  • % Time in Jit
  • IL Bytes Jitted / sec
  • Standard Jit Failures
  • Total # of IL Bytes Jitted (exactly the same as “# of IL Bytes Jitted”)

Those are all fairly self-explanatory except Standard Jit Failures. Failures can occur only if the IL is unverified or there is an internal JIT error.

Closely related to JITting, there is also a category for loading, called .NET CLR Loading. A few of them are:

  • % Time Loading
  • Bytes in Loader Heap
  • Total Assemblies
  • Total Classes Loaded

ETW Events

With ETW events, you can get extremely detailed performance information on every single method that gets JITted in your process, including the IL size, native size, and the amount of time it took to JIT.

  • MethodJittingStarted: method is being JIT-compiled. Fields include:
    • MethodID: Unique ID for this method.
    • ModuleID: Unique ID for the module to which this method belongs.
    • MethodILSize: Size of the method’s IL.
    • MethodNameSpace: Full class name to which this method belongs.
    • MethodName: Name of the method.
    • MethodSignature: Comma-separated list of type names from the method signature.
  • MethodLoad_V1: A method is done JITting and has been loaded. Generic and dynamic methods do not use this version. Fields include:
    • MethodID: Unique ID for this method.
    • ModuleID: Unique ID for this module to which this method belongs.
    • MethodSize: Size of the compiled assembly code after JIT.
    • MethodStartAddress: Start address of the method.
    • MethodFlags:
      • 0x1: Dynamic method
      • 0x2: Generic method
      • 0x4: JIT-compiled (if missing, it was NGENed)
      • 0x8: Helper method
  • MethodLoadVerbose_V1: A generic or dynamic method has been JITted and loaded.
    • It has most of the same fields as MethodLoad_V1 and MethodJittingStarted.

What Code Is Jitted?

If you need to audit the code in your process, perhaps to see which assembly uses the most memory after JIT, you will need to examine the IL and native code sizes of all the methods in the process.

Using CLR MD, you can analyze every method in the process, seeing how large the IL is, as well as the amount of native code JITted from the IL. Here is a method that prints the top 10 largest methods in a process:

class MethodSize
{
    public string Module { get; set; }
    public string TypeName { get; set; }
    public string Name { get; set; }
    public ulong ILSize { get; set; }
    public ulong NativeSize { get; set; }
}

const string TargetProcessName = "LargeMemoryUsage.exe";

private static void PrintTop10BiggestMethods(ClrRuntime clr)
{
    PrintHeader("Top 10 Methods");
    List<MethodSize> methods = new List<MethodSize>();
    
    for (int i = 0; i < clr.Modules.Count; i++)
    {
        // Only look at our own methods
        var module = clr.Modules[i];        
        if (!module.FileName.EndsWith(TargetProcessName))
        {
            continue;
        }
        string filename = Path.GetFileName(module.FileName);
        
        foreach (var type in module.EnumerateTypes())
        {
            for (var iMethod = 0; 
                 iMethod < type.Methods.Count; 
                 iMethod++)
            {
                ulong ilSize = 0;
                ulong nativeSize = 0;

                var method = type.Methods[iMethod];
                
                if (method.IL != null)
                {
                    ilSize += (ulong)method.IL.Length;

                    if (method.ILOffsetMap != null)
                    {
                        for (var iOffset = 0; 
                             iOffset < method.ILOffsetMap.Length; 
                             iOffset++)
                        {
                          var entry = method.ILOffsetMap[iOffset];
                          var size = entry.EndAddress - 
                                       entry.StartAddress;
                          nativeSize += size;
                        }
                    }
                }
                var methodSize = new MethodSize()
                {
                    Module = filename,
                    TypeName = type.Name,
                    Name = method.Name,
                    ILSize = ilSize,
                    NativeSize = nativeSize
                };
                methods.Add(methodSize);
            }
        }                
    }

    methods.Sort((a, b) =>
    {
        return -a.NativeSize.CompareTo(b.NativeSize);
    });

    Console.WriteLine(
      "Module, Type, Method, IL Size, Native Size");
    Console.WriteLine(
      "------------------------------------------");
    for (int i=0;i<Math.Min(10, methods.Count);i++)
    {
        var method = methods[i];
        Console.WriteLine(
          $"{method.Module}, {method.TypeName}, {method.Name}, " + 
           "{method.ILSize}, {method.NativeSize}");
    }
}

This produces output similar to:

Top 10 Methods
==============
Module, Type, Method, IL Size, Native Size
------------------------------------------
LargeMemoryUsage.exe,LargeMemoryUsage.Program,Main,116,348
LargeMemoryUsage.exe,LargeMemoryUsage.Program,GetNewObject,67,250
LargeMemoryUsage.exe,LargeMemoryUsage.Base,.ctor,21,113
LargeMemoryUsage.exe,LargeMemoryUsage.Program,.cctor,16,88
LargeMemoryUsage.exe,LargeMemoryUsage.C,.ctor,14,70
LargeMemoryUsage.exe,LargeMemoryUsage.D,.ctor,14,69
LargeMemoryUsage.exe,LargeMemoryUsage.B,.ctor,14,69
LargeMemoryUsage.exe,LargeMemoryUsage.A,.ctor,14,69
LargeMemoryUsage.exe,LargeMemoryUsage.D,ToString,12,51
LargeMemoryUsage.exe,LargeMemoryUsage.C,ToString,12,51

What Methods and Modules Take the Longest To JIT?

In general, JIT time is directly proportional to the amount of IL instructions in a method, but this is complicated by the fact that type loading time can also be included in this time, especially the first time a module is used. Some patterns can also trigger complex algorithms in the JIT compiler, which may run longer. You can use PerfView to get very detailed information about JITting activity in your process. If you collect the standard .NET events, you will get a special view called “JITStats.” Here is some of the output from running it on the PerfCountersTypingSpeed sample project:

Name JitTime msec Num Methods IL Size Native Size
PerfCountersTypingSpeed.exe 12.9 8 1,756 3,156
JitTime msec IL Size Native Size Method Name
9.7 22 45 PerfCountersTypingSpeed.Program.Main()
0.3 176 313 PerfCountersTypingSpeed.Form1..ctor()
1.4 1,236 2,178 PerfCountersTypingSpeed.Form1.InitializeComponent()
0.8 107 257 PerfCountersTypingSpeed.Form1.CreateCustomCategories()
0.3 143 257 PerfCountersTypingSpeed.Form1.timer_Tick(class System.Object,class System.EventArgs)
0.1 23 27 PerfCountersTypingSpeed.Form1.OnKeyPress(class System.Object,class System.Windows.Forms.KeyPressEventArgs)
0.2 19 36 PerfCountersTypingSpeed.Form1.OnClosing(class System.ComponentModel.CancelEventArgs)
0.1 30 43 PerfCountersTypingSpeed.Form1.Dispose(bool)

The only method that takes more time to JIT than its IL size would suggest is Main, which makes sense because this is where you will pay for more loading costs.

Examine JITted code

In WinDbg or Visual Studio, you can easily see the disassembled code around the current instruction location, and from there jump to anywhere else as well. In Visual Studio, when you are at a debug break point, right-click anywhere in the source and select “Go to disassembly.”

You can also easily get an annotated dump of the disassembled code directly of a specific method in WinDbg using the !U command. To do this, you need to get the MethodDesc structure pointer.

0:000> !DumpStack
OS Thread Id: 0x5580 (0)
Current frame: ntdll!NtDeviceIoControlFile+0xc
ChildEBP RetAddr  Caller, Callee
...
012ff2e4 7217c50a (MethodDesc 716f0d54 +0xe6 
  System.Console.ReadKey(Boolean)), calling 71995a48
012ff374 039b0514 (MethodDesc 035d4d64 +0x9c 
  LargeMemoryUsage.Program.Main(System.String[])), 
  calling (MethodDesc 716f0d54 +0 System.Console.ReadKey(Boolean))
012ff398 72cceb16 clr!CallDescrWorkerInternal+0x34
...

From this, we will use the MethodDesc value for the Main method, 035d4d64:

0:000> !U 035d4d64
Normal JIT generated code
LargeMemoryUsage.Program.Main(System.String[])
Begin 039b0478, size bc

D:...LargeMemoryUsageProgram.cs @ 16:
039b0478 55              push    ebp
039b0479 8bec            mov     ebp,esp
039b047b 57              push    edi
039b047c 56              push    esi
039b047d 53              push    ebx
039b047e 83ec10          sub     esp,10h
039b0481 b9926a6371      mov     ecx,offset
  mscorlib_ni!System.Collections.IStructuralEquatable.Equals+0x99 
  (71636a92)
039b0486 bae8030000      mov     edx,3E8h
039b048b e8202dc1ff      call    035c31b0 
  (JitHelp: CORINFO_HELP_NEWARR_1_OBJ)
039b0490 8945e4          mov     dword ptr [ebp-1Ch],eax

D:...LargeMemoryUsageProgram.cs @ 18:
039b0493 b9dc7eb471      mov     ecx,offset 
  mscorlib_ni+0x517edc (71b47edc) (MT: System.Random)
039b0498 e82b2cc1ff      call    035c30c8 
  (JitHelp: CORINFO_HELP_NEWSFAST)
039b049d 8bf0            mov     esi,eax
039b049f e8dc70326f      call    clr!SystemNative::GetTickCount 
  (72cd7580)
039b04a4 8bd0            mov     edx,eax
039b04a6 8bce            mov     ecx,esi
039b04a8 e817d40c6e      call    mscorlib_ni+0x44d8c4 (71a7d8c4) 
  (System.Random..ctor(Int32), mdToken: 060010dc)
...

Summary

To minimize the impact of JIT, carefully examine any areas of large amounts of generated code, whether from regular expressions, code generation, dynamic, or any other source. Use profile-guided optimization to decrease application startup time by pre-JITting the most useful code in parallel. Ensure you are using the latest version of .NET to take advantage of improvements in the JIT compiler.

To encourage function inlining, avoid things like virtual methods, loops, exception handling, recursion, or large method bodies. But do not sacrifice the integrity of your application by over-optimizing in this area.

Consider using NGEN for large applications or situations where you cannot afford the JIT cost during startup. Use MPGO to optimize the native images before using NGEN.

For Windows Store applications, ensure you are using .NET Native.

When nothing else works, develop a custom warmup strategy for your application to exercise your hot paths before they are needed for real work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset