13
DECOMPILING AND REVERSING MANAGED ASSEMBLIES

image

Mono and .NET use a VM much as Java does to run compiled executables. The executable format for .NET and Mono is written using a higher-level bytecode than native x86 or x86_64 assembly, called managed assembly. This is in contrast to the native, unmanaged executables from languages like C and C++. Because managed assemblies are written in a higher-level bytecode, decompiling them is fairly straightforward if you use a few libraries that are not a part of the standard library.

In this chapter, we will write a short decompiler that accepts a managed assembly and writes the source code back to a specified folder. This is a very useful tool for malware researchers, reverse engineers, or anyone needing to perform binary diffing (comparing two compiled binaries or libraries for differences at the byte level) between two .NET libraries or applications. We will then briefly cover a program shipped with Mono called monodis that is very useful for analyzing assemblies outside of source code analysis for potential backdoors and other nefarious code.

Decompiling Managed Assemblies

A number of easy-to-use .NET decompilers exist. However, their UIs tend to use toolkits like WPF (Windows Presentation Foundation) that keep them from being cross-platform (and mainly only running on Windows). Many security engineers, analysts, and pentesters run Linux or OS X, so this isn’t super useful. ILSpy is one example of a good Windows decompiler; it uses the cross-platform ICSharpCode.Decompiler and Mono.Cecil libraries for decompilation, but its UI is Windows specific, so it isn’t usable on Linux or OS X. Luckily, we can build a simple tool that takes an assembly as an argument and uses these two previously mentioned open source libraries to decompile a given assembly and write the resulting source code back to disk for later analysis.

Both of these libraries are available in NuGet. Installation will depend on your IDE; if you are using Xamarin Studio or Visual Studio, you can manage NuGet packages in the Solution Explorer for each project in the solution. Listing 13-1 details the whole class, with the methods required to decompile a given assembly.

class MainClass
{
  public static void Main(string[] args)
  {
    if (args.Length != 2)
    {
      Console.Error.WriteLine("Dirty C# decompiler requires two arguments.");
      Console.Error.WriteLine("decompiler.exe <assembly> <path to directory>");
      return;
    }

    IEnumerable<AssemblyClass> klasses = GenerateAssemblyMethodSource(args[0]);
   foreach (AssemblyClass klass in klasses)
    {
      string outdir = Path.Combine(args[1], klass.namespase);
      if (!Directory.Exists(outdir))
        Directory.CreateDirectory(outdir);

      string path = Path.Combine(outdir, klass.name + ".cs");
      File.WriteAllText(path, klass.source);
    }
  }

  private static IEnumerable<AssemblyClass> GenerateAssemblyMethodSource(string assemblyPath)
  {
    AssemblyDefinition assemblyDefinition = AssemblyDefinition.ReadAssembly(assemblyPath,
        new ReaderParameters(ReadingMode.Deferred) { ReadSymbols = true });
    AstBuilder astBuilder = null;
    foreach (var defmod in assemblyDefinition.Modules)
    {
    foreach (var typeInAssembly in defmod.Types)
     {
       AssemblyClass klass = new AssemblyClass();
       klass.name = typeInAssembly.Name;
       klass.namespase = typeInAssembly.Namespace;
       astBuilder = new AstBuilder(new DecompilerContext(assemblyDefinition.MainModule)
           { CurrentType = typeInAssembly });
       astBuilder.AddType(typeInAssembly);

       using (StringWriter output = new StringWriter())
       {
         astBuilder.GenerateCode(new PlainTextOutput(output));
         klass.source = output.ToString();
       }
      yield return klass;
      }
    }
  }
}

public class AssemblyClass
{
  public string namespase;
  public string name;
  public string source;
}

Listing 13-1: The dirty C# decompiler

Listing 13-1 is pretty dense, so let’s go through the big points. In the MainClass, we first create a Main() method that will be run when we run the program. It begins by checking how many arguments are specified. If only one argument is specified, it prints the usage and exits. If two arguments are specified in the application, we assume that the first is the path to the assembly we want to decompile and that the second is the folder where the resulting source code should be written. Finally, we pass the first argument to the application using the GenerateAssemblyMethodSource() method , which is implemented just below the Main() method.

In the GenerateAssemblyMethodSource() method , we use the Mono.Cecil method ReadAssembly() to return an AssemblyDefinition. Basically, this is a class from Mono.Cecil that fully represents an assembly and allows you to programmatically probe it. Once we have the AssemblyDefinition for the assembly we want to decompile, we have what we need to generate C# source code that is functionally equivalent to the raw bytecode instructions in the assembly. We use Mono.Cecil to generate our C# code from the AssemblyDefinition by creating an abstract syntax tree (AST). I won’t go into ASTs (there are college courses dedicated to this subject), but you should know that an AST can express every potential code path within a program and that Mono.Cecil can be used to generate the AST of a .NET program.

This process must be repeated for every class in the assembly. Basic assemblies like this one have only one or two classes, but complex applications can have many dozen or more. That would be a pain to code individually, so we create a foreach loop to do the work for us. It iterates these steps over each class in the assembly and creates a new AssemblyClass (which is defined below the GenerateAssemblyMethodSource() method) based on the current class information.

The part to note here is that the GenerateCode() method actually does the heavy lifting of the whole program by taking the AST we create to give us a C# source code representation of the class in the assembly. Then, we assign the source field on the AssemblyClass with the generated C# source code, as well as the name of the class and the namespace. When all this is done, we return a list of classes and their source code to the caller of the GenerateAssemblyMethodSource() method—in this case, our Main() method. As we iterate over each class returned by GenerateAssemblyMethodSource(), we create a new file per class and write the source code for the class into the file. We use the yield keyword in GenerateAssemblyMethodSource() to return each class, one at a time, as we iterate in the foreach loop rather than returning a full list of all the classes and then processing them. This is a good performance boost for binaries with a lot of classes to process.

Testing the Decompiler

Let’s take a time-out to test this by writing a Hello World–esque application. Make a new project with the simple class in Listing 13-2 and then compile it.

using System;
namespace hello_world
{
  class MainClass
  {
    public static void Main(string[] args)
    {
      Console.WriteLine("Hello World!");
      Console.WriteLine(2 + 2);
    }
  }
}

Listing 13-2: A simple Hello World application before decompilation

After compiling the project, we point our new decompiler at it to see what it comes out with, as shown in Listing 13-3.

$ ./decompiler.exe ~/projects/hello_world/bin/Debug/hello_world.exe hello_world
$ cat hello_world/hello_world/MainClass.cs
using System;

namespace hello_world
{
  internal class MainClass
  {
    public static void Main(string[] args)
    {
      Console.WriteLine("Hello World!");
      Console.WriteLine(4);
    }
  }
}

Listing 13-3: The decompiled Hello World source code

Pretty close! The only real difference is the second WriteLine() method call. In the original code, we had 2 + 2, but the decompiled version outputs 4 . This is not a problem. During compile time, anything that evaluates to a constant value is replaced with that in the binary, so 2 + 2 gets written as 4 in the assembly—something to keep in mind when dealing with assemblies that perform a lot of math to achieve a given result.

Using monodis to Analyze an Assembly

Say we want to do some cursory investigation into a malicious binary before decompiling it. The monodis tool that ships with Mono gives us a lot of power for doing this. It has specific strings-type options (strings is a common Unix utility that prints any human-readable string of characters found in a given file) and can list and export resources compiled into the assembly such as config files or private keys. The monodis usage output can be cryptic and hard to read, as shown in Listing 13-4 (though the man page is a little better).

$ monodis
monodis -- Mono Common Intermediate Language Disassembler
Usage is: monodis [--output=filename] [--filter=filename] [--help] [--mscorlib]
[--assembly] [--assemblyref] [--classlayout]
[--constant] [--customattr] [--declsec] [--event] [--exported]
[--fields] [--file] [--genericpar] [--interface] [--manifest]
[--marshal] [--memberref] [--method] [--methodimpl] [--methodsem]
[--methodspec] [--moduleref] [--module] [--mresources] [--presources]
[--nested] [--param] [--parconst] [--property] [--propertymap]
[--typedef] [--typeref] [--typespec] [--implmap] [--fieldrva]
[--standalonesig] [--methodptr] [--fieldptr] [--paramptr] [--eventptr]
[--propertyptr] [--blob] [--strings] [--userstrings] [--forward-decls] file ..

Listing 13-4: The monodis usage output

Running monodis with no arguments prints a full disassembly of the assembly in the Common Intermediate Language (CIL) bytecode, or you can output the disassembly straight into a file. Listing 13-5 shows some of the disassembly output of the ICSharpCode.Decompiler.dll assembly, which is effectively analogous to the x86 assembly language you may see for a natively compiled application.

$ monodis ICSharpCode.Decompiler.dll | tail -n30 | head -n10
   IL_000c:  mul
   IL_000d:  call class [mscorlib]System.Collections.Generic.EqualityComparer`1<!0> class
[mscorlib]System.Collections.Generic.EqualityComparer`1<!'<expr>j__TPar'>::get_Default()
   IL_0012:  ldarg.0
   IL_0013:  ldfld !0 class '<>f__AnonymousType5`2'<!0,!1>::'<expr>i__Field'
   IL_0018:  callvirt instance int32 class [mscorlib]System.Collections.Generic.Equality
Comparer`1<!'<expr>j__TPar'>::GetHashCode(!0)
   IL_001d:  add
   IL_001e:  stloc.0
   IL_001f:  ldc.i4 -1521134295
   IL_0024:  ldloc.0
   IL_0025:  mul $

Listing 13-5: Some CIL disassembly from ICSharpCode.Decompiler.dll

That’s nice, but not very useful if you don’t know what you’re looking at. Notice that the output code looks similar to x86 assembly. This is actually raw intermediate language (IL), which is kind of like Java bytecode in JAR files, and it can seem a bit arcane. You’ll likely find this most useful when diffing two versions of a library to see what was changed.

It has other great features that aid in reverse engineering. For instance, you can run the GNU strings utility on an assembly to see which strings are stored inside, but you always get cruft you don’t want, such as random byte sequences that just happen to be ASCII printable. If, on the other hand, you pass the --userstrings argument to monodis, it will print any strings that are stored for use in the code, such as variable assignments or constants, as Listing 13-6 shows. Since monodis actually parses the assembly to determine what strings have been programmatically defined, it can produce much cleaner results with higher signal to noise.

$ monodis --userstrings ~/projects/hello_world/bin/Debug/hello_world.exe
User Strings heap contents
00: ""
01: "Hello World!"
1b: ""
$

Listing 13-6: Using the --userstrings argument for monodis

You can also combine --userstrings with --strings (used for metadata and other things), which will output all strings stored in the assembly that aren’t the random garbage that GNU strings picks up. This is very useful when you look for encryption keys or credentials hardcoded into assemblies.

However, my favorite monodis flags are --manifest and --mresources. The first, --manifest, lists all the embedded resources in the assembly. These are usually images or configuration files, but sometimes you’ll find private keys and other sensitive material. The second argument, --mresources, saves each embedded resource to the current working directory. Listing 13-7 shows this in practice.

$ monodis --manifest ~/projects/hello_world/bin/Debug/hello_world.exe
Manifestresource Table (1..1)
1: public 'hello_world.til_neo.png' at offset 0 in current module
$ monodis --mresources ~/projects/hello_world/bin/Debug/hello_world.exe
$ file hello_world.til_neo.png
hello_world.til_neo.png: PNG image data, 1440 x 948, 8-bit/color RGBA, non-interlaced
$

Listing 13-7: Saving an embedded resource to the filesystem with monodis

Apparently, someone hid a picture of Neo in my Hello World application! To be sure, monodis is a favorite tool when I’m messing with an unknown assembly and I want to gain a little bit more information about it, such as methods or specific strings in the binary.

Finally, we have one of the most useful arguments to monodis, --method, which lists all the methods and arguments available in a library or binary (see Listing 13-8).

$ monodis --method ch1_hello_world.exe
Method Table (1..2)
########## ch1_hello_world.MainClass
1: instance default void '.ctor' ()  (param: 1 impl_flags: cil managed )
2: default void Main (string[] args)  (param: 1 impl_flags: cil managed )

Listing 13-8: Demonstrating the --method argument for monodis

When you run monodis --method on the Hello World program from Chapter 1, you will notice that monodis prints two method lines. The first line is the constructor for the MainClass class that contains the Main() method, on line 2 . So, not only does this argument list all the methods (and which class those methods are in), but it also prints the class constructors! This can offer great insight into how a program may work: method names are often good descriptions of what is going on internally.

Conclusion

In the first part of this chapter, we discussed how to utilize the open source ICSharpCode.Decompiler and Mono.Cecil libraries to decompile an arbitrary assembly back into C# code. By compiling a small Hello World application, we saw one difference between the code that results from a decompiled assembly and that of the original source. Other differences may occur, such as the keyword var being replaced with the actual type of the object being created. However, the generated code should still be functionally equivalent, even if it isn’t completely the same source code as before.

Then, we used the monodis tool to see how to dissect and analyze assemblies to glean more information from a rogue application than we would easily have been able to do otherwise. Hopefully, these tools can decrease the time between going from “What happened?” to “How do we fix it?” when something goes wrong or a new piece of malware is found.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset