6
APPLICATION REVERSE ENGINEERING

If you can analyze an entire network protocol just by looking at the transmitted data, then your analysis is quite simple. But that’s not always possible with some protocols, especially those that use custom encryption or compression schemes. However, if you can get the executables for the client or server, you can use binary reverse engineering (RE) to determine how the protocol operates and search for vulnerabilities as well.

The two main kinds of reverse engineering are static and dynamic. Static reverse engineering is the process of disassembling a compiled executable into native machine code and using that code to understand how the executable works. Dynamic reverse engineering involves executing an application and then using tools, such as debuggers and function monitors, to inspect the application’s runtime operation.

In this chapter, I’ll walk you through the basics of taking apart executables to identify and understand the code areas responsible for network communication.

I’ll focus on the Windows platform first, because you’re more likely to find applications without source code on Windows than you are on Linux or macOS. Then, I’ll cover the differences between platforms in more detail and give you some tips and tricks for working on alternative platforms; however, most of the skills you’ll learn will be applicable on all platforms. As you read, keep in mind that it takes time to become good reverse engineer, and I can’t possibly cover the broad topic of reverse engineering in one chapter.

Before we delve into reverse engineering, I’ll discuss how developers create executable files and then provide some details about the omnipresent x86 computer architecture. Once you understand the basics of x86 architecture and how it represents instructions, you’ll know what to look for when you’re reverse engineering code.

Finally, I’ll explain some general operating system principles, including how the operating system implements networking functionality. Armed with this knowledge, you should be able to track down and analyze network applications.

Let’s start with background information on how programs execute on a modern operating system and examine the principles of compilers and interpreters.

Compilers, Interpreters, and Assemblers

Most applications are written in a higher-level programming language, such as C/C++, C#, Java, or one of the many scripting languages. When an application is developed, the raw language is its source code. Unfortunately, computers don’t understand source code, so the high-level language must be converted into machine code (the native instructions the computer’s processor executes) by interpreting or compiling the source code.

The two common ways of developing and executing programs is by interpreting the original source code or by compiling a program to native code. The way a program executes determines how we reverse engineer it, so let’s look at these two distinct methods of execution to get a better idea of how they work.

Interpreted Languages

Interpreted languages, such as Python and Ruby, are sometimes called scripting languages, because their applications are commonly run from short scripts written as text files. Interpreted languages are dynamic and speed up development time. But interpreters execute programs more slowly than code that has been converted to machine code, which the computer understands directly. To convert source code to a more native representation, the programming language can instead be compiled.

Compiled Languages

Compiled programming languages use a compiler to parse the source code and generate machine code, typically by generating an intermediate language first. For native code generation, usually an assembly language specific to the CPU on which the application will run (such as 32- or 64-bit assembly) is used. The language is a human-readable and understandable form of the underlying processor’s instruction set. The assembly language is then converted to machine code using an assembler. For example, Figure 6-1 shows how a C compiler works.

image

Figure 6-1: The C language compilation process

To reverse a native binary to the original source code, you need to reverse the compilation using a process called decompilation. Unfortunately, decompiling machine code is quite difficult, so reverse engineers typically reverse just the assembly process using a process called disassembly.

Static vs. Dynamic Linking

With extremely simple programs, the compilation process might be all that is needed to produce a working executable. But in most applications, a lot of code is imported into the final executable from external libraries by linking—a process that uses a linker program after compilation. The linker takes the application-specific machine code generated by the compiler, along with any necessary external libraries used by the application, and embeds everything in a final executable by statically linking any external libraries. This static linking process produces a single, self-contained executable that doesn’t depend on the original libraries.

Because certain processes might be handled in very different ways on different operating systems, static linking all code into one big binary might not be a good idea because the OS-specific implementation could change. For example, writing to a file on disk might have widely different operating system calls on Windows than it does on Linux. Therefore, compilers commonly link an executable to operating system–specific libraries by dynamic linking: instead of embedding the machine code in the final executable, the compiler stores only a reference to the dynamic library and the required function. The operating system must resolve the linked references when the application runs.

The x86 Architecture

Before getting into the methods of reverse engineering, you’ll need some understanding of the basics of the x86 computer architecture. For a computer architecture that is over 30 years old, x86 is surprisingly persistent. It’s used in the majority of desktop and laptop computers available today. Although the PC has been the traditional home of the x86 architecture, it has found its way into Mac1 computers, game consoles, and even smartphones.

The original x86 architecture was released by Intel in 1978 with the 8086 CPU. Over the years, Intel and other manufacturers (such as AMD) have improved its performance massively, moving from supporting 16-bit operations to 32-bit and now 64-bit operations. The modern architecture has barely anything in common with the original 8086, other than processor instructions and programming idioms. Because of its lengthy history, the x86 architecture is very complex. We’ll first look at how the x86 executes machine code, and then examine its CPU registers and the methods used to determine the order of execution.

The Instruction Set Architecture

When discussing how a CPU executes machine code, it’s common to talk about the instruction set architecture (ISA). The ISA defines how the machine code works and how it interacts with the CPU and the rest of the computer. A working knowledge of the ISA is crucial for effective reverse engineering.

The ISA defines the set of machine language instructions available to a program; each individual machine language instruction is represented by a mnemonic instruction. The mnemonics name each instruction and determine how its parameters, or operands, are represented. Table 6-1 lists the mnemonics of some of the most common x86 instructions. (I’ll cover many of these instructions in greater detail in the following sections.)

Table 6-1: Common x86 Instruction Mnemonics

Instruction

Description

MOV destination, source

Moves a value from source to destination

ADD destination, value

Adds an integer value to the destination

SUB destination, value

Subtracts an integer value from a destination

CALL address

Calls the subroutine at the specified address

JMP address

Jumps unconditionally to the specified address

RET

Returns from a previous subroutine

RETN size

Returns from a previous subroutine and then increments the stack by size

Jcc address

Jumps to the specified address if the condition indicated by cc is true

PUSH value

Pushes a value onto the current stack and decrements the stack pointer

POP destination

Pops the top of the stack into the destination and increments the stack pointer

CMP valuea, valueb

Compares valuea and valueb and sets the appropriate flags

TEST valuea, valueb

Performs a bitwise AND on valuea and valueb and sets the appropriate flags

AND destination, value

Performs a bitwise AND on the destination with the value

OR destination, value

Performs a bitwise OR on the destination with the value

XOR destination, value

Performs a bitwise Exclusive OR on the destination with the value

SHL destination, N

Shifts the destination to the left by N bits (with left being higher bits)

SHR destination, N

Shifts the destination to the right by N bits (with right being lower bits)

INC destination

Increments destination by 1

DEC destination

Decrements destination by 1

These mnemonic instructions take one of three forms depending on how many operands the instruction takes. Table 6-2 shows the three different forms of operands.

Table 6-2: Intel Mnemonic Forms

Number of operands

Form

Examples

0

NAME

POP, RET

1

NAME input

PUSH 1; CALL func

2

NAME output, input

MOV EAX, EBX; ADD EDI, 1

The two common ways to represent x86 instructions in assembly are Intel and AT&T syntax. Intel syntax, originally developed by the Intel Corporation, is the syntax I use throughout this chapter. AT&T syntax is used in many development tools on Unix-like systems. The syntaxes differ in a few ways, such as the order in which operands are given. For example, the instruction to add 1 to the value stored in the EAX register would look like this in Intel syntax: ADD EAX, 1 and like this in AT&T Syntax: addl $1, %eax.

CPU Registers

The CPU has a number of registers for very fast, temporary storage of the current state of execution. In x86, each register is referred to by a two- or three-character label. Figure 6-2 shows the main registers for a 32-bit x86 processor. It’s essential to understand the many types of registers the processor supports because each serves different purposes and is necessary for understanding how the instructions operate.

image

Figure 6-2: The main 32-bit x86 registers

The x86’s registers are split into four main categories: general purpose, memory index, control, and selector.

General Purpose Registers

The general purpose registers (EAX, EBX, ECX, and EDX in Figure 6-2) are temporary stores for nonspecific values of computation, such as the results of addition or subtraction. The general purpose registers are 32 bits in size, although instructions can access them in 16- and 8-bit versions using a simple naming convention: for example, a 16-bit version of the EAX register is accessed as AX, and the 8-bit versions are AH and AL. Figure 6-3 shows the organization of the EAX register.

image

Figure 6-3: EAX general purpose register with small register components

Memory Index Registers

The memory index registers (ESI, EDI, ESP, EBP, EIP) are mostly general purpose except for the ESP and EIP registers. The ESP register is used by the PUSH and POP instructions, as well as during subroutine calls to indicate the current memory location of the base of a stack.

Although you can utilize the ESP register for purposes other than indexing into the stack, it’s usually unwise to do so because it might cause memory corruption or unexpected behavior. The reason is that some instructions implicitly rely on the value of the register. On the other hand, the EIP register cannot be directly accessed as a general purpose register because it indicates the next address in memory where an instruction will be read from.

The only way to change the value of the EIP register is by using a control instruction, such as CALL, JMP, or RET. For this discussion, the important control register is EFLAGS. EFLAGS contains a variety of Boolean flags that indicate the results of instruction execution, such as whether the last operation resulted in the value 0. These Boolean flags implement conditional branches on the x86 processor. For example, if you subtract two values and the result is 0, the Zero flag in the EFLAGS register will be set to 1, and flags that do not apply will be set to 0.

The EFLAGS register also contains important system flags, such as whether interrupts are enabled. Not all instructions affect the value of EFLAGS. Table 6-3 lists the most important flag values, including the flag’s bit position, its common name, and a brief description.

Table 6-3: Important EFLAGS Status Flags

Bit

Name

Description

0

Carry flag

Indicates whether a carry bit was generated from the last operation

2

Parity flag

The parity of the least-significant byte of the last operation

6

Zero flag

Indicates whether the last operation has zero as its result; used in comparison operations

7

Sign flag

Indicates the sign of the last operation; effectively, the most-significant bit of the result

11

Overflow flag

Indicates whether the last operation overflowed

Selector Registers

The selector registers (CS, DS, ES, FS, GS, SS) address memory locations by indicating a specific block of memory into which you can read or write. The real memory address used in reading or writing the value is looked up in an internal CPU table.

NOTE

Selector registers are usually only used in operating system–specific operations. For example, on Windows, the FS register is used to access memory allocated to store the current thread’s control information.

Memory is accessed using little endian byte order. Recall from Chapter 3 that little endian order means the least-significant byte is stored at the lowest memory address.

Another important feature of the x86 architecture is that it doesn’t require its memory operations to be aligned. All reads and writes to main memory on an aligned processor architecture must be aligned to the size of the operation. For example, if you want to read a 32-bit value, you would have to read from a memory address that is a multiple of 4. On aligned architectures, such as SPARC, reading an unaligned address would generate an error. Conversely, the x86 architecture permits you to read from or write to any memory address regardless of alignment.

Unlike architectures such as ARM, which use specialized instructions to load and store values between the CPU registers and main memory, many of the x86 instructions can take memory addresses as operands. In fact, the x86 supports a complex memory-addressing format for its instructions: each memory address reference can contain a base register, an index register, a multiplier for the index (between 1 and 8), or a 32-bit offset. For example, the following MOV instruction combines all four of these referencing options to determine which memory address contains the value to be copied into the EAX register:

MOV EAX, [ESI + EDI * 8 + 0x50]   ; Read 32-bit value from memory address

When a complex address reference like this is used in an instruction, it’s common to see it enclosed in square brackets.

Program Flow

Program flow, or control flow, is how a program determines which instructions to execute. The x86 has three main types of program flow instructions: subroutine calling, conditional branches, and unconditional branches. Subroutine calling redirects the flow of the program to a subroutine—a specified sequence of instructions. This is achieved with the CALL instruction, which changes the EIP register to the location of the subroutine. CALL places the memory address of the next instruction onto the current stack, which tells the program flow where to return after it has performed its subroutine task. The return is performed using the RET instruction, which changes the EIP register to the top address in the stack (the one CALL put there).

Conditional branches allow the code to make decisions based on prior operations. For example, the CMP instruction compares the values of two operands (perhaps two registers) and calculates the appropriate values for the EFLAGS register. Under the hood, the CMP instruction does this by subtracting one value from the other, setting the EFLAGS register as appropriate, and then discarding the result. The TEST instruction does the same except it performs an AND operation instead of a subtraction.

After the EFLAGS value has been calculated, a conditional branch can be executed; the address it jumps to depends on the state of EFLAGS. For example, the JZ instruction will conditionally jump if the Zero flag is set (which would happen if, for instance, the CMP instruction compared two values that were equal); otherwise, the instruction is a no-operation. Keep in mind that the EFLAGS register can also be set by arithmetic and other instructions. For example, the SHL instruction shifts the value of a destination by a certain number of bits from low to high.

Unconditional branching program flow is implemented through the JMP instruction, which just jumps unconditionally to a destination address. There’s not much more to be said about unconditional branching.

Operating System Basics

Understanding a computer’s architecture is important for both static and dynamic reverse engineering. Without this knowledge, it’s difficult to ever understand what a sequence of instructions does. But architecture is only part of the story: without the operating system handling the computer’s hardware and processes, the instructions wouldn’t be very useful. Here I’ll explain some of the basics of how an operating system works, which will help you understand the processes of reverse engineering.

Executable File Formats

Executable file formats define how executable files are stored on disk. Operating systems need to specify the executables they support so they can load and run programs. Unlike earlier operating systems, such as MS-DOS, which had no restrictions on what file formats would execute (when run, files containing instructions would load directly into memory), modern operating systems have many more requirements that necessitate more complex formats.

Some requirements of a modern executable format include:

• Memory allocation for executable instructions and data

• Support for dynamic linking of external libraries

• Support for cryptographic signatures to validate the source of the executable

• Maintenance of debug information to link executable code to the original source code for debugging purposes

• A reference to the address in the executable file where code begins executing, commonly called the start address (necessary because the program’s start address might not be the first instruction in the executable file)

Windows uses the Portable Executable (PE) format for all executables and dynamic libraries. Executables typically use the .exe extension, and dynamic libraries use the .dll extension. Windows doesn’t actually need these extensions for a new process to work correctly; they are used just for convenience.

Most Unix-like systems, including Linux and Solaris, use the Executable Linking Format (ELF) as their primary executable format. The major exception is macOS, which uses the Mach-O format.

Sections

Memory sections are probably the most important information stored in an executable. All nontrivial executables will have at least three sections: the code section, which contains the native machine code for the executable; the data section, which contains initialized data that can be read and written during execution; and a special section to contain uninitialized data. Each section has a name that identifies the data it contains. The code section is usually called text, the data section is called data, and the uninitialized data is called bss.

Every section contains four basic pieces of information:

• A text name

• A size and location of the data for the section contained in the executable file

• The size and address in memory where the data should be loaded

• Memory protection flags, which indicate whether the section can be written or executed when loaded into memory

Processes and Threads

An operating system must be able to run multiple instances of an executable concurrently without them conflicting. To do so, operating systems define a process, which acts as a container for an instance of a running executable. A process stores all the private memory the instance needs to operate, isolating it from other instances of the same executable. The process is also a security boundary, because it runs under a particular user of the operating system and security decisions can be made based on this identity.

Operating systems also define a thread of execution, which allows the operating system to rapidly switch between multiple processes, making it seem to the user that they’re all running at the same time. This is called multitasking. To switch between processes, the operating system must interrupt what the CPU is doing, store the current process’s state, and restore an alternate process’s state. When the CPU resumes, it is running another process.

A thread defines the current state of execution. It has its own block of memory for a stack and somewhere to store its state when the operating system stops the thread. A process will usually have at least one thread, and the limit on the number of threads in the process is typically controlled by the computer’s resources.

To create a new process from an executable file, the operating system first creates an empty process with its own allocated memory space. Then the operating system loads the main executable into the process’s memory space, allocating memory based on the executable’s section table. Next, a new thread is created, which is called the main thread.

The dynamic linking program is responsible for linking in the main executable’s system libraries before jumping back to the original start address. When the operating system launches the main thread, the process creation is complete.

Operating System Networking Interface

The operating system must manage a computer’s networking hardware so it can be shared between all running applications. The hardware knows very little about higher-level protocols, such as TCP/IP,2 so the operating system must provide implementations of these higher-level protocols.

The operating system also needs to provide a way for applications to interface with the network. The most common network API is the Berkeley sockets model, originally developed at the University of California, Berkeley in the 1970s for BSD. All Unix-like systems have built-in support for Berkeley sockets. On Windows, the Winsock library provides a very similar programming interface. The Berkeley sockets model is so prevalent that you’ll almost certainly encounter it on a wide range of platforms.

Creating a Simple TCP Client Connection to a Server

To get a better sense of how the sockets API works, Listing 6-1 shows how to create a simple TCP client connection to a remote server.

   int port = 12345;
   const char* ip = "1.2.3.4";
   sockaddr_in addr = {0};

int s = socket(AF_INET, SOCK_STREAM, 0);

   addr.sin_family = PF_INET;
addr.sin_port = htons(port);
inet_pton(AF_INET, ip, &addr.sin_addr);
if(connect(s, (sockaddr*) &addr, sizeof(addr)) == 0)
   {
       char buf[1024];
        int len = recv(s, buf, sizeof(buf), 0);

        send(s, buf, len, 0);
   }

   close(s);

Listing 6-1: A simple TCP network client

The first API call creates a new socket. The AF_INET parameter indicates we want to use the IPv4 protocol. (To use IPv6 instead, we would write AF_INET6). The second parameter SOCK_STREAM indicates that we want to use a streaming connection, which for the internet means TCP. To create a UDP socket, we would write SOCK_DGRAM (for datagram socket).

Next, we construct a destination address with addr, an instance of the system-defined sockaddr_in structure. We set up the address structure with the protocol type, the TCP port, and the TCP IP address. The call to inet_pton converts the string representation of the IP address in ip to a 32-bit integer.

Note that when setting the port, the htons function is used to convert the value from host-byte-order (which for x86 is little endian) to network-byte-order (always big endian). This applies to the IP address as well. In this case, the IP address 1.2.3.4 will become the integer 0x01020304 when stored in big endian format.

The final step is to issue the call to connect to the destination address . This is the main point of failure, because at this point the operating system has to make an outbound call to the destination address to see whether anything is listening. When the new socket connection is established, the program can read and write data to the socket as if it were a file via the recv and send system calls. (On Unix-like systems, you can also use the general read and write calls, but not on Windows.)

Creating a Client Connection to a TCP Server

Listing 6-2 shows a snippet of the other side of the network connection, a very simple TCP socket server.

   sockaddr_in bind_addr = {0};

   int s = socket(AF_INET, SOCK_STREAM, 0);

   bind_addr.sin_family = AF_INET;
   bind_addr.sin_port = htons(12345);
inet_pton("0.0.0.0", &bind_addr.sin_addr);

bind(s, (sockaddr*)&bind_addr, sizeof(bind_addr));
listen(s, 10);

   sockaddr_in client_addr;
   int socksize = sizeof(client_addr);
int newsock = accept(s, (sockaddr*)&client_addr, &socksize);

   // Do something with the new socket

Listing 6-2: A simple TCP socket server

The first important step when connecting to a TCP socket server is to bind the socket to an address on the local network interface, as shown at and . This is effectively the opposite of the client case in Listing 6-1 because inet_pton() just converts a string IP address to its binary form. The socket is bound to all network addresses, as signified by "0.0.0.0", although this could instead be a specific address on port 12345.

Then, the socket is bound to that local address . By binding to all interfaces, we ensure the server socket will be accessible from outside the current system, such as over the internet, assuming no firewall is in the way.

Finally, the listing asks the network interface to listen for new incoming connections and calls accept , which returns the next new connection. As with the client, this new socket can be read and written to using the recv and send calls.

When you encounter native applications that use the operating system network interface, you’ll have to track down all these function calls in the executable code. Your knowledge of how programs are written at the C programming language level will prove valuable when you’re looking at your reversed code in a disassembler.

Application Binary Interface

The application binary interface (ABI) is an interface defined by the operating system to describe the conventions of how an application calls an API function. Most programming languages and operating systems pass parameters left to right, meaning that the leftmost parameter in the original source code is placed at the lowest stack address. If the parameters are built by pushing them to a stack, the last parameter is pushed first.

Another important consideration is how the return value is provided to the function’s caller when the API call is complete. In the x86 architecture, as long as the value is less than or equal to 32 bits, it’s passed back in the EAX register. If the value is between 32 and 64 bits, it’s passed back in a combination of EAX and EDX.

Both EAX and EDX are considered scratch registers in the ABI, meaning that their register values are not preserved across function calls: in other words, when calling a function, the caller can’t rely on any value stored in these registers to still exist when the call returns. This model of designating registers as scratch is done for pragmatic reasons: it allows functions to spend less time and memory saving registers, which might not be modified anyway. In fact, the ABI specifies an exact list of which registers must be saved into a location on the stack by the called function.

Table 6-4 contains a quick description of the typical register assignment’s purpose. The table also indicates whether the register must be saved when calling a function in order for the register to be restored to its original value before the function returns.

Table 6-4: Saved Register List

Register

ABI usage

Saved?

EAX

Used to pass the return value of the function

No

EBX

General purpose register

Yes

ECX

Used for local loops and counters, and sometimes used to pass object pointers in languages such as C++

No

EDX

Used for extended return values

No

EDI

General purpose register

Yes

ESI

General purpose register

Yes

EBP

Pointer to the base of the current valid stack frame

Yes

ESP

Pointer to the base of the stack

Yes

Figure 6-4 shows an add() function being called in the assembly code for the print_add() function: it places the parameters on the stack (PUSH 10), calls the add() function (CALL add), and then cleans up afterward (ADD ESP, 8). The result of the addition is passed back from add() through the EAX register, which is then printed to the console.

image

Figure 6-4: Function calling in assembly code

Static Reverse Engineering

Now that you have a basic understanding of how programs execute, we’ll look at some methods of reverse engineering. Static reverse engineering is the process of dissecting an application executable to determine what it does. Ideally, we could reverse the compilation process to the original source code, but that’s usually too difficult to do. Instead, it’s more common to disassemble the executable.

Rather than attacking a binary with only a hex editor and a machine code reference, you can use one of many tools to disassemble binaries. One such tool is the Linux-based objdump, which simply prints the disassembled output to the console or to a file. Then it’s up to you to navigate through the disassembly using a text editor. However, objdump isn’t very user friendly.

Fortunately, there are interactive disassemblers that present disassembled code in a form that you can easily inspect and navigate. By far, the most fully featured of these is IDA Pro, which was developed by the Hex Rays company. IDA Pro is the go-to tool for static reversing, and it supports many common executable formats as well as almost any CPU architecture. The full version is pricey, but a free edition is also available. Although the free version only disassembles x86 code and can’t be used in a commercial environment, it’s perfect for getting you up to speed with a disassembler. You can download the free version of IDA Pro from the Hex Rays website at https://www.hex-rays.com/. The free version is only for Windows, but it should run well under Wine on Linux or macOS. Let’s take a quick tour of how to use IDA Pro to dissect a simple network binary.

A Quick Guide to Using IDA Pro Free Edition

Once it’s installed, start IDA Pro and then choose the target executable by clicking FileOpen. The Load a new file window should appear (see Figure 6-5).

This window displays several options, but most are for advanced users; you only need to consider certain important options. The first option allows you to choose the executable format you want to inspect . The default in the figure, Portable executable, is usually the correct choice, but it’s always best to check. The Processor type specifies the processor architecture as the default, which is x86. This option is especially important when you’re disassembling binary data for unusual processor architectures. When you’re sure the options you chose are correct, click OK to begin disassembly.

Your choices for the first and second options will depend on the executable you’re trying to disassemble. In this example, we’re disassembling a Windows executable that uses the PE format with an x86 processor. For other platforms, such as macOS or Linux, you’ll need to select the appropriate options. IDA will make its best efforts to detect the format necessary to disassemble your target, so normally you won’t need to choose. During disassembly, it will do its best to find all executable code, annotate the decompiled functions and data, and determine cross-references between areas of the disassembly.

image

Figure 6-5: Options for loading a new file

By default, IDA attempts to provide annotations for variable names and function parameters if it knows about them, such as when calling common API functions. For cross-references, IDA will find the locations in the disassembly where data and code are referenced: you can look these up when you’re reverse engineering, as you’ll soon see. Disassembly can take a long time. When the process is complete, you should have access to the main IDA interface, as shown in Figure 6-6.

There are three important windows to pay attention to in IDA’s main interface. The window at is the default disassembly view. In this example, it shows the IDA Pro graph view, which is often a very useful way to view an individual function’s flow of execution. To display a native view showing the disassembly in a linear format based on the loading address of instructions, press the spacebar. The window at shows the status of the disassembly process as well as any errors that might occur if you try to perform an operation in IDA that it doesn’t understand. The tabs of the open windows are at .

You can open additional windows in IDA by selecting ViewOpen subviews. Here are some windows you’ll almost certainly need and what they display:

IDA View Shows the disassembly of the executable

Exports Shows any functions exported by the executable

Imports Shows any functions dynamically linked into this executable at runtime

Functions Shows a list of all functions that IDA Pro has identified

Strings Shows a list of printable strings that IDA Pro has identified during analysis

image

Figure 6-6: The main IDA Pro interface

image

Figure 6-7: The back button for the IDA Pro disassembly window

Of the five window types listed, the last four are basically just lists of information. The IDA View is where you’ll spend most of your time when you’re reverse engineering, because it shows you the disassembled code. You can easily navigate around the disassembly in IDA View. For example, double-click anything that looks like a function name or data reference to navigate automatically to the location of the reference. This technique is especially useful when you’re analyzing calls to other functions: for instance, if you see CALL sub_400100, just double-click the sub_400100 portion to be taken directly to the function. You can go to the original caller by pressing the ESC key or the back button, highlighted in Figure 6-7.

In fact, you can navigate back and forth in the disassembly window as you would in a web browser. When you find a reference string in the text, move the text cursor to the reference and press X or right-click and choose Jump to xref to operand to bring up a cross-reference dialog that shows a list of all locations in the executable referencing that function or data value. Double-click an entry to navigate directly to the reference in the disassembly window.

NOTE

By default, IDA will generate automatic names for referenced values. For example, functions are named sub_XXXX, where XXXX is their memory address; the name loc_XXXX indicates branch locations in the current function or locations that are not contained in a function. These names may not help you understand what the disassembly is doing, but you can rename these references to make them more meaningful. To rename references, move the cursor to the reference text and press N or right-click and select Rename from the menu. The changes to the name should propagate everywhere it is referenced.

Analyzing Stack Variables and Arguments

Another feature in IDA’s disassembly window is its analysis of stack variables and arguments. When I discussed calling conventions in “Application Binary Interface” on page 123, I indicated that parameters are generally passed on the stack, but that the stack also stores temporary local variables, which are used by functions to store important values that can’t fit into the available registers. IDA Pro will analyze the function and determine how many arguments it takes and which local variables it uses. Figure 6-8 shows these variables at the start of a disassembled function as well as a few instructions that use these variables.

image

Figure 6-8: A disassembled function showing local variables and arguments

You can rename these local variables and arguments and look up all their cross-references, but cross-references for local variables and arguments will stay within the same function.

Identifying Key Functionality

Next, you need to determine where the executable you’re disassembling handles the network protocol. The most straightforward way to do this is to inspect all parts of the executable in turn and determine what they do. But if you’re disassembling a large commercial product, this method is very inefficient. Instead, you’ll need a way to quickly identify areas of functionality for further analysis. In this section, I’ll discuss four typical approaches for doing so, including extracting symbolic information, looking up which libraries are imported into the executable, analyzing strings, and identifying automated code.

Extracting Symbolic Information

Compiling source code into a native executable is a lossy process, especially when the code includes symbolic information, such as the names of variables and functions or the form of in-memory structures. Because this information is rarely needed for a native executable to run correctly, the compilation process may just discard it. But dropping this information makes it very difficult to debug problems in the built executable.

All compilers support the ability to convert symbolic information and generate debug symbols with information about the original source code line associated with an instruction in memory as well as type information for functions and variables. However, developers rarely leave in debug symbols intentionally, choosing instead to remove them before a public release to prevent people from discovering their proprietary secrets (or bad code). Still, sometimes developers slip up, and you can take advantage of those slipups to aid reverse engineering.

IDA Pro loads debug symbols automatically whenever possible, but sometimes you’ll need to hunt down the symbols on your own. Let’s look at the debug symbols used by Windows, macOS, and Linux, as well as where the symbolic information is stored and how to get IDA to load it correctly.

When a Windows executable is built using common compilers (such as Microsoft Visual C++), the debug symbol information isn’t stored inside the executable; instead, it’s stored in a section of the executable that provides the location of a program database (PDB) file. In fact, all the debug information is stored in this PDB file. The separation of the debug symbols from the executable makes it easy to distribute the executable without debug information while making that information readily available for debugging.

PDB files are rarely distributed with executables, at least in closed-source software. But one very important exception is Microsoft Windows. To aid debugging efforts, Microsoft releases public symbols for most executables installed as part of Windows, including the kernel. Although these PDB files don’t contain all the debug information from the compilation process (Microsoft strips out information they don’t want to make public, such as detailed type information), the files still contain most of the function names, which is often what you want. The upshot is that when reverse engineering Windows executables, IDA Pro should automatically look up the symbol file on Microsoft’s public symbol server and process it. If you happen to have the symbol file (because it came with the executable), load it by placing it next to the executable in a directory and then have IDA Pro disassemble the executable. You can also load PDB files after initial disassembly by selecting FileLoad FilePDB File.

Debug symbols are most significant in reverse engineering in IDA Pro when naming functions in the disassembly and Functions windows. If the symbols also contain type information, you should see annotations on the function calls that indicate the types of parameters, as shown in Figure 6-9.

image

Figure 6-9: Disassembly with debug symbols

Even without a PDB file, you might be able to access some symbolic information from the executable. Dynamic libraries, for example, must export some functions for another executable to use: that export will provide some basic symbolic information, including the names of the external functions. From that information, you should be able to drill down to find what you’re looking for in the Exports window. Figure 6-10 shows what this information would look like for the ws2_32.dll Windows network library.

image

Figure 6-10: Exports from the ws2_32.dll library

Debug symbols work similarly on macOS, except debugging information is contained in a debugging symbols package (dSYM), which is created alongside the executable rather than in a single PDB file. The dSYM package is a separate macOS package directory and is rarely distributed with commercial applications. However, the Mach-O executable format can store basic symbolic information, such as function and data variable names, in the executable. A developer can run a tool called Strip, which will remove all this symbolic information from a Mach-O binary. If they do not run Strip, then the Mach-O binary may still contain useful symbolic information for reverse engineering.

On Linux, ELF executable files package all debug and other symbolic information into a single executable file by placing debugging information into its own section in the executable. As with macOS, the only way to remove this information is with the Strip tool; if the developer fails to do so before release, you might be in luck. (Of course, you’ll have access to the source code for most programs running on Linux.)

Viewing Imported Libraries

On a general purpose operating system, calls to network APIs aren’t likely to be built directly into the executable. Instead, functions will be dynamically linked at runtime. To determine what an executable imports dynamically, view the Imports window in IDA Pro, as shown in Figure 6-11.

In the figure, various network APIs are imported from the ws2_32.dll library, which is the BSD sockets implementation for Windows. When you double-click an entry, you should see the import in a disassembly window. From there, you can find references to that function by using IDA Pro to show the cross-references to that address.

image

Figure 6-11: The Imports window

In addition to network functions, you might also see that various cryptographic libraries have been imported. Following these references can lead you to where encryption is used in the executable. By using this imported information, you may be able to trace back to the original callee to find out how it’s been used. Common encryption libraries include OpenSSL and the Windows Crypt32.dll.

Analyzing Strings

Most applications contain strings with printable text information, such as text to display during application execution, text for logging purposes, or text left over from the debugging process that isn’t used. The text, especially internal debug information, might hint at what a disassembled function is doing. Depending on how the developer added debug information, you might find the function name, the original C source code file, or even the line number in the source code where the debug string was printed. (Most C and C++ compilers support a syntax to embed these values into a string during compilation.)

IDA Pro tries to find printable text strings as part of its analysis process. To display these strings, open the Strings window. Click a string of interest, and you’ll see its definition. Then you can attempt to find references to the string that should allow you to trace back to the functionality associated with it.

String analysis is also useful for determining which libraries an executable was statically linked with. For example, the ZLib compression library is commonly statically linked, and the linked executable should always contain the following string (the version number might differ):

inflate 1.2.8 Copyright 1995-2013 Mark Adler

By quickly discovering which libraries are included in an executable, you might be able to successfully guess the structure of the protocol.

Identifying Automated Code

Certain types of functionality lend themselves to automated identification. For example, encryption algorithms typically have several magic constants (numbers defined by the algorithm that are chosen for particular mathematical properties) as part of the algorithm. If you find these magic constants in the executable, you know a particular encryption algorithm is at least compiled into the executable (though it isn’t necessarily used). For example, Listing 6-3 shows the initialization of the MD5 hashing algorithm, which uses magic constant values.

void md5_init( md5_context *ctx )
{
    ctx->state[0] = 0x67452301;
    ctx->state[1] = 0xEFCDAB89;
    ctx->state[2] = 0x98BADCFE;
    ctx->state[3] = 0x10325476;
}

Listing 6-3: MD5 initialization showing magic constants

Armed with knowledge of the MD5 algorithm, you can search for this initialization code in IDA Pro by selecting a disassembly window and choosing SearchImmediate value. Complete the dialog as shown in Figure 6-12 and click OK.

image

Figure 6-12: The IDA Pro search box for MD5 constant

If MD5 is present, your search should display a list of places where that unique value is found. Then you can switch to the disassembly window to try to determine what code uses that value. You can also use this technique with algorithms, such as the AES encryption algorithm, which uses special s-box structures that contain similar magic constants.

However, locating algorithms using IDA Pro’s search box can be time consuming and error prone. For example, the search in Figure 6-12 will pick up MD5 as well as SHA-1, which uses the same four magic constants (and adds a fifth). Fortunately, there are tools that can do these searches for you. One example, PEiD (available from http://www.softpedia.com/get/Programming/Packers-Crypters-Protectors/PEiD-updated.shtml), determines whether a Windows PE file is packed with a known packing tool, such as UPX. It includes a few plug-ins, one of which will detect potential encryption algorithms and indicate where in the executable they are referenced.

To use PEiD to detect cryptographic algorithms, start PEiD and click the top-right button to choose a PE executable to analyze. Then run the plug-in by clicking the button on the bottom right and selecting PluginsKrypto Analyzer. If the executable contains any cryptographic algorithms, the plug-in should identify them and display a dialog like the one in Figure 6-13. You can then enter the referenced address value into IDA Pro to analyze the results.

image

Figure 6-13: The result of PEiD cryptographic algorithm analysis

Dynamic Reverse Engineering

Dynamic reverse engineering is about inspecting the operation of a running executable. This method of reversing is especially useful when analyzing complex functionality, such as custom cryptography or compression routines. The reason is that instead of staring at the disassembly of complex functionality, you can step through it one instruction at a time. Dynamic reverse engineering also lets you test your understanding of the code by allowing you to inject test inputs.

The most common way to perform dynamic reverse engineering is to use a debugger to halt a running application at specific points and inspect data values. Although several debugging programs are available to choose from, we’ll use IDA Pro, which contains a basic debugger for Windows applications and synchronizes between the static and debugger view. For example, if you rename a function in the debugger, that change will be reflected in the static disassembly.

NOTE

Although I use IDA Pro on Windows in the following discussion, the basic techniques are applicable to other operating systems and debuggers.

To run the currently disassembled executable in IDA Pro’s debugger, press F9. If the executable needs command line arguments, add them by selecting DebuggerProcess Options and filling in the Parameters text box in the displayed dialog. To stop debugging a running process, press CTRL-F2.

Setting Breakpoints

The simplest way to use a debugger’s features is to set breakpoints at places of interest in the disassembly, and then inspect the state of the running program at these breakpoints. To set a breakpoint, find an area of interest and press F2. The line of disassembly should turn red, indicating that the breakpoint has been set correctly. Now, whenever the program tries to execute the instruction at that breakpoint, the debugger should stop and give you access to the current state of the program.

Debugger Windows

By default, the IDA Pro debugger shows three important windows when the debugger hits a breakpoint.

The EIP Window

The first window displays a disassembly view based on the instruction in the EIP register that shows the instruction currently being executed (see Figure 6-14). This window works much like the disassembly window does while doing static reverse engineering. You can quickly navigate from this window to other functions and rename references (which are reflected in your static disassembly). When you hover the mouse over a register, you should see a quick preview of the value, which is very useful if the register points to a memory address.

image

Figure 6-14: The debugger EIP window

The ESP Window

The debugger also shows an ESP window that reflects the current location of the ESP register, which points to the base of the current thread’s stack. Here is where you can identify the parameters being passed to function calls or the value of local variables. For example, Figure 6-15 shows the stack values just before calling the send function. I’ve highlighted the four parameters. As with the EIP window, you can double-click references to navigate to that location.

image

Figure 6-15: The debugger ESP window

The State of the General Purpose Registers

The General registers default window shows the current state of the general purpose registers. Recall that registers are used to store the current values of various program states, such as loop counters and memory addresses. For memory addresses, this window provides a convenient way to navigate to a memory view window: click the arrow next to each address to navigate from the last active memory window to the memory address corresponding to that register value.

To create a new memory window, right-click the array and select Jump in new window. You’ll see the condition flags from the EFLAGS register on the right side of the window, as shown in Figure 6-16.

image

Figure 6-16: The General registers window

Where to Set Breakpoints?

Where are the best places to set breakpoints when you’re investigating a network protocol? A good first step is to set breakpoints on calls to the send and recv functions, which send and receive data from the network stack. Cryptographic functions are also a good target: you can set breakpoints on functions that set the encryption key or the encryption and decryption functions. Because the debugger synchronizes with the static disassembler in IDA Pro, you can also set breakpoints on code areas that appear to be building network protocol data. By stepping through instructions with breakpoints, you can better understand how the underlying algorithms work.

Reverse Engineering Managed Languages

Not all applications are distributed as native executables. For example, applications written in managed languages like .NET and Java compile to an intermediate machine language, which is commonly designed to be CPU and operating system agnostic. When the application is executed, a virtual machine or runtime executes the code. In .NET this intermediate machine language is called common intermediate language (CIL); in Java it’s called Java byte code.

These intermediate languages contain substantial amounts of metadata, such as the names of classes and all internal- and external-facing method names. Also, unlike for native-compiled code, the output of managed languages is fairly predictable, which makes them ideal for decompiling.

In the following sections, I’ll examine how .NET and Java applications are packaged. I’ll also demonstrate a few tools you can use to reverse engineer .NET and Java applications efficiently.

.NET Applications

The .NET runtime environment is called the common language runtime (CLR). A .NET application relies on the CLR as well as a large library of basic functionality called the base class library (BCL).

Although .NET is primarily a Microsoft Windows platform (it is developed by Microsoft after all), a number of other, more portable versions are available. The best known is the Mono Project, which runs on Unix-like systems and covers a wide range of CPU architectures, including SPARC and MIPS.

If you look at the files distributed with a .NET application, you’ll see files with .exe and .ddl extensions, and you’d be forgiven for assuming they’re just native executables. But if you load these files into an x86 disassembler, you’ll be greeted with a message similar to the one shown in Figure 6-17.

image

Figure 6-17: A .NET executable in an x86 disassembler

As it turns out, .NET only uses the .exe and .dll file formats as convenient containers for the CIL code. In the .NET runtime, these containers are referred to as assemblies.

Assemblies contain one or more classes, enumerations, and/or structures. Each type is referred to by a name, typically consisting of a namespace and a short name. The namespace reduces the likelihood of conflicting names but can also be useful for categorization. For example, any types under the namespace System.Net deal with network functionality.

Using ILSpy

You’ll rarely, if ever, need to interact with raw CIL because tools like Reflector (https://www.red-gate.com/products/dotnet-development/reflector/) and ILSpy (http://ilspy.net/) can decompile CIL data into C# or Visual Basic source and display the original CIL. Let’s look at how to use ILSpy, a free open source tool that you can use to find an application’s network functionality. Figure 6-18 shows ILSpy’s main interface.

The interface is split into two windows. The left window is a tree-based listing of all assemblies that ILSpy has loaded. You can expand the tree view to see the namespaces and the types an assembly contains . The right window shows disassembled source code . The assembly you select in the left window is expanded on the right.

To work with a .NET application, load it into ILSpy by pressing CTRL+O and selecting the application in the dialog. If you open the application’s main executable file, ILSpy should automatically load any assembly referenced in the executable as necessary.

With the application open, you can search for the network functionality. One way to do so is to search for types and members whose names sound like network functions. To search all loaded assemblies, press F3. A new window should appear on the right side of your screen, as shown in Figure 6-19.

image

Figure 6-18: The ILSpy main interface

image

Figure 6-19: The ILSpy Search window

Enter a search term at to filter out all loaded types and display them in the window below. You can also search for members or constants by selecting them from the drop-down list at . For example, to search for literal strings, select Constant. When you’ve found an entry you want to inspect, such as TcpNetworkListener , double-click it and ILSpy should automatically decompile the type or method.

Rather than directly searching for specific types and members, you can also search an application for areas that use built-in network or cryptography libraries. The base class library contains a large set of low-level socket APIs and libraries for higher-level protocols, such as HTTP and FTP. If you right-click a type or member in the left window and select Analyze, a new window should appear, as shown at the right side of Figure 6-20.

image

Figure 6-20: ILSpy analyzing a type

This new window is a tree, which when expanded, shows the types of analyses that can be performed on the item you selected in the left window. Your options will depend on what you selected to analyze. For example, analyzing a type shows three options, although you’ll typically only need to use the following two forms of analysis:

Instantiated By Shows which methods create new instances of this type

Exposed By Shows which methods or properties use this type in their declaration or parameters

If you analyze a member, a method, or a property, you’ll get two options :

Uses Shows what other members or types the selected member uses

Used By Shows what other members use the selected member (say, by calling the method)

You can expand all entries .

And that’s pretty much all there is to statically analyzing a .NET application. Find some code of interest, inspect the decompiled code, and then start analyzing the network protocol.

NOTE

Most of .NET’s core functionality is in the base class library distributed with the .NET runtime environment and available to all .NET applications. The assemblies in the BCL provide several basic network and cryptographic libraries, which applications are likely to need if they implement a network protocol. Look for areas that reference types in the System.Net and System.Security.Cryptography namespaces. These are mostly implemented in the MSCORLIB and System assemblies. If you can trace back from calls to these important APIs, you’ll discover where the application handles the network protocol.

Java Applications

Java applications differ from .NET applications in that the Java compiler doesn’t merge all types into a single file; instead, it compiles each source code file into a single Class file with a .class extension. Because separate Class files in filesystem directories aren’t very convenient to transfer between systems, Java applications are often packaged into a Java archive, or JAR. A JAR file is just a ZIP file with a few additional files to support the Java runtime. Figure 6-21 shows a JAR file opened in a ZIP decompression program.

image

Figure 6-21: An example JAR file opened with a ZIP application

To decompile Java programs, I recommend using JD-GUI (http://jd.benow.ca/), which works in essentially the same as ILSpy when decompiling .NET applications. I won’t cover using JD-GUI in depth but will just highlight a few important areas of the user interface in Figure 6-22 to get you up to speed.

image

Figure 6-22: JD-GUI with an open JAR File

Figure 6-22 shows the JD-GUI user interface when you open the JAR file jce.jar , which is installed by default when you install Java and can usually be found in JAVAHOME/lib. You can open individual class files or multiple JAR files at one time depending on the structure of the application you’re reverse engineering. When you open a JAR file, JD-GUI will parse the metadata as well as the list of classes, which it will present in a tree structure. In Figure 6-22 we can see two important piece of information JD-GUI has extracted. First, a package named javax.crypto , which defines the classes for various Java cryptographic operations. Underneath the package name is list of classes defined in that package, such as CryptoAllPermissionCollection.class . If you click the class name in the left window, a decompiled version of the class will be shown on the right . You can scroll through the decompiled code, or click on the fields and methods exposed by the class to jump to them in the decompiled code window.

The second important thing to note is that any identifier underlined in the decompiled code can be clicked, and the tool will navigate to the definition. If you clicked the underlined all_allowed identifier , the user interface would navigate to the definition of the all_allowed field in the current decompiled class.

Dealing with Obfuscation

All the metadata included with a typical .NET or Java application makes it easier for a reverse engineer to work out what an application is doing. However, commercial developers, who employ special “secret sauce” network protocols, tend to not like the fact that these applications are much easier to reverse engineer. The ease with which these languages are decompiled also makes it relatively straightforward to discover horrible security holes in custom network protocols. Some developers might not like you knowing this, so they use obscurity as a security solution.

You’ll likely encounter applications that are intentionally obfuscated using tools such as ProGuard for Java or Dotfuscator for .NET. These tools apply various modifications to the compiled application that are designed to frustrate a reverse engineer. The modification might be as simple as changing all the type and method names to meaningless values, or it might be more elaborate, such as employing runtime decryption of strings and code. Whatever the method, obfuscation will make decompiling the code more difficult. For example, Figure 6-23 shows an original Java class next to its obfuscated version, which was obtained after running it through ProGuard.

image

Figure 6-23: Original and obfuscated class file comparison

If you encounter an obfuscated application, it can be difficult to determine what it’s doing using normal decompilers. After all, that’s the point of the obfuscation. However, here are a few tips to use when tackling them:

• Keep in mind that external library types and methods (such as core class libraries) cannot be obfuscated. Calls to the socket APIs must exist in the application if it does any networking, so search for them.

• Because .NET and Java are easy to load and execute dynamically, you can write a simple test harness to load the obfuscated application and run the string or code decryption routines.

• Use dynamic reverse engineering as much as possible to inspect types at runtime to determine what they’re used for.

Reverse Engineering Resources

The following URLs provide access to excellent information resources for reverse engineering software. These resources provide more details on reverse engineering or other related topics, such as executable file formats.

• OpenRCE Forums: http://www.openrce.org/

• ELF File Format: http://refspecs.linuxbase.org/elf/elf.pdf

• macOS Mach-O Format: https://web.archive.org/web/20090901205800/

http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html

• PE File Format: https://msdn.microsoft.com/en-us/library/windows/desktop/ms680547(v=vs.85).aspx

For more information on the tools used in this chapter, including where to download them, turn to Appendix A.

Final Words

Reverse engineering takes time and patience, so don’t expect to learn it overnight. It takes time to understand how the operating system and the architecture work together, to untangle the mess that optimized C can produce in the disassembler, and to statically analyze your decompiled code. I hope I’ve given you some useful tips on reverse engineering an executable to find its network protocol code.

The best approach when reverse engineering is to start on small executables that you already understand. You can compare the source of these small executables to the disassembled machine code to better understand how the compiler translated the original programming language.

Of course, don’t forget about dynamic reverse engineering and using a debugger whenever possible. Sometimes just running the code will be a more efficient method than static analysis. Not only will stepping through a program help you to better understand how the computer architecture works, but it will also allow you to analyze a small section of code fully. If you’re lucky, you might get to analyze a managed language executable written in .NET or Java using one of the many tools available. Of course, if the developer has obfuscated the executable, analysis becomes more difficult, but that’s part of the fun of reverse engineering.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset