This section will demonstrate some use cases where the preceding algorithms and techniques are used to support the investigator. For this chapter, we use two very common and interesting examples, Mobile Malware and the National Software Reference Library (NSRL).
In this example, we will check the installed applications on an Android smartphone against an online analysis system, Mobile-Sandbox. Mobile-Sandbox is a website that provides free Android files checking for viruses or suspicious behavior, http://www.mobilesandbox.org. It is connected to VirusTotal, which uses up to 56 different antivirus products and scan engines to check for viruses that the user's antivirus solution may have missed or verify against any false positives. Additionally, Mobile-Sandbox uses custom techniques to detect applications that act potentially malicious. Antivirus software vendors, developers, and researchers behind Mobile-Sandbox can receive copies of the files to help in improving their software and techniques.
In the example, we will use two steps to successfully compare the installed applications with the already tested apps on the Mobile-Sandbox web service.
The first step is to get the hash sums of the installed applications on the device. This is very important as these values can help to identify the apps and check them against the online services. For this example, we will use an application from Google Play, AppExtract (https://play.google.com/store/apps/details?id=de.mspreitz.appextract). The forensically more correct way of getting these values can be found in Chapter 6, Using Python for Mobile Forensics.
AppExtract for Android generates a list of installed and running apps with a large amount of metadata that can help in identifying unwanted or even malicious applications. This metadata contains the SHA256 hash sum of the application packages, an indicator whether the app has been installed by the user or the system itself, and a lot of additional data that can help in identifying if the app is benign or not. These lists can be transferred via your favorite email app for further analysis. Once you receive the plain-text email with the generated lists, you just need to copy the list that contains all the installed applications to a CSV file. This file can be used for an automated analysis or opened with LibreOffice Calc in the lab environment. You can see the metadata of the current version of the Chrome Browser for Android in the following:
Type;App_Name;md5;TargetSdkVersion;Package_Name;Process_Name;APK_Location;Version_Code;Version_Name;Certificate_Info;Certificate_SN;InstallTime;LastModified SystemApp;Chrome;4e4c56a8a7d8d6b1ec3e0149b3918656;21;com.android.chrome;com.android.chrome;/data/app/com.android.chrome-2.apk;2311109;42.0.2311.109;CN=Android, OU=Android, O=Google Inc., L=Mountain View, ST=California, C=US;14042372374541250701;unknown;unknown
The second step is to compare the hash sums from the device (third column in our CSV file) with the Mobile-Sandbox database. This can be done with the help of the following script that we will save as get_infos_mobilesandbox.py
:
#!/usr/bin/env python import sys, requests # Authentication Parameters # if you need an API key and user name please contact @m_spreitz API_FORMAT = 'json' API_USER = '' API_KEY = '' # parsing input parameters if (len(sys.argv) < 3): print "Get infos to a specific Android app from the Mobile-Sandbox." print "Usage: %s requests [type (md5,sha256)] [value]" % sys.argv[0] sys.exit(0) # building the payload payload = {'format':API_FORMAT, 'username':API_USER, 'api_key':API_KEY, 'searchType':str(sys.argv[1]), # has to be md5 or sha256 'searchValue':str(sys.argv[2])} # submitting sample hash and getting meta data print "--------------------------------" r = requests.get("http://mobilesandbox.org/api/bot/queue/get_info/", params=payload) # printing result and writing report file to disk if not r.status_code == requests.codes.ok: print "query result: 33[91m" + r.text + " 33[0m" else: for key, value in r.json().iteritems(): print key + ": 33[94m" + str(value) + " 33[0m" print "--------------------------------"
The script can be used as shown in the following:
(labenv)user@lab:~$ ./get_infos_mobilesandbox.py md5 4e4c56a8a7d8d6b1ec3e0149b3918656 -------------------------------- status: done min_sdk_version: 0 package_name: com.android.chrome apk_name: Chrome.apk AV_detection_rate: 0 / 56 drebin_score: benign (1.38173) sample_origin: user upload android_build_version: Android 1.0 ssdeep: 196608:ddkkKqfC+ca8eE/jXQewwn5ux1aDn9PpvPBic6aQmAHQXPOo:dBKZaJYXQE5u3ajtpvpeaQm1 sha256: 79de1dc6af66e6830960d6f991cc3e416fd3ce63fb786db6954a3ccaa7f7323c malware_family: --- md5: 4e4c56a8a7d8d6b1ec3e0149b3918656 --------------------------------
With the help of these three tools, it is possible to quickly check if an application on a mobile device is potentially infected (see the highlighted parts in the response) or at least where to start with the manual investigation if an application hasn't been tested before.
To increase efficiency in the forensic analysis, it is crucial to sort out any files that belong to known software and have not been modified. The National Software Reference Library (NSRL) maintains multiple lists of hash sums for the known content. NSRL is a project of the U.S. Department of Homeland Security, further details are available on http://www.nsrl.nist.gov/. It is important to understand that these lists of hash sums merely indicate that a file was not modified as compared to the version that was submitted to the NSRL. Consequently, it is normal that a lot of files, which are to be analysed during a forensic investigation, are not listed in NSRL. On the other hand, even the listed files can be used and deployed by an attacker as a tool. For example, a tool such as psexec.exe
is a program provided by Microsoft for remote administration and listed in NSRL. Nevertheless, an attacker may have deployed it for his malicious purposes.
The minimal set is offered free of charge to download on the NIST homepage. The download consists of a single ZIP file with the hash list and a list of supported software products as the most prominent contents.
The hashes are stored in the NSRLFile.txt
file that holds one file hash per line, for example:
"3CACD2048DB88F4F2E863B6DE3B1FD197922B3F2","0BEA3F79A36B1F67B2CE0F595524C77C","C39B9F35","TWAIN.DLL",94784,14965,"358",""
The fields of this record are as follows:
NSRLProd.txt
file contains a list of all products and can be used to look up the product code. In the previous example, the code 14965
denotes Microsoft Picture It!.NSRLOS.txt
.More details about the file specifications can be found at http://www.nsrl.nist.gov/Documents/Data-Formats-of-the-NSRL-Reference-Data-Set-16.pdf.
Currently, the NSRL database contains more than 40 million distinct hashes in the minimal set. A text-based search would take minutes, even on an up-to-date workstation. Therefore, it is important to make efficient lookups to that database. Rob Hanson's tool nsrlsvr provides a server that supports efficient lookups. It is available at https://rjhansen.github.io/nsrlsvr/.
To compile the software on a Linux system, the automake, autoconf, and c++ compiler tools must be installed. The detailed installation instructions including all the requirements are provided in the INSTALL
file.
Installing nsrlsvr in a non-default directory
The installation directory of nsrlsvr can be changed by calling the configure
script with the --prefix
parameter. The parameter value denotes the target directory. If a user-writable directory is specified, the installation does not require root privileges and can be completely removed by removing the installation directory.
The nsrlsrv maintains its own copy of all the MD5 hash sums of the NSRL database. Therefore, it is required to initialize the hash database. The required nsrlupdate tool is provided with nsrlsrv.
user@lab:~$ nsrlupdate your/path/to/NSRLFile.txt
After the database is fully populated, the server can be started by simply calling:
user@lab:~$ nsrlsvr
If everything is installed correctly, this command returns without providing any output and the server starts listening to the TCP port 9120 for requests.
There is also a client tool for using nsrlsvr called nsrllookup. The client is written in C++ and available at https://rjhansen.github.io/nsrllookup/. However, a client for interacting with nsrlsvr can easily be implemented in native Python. This section explains the protocol and shows a sample implementation of such a client.
The nsrlsvr implements a text-oriented protocol on its network port 9120. Every command consists of one line of text followed by a newline (CR LF). The following commands are supported:
OK
followed by a line break.OK
followed by a sequence of zeroes and ones. A 1
indicates that the MD5 hash sum was found in the database and a 0
indicates that there was no match. For example, the query shown previously would lead to the following answer:OK 101
This means that the first and the last MD5 hashes were found in NSRL, but the middle hash sum could not be found.
Consequently, the following Python routine is sufficient to efficiently query the NSRL database:
#!/usr/bin/env python import socket NSRL_SERVER='127.0.0.1' NSRL_PORT=9120 def nsrlquery(md5hashes): """Query the NSRL server and return a list of booleans. Arguments: md5hashes -- The list of MD5 hashes for the query. """ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((NSRL_SERVER, NSRL_PORT)) try: f = s.makefile('r') s.sendall("version: 2.0 ") response = f.readline(); if response.strip() != 'OK': raise RuntimeError('NSRL handshake error') query = 'query ' + ' '.join(md5hashes) + " " s.sendall(query) response = f.readline(); if response[:2] != 'OK': raise RuntimeError('NSRL query error') return [c=='1' for c in response[3:].strip()] finally: s.close()
Using this module is as easy as shown here:
import nsrlquery hashes = ['86d3d86902b09d963afc08ea0002a746', '3dcfe9688ca733a76f82d03d7ef4a21f', '976fe1fe512945e390ba10f6964565bf'] nsrlquery.nsrlquery(hashes)
This code queries the NSRL server and returns a list of booleans, each indicating whether the corresponding MD5 hash has been found in the NSRL file list.