Real-world scenarios

This section will demonstrate some use cases where the preceding algorithms and techniques are used to support the investigator. For this chapter, we use two very common and interesting examples, Mobile Malware and the National Software Reference Library (NSRL).

Mobile Malware

In this example, we will check the installed applications on an Android smartphone against an online analysis system, Mobile-Sandbox. Mobile-Sandbox is a website that provides free Android files checking for viruses or suspicious behavior, http://www.mobilesandbox.org. It is connected to VirusTotal, which uses up to 56 different antivirus products and scan engines to check for viruses that the user's antivirus solution may have missed or verify against any false positives. Additionally, Mobile-Sandbox uses custom techniques to detect applications that act potentially malicious. Antivirus software vendors, developers, and researchers behind Mobile-Sandbox can receive copies of the files to help in improving their software and techniques.

In the example, we will use two steps to successfully compare the installed applications with the already tested apps on the Mobile-Sandbox web service.

The first step is to get the hash sums of the installed applications on the device. This is very important as these values can help to identify the apps and check them against the online services. For this example, we will use an application from Google Play, AppExtract (https://play.google.com/store/apps/details?id=de.mspreitz.appextract). The forensically more correct way of getting these values can be found in Chapter 6, Using Python for Mobile Forensics.

Mobile Malware

AppExtract for Android generates a list of installed and running apps with a large amount of metadata that can help in identifying unwanted or even malicious applications. This metadata contains the SHA256 hash sum of the application packages, an indicator whether the app has been installed by the user or the system itself, and a lot of additional data that can help in identifying if the app is benign or not. These lists can be transferred via your favorite email app for further analysis. Once you receive the plain-text email with the generated lists, you just need to copy the list that contains all the installed applications to a CSV file. This file can be used for an automated analysis or opened with LibreOffice Calc in the lab environment. You can see the metadata of the current version of the Chrome Browser for Android in the following:

Type;App_Name;md5;TargetSdkVersion;Package_Name;Process_Name;APK_Location;Version_Code;Version_Name;Certificate_Info;Certificate_SN;InstallTime;LastModified

SystemApp;Chrome;4e4c56a8a7d8d6b1ec3e0149b3918656;21;com.android.chrome;com.android.chrome;/data/app/com.android.chrome-2.apk;2311109;42.0.2311.109;CN=Android, OU=Android, O=Google Inc., L=Mountain View, ST=California, C=US;14042372374541250701;unknown;unknown

The second step is to compare the hash sums from the device (third column in our CSV file) with the Mobile-Sandbox database. This can be done with the help of the following script that we will save as get_infos_mobilesandbox.py:

#!/usr/bin/env python

import sys, requests

# Authentication Parameters
# if you need an API key and user name please contact @m_spreitz
API_FORMAT = 'json'
API_USER = ''
API_KEY = ''

# parsing input parameters
if (len(sys.argv) < 3):
    print "Get infos to a specific Android app from the Mobile-Sandbox."
    print "Usage: %s requests [type (md5,sha256)] [value]" % sys.argv[0]
    sys.exit(0)

# building the payload
payload = {'format':API_FORMAT,
           'username':API_USER,
           'api_key':API_KEY,
           'searchType':str(sys.argv[1]),   # has to be md5 or sha256
           'searchValue':str(sys.argv[2])}

# submitting sample hash and getting meta data
print "--------------------------------"
r = requests.get("http://mobilesandbox.org/api/bot/queue/get_info/", params=payload)

# printing result and writing report file to disk
if not r.status_code == requests.codes.ok:
    print "query result: 33[91m" + r.text + "33[0m"
else:
    for key, value in r.json().iteritems():
        print key + ": 33[94m" + str(value) + "33[0m"
print "--------------------------------"

The script can be used as shown in the following:

(labenv)user@lab:~$ ./get_infos_mobilesandbox.py md5 4e4c56a8a7d8d6b1ec3e0149b3918656

--------------------------------
status: done
min_sdk_version: 0
package_name: com.android.chrome
apk_name: Chrome.apk
AV_detection_rate: 0 / 56
drebin_score: benign (1.38173)
sample_origin: user upload
android_build_version: Android 1.0
ssdeep: 196608:ddkkKqfC+ca8eE/jXQewwn5ux1aDn9PpvPBic6aQmAHQXPOo:dBKZaJYXQE5u3ajtpvpeaQm1
sha256: 79de1dc6af66e6830960d6f991cc3e416fd3ce63fb786db6954a3ccaa7f7323c
malware_family: ---
md5: 4e4c56a8a7d8d6b1ec3e0149b3918656
--------------------------------

With the help of these three tools, it is possible to quickly check if an application on a mobile device is potentially infected (see the highlighted parts in the response) or at least where to start with the manual investigation if an application hasn't been tested before.

NSRLquery

To increase efficiency in the forensic analysis, it is crucial to sort out any files that belong to known software and have not been modified. The National Software Reference Library (NSRL) maintains multiple lists of hash sums for the known content. NSRL is a project of the U.S. Department of Homeland Security, further details are available on http://www.nsrl.nist.gov/. It is important to understand that these lists of hash sums merely indicate that a file was not modified as compared to the version that was submitted to the NSRL. Consequently, it is normal that a lot of files, which are to be analysed during a forensic investigation, are not listed in NSRL. On the other hand, even the listed files can be used and deployed by an attacker as a tool. For example, a tool such as psexec.exe is a program provided by Microsoft for remote administration and listed in NSRL. Nevertheless, an attacker may have deployed it for his malicious purposes.

Tip

Which NSRL list should be used?

NSRL consists of several hash sets. It is highly recommended to begin with the minimal set. This set only contains one hash sum per file, which means only one file version is known.

The minimal set is offered free of charge to download on the NIST homepage. The download consists of a single ZIP file with the hash list and a list of supported software products as the most prominent contents.

The hashes are stored in the NSRLFile.txt file that holds one file hash per line, for example:

"3CACD2048DB88F4F2E863B6DE3B1FD197922B3F2","0BEA3F79A36B1F67B2CE0F595524C77C","C39B9F35","TWAIN.DLL",94784,14965,"358",""

The fields of this record are as follows:

  • The hash sum of the file that is calculated with SHA-1, a predecessor to the SHA-256 algorithm described earlier.
  • The hash sum of the file that is calculated with MD5.
  • The CRC32 checksum of the file.
  • The file name.
  • The file size in bytes.
  • A product code denoting the software product this file belongs to. The NSRLProd.txt file contains a list of all products and can be used to look up the product code. In the previous example, the code 14965 denotes Microsoft Picture It!.
  • The operating system where this file is to be expected. The list of operating system codes can be found in NSRLOS.txt.
  • An indicator whether this file is to be considered normal (""), a malicious file ("N"), or special ("S"). While this flag is part of the specification, all the files of the current NSRL minimal set are set to be normal.

More details about the file specifications can be found at http://www.nsrl.nist.gov/Documents/Data-Formats-of-the-NSRL-Reference-Data-Set-16.pdf.

Downloading and installing nsrlsvr

Currently, the NSRL database contains more than 40 million distinct hashes in the minimal set. A text-based search would take minutes, even on an up-to-date workstation. Therefore, it is important to make efficient lookups to that database. Rob Hanson's tool nsrlsvr provides a server that supports efficient lookups. It is available at https://rjhansen.github.io/nsrlsvr/.

Note

There are also public NSRL servers on the Internet that you can use. These are usually provided on an as is basis. However, to test smaller sets of hashes, you may use Robert Hanson's public server nsrllookup.com and continue reading with the next section.

To compile the software on a Linux system, the automake, autoconf, and c++ compiler tools must be installed. The detailed installation instructions including all the requirements are provided in the INSTALL file.

Tip

Installing nsrlsvr in a non-default directory

The installation directory of nsrlsvr can be changed by calling the configure script with the --prefix parameter. The parameter value denotes the target directory. If a user-writable directory is specified, the installation does not require root privileges and can be completely removed by removing the installation directory.

The nsrlsrv maintains its own copy of all the MD5 hash sums of the NSRL database. Therefore, it is required to initialize the hash database. The required nsrlupdate tool is provided with nsrlsrv.

user@lab:~$ nsrlupdate your/path/to/NSRLFile.txt

After the database is fully populated, the server can be started by simply calling:

user@lab:~$ nsrlsvr

If everything is installed correctly, this command returns without providing any output and the server starts listening to the TCP port 9120 for requests.

Writing a client for nsrlsvr in Python

There is also a client tool for using nsrlsvr called nsrllookup. The client is written in C++ and available at https://rjhansen.github.io/nsrllookup/. However, a client for interacting with nsrlsvr can easily be implemented in native Python. This section explains the protocol and shows a sample implementation of such a client.

The nsrlsvr implements a text-oriented protocol on its network port 9120. Every command consists of one line of text followed by a newline (CR LF). The following commands are supported:

  • version: 2.0: The version command is used for the initial handshake between the nsrl client and nsrlsvr. The client is supposed to provide its version after the colon. The server will always respond with OK followed by a line break.
  • query 5CB360EF546633691912089DB24A82EE 908A54EB629F410C647A573F91E80775 BFDD76C4DD6F8C0C2474215AD5E193CF: The query command is used for actually querying the NSRL database from the server. The keyword query is followed by one or multiple MD5 hash sums. The server will respond with OK followed by a sequence of zeroes and ones. A 1 indicates that the MD5 hash sum was found in the database and a 0 indicates that there was no match. For example, the query shown previously would lead to the following answer:
    OK 101
    

    This means that the first and the last MD5 hashes were found in NSRL, but the middle hash sum could not be found.

  • BYE: The bye command terminates the connection to the nsrlsvr.

Consequently, the following Python routine is sufficient to efficiently query the NSRL database:

#!/usr/bin/env python

import socket

NSRL_SERVER='127.0.0.1'
NSRL_PORT=9120

def nsrlquery(md5hashes):
    """Query the NSRL server and return a list of booleans.

    Arguments:
    md5hashes -- The list of MD5 hashes for the query.
    """

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((NSRL_SERVER, NSRL_PORT))

    try:
        f = s.makefile('r')
        s.sendall("version: 2.0
")
        response = f.readline();
        if response.strip() != 'OK':
            raise RuntimeError('NSRL handshake error')

        query = 'query ' + ' '.join(md5hashes) + "
"
        s.sendall(query)
        response = f.readline();

        if response[:2] != 'OK':
            raise RuntimeError('NSRL query error')

        return [c=='1' for c in response[3:].strip()]
    finally:
        s.close()

Using this module is as easy as shown here:

import nsrlquery
hashes = ['86d3d86902b09d963afc08ea0002a746',
          '3dcfe9688ca733a76f82d03d7ef4a21f',
          '976fe1fe512945e390ba10f6964565bf']
nsrlquery.nsrlquery(hashes)

This code queries the NSRL server and returns a list of booleans, each indicating whether the corresponding MD5 hash has been found in the NSRL file list.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset