Chapter 11. Malware Analysis

Detecting the presence of malicious code is one of the most fundamental and challenging activities in cybersecurity operations. You have two main options when analyzing a piece of code: static and dynamic. During static analysis you analyze the code itself to determine whether indicators of malicious activity exist. During dynamic analysis, you execute the code and then look at its behavior and impact on a system to determine its functionality. In this chapter, we focus on static analysis techniques.

Warning

When dealing with potentially malicious files, be sure to perform any analysis on a system that is not connected to a network and does not contain any sensitive information. Afterward, assume that the system has been infected, and completely wipe and reimage the system before introducing it back into your network.

Commands in Use

In this chapter, we introduce curl to interact with websites, vi to edit files, and xxd to perform base conversions and file analysis.

curl

The curl command can be used to transfer data over a network between a client and a server. It supports multiple protocols, including HTTP, HTTPS, FTP, SFTP, and Telnet. curl is extremely versatile. The command options presented next represent only a small fraction of the capabilities available. For more information, be sure to check out the Linux man page for curl.

Common command options

-A

Specify the HTTP user agent string to send to the server

-d

Data to send with an HTTP POST request

-G

Use an HTTP GET request to send data rather than a POST

-I

Fetch only the protocol (HTTP, FTP) header

-L

Follow redirects

-s

Do not show error messages or progress bar

Command example

To fetch a standard web page, you need to pass in only the URL as the first argument. By default, curl will display the contents of the web page to standard out. You can redirect the output to a file by using a redirect or the -o option:

curl https://www.digadel.com
Tip

Not sure where a potentially dangerous shortened URL goes? Expand it with curl:

curl -ILs http://bitly.com/1k5eYPw | grep '^Location:'

vi

vi is not your typical command, but rather a full-featured command-line text editor. It is highly capable and even supports plug-ins.

Command example

To open the file somefile.txt in vi:

vi somefile.txt

When you are in the vi environment, hit the Esc key and then type i to enter Insert mode so you can edit the text. To exit Insert mode, press Esc.

To enter Command mode, hit the Esc key. You can enter one of the commands in Table 11-1 and press Enter for it to take effect.

Table 11-1. Common vi commands
Command Purpose

b

Back one word

cc

Replace current line

cw

Replace current word

dw

Delete current word

dd

Delete current line

:w

Write/save the file

:w filename

Write/save the file as filename

:q!

Quit without saving

ZZ

Save and quit

:set number

Show line numbers

/

Search forward

?

Search backward

n

Find next occurrence

A full overview of vi is beyond the scope of this book. For more information, you can the visit Vim editor page.

xxd

The xxd command displays a file to the screen in binary or hexadecimal format.

Common command options

-b

Display the file using binary rather than hexadecimal output

-l

Print n number of bytes

-s

Start printing at byte position n

Command example

To display somefile.txt, start at byte offset 35 and print the next 50 bytes:

xxd -s 35 -l 50 somefile.txt

Reverse Engineering

The details of how to reverse engineer a binary is beyond the scope of this book. However, we do cover how the standard command line can be used to enable your reverse-engineering efforts. This is not meant to be a replacement for reverse-engineering tools like IDA Pro or OllyDbg; rather, it is meant to provide techniques that can be used to augment those tools or provide you with some capability if they are not available.

Tip

For detailed information on malware analysis, see Practical Malware Analysis by Michael Sikorski and Andrew Honig (No Starch Press). For more information on IDA Pro, see The IDA Pro Book by Chris Eagle (No Starch Press).

Hexadecimal, Decimal, Binary, and ASCII Conversions

When analyzing files, it is critical to be able to translate easily between decimal, hexadecimal, and ASCII. Thankfully, this can easily be done on the command line. Take the starting hexadecimal value 0x41. You can use printf to convert it to decimal by using the format string "%d":

$ printf "%d" 0x41

65

To convert the decimal 65 back to hexadecimal, replace the format string with %x:

$ printf "%x" 65

41

To convert from ASCII to hexadecimal, you can pipe the character into the xxd command from printf:

$ printf 'A' | xxd

00000000: 41

To convert from hexadecimal to ASCII, use the xxd command’s -r option:

$ printf 0x41 | xxd -r

A

To convert from ASCII to binary, you can pipe the character into xxd and use the -b option:

$ printf 'A' | xxd -b

00000000: 01000001
Tip

The printf command is purposely used in the preceding examples rather than echo. That is because the echo command automatically appends a line feed that adds an extraneous character to the output. This can be seen here:

$ echo 'A' | xxd

00000000: 410a

Next, let’s look further at the xxd command and how it can be used to analyze a file such as an executable.

Analyzing with xxd

The executable helloworld will be used to explore the functionality of xxd. The source code is shown in Example 11-1. The file helloworld was compiled for Linux into Executable and Linkable Format (ELF) by using the GNU C Compiler (GCC).

Example 11-1. helloworld.c
#include <stdio.h>

int main()
{
  printf("Hello World!
");
  return 0;
}

The xxd command can be used to examine any part of the executable. As an example, you can look at the file’s magic number, which begins at position 0x00 and is 4 bytes in size. To do that, use -s for the starting position (in decimal), and -l for the number of bytes (in decimal) to return. The starting offset and length can also be specified in hexadecimal by prepending 0x to the number (i.e., 0x2A). As expected, the ELF magic number is seen.

$ xxd -s 0 -l 4 helloworld

00000000: 7f45 4c46                                .ELF

The fifth byte of the file will tell you whether the executable is 32-bit (0x01) or 64-bit (0x02) architecture. In this case, it is a 64-bit executable:

$ xxd -s 4 -l 1 helloworld

00000004: 02

The sixth byte tells you whether the file is little-endian (0x01) or big-endian (0x02). In this case, it is little-endian:

$ xxd -s 5 -l 1 helloworld

00000005: 01

The format and endianness are critical pieces of information for analyzing the rest of the file. For example, the 8 bytes starting at offset 0x20 of a 64-bit ELF file specify the offset of the program header:

$ xxd -s 0x20 -l 8 helloworld

00000020: 4000 0000 0000 0000

You know that the offset of the program header is 0x40 because the file is little-endian. That offset can then be used to display the program header, which should be 0x38 bytes in length for a 64-bit ELF file:

$ xxd -s 0x40 -l 0x38 helloworld

00000040: 0600 0000 0500 0000 4000 0000 0000 0000  ........@.......
00000050: 4000 4000 0000 0000 4000 4000 0000 0000  @.@.....@.@.....
00000060: f801 0000 0000 0000 f801 0000 0000 0000  ................
00000070: 0800 0000 0000 0000                      ........

For more information on the Linux ELF file format, see the Tool Interface Standard (TIS) Executable and Linking format (ELF) Specification.

For more information on the Windows executable file format, see the Microsoft portable executable file format documentation.

Hex editor

Sometimes you may need to display and edit a file in hexadecimal. You can combine xxd with the vi editor to do just that. First, open the file you want to edit as normal with vi:

vi helloworld

After the file is open, enter the vi command:

:%!xxd

In vi, the % symbol represents the address range of the entire file, and the ! symbol can be used to execute a shell command, replacing the original lines with the output of the command. Combining the two as shown in the preceding example will run the current file through xxd (or any shell command) and leave the results in vi:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 3004 4000 0000 0000  ..>.....0.@.....
00000020: 4000 0000 0000 0000 efbf bd19 0000 0000  @...............
00000030: 0000 0000 0000 4000 3800 0900 4000 1f00  [email protected]...@...
00000040: 1c00 0600 0000 0500 0000 4000 0000 0000  ..........@.....
.
.
.

After you have made your edits, you can covert the file back to normal by using the vi command :%!xxd -r. Write out these changes (ZZ) when you are done. Of course, you can just quit without writing (:q!) at any time, and the file will be left unchanged.

Tip

To convert a file loaded in vi to Base64 encoding, use :%!base64. To convert back from Base64, use :%!base64 -d.

Extracting Strings

One of the most basic approaches to analyzing an unknown executable is to extract any ASCII strings contained in the file. This can often yield information such as filenames or paths, IP addresses, author names, compiler information, URLs, and other information that might provide valuable insight into the program’s functionality or origin.

A command called strings can extract ASCII data for us, but it is not available by default on many distributions, including Git Bash. To solve this more universally, we can use our good friend egrep:

egrep -a -o '[[:print:]]{2,}' somefile.exe

This regex expression searches the specified file for two or more (that’s the {2,} construct) printable characters in a row that appear as their own contiguous word. The -a option processes the binary executable as if it were a text file. The -o option will output only the matching text rather than the entire line, thereby eliminating any of the nonprintable binary data. The search is for two or more characters because single characters are quite likely in any binary byte and thus are not significant.

To make the output even cleaner, you can pipe the results into sort with the -u option to remove any duplicates:

egrep -a -o '[[:print:]]{2,}' somefile.exe | sort -u

It may also be useful to sort the strings from longest to shortest, as the longest strings are more likely to contain interesting information. The sort command does not provide a way to do this natively, so you can use awk to augment it:

egrep -a -o '[[:print:]]{2,}' somefile.exe |
              awk '{print length(), $0}' | sort -rnu

Here, you first send the egrep output to awk to have it prepend the length of each string on each line. This output is then sorted in reverse numerical order with duplicates removed.

The approach of extracting strings from an executable does have its limitations. If a string is not contiguous, meaning that nonprintable characters separate one or more characters, the string will print out as individual characters rather than the entire string. This is sometimes just an artifact of how an executable is constructed, but it can also be done intentionally by malware developers to help avoid detection. Malware developers may also use encoding or encryption to similarly mask the existence of strings in a binary file.

Interfacing with VirusTotal

VirusTotal is a commercial online tool used to upload files and run them against a battery of antivirus engines and other static analysis tools to determine whether they are malicious. VirusTotal can also provide information on how often a particular file has been seen in the wild, or if anyone else has identified it as malicious; this is known as a file’s reputation. If a file has never been seen before in the wild, and therefore has a low reputation, it is more likely to be malicious.

Warning

Be cautious when uploading files to VirusTotal and similar services. Those services maintain databases of all files uploaded, so files with potentially sensitive or privileged information should never be uploaded. Additionally, in certain circumstances, uploading malware files to public repositories could alert an adversary that you have identified his presence on your system.

VirusTotal provides an API that can be used to interface with the service by using curl. To use the API you must have a unique API key. To obtain a key, go to the VirusTotal website and request an account. After you create an account, log in and go to your account settings to view your API key. A real API key will not be used for the examples in this book due to security concerns; instead, we will use the text replacewithapikey anywhere your API key should be substituted.

Tip

The full VirusTotal API can be found in the VirusTotal documentation.

Searching the Database by Hash Value

VirusTotal uses a Representational State Transfer (REST) request to interact with the service over the internet. Table 11-2 lists some of the REST URLs for VirusTotal’s basic file-scanning functionality.

Table 11-2. VirusTotal tile API
Description Request URL Parameters

Retrieve a scan report

https://www.virustotal.com/vtapi/v2/file/report

apikey, resource, allinfo

Upload and scan a file

https://www.virustotal.com/vtapi/v2/file/scan

apikey, file

VirusTotal keeps a history of all files that have been previously uploaded and analyzed. You can search the database by using a hash of your suspect file to determine whether a report already exists; this saves you from having to actually upload the file. The limitation with this method is that if no one else has ever uploaded the same file to VirusTotal, no report will exist.

VirusTotal accepts MD5, SHA-1, and SHA-256 hash formats, which you can generate using md5sum, sha1sum, and sha256sum, respectively. Once you have generated the hash of your file it can be sent to VirusTotal by using curl and a REST request.

The REST request is in the form of a URL that begins with https://www.virustotal.com/vtapi/v2/file/report and has the following three primary parameters:

apikey

Your API key obtained from VirusTotal

resource

The MD5, SHA-1, or SHA-256 hash of the file

allinfo

If true, will return additional information from other tools

As an example, we will use a sample of the WannaCry malware, which has an MD5 hash of db349b97c37d22f5ea1d1841e3c89eb4:

curl 'https://www.virustotal.com/vtapi/v2/file/report?apikey=replacewithapikey&
resource=db349b97c37d22f5ea1d1841e3c89eb4&allinfo=false > WannaCry_VirusTotal.txt

The resulting JSON response contains a list of all antivirus engines the file was run against and their determination of whether the file was detected as malicious. Here, we can see the responses from the first two engines, Bkav and MicroWorld-eScan:

{"scans":
  {"Bkav":
    {"detected": true,
     "version": "1.3.0.9466",
     "result": "W32.WannaCrypLTE.Trojan",
     "update": "20180712"},
   "MicroWorld-eScan":
    {"detected": true,
     "version": "14.0.297.0",
     "result": "Trojan.Ransom.WannaCryptor.H",
     "update": "20180712"}
      .
      .
      .

Although JSON is great for structuring data, it is a little difficult for humans to read. You can extract some of the important information, such as whether the file was detected as malicious, by using grep:

$ grep -Po '{"detected": true.*?"result":.*?,' Calc_VirusTotal.txt

{"detected": true, "version": "1.3.0.9466", "result": "W32.WannaCrypLTE.Trojan",
{"detected": true, "version": "14.0.297.0", "result": "Trojan.Ransom.WannaCryptor.H",
{"detected": true, "version": "14.00", "result": "Trojan.Mauvaise.SL1",

The -P option for grep is used to enable the Perl engine, which allows you to use the pattern .*? as a lazy quantifier. This lazy quantifier matches only the minimum number of characters needed to satisfy the entire regular expression, thus allowing you to extract the response from each of the antivirus engines individually rather than in a large clump.

Although this method works, a much better solution can be created using a bash script, as shown in Example 11-2.

Example 11-2. vtjson.sh
#!/bin/bash -
#
# Rapid Cybersecurity Ops
# vtjson.sh
#
# Description:
# Search a JSON file for VirusTotal malware hits
#
# Usage:
# vtjson.awk [<json file>]
#   <json file> File containing results from VirusTotal
#               default: Calc_VirusTotal.txt
#

RE='^.(.*)...{.*detect..(.*),..vers.*result....(.*).,..update.*$'     1

FN="${1:-Calc_VirusTotal.txt}"
sed -e 's/{"scans": {/&
 /' -e 's/},/&
/g' "$FN" |           2
while read ALINE
do
    if [[ $ALINE =~ $RE ]]                                     3
    then
	VIRUS="${BASH_REMATCH[1]}"                                    4
	FOUND="${BASH_REMATCH[2]}"
	RESLT="${BASH_REMATCH[3]}"
	if [[ $FOUND =~ .*true.* ]]                                   5
	then
	    echo $VIRUS "- result:" $RESLT
	fi
    fi
done
1

This complex regular expression (or RE) is looking for lines that contain DETECT and RESULT and UPDATE in that sequence on a line. More importantly, the RE is also locating three substrings within any line that matches those three keywords. The substrings are delineated by the parentheses; the parentheses are not to be found in the strings that we’re searching, but rather are syntax of the RE to indicate a grouping.

Let’s look at the first group in this example. The RE is enclosed in single quotes. There may be lots of special characters, but we don’t want the shell to interpret them as special shell characters; we want them passed through literally to the regex processor. The next character, the ^, say, to anchor this search to the beginning of the line. The next character, the ., matches any character in the input line. Then comes a group of any character, the . again, repeated any number of times, indicated by the *.

So how many characters will fill in that first group? We need to keep looking along the RE to see what else has to match. What has to come after the group is three characters followed by a left brace. So we can now describe that first grouping as all the characters beginning at the second character of the line, up to, but not including, the three characters before the left brace.

It’s similar with the other groupings; they are constrained in their location by the dots and keywords. Yes, this does make for a rather rigid format, but in this case we are dealing with a rather rigid (predictable) format. This script could have been written to handle a more flexible input format. See the exercises at the end of the chapter.

2

The sed command is preparing our input for easier processing. It puts the initial JSON keyword scans and its associated punctuations on a line by itself. It then also puts a newline at the end of each right brace (with a comma after it). In both edit expressions, the ampersand on the righthand side of a substitution represents whatever was matched on the left side. For example, in the second substitution, the ampersand is shorthand for a right brace and comma.

3

Here is where the regular expression is put into use. Be sure not to put the $RE inside quotes, or it will match for those special characters as literals. To get the regular expression behavior, put no quotes around it.

4

If any parentheses are used in the regular expression, they delineate a substring that can be retrieved from the shell array variable BASH_REMATCH. Index 1 holds the first substring, etc.

5

This is another use of the regular expression matching. We are looking for the word true anywhere in the line. This makes assumptions about our input data—that the word doesn’t appear in any other field than the one we want. We could have made it more specific (locating it near the word detected, for example), but this is much more readable and will work as long as the four letters t-r-u-e don’t appear in sequence in any other field.

You don’t necessarily need to use regular expressions to solve this problem. Here is a solution using awk. Now awk can make powerful use of regular expressions, but you don’t need them here because of another powerful feature of awk: the parsing of the input into fields. Example 11-3 shows the code.

Example 11-3. vtjson.awk
# Cybersecurity Ops with bash
# vtjson.awk
#
# Description:
# Search a JSON file for VirusTotal malware hits
#
# Usage:
# vtjson.awk <json file>
#   <json file> File containing results from VirusTotal
#

FN="${1:-Calc_VirusTotal.txt}"
sed -e 's/{"scans": {/&
 /' -e 's/},/&
/g' "$FN" |     1
awk '
NF == 9 {                                       2
    COMMA=","
    QUOTE="""                                  3
    if ( $3 == "true" COMMA ) {                 4
        VIRUS=$1                                5
        gsub(QUOTE, "", VIRUS)                  6

        RESLT=$7
        gsub(QUOTE, "", RESLT)
        gsub(COMMA, "", RESLT)

        print VIRUS, "- result:", RESLT
    }
}'
1

We begin with the same preprocessing of the input as we did in the previous script. This time, we pipe the results into awk.

2

Only input lines with nine fields will execute the code inside these braces.

3

We set up variables to hold these string constants. Note that we can’t use single quotes around the one double-quote character. Why? Because the entire awk script is being protected (from the shell interpreting special characters) by being enclosed in single quotes. (Look back three lines, and at the end of this script.) Instead, we “escape” the double quote by preceding it with a backslash.

4

This compares the third field of the input line to the string "true," because in awk, juxtaposition of strings implies concatenation. We don’t use a plus sign to “add” the two strings as we do in some languages; we just put them side by side.

5

As with the $3 used in the if clause, the $1 here refers to a field number of the input line—the first word, if you will, of the input. It is not a shell variable referring to a script parameter. Remember the single quotes that encase this awk script.

6

gsub is an awk function that does a global substitution. It replaces all occurrences of the first argument with the second argument when searching through the third argument. Since the second argument is the empty string, the net result is that it removes all quote characters from the string in the variable VIRUS (which was assigned the value of the first field of the input line).

The rest of the script is much the same, doing those substitutions and then printing the results. Remember, too, that in awk, it keeps reading stdin and running through the code once for each line of input, until the end of the input.

Scanning a File

You can upload new files to VirusTotal to be analyzed if information on them does not already exist in the database. To do that, you need to use an HTML POST request to the URL https://www.virustotal.com/vtapi/v2/file/scan. You must also provide your API key and a path to the file to upload. The following is an example using the Windows calc.exe file that can typically be found in the c:WindowsSystem32 directory:

curl --request POST --url 'https:&#x002F;/www.virustotal.com/vtapi/v2/file/scan'
--form 'apikey=replacewithapikey' --form 'file=@/c/Windows/System32/calc.exe'

When uploading a file, you do not receive the results immediately. What is returned is a JSON object, such as the following, that contains metadata on the file that can be used to later retrieve a report using the scan ID or one of the hash values:

{
"scan_id": "5543a258a819524b477dac619efa82b7f42822e3f446c9709fadc25fdff94226-1...",
"sha1": "7ffebfee4b3c05a0a8731e859bf20ebb0b98b5fa",
"resource": "5543a258a819524b477dac619efa82b7f42822e3f446c9709fadc25fdff94226",
"response_code": 1,
"sha256": "5543a258a819524b477dac619efa82b7f42822e3f446c9709fadc25fdff94226",
"permalink": "https://www.virustotal.com/file/5543a258a819524b477dac619efa82b7...",
"md5": "d82c445e3d484f31cd2638a4338e5fd9",
"verbose_msg": "Scan request successfully queued, come back later for the report"
}

Scanning URLs, Domains, and IP Addresses

VirusTotal also has features to perform scans on a particular URL, domain, or IP address. All of the API calls are similar in that they make an HTTP GET request to the corresponding URL listed in Table 11-3 with the parameters set appropriately.

Table 11-3. VirusTotal URL API
Description Request URL Parameters

URL report

https://www.virustotal.com/vtapi/v2/url/report

apikey, resource, allinfo, scan

Domain report

https://www.virustotal.com/vtapi/v2/domain/report

apikey, domain

IP report

https://www.virustotal.com/vtapi/v2/ip-address/report

apikey, ip

Here is an example of requesting a scan report on a URL:

curl 'https://www.virustotal.com/vtapi/v2/url/report?apikey=replacewithapikey
&resource=www.oreilly.com&allinfo=false&scan=1'

The parameter scan=1 will automatically submit the URL for analysis if it does not already exist in the database.

Summary

The command line alone cannot provide the same level of capability as full-fledged reverse-engineering tools, but it can be quite powerful for inspecting an executable or file. Remember to analyze suspected malware only on systems that are disconnected from the network, and be cognizant of confidentiality issues that may arise if you upload files to VirusTotal or other similar services.

In the next chapter, we look at how to improve data visualization post gathering and analysis.

Workshop

  1. Create a regular expression to search a binary for single printable characters separated by single nonprintable characters. For example, p.a.s.s.w.o.r.d, where . represents a nonprintable character.

  2. Search a binary file for instances of a single printable character. Rather than printing the ones that you find, print all the ones that you don’t find. For a slightly simpler exercise, consider only the alphanumeric characters rather than all printable characters.

  3. Write a script to interact with the VirusTotal API via a single command. Use the options -h to check a hash, -f to upload a file, and -u to check a URL. For example:

    $ ./vt.sh -h db349b97c37d22f5ea1d1841e3c89eb4
    
    Detected: W32.WannaCrypLTE.Trojan

Visit the Cybersecurity Ops website for additional resources and the answers to these questions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset