Chapter 4. Packet Analysis

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Packet Analysis

Twas brillig, and the Protocols
     Did USER-SERVER in the wabe.
All mimsey was the FTP,
     And the RJE outgrabe,

Beware the ARPANET, my son;
     The bits that byte,
     the heads that scratch...

—R. Merryman, “ARPAWOCKY” (RFC 527)¹

1. R. Merryman, “ARPAWOCKY” (RFC 527), IETF, June 1973, http://rfc-editor.org/rfc/rfc527.txt.

Once you have captured network traffic, what do you do with it? Depending on the nature of the investigation, you might want to analyze the protocols in use, search for a specific string, or carve out files.

Perhaps you received an alert from an IDS about suspicious traffic from a particular host and you would like to identify the cause. Or perhaps you are concerned that an employee is exporting confidential data and you need to search outbound communications for specific keywords. Or perhaps you’ve discovered traffic on the network that you can’t recognize or interpret and you want to figure out the cause.

In all of these cases, it is useful to understand the fundamentals of protocol analysis, packet analysis, and multipacket stream analysis. During this chapter, we learn how to analyze fields within protocols, protocols within packets, packets within streams, and then reconstruct higher-layer protocol data from streams. Throughout this discussion we provide examples of common tools and techniques that investigators can use to accomplish specific goals.

Investigators face many challenges when conducting packet analysis. It is not always possible to recover all protocol information or contents from a packet. The packet data may be corrupted or truncated, the contents may be encrypted at different layers, or the protocols in use may be undocumented. More and more often, the sheer volume of traffic makes it difficult to find the useful packets in the first place. Fortunately, the tools available for packet analysis are becoming increasingly sophisticated. A well-trained forensic investigator should be familiar with a variety of tools and techniques so that you can select the right tool for the job.

4.1 Protocol Analysis

Protocol analysis refers to the art and science of understanding how a particular communications protocol works, what it’s used for, how to identify it, and how to dissect it. This may not be as straightforward as you might expect. In an ideal world, all protocols would be neatly cataloged, publicized, and implemented according to specification. In reality, none of this is true. Many protocols are deliberately kept secret by their inventors, either to protect intellectual property, keep out competition, or for the purposes of security and covert communications. Other protocols are simply not well documented because no one has taken the time.

Some protocols are publicly documented, such as the IETF-specified standards (which we discuss in greater detail shortly). However, that does not mean that hardware and software vendors have chosen to properly implement them. Often, manufacturers implement protocols before standards have been formally ratified, or only partially implement them. Engineers and programmers often make mistakes that result in behavior which is not compliant with standards.

Regardless of whether protocol specifications are formally published, you never know what you’ll find actually traversing the networks. It is very rare for software developers or equipment manufacturers to completely adhere to every aspect of a published standard (even their own!). This fact is routinely exploited by attackers to bypass intrusion detection systems and firewall rules, smuggle data in strange places, and generally create mayhem.

The bottom line is that protocol analysis is a challenging art.

Protocol analysis—Examination of one or more fields within a protocol’s data structure. Protocol analysis is commonly conducted for the purposes of research (i.e., as in the case of an unpublished protocol specification) or network investigation.

4.1.1 Where to Get Information on Protocols

When you are searching for documentation about a particular protocol, there are many possible places to look. Perhaps the most well known is the IETF’s large, public repository of documented protocols. Other standards bodies, vendors, and researchers also maintain public and private caches of protocol documentation.

4.1.1.1 IETF Request for Comments (RFC)

In 1969, with the emergence of the ARPANET, a small group of researchers began distributing “requests for comments” (RFCs), which were initially defined as “any thought, suggestion, etc. related to the HOST software or other aspect of the [ARPANET] network.” Each RFC was numbered and sent to network engineers at different organizations involved in the creation of the ARPANET. The original distributors wrote that “we hope to promote the exchange and discussion of considerably less than authoritative ideas.”²

2. S. Crocker, “RFC 10—Documentation Conventions,” IETF, July 29, 1969, http://www.rfc-editor.org/rfc/rfc10.txt.

Over time, RFCs have emerged as a way to develop, communicate, and define international standards for internetworking. They are developed and distributed by the Internet Engineering Task Force (IETF), a “loosely self-organized group of people who contribute to the engineering and evolution of Internet technologies . . . the principal body engaged in the development of new Internet standard specifications.”³

3. P. Hoffman and S. Harris, “RFC 4677—The Tao of IETF: A Novice’s Guide to the Internet Engineering Task Force,” IETF, September 2006, http://www.ietf.org/rfc/rfc4677.txt.

A subset of RFCs are approved by the IETF as Internet Standards, labeled the “STD” subseries. Internet Standards RFCs are heavily vetted and tested by the community and subject to stricter documentation requirements than other types of RFCs. RFCs which are “standards-track” documents, have different maturity levels: “Proposed Standard,” “Draft Standard,” and “Internet Standard,” the latter of which is given an STD series number upon publication (Internet Standards also retain their RFC numbers).⁴

4. S. Bradner, “RFC 2026—The Internet Standards Process—Revision 3,” IETF, October 1996, http://rfc-editor.org/rfc2026.txt.

The IETF specifies in RFC 4677 that “The IETF makes standards that are often adopted by Internet users, but it does not control, or even patrol, the Internet.” This has important implications for network forensic investigators; although the IETF may publish an Internet Standard, no one monitors network traffic or inspects networking software to ensure that standards are actually being followed. Instead, what exists on networks is what individual organizations, companies, users, and attackers create.

Founded in 1992, the Internet Society (ISOC) now sponsors the IETF, and chartered both the Internet Architecture Board (IAB) and the Internet Assigned Numbers Authority (IANA).

The canonical repository of RFCs on the Internet is at “http://www.rfc-editor.org” (per RFC 4677).⁵

5. P. Hoffman and S. Harris, “RFC 4677—The Tao of IETF: A Novice’s Guide to the Internet Engineering Task Force,” http://www.rfc-editor.org/rfc/rfc4677.txt

Steve Crocker, author of RFC 1, said: “Instead of authority-based decision-making, we relied on a process we called ‘rough consensus and running code.’⁶ Everyone was welcome to propose ideas, and if enough people liked it and used it, the design became a standard.”

6. Ibid.

“After all, everyone understood there was a practical value in choosing to do the same task in the same way.”⁷

7. Stephen D. Crocker, “Op-Ed Contributor—How the Internet Got Its Rules,” New York Times, April 6, 2009, http://www.nytimes.com/2009/04/07/opinion/07crocker.html?_r=2&em.

4.1.1.2 Other Standards Bodies

The IETF isn’t the only standards body, of course. There are other standards organizations that publish communications protocols for networking equipment and software. These include:

• Institute of Electrical and Electronics Engineers Standards Association (IEEE-SA) The IEEE Standards Association develops standards for a broad range of electrical engineering topics, ranging from consumer electronics to wired and wireless networking. Some of the most well-known IEEE specifications include those developed by the IEEE LAN/MAN Standards Committee, such as the “802.11” suite of protocols for wireless networking⁸ and 802.3 (Ethernet).⁹

8. IEEE, “Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications IEEE Standard for Information Technology—Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks—Specific Requirements,” IEEE Standards Association (June 12, 2007), http://standards.ieee.org/getieee802/download/802.11-2007.pdf.

9. “IEEE-SA -IEEE Get 802 Program,” IEEE Standards Association, 2010, http://standards.ieee.org/getieee802/802.3.html.

• International Organization for Standardization (ISO) The ISO is an international nonprofit organization comprised of representatives from national standards institutes from 163 countries. They publish standards for a variety of industries, including information technology and communications.¹⁰

10. “ISO,” 2011, http://www.iso.org/iso/iso_catalogue.htm.

4.1.1.3 Vendors

Many vendors develop their own proprietary protocols for equipment, software, and communications. Some vendors work with standards bodies to instigate widespread adoption of their standards and help ensure compatibility. For example, Cisco employees worked as part of the IETF to develop RFC 2784, “Generic Routing Encapsulation,”¹¹ which describes an open trunking protocol. Microsoft publishes an online library of their technical specifications, which include communications protocols used by Windows servers and clients.¹²

11. D. Farinacci et al., “RFC 2784—Generic Routing Encapsulation (GRE),” IETF, March 2000, http://www.rfc-editor.org/rfc/rfc2784.txt.

12. Microsoft Corporation, “Windows Communication Protocols (MCPP),” MSDN, 2011, http://msdn.microsoft.com/en-us/library/cc216513%28v=prot.10%29.aspx.

Other times, vendors work hard to keep their protocols secret, usually to reduce competition. America Online, for example, refused to publish their proprietary instant messaging protocol (Open System for CommunicAtion in Realtime, or OSCAR) for many years, and worked hard to prevent competitors from developing AIM-compatible chat clients. Over time, many vendors and individual engineers reverse-engineered the protocols. Eventually, AOL published information about some aspects of the OSCAR protocol.

4.1.1.4 Researchers

Researchers at many universities, private institutions, and their parents’ garages, have analyzed networking protocols and traffic. Often, this is undertaken in an attempt to reverse-engineer a secret protocol. For example, Russian researcher Alexandr Shutko has published his “Unofficial” OSCAR (ICQ v7/v8/v9) protocol documentation. He conducted the research to “have some fun.”¹³

13. “OSCAR (ICQ v7/v8/v9) Protocol Documentation,” July 2, 2005, http://iserverd.khstu.ru/oscar/.

You can conduct your own protocol research. In many cases, all you need is a laptop, some off-the-shelf networking equipment, and free software tools. To build your own protocol analysis lab, first set up a small network. An older-model, inexpensive hub works very well for sniffing traffic. Then set up endpoints that use the protocol you are interested in analyzing and a system for intercepting the traffic.

In some cases it can be hard to get vendor-specific equipment due to budget restrictions. Depending on your target protocol, you may not be able to replicate the protocol in your lab. However, there are many widely used protocols that are easily accessible and fun to tear apart.

4.1.2 Protocol Analysis Tools

Rather than reinvent the wheel, it’s a good idea to familarize yourself with tools and languages specifically designed for protocol analysis. Tools such as Wireshark and tshark include built-in protocol dissectors for hundreds of different protocols, using the NetBee PDML and PSML languages as a foundation. These can save you a lot of time and effort.

4.1.2.1 Packet Details Markup Language and Packet Summary Markup Language

The Packet Details Markup Language (PDML) defines a standard for expressing packet details for Layers 2–7 in an XML format. The syntax is essentially a compromise in “read-ability” between computer and human parsers. Computer software must be programmed to interpret the markup language; humans can learn to read it, or employ other tools to make use of it.¹⁴ The Packet Summary Markup Language (PSML) is a similar XML language for expressing the most important details about a protocol.

14. “[NetBee] Pdml Specification,” NetBee, http://www.nbee.org/doku.php?id=netpdl:pdml_specification.

PDML and PSML are both part of the NetBee library, which is designed to support packet processing.¹⁵ PDML and PSML were created and remain copyrighted by the Net-Group at the Politecnico di Torino in Italy, where WinPcap was also first developed. These specifications are used by Wireshark and tshark as a foundation for protocol dissection and display.

15. “The NetBee Library [NetBee],” NetBee, August 13, 2010, http://www.nbee.org/doku.php.

4.1.2.2 Wireshark

Wireshark is an excellent tool for protocol analysis. It includes built-in protocol dissectors that automatically interpret and display protocol details within individual packets, and allows you to filter on specific fields within supported protocols. You can also write your own packet dissectors for inclusion into the main Wireshark program or to be distributed as plugins.

By default, Wireshark displays packets in three panels:

• Packet List This panel shows packets that have been captured, one per line, with very brief details about them. This typically includes the time the packet was captured, the source and destination IP address, the highest-level protocol in use (according to Wireshark’s heuristics for analyzing protocols), and a brief snippet of protocol data.

• Packet Details For the packet highlighted in the Packet List View, this shows the details of the protocols in all layers that Wireshark can interpret.

• Packet Bytes This shows the hexadecimal and ASCII representation of the packet, including Layer 2 data.

As you can see in Figure 4-1, Wireshark automatically decodes protocols at various layers in the frame and displays the details in the Packet Details window, ordered by layer.

Figure 4-1 Screenshot of the main Wireshark panels. The top panel is “Packet List,” the middle panel is “Packet Details,” and the bottom panel is “Packet Bytes.”

In this example, you can see that within Frame 24, Wireshark has identified “Ethernet II,” “Internet Protocol,” and “Transmission Control Protocol.” It shows a summary of important information for each protocol (such as “Src Port” and “Dest Port” for TCP), and also allows you to drill down to see extensive details for each protocol. In the Packet List window, Wireshark automatically displays the name of the highest-layer protocol found in the packet (in this case, TCP).

Wireshark includes hundreds of built-in protocol dissectors.¹⁶ You can view a list of protocol dissectors that are enabled in your version of Wireshark by going to Analyze→ Enabled Protocols. You can also modify dissector settings for specific protocols in Edit→ Preferences→ Protocols. If you are analyzing a protocol that Wireshark doesn’t currently support, you can also write your own dissector. Wireshark plugins are commonly written in C. You can also write Wireshark plugins in Lua, but for performance reasons, Wireshark developers recommend using Lua only for prototyping dissectors.¹⁷

16. “ProtocolReference—The Wireshark Wiki,” Wireshark, March 7, 2011, http://wiki.wireshark.org/ProtocolReference.

17. “Lua—The Wireshark Wiki,” Wireshark, May 13, 2011, http://wiki.wireshark.org/Lua.

Wireshark automatically decides which protocol dissectors to use for a specific packet. If you have packets that you would like to dissect using a different protocol dissector, you can use Wireshark’s “Analyze→Decode As” function to modify the dissector in use.

4.1.2.3 tshark

Tshark uses Wireshark’s protocol dissection code, and therefore includes much of the same functionality, with a command-line interface.¹⁸ In tshark, the default output displays the information produced by PSML. When the “-V” flag is used, tshark displays the PDML information.

18. “tshark—The Wireshark Network Analyzer 1.5.0,” Wireshark, http://www.wireshark.org/docs/man-pages/tshark.html.

Here are some examples of using tshark for protocol analysis:

• Basic usage for reading from a capture file:

$ tshark -r capturefile.pcap

• Disable network object name resolution using the -n flag, so you can see actual IP addresses and port numbers. It’s often a good idea to disable network object name resolution because this can slow down display.

$ tshark -n -r capturefile.pcap

• Select an output format using the -T flag. Options include pdml, psml, ps, text, and fields. The default is “text.” In the example below, we print the output in PDML, which provides verbose packet protocol details in XML format.

$ tshark -r capturefile.pcap -T pdml

• To print a specific field (as defined by Wireshark’s protocol dissectors), use the -e flag combined with the “-T fields” option. The example below is from the “tshark” man page, and it prints the Ethernet frame number, IP address, and information about the UDP protocol.

$ tshark -r capturefile.pcap -T fields -e frame.number -e ip.addr -e udp

• Tshark also includes an option, -d, which implements Wireshark’s “Decode As” functionality. This option allows you to manually configure tshark to interpret the indicated traffic as a specific protocol. For example, if you want to configure tshark to treat all traffic in a packet capture on TCP port 29008 as HTTP, you would use the command:

$ tshark -r capturefile.pcap -d tcp.port==29008,http

• Tshark can be given Wireshark’s display filters. Here is a capture file filtered using display filters:

$ tshark -r capturefile.pcap -R 'ip.addr == 192.168.1.1'

4.1.3 Protocol Analysis Techniques

Recall that protocol analysis is the examination of one or more fields within a protocol’s data structure. Protocol analysis is often necessary for packet analysis because investigators must be able to properly interpret the communications structures in order to understand the contents and analyze packets or streams.

Fortunately, much of this analysis has already been done by a worldwide community of developers and analysts, who have created freely available tools such as Wireshark, tshark, tcpdump, and the specification languages upon which they are based. When using these tools, keep in mind that you are standing on the shoulders of giants (and that, occasionally, giants make mistakes).

Network investigators must also be prepared to handle protocols that have not yet been publicly dissected and included in common tools. Furthermore, hackers may sometimes develop new protocols, or extensions to old ones, in order to communicate covertly or add functionality that the original protocol authors never intended.

Protocol analysis is a deep art and we only scratch the surface here. During the next few sections, we describe the fundamentals of protocol identification, decoding, data exportation, and metadata extraction.

4.1.3.1 Protocol Identification

How do you identify which protocols are in use in a packet capture? Here are some common ways that you can identify a protocol:

Throughout this chapter, we will use the following scenario to illustrate tools and techniques. This scenario is “Puzzle #1: Ann’s Bad AIM,” from the “Network Forensics Puzzle Contest” web site.¹⁹ You can download the original contest materials and the corresponding packet capture from http://ForensicsContest.com.

19. Sherri Davidoff, Jonathan Ham, and Eric Fulton, “Network Forensics Puzzle Contest—Puzzle #1: Ann’s Bad AIM,” September 25, 2009, http://forensicscontest.com/2009/09/25/puzzle-1-anns-bad-aim.

The Case: Anarchy-R-Us, Inc. suspects that one of their employees, Ann Dercover, is really a secret agent working for their competitor. Ann has access to the company’s prize asset, the secret recipe. Security staff are worried that Ann may try to leak the company’s secret recipe.

Security staff have been monitoring Ann’s activity for some time, but haven’t found anything suspicious until now. Today an unexpected laptop briefly appeared on the company wireless network. Staff hypothesize it may have been someone in the parking lot because no strangers were seen in the building. Ann’s computer (192.168.1.158) sent IMs over the wireless network to this computer. The rogue laptop disappeared shortly thereafter.

“We have a packet capture of the activity,” said security staff, “but we can’t figure out what’s going on. Can you help?”

• Search for common binary/hexadecimal/ASCII values that are typically associated with a specific protocol

• Leverage information in the encapsulating protocol

• Leverage the TCP/UDP port number, many of which are associated with standard default services

• Analyze the function of the source or destination server (specified by IP address or hostname)

• Test for the presence of recognizable protocol structures

Let’s explore these protocol identification techniques one at a time. Getting back to our scenario, Ann’s Bad AIM, let’s take another look at the packet capture and figure out what protocols are in use. Throughout this discussion and the remainder of this book, remember to count byte offsets starting from 0.

Search for common binary/hexadecimal/ASCII values that are typically associated with a specific protocol

Most protocols contain sequences of bits that are commonly, if not always, present in packets associated with that protocol, in predictable places. An excellent example is the hexadecimal sequence “0x4500,” which often marks the beginning of an IPv4 packet. The following text shows a section of the Ann’s Bad AIM packet capture, produced using tcpdump:

$ tcpdump -nn -AX -r evidence01.pcap
22:57:22.022972 IP 64.12.24.50.443 > 192.168.1.158.51128: Flags [.], ack 6,
    win 64240, length 0
        0x0000:  4500 0028 b43d 0000 7f06 6d0e 400c 1832  E..([email protected]
        0x0010:  c0a8 019e 01bb c7b8 07e9 60db 336b d2c9  ..........`.3k..
        0x0020:  5010 faf0 61f2 0000 0000 0000 0000       P...a.........

Why does “0x4500” commonly appear at the beginning of IP packets? The high-order nibble of the byte at offset zero of an IP packet represents the IP version. There are only two versions of the IP protocol in common use today: IP version 4 (IPv4), and version 6 (IPv6). IPv4 is still the most widely implemented IP protocol, and so it is common to see the value “4” as the high-order nibble of Byte 0.

The low-order nibble of Byte 0 specifies the number of 32-bit words in the IPv4 header. IPv4 specifies that there are 20 bytes of required fields in the IPv4 header. It is possible to include optional header fields, which would cause the header length to increase. However, there are few legitimate purposes for IP options in normal use, and so few operating systems set them, and few firewalls allow them. Consequently, the low-order nibble of Byte 0 in an IPv4 packet is typically “5” (remember, there are 4 bytes in one 32-bit word, so 5 words is equal to 20 bytes).

Byte 1 of the IPv4 header is really a set of multibit fields, commonly grouped together and referred to as the “Type of Service” field. It is not widely used today, and so most IPv4 packets carry all zeros at Byte 1.²⁰

20. Richard W. Stevens, “The Protocols,” in TCP/IP Illustrated, vol. 1, Addison-Wesley, 1994.

As a result of these factors, IPv4 packets commonly begin with the value “0x4500.” Experienced network forensic investigators can view raw tcpdump output and manually pick out the beginnings of IPv4 packets without the assistance of automated tools.

Leverage information in the encapsulating protocol

Protocols often contain information that indicates the type of encapsulated protocol. Recall that in the OSI model, lower-layer protocol fields typically indicate the higher-layer protocol that may be encapsulated, in order to facilitate proper processing.

Figure 4-2 is a screenshot of the same Ann’s Bad AIM packet displayed within Wireshark. Notice that Wireshark has indeed identified the Layer 3 protocol as IP version 4, just as we guessed in the previous section. Byte 9 of the IP header (highlighted) indicates the protocol encapsulated within the IP packet. In this case, the value at Byte 9 is 0x06, which corresponds with TCP. (The protocol numbers used in the IPv4 “Protocol” field and the IPv6 “Next Header” field are assigned and maintained by IANA.²¹) Based on this information, it is reasonable to assume that the protocol encapsulated within this IP packet is TCP.

21. “Protocol Numbers,” May 3, 2011, http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xml.

Figure 4-2 IP protocol details displayed within Wireshark. Notice that the IP packet contains information about the encapsulated protocol (in this case, 0x06, or TCP).

Leverage the TCP/UDP port number, many of which are associated with standard default services

A simple and common way to identify protocols is by examining the TCP or UDP port number in use. There are 65,535 possible port numbers for each of TCP and UDP. IANA has published a list of TCP/UDP port numbers that correspond with specific higher-layer network services. You can view the canonical list on IANA’s web site,²² and much of the same information is also stored in the /etc/services file on most UNIX/Linux systems.

22. “Port Numbers,” July 6, 2011, http://www.iana.org/assignments/port-numbers.

In Figure 4-3, we can see that Frame 8 of the Ann’s Bad AIM packet capture transmits data over UDP 123. According to IANA, UDP port 123 is assigned to the Network Time Protocol (NTP). This is a service that synchronizes time between systems. Wireshark automatically displays the default service associated with a particular port, if one is assigned, as you can see in the highlighted line of the Packet List pane.

Figure 4-3 UDP Protocol Details displayed within Wireshark. Notice that Wireshark automatically associates the UDP port, 123, with its IANA-assigned default service, NTP.

If you look further down in the Packet Details frame, you can also see that Wireshark identified various fields within the higher-layer NTP protocol, such as “Peer Clock Stratum” and “Originate Time Stamp.” This provides corresponding evidence that supports the identification of this protocol as NTP.

Identifying protocols by port number is not always reliable because servers can easily be configured to use nonstandard port numbers for specific services. Take a look at Figure 4-4, in which we see a TCP segment (Frame 167 of the Ann’s Bad AIM packet capture) transmitted to destination port 443. According to IANA, TCP port 443 is assigned to the “http protocol over TLS/SSL,” or “https” service. Accordingly, Wireshark has presented and interpreted it as “https” or “Secure Socket Layer.”

Figure 4-4 TCP packet details displayed within Wireshark. Notice that Wireshark automatically associates TCP port 443 with its IANA-assigned default service, HTTPS. However, this interpretation is INCORRECT (as evidenced by the fact that the packet contents are not encrypted, and no protocol details are displayed under the heading “Secure Socket Layer”).

However, look carefully at the Packet Bytes window in Figure 4-4. Notice that the payload of this packet is clearly not encrypted! You can see cleartext ASCII (“thanks dude”) as well as HTML tags. Looking back up in the Packet Details window, notice that although Wireshark has identified the highest-layer protocol as “Secure Socket Layer,” there are no protocol details listed in this section. Wireshark identified the protocol in use based solely on the TCP port number (443), but the protocol is NOT actually SSL/TLS. As a result, Wireshark was unable to decode the protocol and display details under the “Secure Socket Layer” heading.

Analyze the function of the source or destination server (specified by IP address or hostname)

Often, server hostnames and domains provide clues as to their functions, which can in turn help investigators identify likely protocols in use.

So far we’ve determined that the NTP protocol is likely present within the Ann’s Bad AIM packet capture. We’ve also found that there is unidentified traffic traversing TCP port 443. Although Wireshark identified this traffic as SSL/TLS, we found that this was incorrect. Perhaps the IP addresses or hostnames in use will provide us with clues regarding the highest-layer protocol. . . .

Looking at Figure 4-4, the source IP address in frame 167 is “64.12.24.50.” If you do a WHOIS lookup on this address, you will see that it is owned by “America Online Inc.” (at the time of this writing). Here is a snippet of the WHOIS registration information:

$ whois 64.12.24.50
...
NetRange:       64.12.0.0 - 64.12.255.255
CIDR:           64.12.0.0/16
...
OrgName:        America Online, Inc.
OrgId:          AMERIC-158
Address:        10600 Infantry Ridge Road
City:           Manassas
StateProv:      VA
PostalCode:     20109
Country:        US
RegDate:        1999-12-13
Updated:        1999-12-16
Ref:            http://whois.arin.net/rest/org/AMERIC-158

It would be reasonable for a network forensic investigator to hypothesize that the traffic associated with this server could be traffic commonly used to support AOL’s services, such as HTTP or AIM.

The AOL Instant Messenger (AIM) service was created by America Online in 1997, and remains one of the most popular and portable instant messaging services today. The earliest versions were designed for use by AOL employees, before AOL began providing the service to the general public. AIM software is very portable and can be used on Linux, Windows, or Apple systems.

AIM is based on the proprietary, closed-source Open System for Communication in Realtime (OSCAR) protocol.²³ OSCAR is a proprietary protocol developed by AOL and the company has worked to prevent others from building compatible clients. Even so, many have tried to reverse-engineer the protocol, with some success. Published documentation is very limited, but tools such as Wireshark include protocol dissectors, which are helpful as a guide for forensic analysts. AOL has licensed the OSCAR protocol for use in Apple’s iChat software as well as AIM.²⁴

23. “AOL Instant Messenger—Wikipedia, the free encyclopedia,” http://en.wikipedia.org/wiki/AOL_Instant_Messenger.

24. “OSCAR Protocol—Wikipedia, the free encyclopedia,” http://en.wikipedia.org/wiki/OSCAR_protocol.

To transfer a file, the sender and receiver initially communicate using Inter Client Basic Messages (ICBM) through a third-party messaging server. They use ICBM messages on channel 2 to negotiate the IP address, port, and other details of a direct file transfer. Once the direct connection is set up between sender and recipient on the specified port, OSCAR File Transfer (OFT) is used to transfer the file.

There is a good description of the process in the comments of the source code for Pidgin, a multiplatform chat client (pidgin-2.5.8/libpurple/protocols/oscar/oft.c).²⁵

25. “File List,” SourceArchive, http://pidgin.sourcearchive.com/documentation/1:2.5.8-1ubuntu2/files.html.

The OFT protocol is relatively simple. To transfer a file between two clients, the sender sends an OFT “prompt” header, with the “Type” set to “0x0101” to indicate it is ready to begin sending. The recipient sends an OFT “acknowledge” header back, with the “Type” set to “0x0202” to indicate that it is ready to receive data. Next, the sender sends the raw data to the recipient. Finally, the recipient responds with an OFT “done” packet, with the “Type” set to “0x0204” to indicate that it has finished receiving data. Figure 4-5 illustrates the OFT protocol in use.

Figure 4-5 Illustration of the OFT protocol used to transfer files through AIM.

Figure 4-6 is an example of an OFT2 header message. Note that, among other data, the header contains the protocol version (0x4F465432 is “OFT2” in ASCII), the “Type” (which indicates the purpose of the header), the transmitted file size, and the file name.²⁶

26. Jonathan Clark, “On Sending Files via OSCAR,” 2005, http://www.cs.cmu.edu/jhclark/aim/On%20Sending%20Files%20via%20OSCAR.odt.

Figure 4-6 Illustration of the OFT2 header structure.

Test for the presence of recognizable protocol structures

You can experimentally test for the presence of a specific protocol in a packet capture if you have some idea of the structure and common header values. For example, Figure 4-7 shows Frame 112 from the Ann’s Bad AIM packet capture. In this frame, we can see that there is traffic from source port 5190 (which Wireshark associates with “aol”), but this traffic is not decoded above Layer 4 (TCP). How can we identify the higher-layer protocol in use?

Figure 4-7 Frame #112 from “Ann’s Bad AIM.” Notice that Wireshark does not automatically identify a protocol higher than Layer 4 (TCP). However, analysis of the TCP payload shown in the Packet Bytes window reveals structures that correlate with the OSCAR File Transfer protocol.

Given that the traffic is associated with a port used by AOL, and we have prior evidence of AOL-related traffic, we might hypothesize that this unidentified traffic could be a proprietary AOL protocol that does not have a dissector included in our Wireshark distribution, such as OSCAR traffic.

To test this hypothesis, let’s examine the TCP packet payload. As you can see in Wireshark, the payload contents begin with 0x4F465432, or “OFT2” in ASCII. This matches the information that independent researchers have published regarding the OSCAR protocol header, shown in Figure 4-6.

According to publicly available documentation, Bytes 6 and 7 of the OFT2 header should indicate the “Type.” In Figure 4-7, we can see that Bytes 6 and 7 of our packet payload are “0x0101,” which do correspond with a valid Type value (“Prompt”). This indicates that the sender is ready to transmit a file.

4.1.3.2 Protocol Decoding

Protocol decoding is the technique of interpreting the data in a frame according to a specific, known structure, which allows you to correctly understand the meaning of each bit in the communication. Some bits in a protocol are used for describing the protocol itself or facilitating the communication. Other bits actually describe an encapsulated higher-layer protocol or its payload. In any case, understanding the purpose of specific bits within a frame and interpreting them according to a known protocol structure is a fundamental network forensics technique.

We rely very heavily on widely available tools for protocol decoding because few investigators have the time or resources to conduct protocol analysis in-depth, and in most cases it is not necessary to reinvent the wheel.

To decode network traffic according to a specific protocol specification, you can:

• Leverage publicly available automated decoders and tools

• Refer to publicly available documentation and manually decode the traffic

• Write your own decoder

Let’s return to Frame #167 in our packet capture, which Wireshark mistakenly identified as “Secure Socket Layer” traffic. Based on the source IP address and port number, we have hypothesized that this packet may actually contain traffic relating to an AOL protocol.

Figure 4-8 illustrates how we can use Wireshark to decode this packet as “AIM” traffic. The results are shown in Figure 4-9. Notice that extensive details regarding the “AOL Instant Messenger” protocol are now visible, including Channel ID, Sequence Number, and ICBM Cookie. The presence of this information, and the fact that the values make sense given the context, indicates that we have correctly identified and decoded this as AIM traffic.

Figure 4-8 Frame #167 from “Ann’s Bad AIM.” Wireshark incorrectly decoded this packet as “Secure Socket Layer.” This screenshot illustrates how to use Wireshark’s “Decode As” function to interpret the data as AIM traffic instead.

Figure 4-9 In this screenshot, Frame 167 has been decoded as AIM traffic. Notice that extensive details regarding the AIM protocol are now visible, indicating that our selection of AIM as the protocol was probably correct.

Earlier, we manually identified the encapsulated protocol in Packet #109 as OFT2 traffic. However, decoding OFT with Wireshark is significantly more difficult than decoding general AIM traffic because as of the time of this writing, Wireshark does not include a built-in OFT protocol decoder.

Fortunately, the Wireshark suite includes extensive support and documentation for writing new plugins, and many additional plugins have also been freely published online by independent developers. For example, to solve the Ann’s Bad AIM puzzle, independent developer Franck Guénichot wrote his own OFT protocol dissector in Lua.²⁷ (Note that in order to use an external Lua plugin, you first need to enable Lua support in Wireshark’s init.lua configuration file.) Using Franck’s OFT Lua protocol dissector, we can automatically print details of the OFT file transfer within the packet capture. Here is an example from Franck’s OFT dissector in action, used with tshark:

27. Franck Guénichot, “Index of /contest01/Finalists/Franck_Guenichot,” December 17, 2009, http://forensicscontest.com/contest01/Finalists/Franck_Guenichot/.

$ tshark -r evidence01.pcap -X lua_script:oft-tsk.lua -R "oft" -n  -R frame.
    number==112
112  61.054884 192.168.1.158 -> 192.168.1.159 OFT OFT2 Type: Prompt

As shown above, for frame #112 of Ann’s Bad AIM, tshark identified the higher-layer protocol as an OFT2 “Prompt” message, confirming our manual findings from earlier.

4.1.3.3 Exporting Fields

Once you have identified the protocol in use and determined your method for decoding it, the next step is to extract the values of specific fields of interest. It is trivial to do this graphically using Wireshark if you have the appropriate protocol dissector installed.

Let’s say that you would like to extract the AIM message sent between users in packet #167 of Ann’s Bad AIM. Once you configure Wireshark to decode packet #167 as AIM traffic (rather than SSL traffic), you can view the “Message” field in the Packet Details window and the corresponding bytes in the Packet Bytes window below. You can also use Wireshark’s “Export Selected Packet Bytes” function to save the contents of the selected field as a file that you can manipulate and analyze with other tools. Figure 4-10 shows the AIM Message field within packet #167 of Ann’s Bad AIM.

Figure 4-10 In this screenshot, Frame 167 has been decoded as AIM traffic. Wireshark displays individual fields such as the Message field, along with corresponding bytes, and allows the user to save the contents of these fields.

You can also use tshark to print any or all fields defined within the protocol dissection. Here is an example using Franck Guénichot’s Lua plugin to export all OFT details from Frame #112, such as the OFT filename, size, and more:

$ tshark -r evidence.pcap -X lua_script:oft-tsk.lua -R "oft" -n -R frame.
    number==112  -V
Frame 112 (310 bytes on wire, 310 bytes captured
...
Oscar File Transfer Protocol (256)
    Version: OFT2
    Length: 256
    Type: Prompt (0x0101)
    Cookie: 0000000000000000
    Encryption: None (0)
    Compression: None (0)
    Total File(s): 1
    File(s) Left: 1
    Total Parts: 1
    Parts Left: 1
    Total Size: 12008
    Size: 12008
    Modification Time: 0
    Checksum: 0xb1640000
    Received Resource Fork Checksum: 0xffff0000
    Ressource Fork: 0
    Creation Time: 0
    Resource Fork Checksum,base: 0xffff0000
    Bytes Received: 0
    Received Checksum: 0xffff0000
    Identification String: Cool FileXfer
    Flags: 0x00
    List Name Offset: 0
    List Size Offset: 0
    Dummy Block: 000000000000000000000000000000000000000000000000...
    Mac File Information:
    Encoding: ASCII (0x0000)
    Encoding Subcode: 0x0000
    Filename: recipe.docx

The following invocation demonstrates how to use tshark to produce PDML output using the -T option. First, we see general information and Layer 2 frame data. This is followed by Layer 3 IP protocol details, and then higher-layer encapsulated protocol information such as the OSCAR File Transfer Protocol information. The last field shown below is the name of the file that was transferred using OFT. Note that the output below has been abridged to show notable fields.

$ tshark -r evidence01.pcap -X lua_script:oft-tsk.lua -R "oft" -n -R frame.
    number==112 -T pdml
<?xml version="1.0"?>
<pdml version="0" creator="wireshark/1.2.11">
<packet>
  <proto name="geninfo" pos="0" showname="General information" size="310">
    <field name="num" pos="0" show="112" showname="Number" value="70"size
        ="310"/>
    <field name="len" pos="0" show="310" showname="Frame Length" value="136"
        size="310"/>
    <field name="caplen" pos="0" show="310" showname="Captured Length" value
        ="136" size="310"/>
    <field name="timestamp" pos="0" show="Aug 12, 2009 23:58:04.206379000"
        showname="Captured Time" value="1250143084.206379000" size="310"/>
  </proto>
...

  <proto name="ip" showname="Internet Protocol, Src: 192.168.1.158
      (192.168.1.158), Dst: 192.168.1.159 (192.168.1.159)" size="20" pos="14">
    <field name="ip.version" showname="Version: 4" size="1" pos="14" show="4"
         value="45"/>
    <field name="ip.hdr_len" showname="Header length: 20 bytes" size="1" pos
        ="14" show="20" value="45"/>
...
  <proto name="oft" showname="Oscar File Transfer Protocol (256)" size="256"
      pos="54">
    <field name="oft.version" showname="Version: OFT2" size="4" pos="54" show
        ="OFT2" value="4f465432"/>
    <field name="oft.length" showname="Length: 256" size="2" pos="58" show
        ="256" value="0100"/>
    <field name="oft.type" showname="Type: Prompt (0x0101)" size="2" pos="60"
         show="0x0101" value="0101"/>
    <field name="oft.cookie" showname="Cookie: 0000000000000000" size="8" pos
        ="62" show="00:00:00:00:00:00:00:00" value="0000000000000000"/>
    <field name="oft.encrypt" showname="Encryption: None (0)" size="2" pos
        ="70" show="0" value="0000"/>
    <field name="oft.compress" showname="Compression: None (0)" size="2" pos
        ="72" show="0" value="0000"/>
    <field name="oft.totfil" showname="Total File(s): 1" size="2" pos="74"
        show="1" value="0001"/>
    <field name="oft.totsize" showname="Total Size: 12008" size="4" pos="82"
        show="12008" value="00002ee8"/>
    <field name="oft.size" showname="Size: 12008" size="4" pos="86" show
        ="12008" value="00002ee8"/>
    <field name="oft.modtime" showname="Modification Time: 0" size="4" pos
        ="90" show="0" value="00000000"/>
    <field name="oft.checksum" showname="Checksum: 0xb1640000" size="4" pos
        ="94" show="0xb1640000" value="b1640000"/>
    <field name="oft.idstring" showname="Identification String: Cool FileXfer
        " size="32" pos="122" show="Cool FileXfer" value="436
        f6f6c2046696c655866657200000000000000000000000000000000000000"/>
    <field name="oft.encoding" showname="Encoding: ASCII (0x0000)" size="2"
        pos="242" show="0x0000" value="0000"/>
    <field name="oft.encsubcode" showname="Encoding Subcode: 0x0000" size="2"
         pos="244" show="0x0000" value="0000"/>
    <field name="oft.filename" showname="Filename: recipe.docx" size="12" pos
        ="246" show="recipe.docx" value="7265636970652e646f637800"/>
  </proto>
</packet>

</pdml>

To show only specific fields, such as the filename and total size, you can use tshark’s -T and -e flags:

$ tshark -r evidence.pcap -X lua_script:oft-tsk.lua -R "oft" -n -T fields -e
"oft.filename" -e oft.totsize -R frame.number==112
recipe.docx 12008

Now we can easily see the filename (“recipe.docx”) and the file size (12,008 bytes). This filename is especially interesting given that Anarchy-R-Us is concerned that Ann Dercover may have leaked their secret recipe.

4.2 Packet Analysis

Packet analysis refers to the art and science of inspecting the protocols within a set of packets. Network analysts and investigators often conduct packet analysis in order to identify packets of interest and understand their structure and relationship to gather evidence and facilitate further analysis.

To identify packets of interest, investigators generally use filtering techniques to isolate packets based on protocol fields or their contents. In addition, investigators may search for strings or patterns in packet contents to identify targets for further inspection, even if the protocol in use is not yet known.

Understanding the packet structure is very important for reconstructing communications, transferred files, or any other flow-based transaction. Careful dissection of a single packet or a small group of packets will often help investigators identify which tools are appropriate for evidence extraction and reconstruction.

Packet Analysis—Examination of contents and/or metadata of one or more packets. Packet analysis is typically conducted in order to identify packets of interest and develop a strategy for flow analysis and content reconstruction.

4.2.1 Packet Analysis Tools

Packet analysis is fun! Using widely available tools, you can dissect packets and extract all kinds of details. In this section, we talk about Wireshark/tshark display filters, which make it easy to isolate packets of interest. We also touch on “ngrep” (one of our favorite tools), which is essentially “grep” for packet captures. Finally, we introduce hex editors in the context of packet analysis, and show how we can use them to manually slice and splice packet data.

4.2.1.1 Wireshark/tshark Display Filters

Wireshark (and its command-line counterpart, tshark) includes a “display filter” language that allows the end user to isolate packets of interest based on protocol fields.²⁸

28. “Wireshark,” http://www.wireshark.org/docs/man-pages/wireshark-filter.html.

According to Wireshark, there are over 105,000 display filters that users can choose from—which is to say that there are over 105,000 protocol fields that Wireshark’s parser understands, as it is designed to allow you to filter on any field in any protocol it can parse.²⁹

29. “Wireshark—Display Filter Reference,” http://www.wireshark.org/docs/dfref/.

As more protocol parsers are included in Wireshark, its ability to filter based on those protocols continues to develop. Furthermore, Wireshark provides an open plugin architecture that allows anyone to build a new protocol parser to filter with as well.

In addition to the published documentation released online by Wireshark, the graphical Wireshark tool includes handy references for looking up display filters. You can click on the “Expression” button in the Filter Toolbar to access a box that will help you build a filter of your choice (see Figure 4-11). Even more intuitively, if you identify a field of interest in the Packet Details windows, you can right-click on it and click “Apply As Filter” to filter using that field and its value. You can also view a list of supported protocols and display filters by going to Help → Supported Protocols.

Figure 4-11 Screenshot of the “Filter Expression” box accessed from the Filter toolbar in Wireshark.

You can use Wireshark’s display filters from the command line in tshark using the -R option. For example:

$ tshark -r capturefile.pcap -R "ip.src==192.168.1.158 && ip.dst==10.1.1.10"

4.2.1.2 ngrep

Ngrep is an excellent libpcap-based tool designed for identifying packets of interest based on the presence (or absence) of particular strings, binary sequences, or patterns anywhere within the packet. According to the author, Jordan Ritter, ngrep is designed “to provide most of GNU grep’s common features, applying them to the network layer.”³⁰

30. Jordan Ritter, “ngrep(8): network grep—Linux man page,” http://linux.die.net/man/8/ngrep.

Using ngrep, you can search a set of packets for literal ASCII strings, or binary sequences (specified in hex), that appear anywhere in a packet’s payload. It can write out the packets that match to a separate file. It also recognizes common protocols such as IP, TCP, UDP, and ICMP, and prints out summary details such as IP addresses and port numbers for matching packets.

Ngrep doesn’t include flow reconstruction capabilities, so it inspects each packet atomically for the specified match criteria. This has two important implications for forensic investigators. First, if the matching content spans packets, ngrep will not detect it. Second, if ngrep finds a match, it will only indicate the matching packet, rather than the matching flow.³¹ Nevertheless, for a first-pass analysis or a quick dirty-word search, ngrep is invaluable; other tools can be used to reconstruct the contents of the flows that contain the matching packets.

31. Jordan Ritter, “ngrep—network grep,” November 18, 2006, http://ngrep.sourceforge.net.

Basic usage for analyzing a capture file with ngrep:

$ ngrep -I capturefile.pcap "string to search for"

The above command will search for the string “string to search for” and print out all packets in ASCII format. To include hexadecimal format, you would add the -x flag.

Many libpcap-based tools have interfaces for specifying BPF filters, and ngrep is no exception. The command below runs ngrep with a BPF filter designed to include only packets with a source IP address of 192.168.1.20 and destination port 80.

$ ngrep -I capturefile.pcap "string to search for" 'src host 192.168.1.20 and
dst port 80'

For more information, see the excellent examples Jordan Ritter provides for the use of his tool.³²

32. “ngrep—network grep,” February 10, 2005, http://ngrep.sourceforge.net/usage.html.

4.2.1.3 Hex Editors

Hex editors allow you to view and modify the raw bits of data including packet capture contents. Although you can use tools such as tcpdump and Wireshark to view the raw bits of specific packets and protocol fields, hex editors are indispensible for analysis and isolation of packet fragments and file carving (which we discuss later in this chapter).

Sometimes tools like Wireshark, NetworkMiner, and tcpxtract can automatically export files and events for you with little effort, but not always. There are times when the only way to reconstruct data is with manual effort, reading the protocol specification, extracting data, and rebuilding content by hand.

For example, sometimes the higher-layer protocol used for transmission is not recognized by a forensic analysis tool. The tool Loki tunnels data over ICMP, but common protocol dissectors are not designed to recognize the Loki tunneling protocol, correctly interpret the tunneled data, and reconstruct such data streams. Other times, an additional layer of complexity may prevent a tool from automatically extracting evidence. For instance, HTTP servers sometimes compress content using algorithms such as gzip. Forensic analysis tools often cannot “see” inside the compressed data to extract images or content that would have been viewed by the end-user. Instead, the forensic investigator may need to manually recognize that the content is compressed and use an alternate tool to carve and decompress the data.

Find a hex editor that you feel comfortable with (such as the nice GUI tool “Bless”) and become proficient with it.

Figure 4-12 shows an example of the hex editor “Bless.” In this example, we’ve used Bless to open a PDF file. Bless is a powerful and easy-to-use hex editor. You can search for binary or strings in the file. Here we searched for the string “PDF,” which we find at byte offset 0x01 through 0x03 (see the bottom right panel). The “magic number” “0x25504446,” or “%PDF” in ASCII, indicates the beginning of a PDF document. Using our hex editor, we can modify, add, or delete sections of the file.

Figure 4-12 Screenshot of the Bless hex editor.

4.2.2 Packet Analysis Techniques

There are three basic techniques that are indispensible when conducting packet analysis:

• Pattern Matching—Identify packets of interest by matching specific values within the packet capture.

• Parsing Protocol Fields—Extract the contents of protocol fields.

• Packet Filtering—Separate packets based on the values of fields in protocol meta-data.

While there are certainly other packet analysis techniques, these three are fundamental and incorporated into nearly every investigation that involves packet analysis.

4.2.2.1 Pattern Matching

Is Ann Dercover really a secret agent? Did she export Anarchy-R-Us’s secret recipe? We’ve analyzed the packet capture of her activities and discovered that the AIM protocols were in use, and that a file may have been transferred. This doesn’t bode well for Anarchy-R-Us!

Let’s quickly search the packet capture for strings or patterns of interest to us. In hard drive forensics, it is common to do a “dirty word search” when searching for data of interest on the hard drive. A “dirty word list” is a list of strings, names, patterns, etc., that may be related to the suspicious activities under investigation. Network forensic investigators can leverage the concept of a “dirty word list” to identify packets or data of interest within a network traffic capture.

“Ngrep” is one of the best tools for launching a dirty word search on a packet capture. Let’s use it to examine Ann’s Bad AIM packet capture.

First, we’ll create a short list of “dirty words” that are of interest to us. Given the context of the investigation, let’s include:

• recipe

• secret

• Ann

Next, let’s use these dirty words in a regular expression as the input to ngrep, and print packets that match:

$ ngrep -I evidence01.pcap 'secret|recipe|Ann'
input: evidence01.pcap
match: secret|recipe|Ann
#################
T 192.168.1.158:51128 -> 64.12.24.50:443 [AP]
  *..a...........E4628778....Sec558user1....................Here's the secret
       recipe... I just downloaded it from the file server. Jus
  t copy to a thumb drive and you're good to go >:-)....
###############################################################
T 192.168.1.158:51128 -> 64.12.24.50:443 [AP]
  *..c.z.........G7174647....Sec558user1.......R..7174647..F.CL...."DEST
      .......................F..........'...........recipe.docx.
##################
T 192.168.1.158:5190 -> 192.168.1.159:1272 [AP]
  OFT2.....................................d..........................Cool
      FileXfer...................................................
  ............................................................recipe.docx
      .....................................................
#####
T 192.168.1.159:1272 -> 192.168.1.158:5190 [AP]
  OFT2....7174647..........................d..........................Cool
      FileXfer................... ...............................
  ............................................................recipe.docx
      .....................................................
################
T 192.168.1.159:1272 -> 192.168.1.158:5190 [AP]
  OFT2....7174647..........................d.......................d..Cool
      FileXfer................... ...............................
  ............................................................recipe.docx
      .....................................................
#########################################################################
    exit

In the output above, we see a snippet of a conversation indicating “Here’s the secret recipe.” Very interesting! The source IP address of the packet containing this snippet of conversation was “192.168.1.158” (known to be Ann’s computer) and the destination was “64.12.24.50.” Subsequently, we see strings associated with an OFT2 file transfer (“recipe.docx”). The destination IP address of the OFT2 file transfer is a system on the local network, 192.168.1.159, but the source is still the same: Ann’s computer.

4.2.2.2 Parsing Protocol Fields

Parsing protocol fields is the practice of extracting the contents of protocol fields within packets of interest. Let’s use tshark to extract all the AIM message data from this packet capture. Perhaps we will see other interesting conversation snippets.

$ tshark -r evidence01.pcap -d tcp.port==443,aim -T fields -n -e "aim.
    messageblock.message"
Here's the secret recipe... I just downloaded it from the file server. Just
    copy to a thumb drive and you're good to go >:-)
<HTML><BODY><FONT FACE="Arial" SIZE=2 COLOR=#000000>thanks dude</FONT></
    BODY></HTML>
<HTML><BODY><FONT FACE="Arial" SIZE=2 COLOR=#000000>can't wait to sell it
    on ebay</FONT></BODY></HTML>
see you in hawaii!

Fascinating! It appears from this conversation that the recipient of the file is planning on selling it on eBay, and that the two are planning a rendezvous in Hawaii.

4.2.2.3 Packet Filtering

Packet filtering is the art of separating packets based on the values of fields in protocol metadata or payload. Commonly, investigators filter packets using either BPF filters or Wireshark display filters. We demonstrate each of these in turn.

Filtering with BPF

Earlier we identified two hosts, 192.168.1.58 (Ann’s computer) and 64.12.24.50 (likely an AOL AIM server), that contained an interesting conversation snippet, “Here’s the secret recipe . . .”. We subsequently found other conversation snippets in the same packet capture.

To reduce the volume of traffic that we have to inspect, let’s use tcpdump with a BPF filter to dump out all packets between the same source and destination IP address as the first suspicious conversation snippet that we found.

$ tcpdump -s 0 -r evidence01.pcap -w evidence01-talkers.pcap 'host
64.12.24.50 and host 192.168.1.158'
Reading from file evidence01.pcap, link-type EN10MB (Ethernet)

Wireshark Display Filters

Let’s open this filtered packet capture (evidence01-talkers.pcap) in Wireshark and see if we can get more information about the context of this conversation. As shown in Figure 4-13, we only have to browse to Frame 3 to see something we’ve seen before: cleartext traffic on port 443. We can also see that the Packet Bytes window displays the message containing “Here’s the secret recipe . . .”. Just as before, we can identify the protocols in use and manually configure Wireshark to decode this packet as AIM traffic with the “Decode As . . .” feature (see Figure 4-8).

Figure 4-13 The filtered packet capture, evidence01-talkers.png, opened in Wireshark. The traffic relating to TCP port 443 has been decoded as AIM.

In order to negotiate an OFT file transfer, the sender’s AIM client begins listening for direct connections on a specific port and sends a channel 2 ICBM packet to the AOL server. This packet contains the port and IP address to which the remote peer should connect to receive the file.

Let’s use Wireshark’s display filters to search for channel 2 ICBM packets to see if we can find the packet that contains the file transfer negotiation details. This will allow us to link the conversation snippet, “Here’s the secret recipe . . .” to an actual file transfer between two direct peers.

Figure 4-14 shows the display filter, aim messaging.channelid==0x0002, which picks out ICBM channel 2 messages. After applying this filter, three packets are displayed. The first packet in the list shows the string “recipe.docx” in the Packet Bytes window. Suspicious!

Figure 4-14 A screenshot of Wireshark with a display filter that selects AIM ICBM channel 2 messages.

In Figure 4-15, we scroll down in the Packet Details window to show the inbound IP address (192.168.1.158) and port (TCP port 5190) offered by the initiating client. If the file transfer is accepted by the remote peer, we should expect to see a subsequent direct connection from the remote peer’s IP address to the IP address and port specified in this packet. (Later, in Section 4.3.2, we will see that this direct connection does exist, and we will analyze it.) This packet is what links the AIM conversation, transmitted through AOL’s servers, to a subsequent direct OFT file transfer between two peers.

Figure 4-15 A screenshot of Wireshark displaying the IP address and port offered by the sender for an OFT file transfer.

4.3 Flow Analysis

Flow analysis is the practice of examining related groups of packets in order to identify patterns, analyze higher-layer protocols, or extract data. Once you have identified the protocols in use and of interest and studied them, you will likely want to reconstruct sequences of events and recover original data from them. With some well-known higher-layer protocols, such as HTTP, there are many tools that can do this automatically. With other protocols—especially those that have been bent or broken—you may have to do your own independent analysis, and perhaps even write some code to extract the data.

Flow analysis—Examination of sequences of related packets (“flows”). Flow analysis is typically conducted in order to identify traffic patterns, isolate suspicious activity, analyze higher-layer protocols, or extract data.

As the networking industry evolves, so too has the use of the term “flow.” Increasingly, the term “flow analysis” in certain contexts has been used to refer to statistical analysis of flow records or flow metadata. We discuss flow statistics extensively in the next chapter. In this chapter, we focus on the analysis of the full flow content rather than simply the metadata.

The term “stream” is increasingly used interchangeably with “flow” in common parlance, particularly where data segment reassembly and subsequent content analysis are the main goals (as opposed to the statistical analysis of the sequence of segments or packets involved). In particular, Marty Roesch’s labeling of Snort’s flow reassembly processing module as “Stream” (which originally only processed TCP flows, but which now has evolved to handle flows over UDP as well) led to much of the IDS community referring to such methods as “stream reassembly.” Similarly, Wireshark’s “Follow TCP Stream” function has been augmented by “Follow UDP Stream” and “Follow SSL Stream.” We use the term “stream” in all of these contexts throughout this book.

In RFC 3679, a “flow” is defined as “a sequence of packets sent from a particular source to a particular unicast, anycast, or multicast destination that the source desires to label as a flow. A flow could consist of all packets in a specific transport connection or a media stream. However, a flow is not necessarily 1:1 mapped to a transport connection.” For practical purposes, identifying all of the TCP segments that compromise a flow can be done strictly within the TCP protocol itself at Layer 4 because the TCP protocol has built-in mechanisms to provide this accounting for the purposes of reliability, flow control, and state maintenence. Flows can be constructed upon other transport-layer protocols, including UDP, which is a connectionless protocol and provides no comparable mechanisms to track or define flows. As implied by RFC 3679, if an endpoint “desires” to define a flow, the flow itself will be defined by some protocol that manages connection establishment, maintenence, and completion. Likewise, an intermediary system (such as a switch, router, or firewall) can be configured to track and define flows.

Regardless of whether traffic is transmitted via TCP, a specific flow is defined by some protocol that establishes the connection, keeps track of the data segments sent or received, and terminates the connection when appropriate.

4.3.1 Flow Analysis Tools

There are several popular tools available that are capable of isolating, reconstructing, and exporting flows. These include Wireshark, tshark, tcpflow, and pcapcat. The tcpxtract tool takes flow reconstruction to the next level, and automatically carves files out of TCP streams using magic numbers. We discuss each of these tools below.

4.3.1.1 Wireshark: Follow TCP Stream

Wireshark has a function called “Follow TCP Stream” (shown in Figure 4-16) which allows you to select any packet that is part of a TCP stream in the Packet List view, and have Wireshark automatically reconstruct the full duplex contents of that stream from beginning to end if it is contained within the packet capture. Those contents can then be saved in different forms. Either side of the conversation can be saved independently. In that way, conversations, transactions, and file transfers that span multiple packets in a stream can be reconstructed in their entirety, as long as the packet capture contains all the necessary data.

Figure 4-16 An example of Wireshark’s “Follow TCP Stream” function.

Figure 4-17 is a screenshot of Wireshark’s “Follow TCP Stream” feature used to reconstruct and display both sides of a TCP stream (dark gray text represents traffic flowing in one direction, and light gray is the other direction). Notice the long button across the bottom, which allows you to select “Entire Conversation” or only one direction. You can also view in ASCII, Hex, Raw, or other formats.

Figure 4-17 Wireshark’s “Follow TCP Stream” function in action. The dark gray text represents traffic flowing in one direction, whereas the light gray text represents traffic flowing in the opposite direction. (In real life these colors were pink and blue, respectively; the image was modified to print in grayscale.)

The “Follow TCP Stream” feature is one of the features of the graphical Wireshark interface that does not currently have a corresponding command-line option in tshark as of the time this was written.

4.3.1.2 Conversations in Wireshark and tshark

Wireshark and tshark define a term, “conversation,” which is similar to a flow. According to the published Wireshark manual, “A network conversation is the traffic between two specific endpoints.”³³

33. “8.4.Conversations,” Wireshark, 2011, http://www.wireshark.org/docs/wsug_html_chunked/ChStatConversations.html.

Both Wireshark and tshark provide support for displaying statistics for different types of conversations in the packet capture, including IP, TCP, and UDP conversations.

4.3.1.3 tcpflow

The quintessential command-line tool for extracting the data from TCP streams is “tcpflow.” Originally released by Jeremy Elson in 1999, tcpflow can parse nonfragmented IP packets and reassemble and extract the payloads of any TCP stream it finds in a libpcap packet capture. In that way its output is similar to Wireshark’s “Follow TCP Stream” function, but it extracts all the streams in one fell swoop, saving their contents to files identified by the quartet of socket elements: source IP and port and destination IP and port.³⁴

34. Jeremy Elson, “CircleMUD,” August 7, 2003, http://www.circlemud.org/jelson/software/tcpflow/.

4.3.1.4 pcapcat

A more recent and similarly useful tool is pcapcat by Kristinn . Released in September 2009 as part of his winning entry in the inaugural ForensicsContest.com contest,³⁵ pcapcat is a perl script that by default reads a libpcap capture and lists all of the streams it sees. It allows for interactive analysis and can also dump out individual streams upon selection.³⁶

35. Kristinn , “Forensics Contest,” August 14, 2009, http://forensicscontest.com/contest01/Finalists/Kristinn_Gudjonsson/answer.txt.

36. Kristinn , “pcapcat,” 2009, http://blog.kiddaland.net/dw/pcapcat.

Just as most protocols can be identified by well-known sequences of bytes near the zero offset (as discussed above), almost all file formats have “headers” that lead with a few identifying bytes that can be catalogued. Many have common terminating sequences, or “footers,” as well.

These are often referred to as “magic numbers,” and if they can be catalogued they can be searched for in any arbitrary volume of bytes. The filesystem forensics tool “Foremost” takes exactly this approach. Upon finding an instance of a magic number, it begins carving contiguous data bytes until a preconfigured limit is reached, or the footer for the file (if any is specified) is encountered. This data is saved to a file with the appropriate file extension appended to the name.³⁷

37. “Foremost,” March 1, 2010, http://foremost.sourceforge.net/.

4.3.1.5 tcpxtract

Tcpxtract may be one of the coolest tools for the network forensic analyst. Released in the public domain in 2005 by Nick Harbour, tcpxtract is a libpcap-based tool designed specifically to extract and reconstruct payload data based on file signatures. Analogous to “foremost” for data-layer recovery from filesystem images, tcpxtract inspects reconstructed TCP streams for the beginning sequences of file formats that it knows about and carves the corresponding data out of the packets, to the extent that it is able. It has a configuration file that is very similar to foremost’s (and comes with a script to convert foremost configuration files to the format used by tcpxtract).³⁸

38. Nick Harbour, “tcpxtract,” October 13, 2005, http://tcpxtract.sourceforge.net/.

Here’s a snippet of a configuration file:

#----------------------------------------------------
# GRAPHICS FILES
#---------------------------------------------------
#
# AOL ART files
art(150000,x4ax47x04x0e, xcfxc7xcb);
art(150000,x4ax47x03x0e, xd0xcbx00x00);
# GIF and JPG files (very common)
gif(3000000, x47x49x46x38x37x61, x00x3b);
gif(3000000, x47x49x46x38x39x61, x00x00x3b);
jpg(1000000, xffxd8xffxe0x00x10, xffxd9);
jpg(1000000, xffxd8xffxe1);
jpg(1000000, xffxd8xffxe0);
# PNG (used in web pages)
png(1000000, x50x4ex47?, xffxfcxfdxfe);

A format variant is described on each line. The first parameter is the number of bytes to carve before giving up on trying to find the end sequence, if one is provided. The second parameter is the hexadecimal sequence that, upon discovery, triggers the carving of what is assumed to be data corresponding to the file type names. The third, optional parameter is the byte sequence that the file format uses to signal “end of file” (EOF), if one is known to exist.

Basic usage for analyzing a capture file:

$ tcpxtract -f capturefile.pcap

This will extract all recognizable files from the packet capture. You can also specify an output directory:

$ tcpxtract -f capturefile.pcap -o output_dir/

4.3.2 Flow Analysis Techniques

Basic flow analysis techniques include:

• List conversations and flows—List all conversations and/or flows within a packet capture, or only specific flows based on their characteristics.

• Export a flow—Isolate a flow, or multiple flows, and store the flow(s) of interest to disk for further analysis.

• File and data carving—Extract files or other data of interest from the reassembled flow.

We discuss each of these, and show examples, in the sections below.

4.3.2.1 List Conversations

Continuing with the Ann’s Bad AIM case, we know from packet analysis that the AIM client at 192.168.1.158 initiated a file transfer on port 5190 in order to transfer the file “recipe.docx”. Did a remote peer ever connect to this port to complete the file transfer?

We can use tshark to quickly view packet capture conversation statistics, as shown below:

$ tshark -qn -z conv,tcp -r evidence01.pcap
===============================================================================

TCP Conversations
Filter:<No Filter>
                                        |   <-      |    ->     |   Total   |
                                      Frames  Bytes Frames Bytes Frames Bytes
192.168.1.159:1271  <-> 205.188.13.12:443  31  29717   16   1451   47   31168
192.168.1.159:1221  <-> 64.12.25.91:443    24   4206   16   1799   40    6005
192.168.1.158:51128 <-> 64.12.24.50:443    20   2622   20   1681   40    4303
192.168.1.158:5190  <-> 192.168.1.159:1272   29   1042   15  13100   24   14142
192.168.1.159:1273  <-> 64.236.68.246:80    5   1545    5   1964   10    3509
192.168.1.2:54419   <-> 192.168.1.157:80    3    206    4    272    7     478
192.168.1.2:55488   <-> 192.168.1.30:22     2    292    3    246    5     538
===============================================================================

Scrolling through this list, we can see that there was indeed one conversation that involved 192.168.1.158 on TCP port 5190. The other host involved in the conversation was 192.168.1.159, and 13,100 bytes were transferred to it.

4.3.2.2 List TCP Flows

Let’s identify the specific flows of interest so that we can get ready to extract the higher-layer protocol data. During our earlier packet analysis, we saw that the sender (192.168.1.158) initiated the file transfer during the time frame of the packet capture. The actual file transfer must have occurred subsequently. Hence, we can limit our search to TCP flows that were created during the time of the packet capture (i.e., we can safely ignore any TCP flows that were initiated before our packet capture began).

Coveniently, the default behavior of pcapcat is to list only the TCP flows that were created during the capture time. In the output below, we see that there were only four TCP flows created within the packet capture.

$ pcapcat -r evidence01.pcap
[1] TCP 192.168.1.2:54419 -> 192.168.1.157:80
[2] TCP 192.168.1.159:1271 -> 205.188.13.12:443
[3] TCP 192.168.1.159:1272 -> 192.168.1.158:5190
[4] TCP 192.168.1.159:1273 -> 64.236.68.246:80
Enter the index number of the conversation to dump or press enter to quit:

Of these, only one flow was related to 192.168.1.158 on port 5190 (flow number 3). The other endpoint of this flow is a host on the local network, 192.168.1.159. This corroborates our findings from the tshark conversation statistics:

[3] TCP 192.168.1.159:1272 -> 192.168.1.158:5190

4.3.2.3 Export TCP Flow

We’ve identified the flow that most likely contains the OFT file transfer. Now let’s export that flow so that we can attempt to recover any data transferred within it. Using pcapcat, we can export the flow of interest, this time using a BPF filter to quickly narrow down the search:

$ pcapcat -r evidence01.pcap -w internal-stream.dump -f 'host 192.168.1.158
and port 5190'
[1] TCP 192.168.1.159:1272 -> 192.168.1.158:5190
Enter the index number of the conversation to dump or press enter to quit: 1
Dumping index value 1

You can also use the tool “tcpflow” to automatically extract any or all flows from a packet capture, as shown below. Here we will use tcpflow with a BPF filter to extract any TCP flows that related to 192.168.1.158 on port 5190:

$ tcpflow -r evidence01.pcap 'host 192.168.1.158 and port 5190'
tcpflow[25586]: tcpflow version 0.21 by Jeremy Elson <[email protected]>
tcpflow[25586]: looking for handler for datalink type 1 for interface
    evidence01.pcap
tcpflow[25586]: found max FDs to be 16 using OPEN_MAX
tcpflow[25586]: 192.168.001.159.01272-192.168.001.158.05190: new flow
tcpflow[25586]: 192.168.001.158.05190-192.168.001.159.01272: new flow
tcpflow[25586]: 192.168.001.158.05190-192.168.001.159.01272: opening new
    output file
tcpflow[25586]: 192.168.001.159.01272-192.168.001.158.05190: opening new
    output file

$ ls -l
total 13
-rwx------ 1 student student 12264 2011-01-08 20:53
    192.168.001.158.05190-192.168.001.159.01272
-rwx------ 1 student student   512 2011-01-08 20:53
    192.168.001.159.01272-192.168.001.158.05190

Notice that tcpflow extracted two half-duplex flows and saved them to files. As you can see, the filenames indicate the source and destination IP addresses and ports, and also indicate the direction of traffic flow.

You can also export flows manually in Wireshark, although this does not scale as well for large packet captures. In this case, to export the flow of interest using Wireshark, begin by clicking on any packet that is part of the flow of interest. Frame #109, as shown in Figure 4-18, is part of the connection between 192.168.1.158 and 192.168.1.159 on TCP port 5190.

Figure 4-18 Frame #109 shown in Wireshark. This frame is part of the identified conversation of interest between 192.168.1.158 and 192.168.1.159 on TCP port 5190.

Select packet #109 and right-click on “Follow TCP Stream.” This produces a screen that displays the raw contents of the full-duplex flow. (We can select one side of the conversation, as shown in Figure 4-19, to see only the data transferred from 192.168.1.158 (TCP port 5190) to 192.168.1.159.) Based on protocol markers such as the “OFT2” header, this appears to contain an OFT2 file transfer of a file called “recipe.docx.” Click “Save As” to save the full-duplex flow in “raw” format.

Figure 4-19 Wireshark’s “Follow TCP Stream” function, shown in this screenshot, is used to reconstruct and isolate the TCP stream between 192.168.1.158 and 192.168.1.159 on TCP port 5190. We have selected one direction of the stream, which appears to contain an OFT2 file transfer of the file “recipe.docx.”

4.3.2.4 File and Data Carving

What was the file that Ann’s AIM client transferred to 192.168.1.159? Now that we have exported the flow that is most likely to contain this OFT2 file transfer, let’s see if we can carve the file itself out of the captured network traffic.

Figure 4-20 shows the exported flow, internal-stream.dump, opened in the Bless hex editor. Notice the first 4 bytes in the stream are “OFT2,” which mark the beginning of an OFT header. Bytes 6–7 (“Type”) are 0x0101, which indicates that the sender is ready to transfer data.

Figure 4-20 The beginning of our exported flow, internal-stream.dump. Notice the first 4 bytes in the stream are “OFT2,” which mark the beginning of an OFT header. Bytes 6–7 (“Type”) are 0x0101, which indicates that the sender is ready to transfer data.

As shown in Figure 4-21, bytes 28–31 represent the “Total Size” of the file that the sender intends to transfer. In this case, the “Total Size” is “0x00002EE8,” or 12,008 bytes in decimal. The “Filename” string begins at Byte 192 (0xc0), and in this case it is “recipe.docx”, (padded with nulls to 64 bytes).

Figure 4-21 Our exported flow, internal-stream.dump, shown in Bless. Bytes 28–31 represent the “Total Size” of the file that the sender intends to transfer. In this case, the “Total Size” is “0x00002EE8,” or 12,008 bytes in decimal. Further below, we an see the filename “recipe.docx,” which begins at Byte 192 of the OFT header.

Scrolling down to Byte 256 (0x100), we see the beginning of another 256-byte OFT2 header (Figure 4-22). This time, the “Type” is “0x0202,” or “Acknowledge.” OFT “Acknowledge” packets are sent by the recipient to the sender to indicate that the recipient is ready to receive the file transfer. Since we are looking at the full-duplex communication, we see OFT messages from both the sender and recipient. If we were looking at only the half-duplex flow that we exported from Wireshark, we would not have seen this message from the recipient.

Figure 4-22 Our exported flow, internal-stream.dump, shown in Bless. Here at Byte 256 (0x100) we see a second OFT header with “Type” 0x0202, or “Acknowledge.” OFT “Acknowledge” packets are sent by the recipient to the sender to indicate that the recipient is ready to receive the file transfer.

Now let’s look for the beginning of a transferred .docx file. We can identify the beginning of the .docx file by the “magic number” at the beginning, “0x504B” or “PK” in ASCII. Scrolling down in Bless to Byte 512 (0x200) Figure 4-23, this magic number appears at bytes offset 0x200 to 0x201 (2 bytes). The Bless hex editor displays the byte locations for us when the bytes are highlighted (see the bottom panel for the offset in hexadecimal).

Figure 4-23 Our exported flow, internal-stream.dump, shown in Bless. The magic number for a .docx file, 0x504B, appears at Bytes 512–513 (0x200–0x201).

To find the end of the transferred file, we can add the initial byte offset of the file (0x0200) to the expected size (0x2EE8), which means that the file should end right before byte offset 0x30E8. This matches perfectly with what we see in Figure 4-24; scrolling to the offset in the exported flow data, we see that the next OFT header does in fact begin at byte offset 0x30E8. Notice that the transferred file appears to end with four null bytes: “00 00 00 00.”

Figure 4-24 Our exported flow, internal-stream.dump, shown in Bless. The .docx file ends just before Byte 0x30E8. We see that it ends with 4 null bytes, and is immediately followed by an OFT2 “Done” header (Type “0x0204”).

Let’s double-check and make sure that the file was transferred fully. Examining the OFT message at byte offset 0x30E8, shown in Figure 4-25, we can see that it has type 0x0204, or “Done.” OFT messages of this type are sent by the server to indicate that it is done sending the file. We can also check the size of the data that has already been sent. This is a 4-byte field at Byte 32 of the OFT header (Byte 0x3108 of the packet capture). The value of this field in the OFT “Done” packet is 0x2EE8, which matches the total size that the sender originally indicated it intended to transfer. From this, we can conclude that the file was fully transferred in one exchange.

Figure 4-25 Our exported flow, internal-stream.dump, shown in Bless. The OFT2 “Done” header begins at byte offset 0x30E8. Note the “Size” 0x2EE8 (12,008 bytes), which indicates the amount of data that has already been transferred. This matches the total file size indicated by the sender, which means our file transfer is complete.

Now that we have identified the beginning and end of the transferred file (Bytes 0x200 through 0x30E7), we can easily carve it out using Bless’ “Cut” tool. Figure 4-26 shows how to use Bless to snip off the extra data before the transferred file. Next, we also snip off the extra data after the transferred file, and save the resulting data as “recipe.docx”.

Figure 4-26 In this screenshot, we are using Bless to snip off the extra data before the transferred .docx file.

Let’s collect cryptographic hashes of the file so that we can later verify it has not changed since we carved it out:

$ sha256sum recipe.docx
f0f74a982a814640aedaa5fd6542ac810e8c5e257552bcc024a5c808343bccf9 recipe.docx

$ md5sum recipe.docx
8350582774e1d4dbe1d61d64c89e0ea1 recipe.docx

Double-check the size of the file we just carved:

$ ls -l recipe.docx
-rwx------ 1 student student 12008 2010-09-03 16:36 recipe.docx

The file size is 12,008 bytes in size, which matches our expectations. Verify the file type:

$ file recipe.docx
/home/student/.magic, 15022: Warning: using regular magic file `/etc/magic'
recipe.docx: Zip archive data, at least v2.0 to extract

This result, “Zip archive data,” makes sense, because .docx files are a type of zip file.

In a real investigation, you will want to store this file on media where it cannot easily be modified. As you progress in your forensic analysis, routinely store cryptographically hashed copies of important data (carved-out flows or files) in places where you cannot accidentally modify them.

Finally, we have carved out the file transferred! Let’s open a copy of it with a document editor and see what’s inside. Remember, programs that you use to view a file may also modify it. Always maintain hashed, write-protected copies of the important data you have extracted during analysis.

In Figure 4-27, we can see Anarchy-R-Us’s “Recipe for Disaster.” This is their secret recipe! We have found strong evidence that the user of 192.168.1.158 (Ann) transferred this file to someone at 192.168.1.159 through her AIM client.

Figure 4-27 Here is the file we carved out of the OFT transfer, recipe.docx. The contents are shown in OpenOffice.

Whew! Manually carving out that file was a lot of work. Imagine if you had a much bigger packet capture with lots of file transfers. This manual method is good for understanding the underlying system, but in practice it really doesn’t scale.

Let’s try using tcpxtract to automatically carve the transferred file out for us:

$ tcpxtract -f evidence01.pcap
...
Found file of type "zip" in session [205.188.13.12:47873 ->
    192.168.1.159:63236], exporting to 00000022.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000023.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000024.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000025.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000026.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000027.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000028.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000029.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000030.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000031.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000032.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000033.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000034.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000035.zip
Found file of type "zip" in session [192.168.1.158:17940 ->
    192.168.1.159:63492], exporting to 00000036.zip

$ ls -l
...
-rwx------ 1 student student 12020 2011-01-08 11:22 00000023.zip
-rwx------ 1 student student 11068 2011-01-08 11:22 00000024.zip
-rwx------ 1 student student 10264 2011-01-08 11:22 00000025.zip
-rwx------ 1 student student  9670 2011-01-08 11:22 00000026.zip
-rwx------ 1 student student  8775 2011-01-08 11:22 00000027.zip
-rwx------ 1 student student  7038 2011-01-08 11:22 00000028.zip
-rwx------ 1 student student  6209 2011-01-08 11:22 00000029.zip
-rwx------ 1 student student  5691 2011-01-08 11:22 00000030.zip
-rwx------ 1 student student  5380 2011-01-08 11:22 00000031.zip
-rwx------ 1 student student  3485 2011-01-08 11:22 00000032.zip
-rwx------ 1 student student  2807 2011-01-08 11:22 00000033.zip
-rwx------ 1 student student  2585 2011-01-08 11:22 00000034.zip
-rwx------ 1 student student  1974 2011-01-08 11:22 00000035.zip
-rwx------ 1 student student  1737 2011-01-08 11:22 00000036.zip

If you search the flow of interest for the hex string “0x504B0304,” which tcpxtract uses by default to mark the beginning of a zip file, you will see that this string was actually present 14 times within the flow. Only one of these instances corresponds with the “0x504B0304” that marks the beginning of our file transfer for recipe.docx. This was the first instance, which tcpxtract carved out as file “00000023.zip.”

Let’s try saving “00000023.zip” as “recipe-tcpxtract.docx” and see what happens when we open it in a document editor. Figure 4-28 shows the .docx file carved out by tcpxtract, which indeed appears to contain the same content as the recipe.docx file that we carved out manually earlier.

Figure 4-28 Here is a file we carved out of the OFT transfer using tcpxtract. The contents are shown in OpenOffice. Note that the contents appear to be the same as the file we carved out earlier manually—but as we will see, they are not quite the same.

But when we take the cryptographic hashes, we see immediately that this file is not the same as the one we carved out earlier:

$ md5sum recipe-tcpxtract.docx
a217badfdb530bd55d1dbd2280cb3e2b  recipe-tcpxtract.docx

$ sha256sum recipe-tcpxtract.docx
3472a1720544098caf9872401932d10a021dfc56ea34cad5db0f80c92080ba82  recipe-
    tcpxtract.docx

In fact, the file size is slightly different as well. Here we see that the file carved out by tcpxtract is 12,020 bytes, whereas the one we carved earlier was only 12,008 bytes.

$ ls -l recipe-tcpxtract.docx
-rwx------ 1 student student 12020 2011-01-08 11:58 recipe-tcpxtract.docx

Which is correct, our manual extraction or tcpxtract’s automated extraction?

Let’s open both the files up in a hex editor, as shown in Figure 4-29. We can see visually that the beginning of the files are the same, but the file carved by tcpxtract has 12 bytes of extra data at the end. This makes sense, since we know that recipe-tcpxtract.docx was 12,020 bytes in size, compared to recipe.docx, which was only 12,008 bytes in size.

Figure 4-29 This screenshot shows the ends of the two files that we carved out. The top file (a) was carved by tcpxtract, and the bottom file (b) was carved manually. Notice that the file carved by tcpxtract had 16 nulls at the end, but the manually carved file only had 4.

Given that the highest-layer protocol, OFT, reported the expected file size as “0x00002EE8,” or 12,008 bytes, it’s likely that the file we carved out manually is correct. Why, then, did tcpxtract pad the file with 12 bytes of mysterious trailing nulls?

Carefully examining the relevant TCP stream in Wireshark, we can see that after the end of the file transfer, 192.168.1.158 sent two additional TCP packets to 192.168.1.159 (#134 and #139). Neither of these packets contained any data within the TCP payload. In packet #139 of the evidence file (shown in Figure 4-30), you can see that the “Total Length” of the IP packet is 40 bytes. The IP “Header length” is 20 bytes, and the TCP “Header length” is 20 bytes. Adding up the IP and TCP header lengths, we find that the total length of the IP + TCP headers is 40 bytes. Since that is also the total length of the whole IP packet, there must be zero data in the TCP payload. You can confirm this by looking in the Packet Details window, where the TCP protocol is marked as “Len:0,” to indicate zero payload length.

Figure 4-30 After the end of the file transfer, 192.168.1.158 sent two additional TCP packets to 192.168.1.159 (#134 and #139). Neither of these packets contains any data within the TCP payload. Each packet is padded with six nulls at the end by the network card, to reach the minimum size for an Ethernet frame (46 bytes).

The minimum payload size of an Ethernet frame is 46 bytes.³⁹ Consequently, the network device driver padded the Ethernet frame’s payload with 6 null bytes, since TCP/IP only took up 40 bytes total. You can see these 6 null bytes at the end of Wireshark’s Packet Bytes frame in Figure 4-30. Notice how they occur immediately after the TCP header. These days, TCP options are so prevalent that minimum-length 20-byte TCP headers are not as common as they used to be. As a result, padded Ethernet frames are relatively rare.

39. “IEEE-SA -IEEE Get 802 Program,” http://standards.ieee.org/about/get/802/802.3.html.

There are two padded packets in this flow sent from 192.168.1.158 to 192.168.1.159 after the end of the file transfer, which contain a total of 12 null bytes of Ethernet padding. Note that if no footer magic number is present, tcpxtract will continue to carve TCP data until the end of the flow. It is likely that tcpxtract incorrectly included the 6 null bytes of padding from each of these two packets as part of the TCP stream because rather than properly calculating the size of each TCP payload, it simply grabbed all bytes after the TCP header and mistakenly included them as part of the TCP stream.

The moral of the story: Trust, but verify.

There was a big flap in 2003 when AtStake released a whitepaper detailing how certain improperly written network device drivers were padding Ethernet frames with bytes from previously handled frames or kernel memory. This resulted in potentially sensitive snippets of network traffic and kernel memory leaking to other hosts on the LAN. This vulnerability, which was found across a wide range of device drives from different sources, was dubbed “Etherleak.”⁴⁰

40. “Atstake,” 2003, http://www.atstake.com/research/advisories/2003/a010603-1.txt.

4.4 Higher-Layer Traffic Analysis

If an investigator has been successful in isolating and reconstructing the payloads of a transport-layer flow, it becomes possible to inspect the higher-layer protocols. Common examples of higher-layer protocols include:

• Hypertext Transfer Protocol (HTTP)

• Simple Mail Transfer Protocol (SMTP)

• Domain Name System (DNS)

• Dynamic Host Configuration Protocol (DHCP)

Of course, there are many, many others.

4.4.1 A Few Common Higher-Layer Protocols

In this section, we briefly review important details of a few of the most common higher-layer protocols: HTTP, DHCP, SMTP, and DNS. This background will be helpful throughout the remainder of the book.

4.4.1.1 Hypertext Transfer Protocol (HTTP)

The Hypertext Transfer Protocol (HTTP) was originally developed in the early 1990s by Tim Berners-Lee at the European Organization for Nuclear Research (CERN).⁴¹ Berners-Lee wanted to help scientists at CERN share information in a flexible, distributed way. In a 1989 proposal, he wrote:⁴²

41. Joshua Quittner, “Network Designer Tim Berners-Lee,”—Time, March 29, 1999, http://www.time.com/time/magazine/article/0,9171,990627,00.html.

42. Tim Berners-Lee, “Information Management: A Proposal,” March 1989, http://www.w3.org/History/1989/proposal.html.

CERN is a wonderful organisation. It involves several thousand people, many of them very creative, all working toward common goals. Although they are nominally organised into a hierarchical management structure, this does not constrain the way people will communicate, and share information, equipment and software across groups.

The actual observed working structure of the organisation is a multiply connected ‘web’ whose interconnections evolve with time. . . . [I]nformation is constantly being lost. The introduction of the new people demands a fair amount of their time and that of others before they have any idea of what goes on. The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.

. . . It was for these reasons that I first made a small linked information system, not realising that a term had already been coined for the idea: ‘hypertext.’

. . . ‘Hypertext’ is a term coined in the 1950s by Ted Nelson [. . .], which has become popular for these systems, although it is used to embrace two different ideas. One idea (which is relevant to this problem) is the concept: ‘Hypertext’: Human-readable information linked together in an unconstrained way.

Along with HTTP, Berners-Lee also developed the first web browser and the HyperText Markup Language (HTML), a language used by web browsers to exchange and display information.⁴³

43. Tim Berners-Lee, “HyperText Transfer Protocol,” June 12, 1992, http://www.w3.org/History/19921103-hypertext/hypertext/WWW/Protocols/HTTP.html.

HTTP operates according to a request/response model, where the HTTP client sends a request to a remote server, and the server processes the request and sends a response. By default, HTTP servers operate over TCP port 80, although they are often configured to listen on different ports. A Uniform Resource Identifier (URI) is a string used to specify the location of a resource.⁴⁴

44. R. Fielding et al., “Hypertext Transfer Protocol—HTTP/1.1,” IETF, June 1999, http://www.ietf.org/rfc/rfc2616.txt.

HTTP requests and responses are referred to as messages. HTTP messages can include a message header (which typically contains fields that store metadata about the transaction) and a message body (which normally contains content used in the request or response).

The HTTP protocol uses methods to perform operations. For example, the “GET” method is used to retrieve a web page and the “POST” method is used to send data to a server.

There are eight methods defined by the RFC (and the protocol is extensible, so custom methods can be defined). Only the “GET” and “HEAD” methods must be supported by a web server; these are designed simply for information retrieval. Other methods are optional. Methods defined by RFC 2616 include:

• OPTIONS—Obtain information about communicating with the remote server.

• GET—Retrieve the information identified by the URI.

• HEAD—Retrieve the information identified by the URI, without returning a message-body. Helpful for URI validation and debugging.

• POST—Send data to the resource specified by the URI for processing.

• PUT—Upload information to be stored under the specified URI.

• DELETE—Delete the resource specified by the URI.

• TRACE—Echo a request message back to the client. Helpful for debugging (can also facilitate cross-site scripting attacks).

• CONNECT—Reserved “for use with a proxy that can dynamically switch to being a tunnel.”⁴⁵

45. “Hypertext Transfer Protocol—HTTP/1.1.”

The HTTP response sent from the server includes a three-digit status code, which indicates the status of the request, and reason phrase, which provides a human-readable description of the status. Common status codes include 200 (“OK”), which means that the request was successfully processed, and 404 (“Not Found”), which means that the URI requested was not found on the server.

There are five categories for status codes, organized by the first digit:⁴⁶

46. Ibid.

- 1xx: Informational - Request received, continuing process

- 2xx: Success - The action was successfully received,
  understood, and accepted

- 3xx: Redirection - Further action must be taken in order to
  complete the request

- 4xx: Client Error - The request contains bad syntax or cannot
  be fulfilled

- 5xx: Server Error - The server failed to fulfill an apparently
  valid request

Designed to run on top of a reliable, connection-oriented protocol such as TCP, HTTP does not include built-in mechanisms to track session state. Over time, software engineers have developed ways to track the state of HTTP sessions, such as cookies, which are small files on the client system that maintain data related to an HTTP session.

For more details about web page caching and web proxies, please see Chapter 10.

4.4.1.2 Dynamic Host Configuration Protocol (DHCP)

To communicate over modern Ethernet networks, every computer has a network card, which is assigned a 6-byte MAC address by the manufacturer. In order to take advantage of IP-based routing, each computer also needs to be assigned a 32- or 128-bit IP address (for IPv4 or IPv6, respectively). Once upon a time, network administrators configured each computer manually with a static IP address that did not change. As large-scale deployments and mobile devices became increasingly common, it became clear that there was a need to dynamically assign IP addresses so that computers could be deployed and moved throughout the network without requiring manual IP address configuration changes.

These days, most organizations use Dynamic Host Configuration Protocol (DHCP) to dynamically assign IP addresses to network cards, rather than configuring static IP addresses on each system. DHCP is a Layer 7 protocol that facilitates automatic configuration of network details, including IP address, gateway, DNS servers, and more. There are separate versions of DHCP for IPv4⁴⁷ and IPv6.⁴⁸ DHCP is designed to operate over UDP, on ports 67 and 68.

47. R. Droms, “RFC 2131—Dynamic Host Configuration Protocol,” IETF, March 1997, http://rfc-editor.org/rfc2131.txt.

48. R. Droms et al., “RFC 3315—Dynamic Host Configuration Protocol for IPv6 (DHCPv6),” IETF, July 2003, http://rfc-editor.org/rfc3315.txt.

For forensic investigators, DHCP records are often the “glue” between a computer’s traffic and the physical realm. DHCP server logs and DHCP communications in packet captures can contain valuable infomation, including the client MAC address (which can provide clues to the hardware manufacturer), client hostname, routing information, and more.

MAC Addresses

As discussed, every network card on an Ethernet network has a 6-byte MAC address, assigned by the manufacturer. Ethernet frames include a 6-byte field for the destination MAC address and another for the source MAC address. Computers on the local subnet typically listen for Ethernet frames destinated for their network card’s MAC address (or the broadcast MAC address, ff:ff:ff:ff:ff:ff).

Each manufacturer has their own 3-byte registered identification number, known as an Organizational Unique Identifier (OUI), which they set as the high-order 3 bytes of the MAC address on each card they manufacture, as shown in Figure 4-31. When you encounter a MAC address, you can look up the OUI to determine the likely manufacturer of the network card. For the purposes of forensics, it is often helpful to have evidence that indicates the manufacturer of client hardware. However, the network card’s MAC address can be changed by the end-user from within their operating system, although in practice most people do not.

Figure 4-31 Every network card on an Ethernet network has a 6-byte MAC address, assigned by the manufacturer. Manufacturers have their own 3-byte registered identification number (known as an OUI), which they set as the higher-order 3 bytes of the MAC address.

For example, let’s suppose we have a client with the MAC address “00:12:79:3f:27:e5.” The first 3 bytes represent the manufacturer’s OUI. The IEEE maintains a public listing of OUI assignments. In the IEEE’s public file, the bytes are separated by hyphens rather than colons, so we will search for “00-12-79.” This OUI is assigned to Hewlett-Packard (HP), so HP is the likely manufacturer of the network card. Remember, the MAC address can be changed by the end-user, although in practice this is rare.⁴⁹

49. IEEE, “OUI List,” July 7, 2011, http://standards.ieee.org/develop/regauth/oui/oui.txt.

DHCP

When a computer is connected to the network, it broadcasts a DHCP request. The local DHCP server will answer with a unicast reply (at Layer 3). This is much like ARP, where the ARP request is broadcast (at Layer 2) and the reply is unicast. In its response, the DHCP server offers a DHCP lease, which is an IP address assignment for a limited amount of time. That lease includes the IP address assigned, the netmask and gateway address, often DNS server addresses, and always the amount of time that the lease is valid. Most clients will request a lease renewal before the lease expires, thereby maintaining the same IP address for an extended period of time. As a result, in practice DHCP leases are fairly stable, meaning that a computer may be assigned the same IP address for days, weeks, or months. This is not guaranteed, but it is often the case.

When analyzing DHCP traffic, you will frequently see the following exchange:

• Client: DHCPDISCOVER (Layer 2 broadcast)

• Server: DHCPOFFER

• Client: DHCPREQUEST

• Server: DHCPACK

This is the process by which a client discovers the DHCP server and receives a DHCP lease. By default, the client will attempt to renew its DHCP lease when half of the lease time has expired.

The following excerpt from the IETF’s RFC 2131 “Dynamic Host Configuration Protocol” summarizes the purpose of various DHCP messages:⁵⁰

50. “RFC 2131—Dynamic Host Configuration Protocol.”

Message          Use
  -------          ---
  DHCPDISCOVER  -  Client broadcast to locate available servers.

  DHCPOFFER     -  Server to client in response to DHCPDISCOVER with offer
                   of configuration parameters.

  DHCPREQUEST   -  Client message to servers either (a) requesting
                   offered parameters from one server and implicitly
                   declining offers from all others, (b) confirming
                   correctness of previously allocated address after,
                   e.g., system reboot, or (c) extending the lease on
                   a particular network address.

  DHCPACK       -  Server to client with configuration parameters,
                   including committed network address.

  DHCPNAK       -  Server to client indicating client's notion of network
                   address is incorrect (e.g., client has moved to new
                   subnet) or client's lease has expired
  DHCPDECLINE   -  Client to server indicating network address is
                   already in use.

  DHCPRELEASE   -  Client to server relinquishing network address and
                   cancelling remaining lease.

  DHCPINFORM    -  Client to server, asking only for local configuration
                   parameters; client already has externally
                   configured network address.

The DHCP message exchange for a new network address allocation is illustrated in the table below, from RFC 2131:⁵¹

51. “RFC 2131—Dynamic Host Configuration Protocol.”

4.4.1.3 Simple Mail Transfer Protocol (SMTP)

The Simple Mail Transfer Protocol (SMTP) was designed as a protocol that clients could speak with servers in order to deliver outbound email, as well as to provide for their delivery hop-by-hop to the end mail transfer agent (MTA). The end MTA either took responsibility for local delivery to another process, or rejected the missive. Originally used on TCP port 25, it is commonly seen on TCP port 587, and used with SMTP Authentication.⁵²

52. J. Klensin, “RFC 5321—Simple Mail Transfer Protocol,” IETF, October 2008, http://rfc-editor.org/rfc/rfc5321.txt.

Basic Use

As with “snail mail” (see Figure 4-32) email messages go through many different stations on their way between sender and recipient. When you send a physical letter, you might place it in your own mailbox. Then it gets picked up and brought to your local post office, then transferred to a regional post office, then transferred a long distance to another regional post office, then the local post office at the destination town, and finally the recipient’s mailbox. Similarly, when you send an email, the message might travel from your local mail client, to a mail submission agent (MSA) within your organization, to mail transfer agent an MTA, to the mail exchanger (MX) for the destination domain, to the local mail delivery agent (MDA), and finally to the recipient’s mailbox.^{53, 54}

53. “RFC 5321—Simple Mail Transfer Protocol.”

54. “Simple Mail Transfer Protocol,” Wikipedia, July 22, 2011, http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#Outgoing_mail_SMTP_server.

Figure 4-32 As with snail mail, an email message goes through multiple stations on the way to its destination.

Here are definitions for important terms used to describe devices which implement SMTP:

• Mail User Agent (MUA)—The end-user’s mail client

• Mail Submission Agent (MSA)—Handles local mail submission

• Mail Transfer Agent (MTA)—Transfers mail between mail servers

• Mail eXchanger (MX)—Accepts incoming messages for a domain

• Mail Delivery Agent (MDA)—Handles local mail delivery

Basic commands

Each SMTP transaction includes four command sequences:

• HELO—Opens the connection

• MAIL—Specifies the return address

• RCPT—Specifies the recipient address(es)

• DATA—The contents of the message. This includes the message header and the message body. The message header contains data such as the “From,” “To,” “Date,” and “Subject” lines, which typically appear in the mail client’s display. The message body contains the actual contents of the message. The message header and message body are separated by an empty line.

At the end of each command sequence, the mail server has the opportunity to accept or reject the values entered by the client.

A simple transcript

Here is a simple transcript of an SMTP exchange:

$ telnet 10.1.34.114 25
Trying 10.1.34.114...
Connected to 10.1.34.114.
Escape character is '^]'.
220 smtp.lmgsecurity.com
HELO client.example.com
250 smtp.lmgsecurity.com
MAIL FROM: [email protected]
250 2.1.0 Ok
RCPT TO: [email protected]
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
From: Jonathan Ham <[email protected]>
To: Sherri Davidoff <[email protected]>
Date: Mon, 06 September 2010 01:20:45 -0700
Subject: nice SMTP example!

That's the best SMTP example I've ever seen! Simply amazing. Two thumbs up.

/jonathan
.
250 2.0.0 Ok: queued as B0F6618C06D

SMTP Authentication

During the 1990s, email spam became a serious problem. Until then, most mail servers did not require authentication in order to submit messages for relaying, and as a result, spammers worldwide submitted large volumes of messages to be transferred through other people’s mail servers. In the 1990s, the SMTP Authentication standard was first developed to provide a standard protocol for SMTP authentication in order to block illegitimate senders from using third-party mail servers to relay spam. The specification has been improved upon and refined over time. In the example below, note that plain text authentication is used. When this option is selected, the credentials are transmitted as Base64-encoded characters. The string in the example, “dGVzdAB0ZXN0ADEyMzQ=,” decodes as “testtest1234.” (You can use the “base64” command on many Linux systems to decode it.)

Here’s a nice example from RFC 4954:⁵⁵

55. R. Siemborski et al., “RFC 4954—SMTP Service Extension for Authentication,” IETF, July 2007, http://rfc-editor.org/rfc/rfc4954.txt.

S: 220-smtp.example.com ESMTP Server
C: EHLO client.example.com
S: 250-smtp.example.com Hello client.example.com
S: 250-AUTH GSSAPI DIGEST-MD5
S: 250-ENHANCEDSTATUSCODES
S: 250 STARTTLS
C: STARTTLS
S: 220 Ready to start TLS
... TLS negotiation proceeds, further commands
protected by TLS layer ...
C: EHLO client.example.com
S: 250-smtp.example.com Hello client.example.com
S: 250 AUTH GSSAPI DIGEST-MD5 PLAIN
C: AUTH PLAIN dGVzdAB0ZXN0ADEyMzQ=
S: 235 2.7.0 Authentication successful

4.4.1.4 Domain Name System (DNS)

The Domain Name System (DNS) protocol provides a hierarchical distributed database for resolving the names that people prefer to use with the 32-bit IPv4 or 128-bit IPv6 numerical addresses.⁵⁶ For both the original “top-level domains” (TLDs) and the newer country code/vanity TLDs, there are “root” servers established that understand the distribution and delegation of control. Hence querying the “.edu” root server for “computer.mit.edu” will not result in an authoritative answer, but rather a pointer to the delegate nameserver for “mit.edu,” which may know the authoritative answer.

56. P. Mockapetris, “RFC 1035—Domain Names—Implementation and Specification,” IETF, November 1987, http://rfc-editor.org/rfc/rfc1035.txt.

DNS is a query-response protocol. The client typically asks a question within a single UDP packet and the server’s response similarly fits within a single UDP packet. The DNS protocol defines the structure of both the query and response. In the structure, there are fields for many types of data that could be requested and the corresponding answers. Some of these fields can contain arbitrary data, and therefore can be used as the basis for a bidirectional covert channel.

Since the underlying Layer 3 protocol is UDP, it is up to the tunneled protocols to provide reliable delivery, flow control, and so forth. As a consequence, it is common to see TCP traffic tunneled over IP traffic layered over DNS over UDP.

It is possible to route normal DNS traffic over TCP. For example, this commonly occurs when the server’s response to a query is too large to fit within a single UDP packet. In this event, the server, as per the DNS protocol, sets the “truncated” bit in its DNS reply to indicate that the response has been truncated to fit in the UDP-encapsulated reply. Usually the client resubmits the query via a TCP connection so that it can retrieve the entirety of the reply. This often occurs when there is a request for an AXFR record, also known as a DNS “zone transfer.” This query essentially asks the server to tell the client everything it knows about a particular domain. These types of requests provide valuable reconnaissance information for attackers and are often a precursor to attack. As a result, DNS over TCP port 53 is often blocked at the perimeter or monitored. Therefore, attackers seeking to use DNS for a covert channel are likely to stick with UDP.⁵⁷

57. J. Postel, “RFC 792—Internet Control Message Protocol,” September 1981, http://www.rfc-editor.org/rfc/rfc792.txt.

DNS Recursion

DNS queries that seek to map names to IP addresses (which is the most common case) start out with a client making a query of the server it considers to be most knowledgeable. That server then crawls the hierarchical space for the answer, returns it, and then also caches the result to expedite any subsequent client queries for the same information. To recurse from the root of “.edu” to an authoritative answer about, say, computer.mit.edu, a number of queries might be required.

DNS Queries

It can be extremely important to collect name-to-address mapping. This can be accomplished by querying any number of DNS servers, both internal and external. The result will be a snapshot of answers being offered at the time, based on caching parameters. The simplest request can take the form of a command-line query as such:

$ dig www.google.com

This will result in an address record (A) query from the default nameserver. Specific name-servers can be queried with dig as follows:

$ dig @ns.google.com www.google.com

Reverse “PTR” records and other records such as nameserver delegations can be obtained as well:

$ dig @ns.modwest.com 204.11.246.86 PTR
$ dig lmgsecurity.com NS

4.4.2 Higher-Layer Analysis Tools

There are a wide variety of tools available that interpret higher-layer protocols and automatically print important details, carve out files, decode data, or even produce professional-quality forensic reports. Some of these tools are highly specialized for use with one specific higher-layer protocol, whereas others are multipurpose.

In this section, we review a few examples of tools that are designed for higher-layer protocol analysis.

4.4.2.1 oftcat

Written by Kristinn , oftcat is a dedicated OFT protocol dissector and data-carving tool. It expects as input a data file that contains a single flow’s reassembled transport-layer payload, such as tcpflow or pcapcat, might produce. As output, oftcat produces a protocol summary of all of the OFT activity it can identify. It also recovers any files transferred within the data stream, and saves them as files with their correct names based on the OFT metadata.⁵⁸ You can see this tool in action in Section 4.4.3.1.

58. Kristinn , “oftcat,” August 18, 2009, http://blog.kiddaland.net/dw/oftcat.

4.4.2.2 smtpdump

The smtpdump tool was written by Franck Guénichot, the co-winner of Puzzle #2, “Ann Skips Bail.”⁵⁹ Smtpdump’s “help” display speaks for itself:

59. Franck Guénichot, “Index of /contest02/Finalists/Franck_Guenichot,” December 17, 2009, http://forensicscontest.com/contest02/Finalists/Franck_Guenichot/.

$ smtpdump

    smtpdump version 0.1,
    Copyright (C) 2009 Franck GUENICHOT
    smtpdump comes with ABSOLUTELY NO WARRANTY;
    This is free software, and you are welcome
    to redistribute it under certain conditions.
    (GPL v3)

    Usage: smtpdump [$options] -r <pcap_file>
    -A, --auth                       Display SMTP Auth informations (only
        LOGIN method)
    -e, --info                       Display E-mail informations
    -b, --brief                      Display minimum e-mail informations
    -x, --xtract                     Extract e-mail attachments

    -m, --md5                        Display extracted attachment MD5 Hash
    -s, --save                       Save raw e-mail to file
    -f, --flow-index <index>         Filters only given index flow
    -r, --read <pcap_file>           Read the given pcap file [REQUIRED]
    -v, --version                    Display version information
    -h, --help                       Display this screen

4.4.2.3 findsmtpinfo.py

Jeremy Rossi, co-winner of Puzzle Contest #2 (“Ann Skips Bail”),⁶⁰ created an SMTP analysis tool that is absolutely simple to use. You specify the pcap file, and it automatically extracts authentication data, decodes Base64-encoded credentials, prints mail header info, extracts each attachment, takes the MD5sum, and produces a report. The report is detailed and well suited for the appendix of a forensic examiner’s findings report.

60. Jeremy Rossi, “Index of /contest02/Finalists/Jeremy_Rossi,” December 17, 2009, http://forensicscontest.com/contest02/Finalists/Jeremy_Rossi/.

Here’s a description of findsmtpinfo.py written by Jeremy Rossi for the contest:⁶¹

61. Ibid.

The initial idea was to write the entire process in Python, but after
    starting to write the code, I found that tcpflow can handle parsing the
    pcap the Python code can be used to present the data and analyze the
    output. I called the Python script findsmtpinfo.py.
The script creates a report of the SMTP information, stores any emails in msg
     format, stores any attachments from the emails, decompresses them if they
     are a compressed format (zip, docx), checks MD5 hashes of all files
    including the msg files (and generates MD5 hash of output report).
So the Python script makes use of three arguments:
-p|--pcap     This argument specifies the pcap file
-r|--report   This argument specifies the report output directory [Defaults
     to ./report]
-f|--force    If the report directory has files already, this argument is
     required to allow overwriting of files

findsmtpinfo.py -p evidence02.pcap -r ./report

4.4.2.4 NetworkMiner

NetworkMiner, by Erik Hjelmvik, is a fantastic multipurpose traffic analysis tool. It runs on Windows, and has both an intuitive GUI interface and a powerful protocol analysis engine. There is a free, open-source edition of NetworkMiner. You can download it from http://networkminer.sourceforge.net/.⁶²

62. Erik Hjelmvik, “NETRESEC NetworkMiner—The Network Forensics Analysis Tool,” 2011, http://www.netresec.com/?page=NetworkMiner.

We explore NetworkMiner in more detail in Section 4.4.3.2.

4.4.3 Higher-Layer Analysis Techniques

There are two general strategies to consider when conducting automated analysis of higher-layer protocols:

• Use small, specialized tools that are each specifically designed to analyze a particular higher-layer protocol; or

• Use a multipurpose tool to quickly gather a wide range of information about the traffic.

Each strategy comes with benefits and tradeoffs, as we will discuss momentarily.

4.4.3.1 Small, Specialized Tools

As we have seen, tools such as oftcat and smtpdump are specifically designed for analysis of one higher-layer protocol. They are small and have a limited range of options, but provide more support for analyzing their targeted protocols than most if not all multipurpose tools.

Small and specialized tools are very useful when you have a clear understanding of the higher-layer protocol contained within the packet capture. They are also usually designed to interface easily with other tools so that output can be redirected into another program for further processing.

Let’s see what happens when we use “oftcat” to analyze the OFT file transfer in “Ann’s Bad AIM.” Recall that carving out the transferred file earlier was a painstaking manual process and our attempt to use tcpxtract was unfortunately thwarted due to erroneous TCP stream reconstruction. Perhaps a small tool, specifically designed to analyze the OFT protocol, will help us carve out the transferred file correctly and efficiently.

Here we are using oftcat to interpret the half-duplex flow from 192.168.1.158 to 192.168.1.159, which tcpflow carved out:

$ oftcat -r 192.168.001.158.05190-192.168.001.159.01272
------------------------------------------------------------
  File name: 192.168.001.158.05190-192.168.001.159.01272

------------------------------------------------------------
Parsing OFT (Oscar File Transfer) header

Name of file transferred:
  Total number of files 1
  Files left: 1
  Total parts: 1
  Parts left: 1
  Total size: 12008
  Size: 12008
  Checksum: 2976120832
  ID string 'Cool FileXfer'
  Type: PEER_TYPE_GETFILE_RECEIVELISTING, PEER_TYPE_RESUMEACK,
      PEER_TYPE_RESUME, PEER_TYPE_GETFILE_REQUESTLISTING,
      PEER_TYPE_RESUMEACCEPT, PEER_TYPE_GETFILE_ACKLISTING, PEER_TYPE_PROMPT,
  Name offset 0
------------------------------------------------------------
parsing file information
Final header (after file transfer)

File: recipe.docx saved in file recipe.docx

As you can see, with the press of a button oftcat automatically identified an OFT message, printed important OFT protocol details to the screen, and carved out the transferred file (“recipe.docx”). We can subsequently check the file size and take the cryptographic hashes:

$ sha256sum recipe.docx
f0f74a982a814640aedaa5fd6542ac810e8c5e257552bcc024a5c808343bccf9 recipe.docx

$ md5sum recipe.docx
8350582774e1d4dbe1d61d64c89e0ea1 recipe.docx

$ ls -l recipe.docx
-rwx------ 1 student student 12008 2011-01-08 23:42 recipe.docx

The cryptographic hashes and the file size (12,008 bytes) match the values we found earlier when we carved the file out by hand. Notice that the file size of “recipe.docx” (12,008 bytes) also matches the values reported by oftcat as the “Total Size” (“12,008”).

Carving the file out with oftcat was much easier than doing so manually, and it produced the same result. Imagine if you had a packet capture with lots of OFT file transfers! You would want a reliable tool that automates the job, such as oftcat.

4.4.3.2 Multipurpose Tool

Multipurpose tools such as NetworkMiner include support for a wide variety of protocols, including higher-layer protocols such as SMTP, AIM, and others. Multipurpose tools are especially useful when you want to gather a wide variety of information from a packet capture, potentially relating to multiple higher-layer protocols, or if you are not sure exactly what you are looking for. Multipurpose tools are designed to provide a broad spectrum of information about a packet capture.

As an example, let’s use NetworkMiner to analyze the Ann’s Bad AIM evidence file. In Figure 4-33, you can see a screenshot of NetworkMiner in action. We simply dragged and dropped the evidence file (evidence01.pcap) into NetworkMiner and it automatically interpreted the results. The first tab, “Hosts,” lists all of the IP addresses of the hosts that transmitted IP traffic in the packet capture.

Figure 4-33 NetworkMiner with the “Ann’s Bad AIM” evidence01.pcap loaded.

In NetworkMiner’s “Files” tab (shown in Figure 4-34), you can see all the files that NetworkMiner carved out of the packet capture, including the “recipe.docx” file. By right-clicking on the filename, you can open it up and view the contents or save it to a folder.

Figure 4-34 NetworkMiner has already extracted recipe.docx.

Easy! While a multipurpose tool such as NetworkMiner may not always support the specific higher-layer protocol of interest to you, it is often a good place to start.

4.5 Conclusion

Packet analysis is a complex art. From identifying protocols within packets, to isolating packets of interest, to reconstructing flows and carving out higher-layer protocol data, mastering packet analysis takes patience and keen attention to detail. Many tools exist for packet analysis, but they are not always accurate, especially because the underlying network protocols and specifications are constantly changing.

In this chapter, we began by studying network protocols. We discussed where protocols are defined and documented, with the caveat that there is no guarantee that what is on paper matches reality! We introduced a variety of protocol and packet analysis tools and took an in-depth look at a few important higher-layer protocols, which we will refer to throughout the book. Although our focus is always on teaching you the low-level techniques so that you can understand what your tools are doing, we also introduced multipurpose packet analysis tools, which are very powerful and popular.

4.6 Case Study: Ann’s Rendezvous

The Case: After being released on bail, Ann Dercover disappears! Fortunately, investigators were carefully monitoring her network activity before she skipped town. “We believe Ann may have communicated with her secret lover, Mr. X, before she left,” says the police chief. “The packet capture may contain clues to her whereabouts.”

Challenge: You are the forensic investigator. Your mission is to analyze the packet capture and gather information about Ann’s activities and plans.

The following questions will help guide your investigation:

• Provide any online aliases or addresses and corresponding account credentials that may be used by the suspect under investigation.

• Who did Ann communicate with? Provide a list of email addresses and any other identifying information.

• Extract any transcripts of Ann’s conversations and present them to investigators.

• If Ann transferred or received any files of interest, recover them.

• Are there any indications of Ann’s physical whereabouts? If so, provide supporting evidence.

Network:

• Internal network: 192.168.30.0/24

• DMZ: 10.30.30.0/24

• The “Internet”: 172.30.1.0/24 [Note that for the purposes of this case study, we are treating the 172.30.1.0/24 subnet as “the Internet.” In real life, this is a reserved nonroutable IP address space.]

Evidence: Investigators provide you with a packet capture from Ann’s home network, “evidence-packet-analysis.pcap.” They also inform you that in the course of their monitoring, they have found that Ann’s laptop has the MAC address 00:21:70:4D:4F:AE.

4.6.1 Analysis: Protocol Summary

There are many possible ways to approach this investigation. Let’s begin by taking a quick high-level look at the protocols contained in this packet capture. Using Wireshark, we open the packet capture and go to Statistics→Protocol Hierarchy. As you can see in Figure 4-35, 100% of the traffic in the evidence packet capture is Ethernet (at Layer 2) and IP at Layer 3.

Figure 4-35 This screenshot provides a high-level look at the protocols contained in this packet capture, using Wireshark’s Protocol Hierarchy function.

Note the presence of “Bootstrap Protocol,” which is used to transfer DHCP requests and responses. This might allow us to easily link the given MAC address to an assigned IP address and other network configuration information. Since we are looking for communications protocols, it is also worth pointing out that the most voluminous higher-layer protocols used in the packet capture are DNS (13.19% of the packet capture), SMTP (10.82% of the packet capture), and IMAP (11.70% of the packet capture). These three protocols are commonly seen together when email is exchanged over a network.

4.6.2 DHCP Traffic

Now let’s examine the “Bootstrap Protocol” traffic. Using the Wireshark Display Filter “eth.addr == 00:21:70:4d:4f:ae and bootp,” we can narrow down the packet capture so that we are only viewing “Bootstrap Protocol” traffic that relates to the MAC address “00:21:70:4d:4f:ae,” which we refer to as “Ann’s computer” based on the given information.

As you can see in Figure 4-36, Wireshark automatically looks up the registered OUI, “00:21:70,” and displays the corresponding manufacturer, Dell. (We can verify this using the IEEE’s published documentation.) The first message shown is a broadcast DHCP Request from Ann’s computer. In this message, we can see that the Requested IP Address is 192.168.30.108, which is likely the address that Ann’s computer was assigned the last time it was on a network. The hostname is listed as “ann-laptop”—nice evidence to corroborate our theory that this system belongs to Ann.

Figure 4-36 A DHCP Request packet from Ann’s computer. Notice that Wireshark automatically looks up the registered OUI, “00:21:70,” and displays the corresponding manufacturer, Dell.

After three DHCP Requests, we see a DHCP ACK from 192.168.30.10, as shown in Figure 4-37. Based on the options listed in the DHCP ACK, we can see that the client 00:21:70:4d:4f:ae has indeed been assigned the IP address 192.168.30.108. The DHCP Server Identifier is 192.168.30.10 (which makes sense, given that the DHCP ACK packet originated from 192.168.30.10). The DNS server is listed as 10.30.30.20, so unless Ann’s computer has been manually configured to use a different DNS server, we can expect to see her system sending DNS requests to 10.30.30.20. The “Router” is listed as 192.168.30.10, and the Subnet Mask is 255.255.255.0, so again, unless Ann’s computer has been manually configured to use a different gateway, her system will send traffic destined for outside the local subnet to 192.168.30.10. The IP Address Lease Time is set to 1 hour. According to RFC we should see a DHCP renewal request after half the lease time has expired, as corroborated by the explicit Renewal Time Value of 30 minutes.

Figure 4-37 A DHCP ACK packet from the DHCP server to Ann’s computer. Based on the options listed in the DHCP ACK, we can see that the client 00:21:70:4d:4f:ae has indeed been assigned the IP address 192.168.30.108.

4.6.3 Keyword Search

A keyword search, or “dirty word” search, is often an efficient way to identify traffic of interest within a packet capture. In this case, since we are looking for communications relating to a known suspect, we might start by searching the packet capture for references to her name, “Ann Dercover.” (We could have chosen other keywords to begin, such as simply “Ann,” or other known aliases, but short keywords often turn up a lot of extra matches that are not what the forensic investigator intends. It is generally more effective to begin with longer, unique keywords and to execute additional searches as needed based on the results.) The “ngrep” example below illustrates one way to search for a keyword within a packet capture.

$ ngrep "Ann Dercover" -N -t -q -I evidence-packet-analysis.pcap
input: evidence-packet-analysis.pcap
match: Ann Dercover

T(6) 2011/05/17 13:33:07.203874 192.168.30.108:1684 -> 64.12.168.40:587 [A]
  Message-ID: <00ab01cc14c9$227de600$6b1ea8c0@annlaptop>..From: "Ann Dercover
  " <[email protected]>..To: <[email protected]>..Subject: need a favor..D
  ate: Tue, 17 May 2011 13:32:17 -0600..MIME-Version: 1.0..Content-Type: mult
  ipart/alternative;...boundary="----=_NextPart_000_00A8_01CC1496.D700DE30"..
  X-Priority: 3..X-MSMail-Priority: Normal..X-Mailer: Microsoft Outlook Expre
  ss 6.00.2900.2180..X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
  ....This is a multi-part message in MIME format.....------=_NextPart_000_00
  A8_01CC1496.D700DE30..Content-Type: text/plain;...charset="iso-8859-1"..Con
  tent-Transfer-Encoding: quoted-printable....Hey, can you hook me up quick w
  ith that fake passport you were taking =..about? - Ann....------=_NextPart_
  000_00A8_01CC1496.D700DE30..Content-Type: text/html;...charset="iso-8859-1"
  ..Content-Transfer-Encoding: quoted-printable....<!DOCTYPE HTML PUBLIC "-//
  W3C//DTD HTML 4.0 Transitional//EN">..<HTML><HEAD>..<META http-equiv=3DCont
  ent-Type content=3D"text/html; =..charset=3Diso-8859-1">..<META content=3D"
  MSHTML 6.00.2900.2853" name=3DGENERATOR>..<STYLE></STYLE>..</HEAD>..<BODY b
  gColor=3D#ffffff>..<DIV><FONT face=3DArial size=3D2>Hey, can you hook me up
   quick with that =..fake=20..passport you were taking about? - Ann</FONT></
  DIV>..<DIV><FONT face=3DArial size=3D2></FONT> </DIV></BODY></HTML>...
  .------=_NextPart_000_00A8_01C

T(6) 2011/05/17 13:33:08.648555 192.168.30.108:1685 -> 205.188.58.10:143 [AP]
  From: "Ann Dercover" <[email protected]>..To: <[email protected]>..Subje
  ct: need a favor..Date: Tue, 17 May 2011 13:32:17 -0600..MIME-Version: 1.0.
  .Content-Type: multipart/alternative;...boundary="----=_NextPart_000_00A8_0
  1CC1496.D700DE30"..X-Priority: 3..X-MSMail-Priority: Normal..X-Mailer: Micr
  osoft Outlook Express 6.00.2900.2180..X-MimeOLE: Produced By Microsoft Mime
  OLE V6.00.2900.2180....This is a multi-part message in MIME format.....----
  --=_NextPart_000_00A8_01CC1496.D700DE30..Content-Type: text/plain;...charse
  t="iso-8859-1"..Content-Transfer-Encoding: quoted-printable....Hey, can you
   hook me up quick with that fake passport you were taking =..about? - Ann..
  ..------=_NextPart_000_00A8_01CC1496.D700DE30..Content-Type: text/html;...c
  harset="iso-8859-1"..Content-Transfer-Encoding: quoted-printable....<!DOCTY
  PE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">..<HTML><HEAD>..<MET
   http-equiv=3DContent-Type content=3D"text/html; =..charset=3Diso-8859-1">
  ..<META content=3D"MSHTML 6.00.2900.2853" name=3DGENERATOR>..<STYLE></STYLE
  >..</HEAD>..<BODY bgColor=3D#ffffff>..<DIV><FONT face=3DArial size=3D2>Hey,
   can you hook me up quick with that =..fake=20..passport you were taking ab
  out? - Ann</FONT></DIV>..<DIV><FONT face=3DArial size=3D2></FONT> </DI
  V></BODY></HTML>....------=_NextPart_000_00A8_01CC1496.D700DE30--..

T(6) 2011/05/17 13:33:21.259662 205.188.58.10:143 -> 192.168.30.108:1686 [AP]
  * 2 FETCH (UID 272 INTERNALDATE "17-May-2011 15:32:17 -0400" RFC822.SIZE 13
  42 ENVELOPE ("Tue, 17 May 2011 13:32:17 -0600" "need a favor" (("Ann Dercov
  er" NIL "sneakyg33ky" "aol.com")) (("Ann Dercover" NIL "sneakyg33ky" "aol.c
  om")) (("Ann Dercover" NIL "sneakyg33ky" "aol.com")) ((NIL NIL "inter0pt1c"
   "aol.com")) NIL NIL NIL NIL) BODY[HEADER.FIELDS (REFERENCES X-REF X-PRIORI
  TY X-MSMAIL-PRIORITY X-MSOESREC NEWSGROUPS)] {44}..X-Priority: 3..X-MSMail-
  Priority: Normal.... FLAGS ($Submitted XAOL-SENT Seen))..p8sr OK UID FETCH
   completed..
T(6) 2011/05/17 13:34:16.481132 192.168.30.108:1687 -> 64.12.168.40:587 [A]
  Message-ID: <00b701cc14c9$4bc95710$6b1ea8c0@annlaptop>..From: "Ann Dercover
  " <[email protected]>..To: <[email protected]>..Subject: lunch next w
  eek..Date: Tue, 17 May 2011 13:33:26 -0600..MIME-Version: 1.0..Content-Type
  : multipart/alternative;...boundary="----=_NextPart_000_00B4_01CC1497.004EC
  040"..X-Priority: 3..X-MSMail-Priority: Normal..X-Mailer: Microsoft Outlook
   Express 6.00.2900.2180..X-MimeOLE: Produced By Microsoft MimeOLE V6.00.290
  0.2180....This is a multi-part message in MIME format.....------=_NextPart_
  000_00B4_01CC1497.004EC040..Content-Type: text/plain;...charset="iso-8859-1
  "..Content-Transfer-Encoding: quoted-printable....Sorry-- I can't do lunch
  next week after all. Heading out of town. =..Another time!....-Ann..------=
  _NextPart_000_00B4_01CC1497.004EC040..Content-Type: text/html;...charset="i
  so-8859-1"..Content-Transfer-Encoding: quoted-printable....<!DOCTYPE HTML P
  UBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">..<HTML><HEAD>..<META http-eq
  uiv=3DContent-Type content=3D"text/html; =..charset=3Diso-8859-1">..<META c
  ontent=3D"MSHTML 6.00.2900.2853" name=3DGENERATOR>..<STYLE></STYLE>..</HEAD
  >..<BODY bgColor=3D#ffffff>..<DIV><FONT face=3DArial size=3D2>Sorry-- I can
  't do lunch next week =..after all.=20..Heading out of town. Another time!<
  /FONT></DIV>..<DIV><FONT face=3DArial size=3D2></FONT> </DIV>..<DIV><F
  ONT face=3DArial size=3D2>-Ann

T(6) 2011/05/17 13:34:18.050493 192.168.30.108:1688 -> 205.188.58.10:143 [A]
  From: "Ann Dercover" <[email protected]>..To: <[email protected]>..Su
  bject: lunch next week..Date: Tue, 17 May 2011 13:33:26 -0600..MIME-Version
  : 1.0..Content-Type: multipart/alternative;...boundary="----=_NextPart_000_
  00B4_01CC1497.004EC040"..X-Priority: 3..X-MSMail-Priority: Normal..X-Mailer
  : Microsoft Outlook Express 6.00.2900.2180..X-MimeOLE: Produced By Microsof
  t MimeOLE V6.00.2900.2180....This is a multi-part message in MIME format...
  ..------=_NextPart_000_00B4_01CC1497.004EC040..Content-Type: text/plain;...
  charset="iso-8859-1"..Content-Transfer-Encoding: quoted-printable....Sorry-
  - I can't do lunch next week after all. Heading out of town. =..Another tim
  e!....-Ann..------=_NextPart_000_00B4_01CC1497.004EC040..Content-Type: text
  /html;...charset="iso-8859-1"..Content-Transfer-Encoding: quoted-printable.
  ...<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">..<HTML><H
  EAD>..<META http-equiv=3DContent-Type content=3D"text/html; =..charset=3Dis
  o-8859-1">..<META content=3D"MSHTML 6.00.2900.2853" name=3DGENERATOR>..<STY
  LE></STYLE>..</HEAD>..<BODY bgColor=3D#ffffff>..<DIV><FONT face=3DArial siz
  e=3D2>Sorry-- I can't do lunch next week =..after all.=20..Heading out of t
  own. Another time!</FONT></DIV>..<DIV><FONT face=3DArial size=3D2></FONT>&n
  bsp;</DIV>..<DIV><FONT face=3DArial size=3D2>-Ann</FONT></DIV></BODY></HTML
  >....------=_NextPart_000_00B4

T(6) 2011/05/17 13:35:16.962873 192.168.30.108:1689 -> 64.12.168.40:587 [A]
  Message-ID: <00bc01cc14c9$6fd1bc60$6b1ea8c0@annlaptop>..From: "Ann Dercover
  " <[email protected]>..To: <[email protected]>..Subject: rendezvous..
  Date: Tue, 17 May 2011 13:34:26 -0600..MIME-Version: 1.0..Content-Type: mul
  tipart/mixed;...boundary="----=_NextPart_000_00B8_01CC1497.244B3EB0"..X-Pri
  ority: 3..X-MSMail-Priority: Normal..X-Mailer: Microsoft Outlook Express 6.
  00.2900.2180..X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180....T
  his is a multi-part message in MIME format.....------=_NextPart_000_00B8_01
  CC1497.244B3EB0..Content-Type: multipart/alternative;...boundary="----=_Nex
  tPart_001_00B9_01CC1497.244B3EB0"......------=_NextPart_001_00B9_01CC1497.2
  44B3EB0..Content-Type: text/plain;...charset="iso-8859-1"..Content-Transfer
  -Encoding: quoted-printable....Hi sweetheart! Bring your fake passport and
  a bathing suit. Address =..attached. love, Ann..------=_NextPart_001_00B9_0
  1CC1497.244B3EB0..Content-Type: text/html;...charset="iso-8859-1"..Content-
  Transfer-Encoding: quoted-printable....<!DOCTYPE HTML PUBLIC "-//W3C//DTD H
  TML 4.0 Transitional//EN">..<HTML><HEAD>..<META http-equiv=3DContent-Type c
  ontent=3D"text/html; =..charset=3Diso-8859-1">..<META content=3D"MSHTML 6.0
  0.2900.2853" name=3DGENERATOR>..<STYLE></STYLE>..</HEAD>..<BODY bgColor=3D#
  ffffff>..<DIV><FONT face=3DArial size=3D2>Hi sweetheart! Bring your fake pa
  ssport =..and a=20..bathing su

T(6) 2011/05/17 13:35:24.092584 192.168.30.108:1690 -> 205.188.58.10:143 [A]
  From: "Ann Dercover" <[email protected]>..To: <[email protected]>..Su
  bject: rendezvous..Date: Tue, 17 May 2011 13:34:26 -0600..MIME-Version: 1.0
  ..Content-Type: multipart/mixed;...boundary="----=_NextPart_000_00B8_01CC14
  97.244B3EB0"..X-Priority: 3..X-MSMail-Priority: Normal..X-Mailer: Microsoft
   Outlook Express 6.00.2900.2180..X-MimeOLE: Produced By Microsoft MimeOLE V
  6.00.2900.2180....This is a multi-part message in MIME format.....------=_N
  extPart_000_00B8_01CC1497.244B3EB0..Content-Type: multipart/alternative;...
  boundary="----=_NextPart_001_00B9_01CC1497.244B3EB0"......------=_NextPart_
  001_00B9_01CC1497.244B3EB0..Content-Type: text/plain;...charset="iso-8859-1
  "..Content-Transfer-Encoding: quoted-printable....Hi sweetheart! Bring your
   fake passport and a bathing suit. Address =..attached. love, Ann..------=_
  NextPart_001_00B9_01CC1497.244B3EB0..Content-Type: text/html;...charset="is
  o-8859-1"..Content-Transfer-Encoding: quoted-printable....<!DOCTYPE HTML PU
  BLIC "-//W3C//DTD HTML 4.0 Transitional//EN">..<HTML><HEAD>..<META http-equ
  iv=3DContent-Type content=3D"text/html; =..charset=3Diso-8859-1">..<META co
  ntent=3D"MSHTML 6.00.2900.2853" name=3DGENERATOR>..<STYLE></STYLE>..</HEAD>
  ..<BODY bgColor=3D#ffffff>..<DIV><FONT face=3DArial size=3D2>Hi sweetheart!
   Bring your fake passport =..and a=20..bathing suit. Address attached. love
  , Ann</FONT></DIV></BODY></HTM

Well that looks interesting! We see from the output above that our keyword search for “Ann Dercover” matched seven packets. Three packets were part of a conversation with 64.12.168.40 over TCP port 587. TCP 587 is assigned by IANA as the “submission” port, commonly used by mail clients to submit messages via SMTP to MSAs.⁶³ Four packets were part of a conversation with 205.188.58.10 over TCP port 143. TCP 143 is assigned by IANA to the Internet Message Access Protocol (IMAP) service, which is used for retrieving email.

63. R. Gellens and J. Klensin, “Message Submission,” IETF, December 1998, http://www.ietf.org/rfc/rfc2476.txt.

Each of the packets matched appears to contain email addresses and content.

4.6.4 SMTP Analysis—Wireshark

Next, let’s examine the SMTP traffic of interest in Wireshark. Beginning with the first packet at 2011/05/17 13:33:07.203874, we can use Wireshark’s “Follow TCP Stream” feature to isolate only the packets that were part of the same TCP flow. Conveniently, Wireshark’s “Follow TCP Stream” feature automatically extracts the Layer 7 packet contents from each packet in the stream, reassembles it, and displays it in an easy-to-read format, as shown in Figure 4-38.

Figure 4-38 An SMTP conversation containing the first packet matched as part of our ngrep search. Wireshark’s “Follow TCP Stream” feature automatically extracts the Layer 7 packet contents from each packet in the stream, reassembles it, and displays it in an easy-to-read format.

In the TCP stream shown in Figure 4-38, we see an authenticated SMTP conversation between a mail client and an MSA. Here we see that Ann authenticated against her SMTP server using plain text:

250-AUTH=XAOL-UAS-MB LOGIN PLAIN

That means her credentials are only Base-64-encoded, not encrypted. Let’s decode them now.

Here’s Ann’s username:

$ echo "c25lYWt5ZzMza3k=" | base64 -d
sneakyg33ky

Here’s her password:

$ echo "czAwcGVyczNrcjF0" | base64 -d
s00pers3kr1t

The “MAIL FROM” and “RCPT TO” SMTP messages shown below indicate that the email sender was [email protected], and the recipient was [email protected]. Note that these values can differ from the “From” and “To” headers that are actually displayed in the body of the message. The “From” and “To” headers do not actually affect where a MTA routes an email message.

MAIL FROM: <[email protected]>
250 2.1.0 Ok
RCPT TO: <[email protected]>

Finally, we see the body of the email, as shown after the “DATA” SMTP message. In this section, we can infer that the user [email protected] is representing herself as “Ann Dercover.” The “Subject” of this email was “need a favor,” and the message contained in the body was short but intriguing: “Hey, can you hook me up quick with that fake passport you were talking = about? - Ann.” Could our suspect have been planning an unexpected, illicit trip abroad?

DATA
354 End data with <CR><LF>.<CR><LF>
Message-ID: <00ab01cc14c9$227de600$6b1ea8c0@annlaptop>
From: "Ann Dercover" <[email protected]>
To: <[email protected]>
Subject: need a favor
Date: Tue, 17 May 2011 13:32:17 -0600
MIME-Version: 1.0
Content-Type: multipart/alternative;
.boundary="----=_NextPart_000_00A8_01CC1496.D700DE30"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

This is a multi-part message in MIME format.

------=_NextPart_000_00A8_01CC1496.D700DE30
Content-Type: text/plain;
.charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hey, can you hook me up quick with that fake passport you were talking =
about? - Ann
...

In a similar fashion, we can use Wireshark’s “Follow TCP Stream” function to examine the TCP stream containing the second SMTP packet we matched using ngrep, at 2011/05/17 13:34:16.481132. The results are shown in Figure 4-39. In this conversation, we see that the same user, [email protected], sent an email to [email protected], with the “Subject” “lunch next week.” In this email, the author writes:

Sorry-- I can't do lunch next week after all. Heading out of town. =
Another time!

-Ann

Figure 4-39 An SMTP conversation containing the second packet matched as part of our ngrep search.

Apparently the sender’s plans have changed!

4.6.5 SMTP Analysis—TCPFlow

Finally, let’s examine the third matching SMTP packet from our ngrep results, created at 2011/05/17 13:35:16.962873. We could do this in the same way as the others, using Wireshark’s “Follow TCP Stream,” but instead let’s practice using command-line tools (which typically scale better).

Recall from our ngrep output that the matched TCP packet was sent from 192.168.30.108:1689 to 64.12.168.40:587 (the conversation that contained this packet was, of course, bidirectional). In the output below, we use tcpflow with an appropriate BPF filter to list all TCP flows between ports 1689 and 587. As you can see, tcpflow reconstructed two unidirectional data streams, one from 192.168.30.108:1689 to 64.12.168.40, and the other in the opposite direction.

$ tcpflow -v -r evidence-packet-analysis.pcap  'port 1689 and port 587'
tcpflow[333]: tcpflow version 0.21 by Jeremy Elson <[email protected]>
tcpflow[333]: looking for handler for datalink type 1 for interface evidence-
    packet-analysis.pcap
tcpflow[333]: found max FDs to be 16 using OPEN_MAX
tcpflow[333]: 192.168.030.108.01689-064.012.168.040.00587: new flow
tcpflow[333]: 064.012.168.040.00587-192.168.030.108.01689: new flow
tcpflow[333]: 064.012.168.040.00587-192.168.030.108.01689: opening new output
     file
tcpflow[333]: 192.168.030.108.01689-064.012.168.040.00587: opening new output
    file

Let’s examine the reconstructed content sent from 192.168.30.108 (Ann’s computer) to the remote server 64.12.168.40. If the contents are indeed SMTP data as we expect, then we will be most interested in the outbound content from Ann’s computer to the MSA, which would contain the SMTP “MAIL FROM” and “RCPT TO” headers and authentication data, as well as the body of the email.

Viewing the outbound content reconstructed by tcpflow, we see the following:

$ less 192.168.030.108.01689-064.012.168.040.00587
EHLO annlaptop
AUTH LOGIN
c25lYWt5ZzMza3k=
czAwcGVyczNrcjF0
MAIL FROM: <[email protected]>
RCPT TO: <[email protected]>
DATA
Message-ID: <00bc01cc14c9$6fd1bc60$6b1ea8c0@annlaptop>
From: "Ann Dercover" <[email protected]>
To: <[email protected]>
Subject: rendezvous
Date: Tue, 17 May 2011 13:34:26 -0600
MIME-Version: 1.0
Content-Type: multipart/mixed;
        boundary="----=_NextPart_000_00B8_01CC1497.244B3EB0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

This is a multi-part message in MIME format.

------=_NextPart_000_00B8_01CC1497.244B3EB0
Content-Type: multipart/alternative;
        boundary="----=_NextPart_001_00B9_01CC1497.244B3EB0"

------=_NextPart_001_00B9_01CC1497.244B3EB0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann
------=_NextPart_001_00B9_01CC1497.244B3EB0
Content-Type: text/html;
...

In the SMTP conversation shown above, we see the outbound portion of a communication with a remote SMTP server. The “MAIL FROM” and “RCPT TO” SMTP messages shown below indicate that the email sender was [email protected], and the recipient was [email protected]. In the body of the email, we can see that the sender has represented herself once again as “Ann Dercover.” The “Subject” of this email was “rendezvous.” As we saw above, the message in the body of the email was as follows:

Hi sweetheart! Bring your fake passport and a bathing suit. Address =
attached. love, Ann

As the message suggests, there is an attachment to the email. Scrolling further in the tcpflow output, we see the following section:

------=_NextPart_000_00B8_01CC1497.244B3EB0
Content-Type: application/octet-stream;
name="secretrendezvous.docx"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="secretrendezvous.docx"

UEsDBBQABgAIAAAAIQCht/xGcgEAAFIFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
...

Based on the line, “Content-Transfer-Encoding: base64,” it appears that the attachment to the message is Base64-encoded, and the filename of the attachment is “secretrendezvous.docx.”

4.6.6 SMTP Analysis—Attachment File Carving

Now, let’s carve the attachment out of the message body. To do this manually, open the stream dump in the Bless hex editor:

$ bless 192.168.030.108.01689-064.012.168.040.00587

Cut the SMTP and MIME protocol information off the top and bottom, as shown in Figure 4-40. The email contains multiple parts, as indicated by the mail headers:

MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_00B8_01CC1497.244B3EB0"

Figure 4-40 This screenshot shows how we carve Ann’s email attachment out of the “Ann Skips Bail” packet capture. The top image (a) shows us cropping the top part of the attachment, and the bottom file (b) shows us cropping the end of the attachment.

The attachment of greatest interest is contained in the part labeled:

Immediately after these lines, there are TWO carriage-return/linefeeds (CRLF). (In hexadecimal, a carriage return is “0x0D” and a linefeed is “0x0A.” Please see Section 10.5.2.1 for additional discussion of file carving using CRLF sequences as markers.) At the end of the message part, we see another sequence of TWO CRLFs. To carve out the attachment, we begin carving immediately after the first set of CRLFs and end just before the second set.⁶⁴ Let’s save the carved file as “evidence-packet-analysis-smtp3-attachment.”

64. N. Borenstein and N. Freed, “MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies,” IETF, June 1992, http://www.ietf.org/rfc/rfc1341.txt.

The carved attachment contains some line breaks. We need to remove these in order to decode the Base64 encoding. For this purpose, we’ll use the “fromdos” command by Christopher Heng, which is distributed as part of the “tofrodos” Debian package:

$ fromdos -b evidence-packet-analysis-smtp3-attachment

Now, we can decode the file and recreate the original attachment:

$ base64 -d evidence-packet-analysis-smtp3-attachment > secretrendezvous.docx

Check the file type:

$ file secretrendezvous.docx
secretrendezvous.docx: Zip archive data, at least v2.0 to extract

This makes sense, since .docx files are zip archive data. Let’s take the cryptographic checksums:

$ md5sum secretrendezvous.docx
9049b6d9e26fe878680eb3f28d72d1d2 secretrendezvous.docx

$ sha256sum secretrendezvous.docx
24601c174587be4ddfff0b9e6d598618c6abfcfadb16f7dd6dbd7a24aed6fec8
secretrendezvous.docx

4.6.7 Viewing the Attachment

Naturally, we want to view the contents of the attachment, which has been labeled with a .docx extension. Always remember that mainstream document-viewing programs often modify the files even when only used to view the file. When viewing a file, make sure that you are working on a copy of the evidence, and not the evidence itself. You can also use filesystem permissions, hardware write blockers, and other mechanisms to reduce the risk that the application will modify the source file.

In this case, we open the attachment using OpenOffice, as shown in Figure 4-41. At the top of the document is the text, “Meet me at the fountain near the rendezvous point. Address below. I’m bringing all the cash.” This is followed by an image of what appears to be a map with an address:

Playa del Carmen
1 Av Constituyentes 1 Calle 10 x la 5ta
Avenida
Playa del Carmen, 77780, Mexico
01 984 873 4000

Taking things one step further, we can carve out the image file embedded in the document. This can be useful for correlating with evidence from other sources, such as hard drive analysis. Carving embedded images is easy with .docx files. First, let’s unzip the file:

$ unzip secretrendezvous-copy.docx
Archive:  secretrendezvous-copy.docx
  inflating: [Content_Types].xml
  inflating: _rels/.rels
  inflating: word/_rels/document.xml.rels
  inflating: word/document.xml
extracting: word/media/image1.png
  inflating: word/theme/theme1.xml
  inflating: word/settings.xml
  inflating: word/webSettings.xml
  inflating: docProps/core.xml
  inflating: word/styles.xml
  inflating: word/fontTable.xml
  inflating: docProps/app.xml

As you can see, there appears to be only one image file in this document, which is labeled with the extension “.png”. Take the cryptographic checksum of the embedded image file:

$ md5sum word/media/image1.png
aadeace50997b1ba24b09ac2ef1940b7  word/media/image1.png

$ sha256sum word/media/image1.png
7dc8b5a245a0ba9e7a456e718316c4fb0ab5ae59d34833f415a429a5b8a6b437  word/media/
    image1.png

Open a copy of the image to confirm that it is the one we are looking for, as shown in Figure 4-42.

4.6.8 Finding Ann the Easy Way

That was a lot of work! Wouldn’t it be nice if there were a program that could analyze SMTP traffic and carve files out for us automatically?

4.6.8.1 NetworkMiner

As we’ve discussed, NetworkMiner is a great multipurpose tool for analyzing network traffic.⁶⁵ When the “Ann Skips Bail” (Puzzle #2) SMTP contest was released on ForensicsContest.com, Erik entered NetworkMiner as his submission. To solve the puzzle, he extended NetworkMiner to parse SMTP traffic, and also added a new “Messages” tab. This was a great example of how members of the network forensics community can step up and contribute tools to solve challenges that all of us face.

65. “NETRESEC NetworkMiner—The Network Forensics Analysis Tool.”

In Figure 4-43, you can see a screenshot of NetworkMiner in action. We simply dragged and dropped the evidence file (evidence-packet-analysis.pcap) into NetworkMiner, and it automatically interpreted the results. The first tab, “Hosts,” lists all of the IP addresses of the hosts that transmitted IP traffic in the packet capture.

Figure 4-43 Here is a screenshot of NetworkMiner in action. We simply dragged and dropped the evidence file into NetworkMiner, and it automatically interpreted the results.

The “Messages” tab, which Erik added specifially to solve this puzzle (lucky us!), is shown in Figure 4-44. As you can see, it automatically parses SMTP headers and displays them along with the body of each message.

Figure 4-44 Erik added the “Messages” tab for the “Ann Skips Bail” SMTP puzzle on ForensicsContest.com. Notice that NetworkMiner automatically parses headers and displays them along with the body of each SMTP message. It can also show other types of messages, such as IMs.

In NetworkMiner’s “Files” tab, you can see all the files that NetworkMiner carved out of the packet capture. In Figure 4-45 you can see that NetworkMiner identified the “secretrendezvous.docx” file, and by right-clicking on it, you can open it up and view the contents.

Figure 4-45 The “Files” tab displays files that NetworkMiner automatically carved out of the packet capture.

NetworkMiner even automatically recovers the client’s authentication credentials. As shown in Figure 4-46, It automatically decodes the Base64-encoded SMTP Authentication credentials and lists them in the “Credentials” tab.

Figure 4-46 NetworkMiner automatically extracts SMTP Authentication credentials, and even Base64 decodes them for you.

4.6.8.2 smtpdump and docxtract

In this example, we use “smtpdump” to automatically analyze the evidence file, print SMTP authentication information, extract the attachments, and print corresponding cryptographic checksums.

The command below tells smtpdump to analyze SMTP flow #3 in the packet capture (-f 3), extract attachments (-x), print the MD5sum (-m), and print authentication data (-A).

$ smtpdump -f 3 -x -m -A -r evidence-packet-analysis.pcap
[3] 192.168.30.108:1689 => 64.12.168.40:587
  === Authentication infos ===
    Found LOGIN method
    Username: sneakyg33ky
    Password: s00pers3kr1t

  === Attachments infos ===

    Type: multipart/alternative
    Type: application/octet-stream
    Saving file to disk: secretrendezvous.docx

    File: secretrendezvous.docx (MD5: 0x9049b6d9e26fe878680eb3f28d72d1d2)

Subsequently, we can use “docxtract” to extract all images from the carved .docx attachment (-x -i), and print the corresponding cryptopgraphic checksum (-m).

$ docxtract -x -i -m secretrendezvous.docx
Extracting: image1.png (194124 bytes)
MD5: aadeace50997b1ba24b09ac2ef1940b7

4.6.8.3 findsmtpinfo.py

We can use “findsmtpinfo.py” to:

• Print SMTP authentication information

• Extract all messages from the packet capture

• Extract all attachments from the messages

• Print the MD5 sums for each of the attachments

• Extract the files embedded within the .docx file

• Print the MD5 sums for each of the embedded files

The “findsmtpinfo.py” tool produces these results automatically and creates a series of reports suitable for the appendix of a professional forensics report. This is a nice example of how a small, sharp tool can be used to extract evidence very efficiently, as shown below:

$ findsmtpinfo.py -p evidence-packet-analysis.pcap
Found SMTP Session data
----------------------------------------
Report: 192.168.030.108.01687-064.012.168.040.00587
----------------------------------------

Found SMTP Session data
SMTP AUTH Login: sneakyg33ky
SMTP AUTH Password: s00pers3kr1t
SMTP MAIL FROM: <[email protected]>
SMTP RCPT TO: <[email protected]>
Found email Messages
- Writing to file: ./report/messages
     /1/192.168.030.108.01687-064.012.168.040.00587.msg
- MD5 of msg: 2141fcb61af1fd3985f18c3ca2b985b2
   - Found Attachment
     - Writing to filename: ./report/messages/1/part-001.ksh
     - Type of Attachement: text/plain
     - MDS of Attachement: 4bd7b649e5f2b3fd18fd3e3cd40f1fae
   - Found Attachment
     - Writing to filename: ./report/messages/1/part-001.html
     - Type of Attachement: text/html
     - MDS of Attachement: 1cf5b25642406fa49b1912cccc93cde2
----------------------------------------
Report: 192.168.030.108.01688-205.188.058.010.00143
----------------------------------------
Found SMTP Session data
...

The “findsmtpinfo.py” tool even automatically unzips the attached .docx file, extracts each of the embedded files, and prints MD5sums for each, as shown below:

...
Found SMTP Session data
SMTP AUTH Login: sneakyg33ky
SMTP AUTH Password: s00pers3kr1t
SMTP MAIL FROM: <[email protected]>
SMTP RCPT TO: <[email protected]>
Found email Messages
- Writing to file: ./report/messages
     /2/192.168.030.108.01689-064.012.168.040.00587.msg
- MD5 of msg: 94d957fda87d9114016e65e5afc12cef
   - Found Attachment
     - Writing to filename: ./report/messages/2/part-001.ksh
     - Type of Attachement: text/plain
     - MDS of Attachement: ba2c98f65f3f678b6a71570adcf362f4
   - Found Attachment
     - Writing to filename: ./report/messages/2/part-001.html
     - Type of Attachement: text/html
     - MDS of Attachement: d07c3b721fed36a725c01e4827c1a563
   - Found Attachment
     - Writing to filename: ./report/messages/2/secretrendezvous.docx
     - Type of Attachement: application/octet-stream
     - MDS of Attachement: 9049b6d9e26fe878680eb3f28d72d1d2
       - ZIP Archive attachment extracting
         - Found file
           - Writing to filename: ./report/messages/2/secretrendezvous.docx.
               unzipped/[Content_Types].xml
           - Type of file: application/xml
           - MDS of File: 8af852cc1775236bac8e8495564a4ef2
         - Found file
           - Writing to filename: ./report/messages/2/secretrendezvous.docx.
               unzipped/_rels/.rels
           - Type of file: None
           - MDS of File: 77bf61733a633ea617a4db76ef769a4d
         - Found file
           - Writing to filename: ./report/messages/2/secretrendezvous.docx.
               unzipped/word/_rels/document.xml.rels
           - Type of file: None
           - MDS of File: 1445296e2c18e6f42da2f6c4455d6a6c
         - Found file
           - Writing to filename: ./report/messages/2/secretrendezvous.docx.
               unzipped/word/document.xml
           - Type of file: application/xml
           - MDS of File: ac21289076d00f81ec885509e27b6d2f
         - Found file
           - Writing to filename: ./report/messages/2/secretrendezvous.docx.
               unzipped/word/media/image1.png
           - Type of file: image/png
           - MDS of File: aadeace50997b1ba24b09ac2ef1940b7
...

4.6.9 Timeline

Based on our analysis so far, we can create a timeline of events. As always, this is simply a working hypothesis based on the available evidence, and can change with the introduction of new evidence.

The timeline below summarizes the events of interest that we have identified so far. Note that the times are those listed within the packet capture, as recorded by the sniffing system. (The dates and times listed in the bodies of the emails we retrieved under the “Date” header were set by the system under investigation, and should be considered less reliable than our packet capturing system.)

All times listed below occurred on May 17, 2011.

• 13:32:01.419886—Packet capture begins

• 13:32:03.166396—First DHCP Request from 00:21:70:4d:4f:ae (Ann’s computer)

• 13:32:03.167145—DHCP ACK from 192.168.30.10 to Ann’s computer, assigning 00:21:70:4d:4f:ae the IP address 192.168.1.108 with a 1-hour lease time.

• 13:33:05.834649–13:33:07.847758—First SMTP conversation. Email sent from Ann’s computer with sender [email protected] and recipient [email protected].

• 13:34:15.110657–13:34:17.204721—Second SMTP conversation. Email sent from Ann’s computer with sender [email protected] and recipient [email protected].

• 13:35:15.504697–13:35:23.263802—Third SMTP conversation. Email sent from Ann’s computer with sender [email protected] and recipient [email protected].

• 13:35:23.263802—Packet capture ends

4.6.10 Theory of the Case

Now that we have put together a timeline of events, let’s summarize our theory of the case. Again, this is a working hypothesis strongly supported by the evidence, references, and experience:

• Ann Dercover connected her laptop (“ann-laptop”) to the network on May 17, 2011, at 13:32:03. Her computer was probably manufactured by Dell.

• At 13:33:05, Ann sent email from her AOL account, [email protected], to [email protected], asking the recipient, “Hey, can you hook me up quick with that fake passport you were talking about?”

• At 13:34:15, Ann sent email from her AOL account, [email protected], to [email protected], informing the recipient, “Sorry—I can’t do lunch next week after all. Heading out of town. Another time!”

• At 13:35:15, Ann sent email from her AOL account, [email protected], to [email protected], with the message, “Hi sweetheart! Bring your fake passport and a bathing suit. Address attached. love, Ann.” The email had a .docx attachment that contained an address and a map.

4.6.11 Response to Challenge Questions

Now, let’s answer the investigative questions posed to us at the beginning of the case.

• Provide any online aliases or addresses and corresponding account credentials that may be used by the suspect under investigation.

Based on our SMTP analysis, there are indications that Ann Dercover uses the email address [email protected], and that her password is “s00pers3kr1t.”

• Who did Ann communicate with? Provide a list of email addresses and any other identifying information.

We have seen that [email protected] sent emails to the following recipients:

– [email protected]

• Extract any transcripts of Ann’s conversations and present them to investigators.

Here is a quick summary of Ann’s conversations, sent via SMTP:

– SMTP Message #1
Sender: [email protected]
Recipient: [email protected]
Date [beginning of SMTP conversation]: May 17, 2011 13:33:05
Subject: need a favor
Message [formatting removed]: Hey, can you hook me up quick with that fake passport you were talking about? - Ann
Attachments of interest: None

– SMTP Message #2
Sender: [email protected]
Recipient: [email protected]
Date [beginning of SMTP conversation]: May 17, 2011 13:34:15
Subject: lunch next week
Message [formatting removed]: Sorry—I can’t do lunch next week after all. Heading out of town. Another time! - Ann
Attachments of interest: None

– SMTP Message #3
Sender: [email protected]
Recipient: [email protected]
Date [beginning of SMTP conversation]: May 17, 2011 13:35:15
Subject: rendezvous
Message [formatting removed]: Hi sweetheart! Bring your fake passport and a bathing suit. Address attached. love, Ann
Attachments of interest: secretrendezvous.docx

• If Ann transferred or received any files of interest, recover them.

We recovered one Office Open XML Document (.docx) file, attached to Ann’s email to [email protected].

The MD5 checksum of the .docx file was:
9049b6d9e26fe878680eb3f28d72d1d2

The SHA256 checksum was:
24601c174587be4ddfff0b9e6d598618c6abfcfadb16f7dd6dbd7a24aed6fec8

The document began with the text, “Meet me at the fountain near the rendezvous point. Address below. I’m bringing all the cash.” This was followed by a PNG image of a map with an address.

• Are there any indications of Ann’s physical whereabouts? If so, provide supporting evidence.

The document that Ann sent to [email protected] indicates that she would like to meet him at the following address:

Playa del Carmen
1 Av Constituyentes 1 Calle 10 x la 5ta
Avenida
Playa del Carmen, 77780, Mexico
01 984 873 4000

Of course, there is no guarantee that Ann and/or the email recipient ever traveled to this location. Perhaps Ann was trying to throw us off her trail!

4.6.12 Next Steps

By analyzing a short packet capture from May 17, 2011, we’ve retrieved significant information about Ann’s communications, including an email address and password, which she likely uses for personal correspondence, the addresses of three correspondents, and an attachment that may contain clues to her physical whereabouts. What are our next steps? Here are some possibilities:

• Further Analysis of the Packet Capture Recall that this packet capture also contains IMAP and HTTP traffic. Each of these protocols may include additional communications. For example, we can analyze the IMAP traffic to recover more email correspondence. The HTTP traffic might contain web-based email account activity, or clues to Ann’s interests—perhaps even travel-related information such as searches for hotels, restaurants, or plane fares.

• Email Account Monitoring If there is sufficient evidence, and relevant laws and/or ISP policies allow, it may be possible to monitor activity relating to a suspect’s email account on an ongoing basis. This may include a history of connections from client IP addresses—very useful information if you are trying to track down the suspect’s physical location. It may also include message header information such as recipient, date, and time, or in some cases, the actual contents of the emails. Even if you have the suspect’s password, be absolutely sure to consult with legal counsel before using a suspect’s credentials to gain access to their account. In general, when possible, it is best to work with the ISP and obtain information through their standard channels.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. Packet Analysis

Create new playlist

Sign In

Sign Up

Chapter 4. Packet Analysis

4.1 Protocol Analysis

4.1.1 Where to Get Information on Protocols

4.1.1.1 IETF Request for Comments (RFC)

4.1.1.2 Other Standards Bodies

4.1.1.3 Vendors

4.1.1.4 Researchers

4.1.2 Protocol Analysis Tools

4.1.2.1 Packet Details Markup Language and Packet Summary Markup Language

4.1.2.2 Wireshark

4.1.2.3 tshark

4.1.3 Protocol Analysis Techniques

4.1.3.1 Protocol Identification

Search for common binary/hexadecimal/ASCII values that are typically associated with a specific protocol

Leverage information in the encapsulating protocol

Leverage the TCP/UDP port number, many of which are associated with standard default services

Analyze the function of the source or destination server (specified by IP address or hostname)

Test for the presence of recognizable protocol structures

4.1.3.2 Protocol Decoding

4.1.3.3 Exporting Fields

4.2 Packet Analysis

4.2.1 Packet Analysis Tools

4.2.1.1 Wireshark/tshark Display Filters

4.2.1.2 ngrep

4.2.1.3 Hex Editors

4.2.2 Packet Analysis Techniques

4.2.2.1 Pattern Matching

4.2.2.2 Parsing Protocol Fields

4.2.2.3 Packet Filtering

Filtering with BPF

Wireshark Display Filters

4.3 Flow Analysis

4.3.1 Flow Analysis Tools

4.3.1.1 Wireshark: Follow TCP Stream

4.3.1.2 Conversations in Wireshark and tshark

4.3.1.3 tcpflow

4.3.1.4 pcapcat

4.3.1.5 tcpxtract

4.3.2 Flow Analysis Techniques

4.3.2.1 List Conversations

4.3.2.2 List TCP Flows

4.3.2.3 Export TCP Flow

4.3.2.4 File and Data Carving

4.4 Higher-Layer Traffic Analysis

4.4.1 A Few Common Higher-Layer Protocols

4.4.1.1 Hypertext Transfer Protocol (HTTP)

4.4.1.2 Dynamic Host Configuration Protocol (DHCP)

MAC Addresses

DHCP

4.4.1.3 Simple Mail Transfer Protocol (SMTP)

Basic Use

Basic commands

A simple transcript

SMTP Authentication

4.4.1.4 Domain Name System (DNS)

DNS Recursion

DNS Queries

4.4.2 Higher-Layer Analysis Tools

4.4.2.1 oftcat

4.4.2.2 smtpdump

4.4.2.3 findsmtpinfo.py

4.4.2.4 NetworkMiner

4.4.3 Higher-Layer Analysis Techniques

4.4.3.1 Small, Specialized Tools

4.4.3.2 Multipurpose Tool

4.5 Conclusion

4.6 Case Study: Ann’s Rendezvous

4.6.1 Analysis: Protocol Summary

4.6.2 DHCP Traffic

4.6.3 Keyword Search

4.6.4 SMTP Analysis—Wireshark

4.6.5 SMTP Analysis—TCPFlow

4.6.6 SMTP Analysis—Attachment File Carving

4.6.7 Viewing the Attachment

4.6.8 Finding Ann the Easy Way

4.6.8.1 NetworkMiner

Table of Contents for
Chapter 4. Packet Analysis