Chapter 11. Document Forensics

In This Chapter

  • Finding data about data

  • Finding the CAM

  • Where documents are found

What a document says about the person who created it is almost as important as what the document's intended purpose appears to be. You have a document that has smoking gun evidence, but how do you really know that the suspect wrote the document and when it was written? Just extracting a document and intending to use it as evidence of a crime aren't enough to do a complete analysis. You must link the evidence to the suspect in some way, and that's where document forensics and the use of metadata come into play.

Metadata is simply data about data. Because the computer field is huge, metadata is necessarily different for many individual computer fields or domains. For example, document metadata is much different from Web page metadata, but they both describe in some form the characteristics of the data they represent. For example, one piece of metadata for a digital photo is the time stamp indicating when the photo was taken.

When you're doing an investigation, one of the classic questions any television investigator would ask is, "Where were you on January 2, 2008, and can you prove it?" Computer forensics and, by association, document forensics have the same goal as your regular physical forensic counterpart — computer forensics wants the truth, but needs hard digital evidence to prove that truth. The key with computers is not only knowing the right question to ask, but how a computer answers the question. Although a computer-generated document or file cannot literally speak, what the document or file has to say about who, what, when, where, why, and how is often much more credible than any human witness testimony. This chapter is all about finding the clues that a document might be hiding that can tell you whether the human and computer versions of the story are the same.

Finding Evidential Material in Documents: Metadata

Documents are arguably one of the most important areas where metadata is found. The rapidly growing field of e-discovery has figuratively found a goldmine with metadata and the use of it to win court cases. It is no understatement to say that in addition to attorneys wanting to find the memorandum, they want to also find the metadata to prove who wrote the memorandum and when.

The following list describes the basic types of metadata found in a typical word processing document:

  • Author: Regardless of whether this information comes from the operating system or from the installation of the word processing software, a name is embedded as part of the document for all to see.

  • Organization: This information is usually acquired by the word processing software from the same sources as the author information. If information is listed during the installation of the operating system or the word processing software, chances are good that it's embedded in the document.

  • Revisions: As part of creating the revision log, the previous authors can be listed as well as the path where the file was stored.

  • Previous authors: Documents often have a history of users who worked on the document.

  • Template: This piece of data shows which template is embedded within the document.

  • Computer name: This name connects the document with the computer on which it was typed.

  • Hard disk: This data often includes the hard drive name and the path where the file was located.

  • Network server: An extension of the hard drive information — if a file is stored on a network server, the metadata reflects the network path name.

  • Time: This type of metadata often indicates how long the document was open for editing.

  • Deleted text: Some metadata logs text that has been deleted.

  • Visual basic objects: Objects used and created by Visual Basic are often part of a macro execution and are saved and hidden from the user.

  • Time stamps: This type of data is usually based on the operating system time stamp and covers the created, accessed, and modified time stamps (CAM). We discuss CAM facts in more detail later in this chapter, in the section "Honing In on CAM (Create, Access, Modify) Facts."

  • Printed: Metadata often tells you when the document was last printed.

Although Microsoft is the focus of much of the metadata extraction, metadata can be found in almost all application software. You can find metadata in Adobe PDF files, multimedia files, Web pages, databases, and even geographic software applications. The type and amount of metadata you find varies depending on the application and on how thorough the user is at either entering his life story metadata on a form (by filling in every empty box with personal information) which gives you the first link between the document and the user.

As attorneys request more and more data as part of the e-discovery process, organizations have begun to clean their documents of any metadata that could possibly prove embarrassing. The issue has reached such proportions that Microsoft has published methods to remove metadata from its documents for organizations that feel they need to wipe their documents clean.

Metadata located within the document falls into two distinct areas: viewable by the user and not viewable by the user. If you can't view information, you have to extract it.

Viewing metadata

This list describes the information you can find when you're looking at user viewable metadata:

  • Basic user information: A typical Microsoft Word document populates various fields in the Properties section that generally show basic user information as it relates to the document. Figure 11-1 shows general information about a document.

  • Document statistics: Statistical information that's often useful to determine timelines and corroborate whereabouts is also often found in the Properties dialog box depending on which tab (General, Summary, Statistics) you choose. Figure 11-2 shows the statistics of the document itself, such as how many pages or paragraphs the document has, although you're often seeking the other information, such as time stamps and path information.

The Microsoft Word Properties dialog box.

Figure 11.1. The Microsoft Word Properties dialog box.

The Microsoft Word Properties dialog box with file statistics.

Figure 11.2. The Microsoft Word Properties dialog box with file statistics.

Extracting metadata

When you're extracting metadata, you have to use special software tools, such as Metadata Analyzer (www.smartpctools.com) or iScrub (www.esqinc.com), to extract the data that you can't easily see. These tools can analyze the document at a binary level for revision logs, Visual Basic objects, or deleted text that might still be present in the document. Figure 11-3 shows the information that Metadata Analyzer can extract. Esquire's iScrub is a bit more powerful and can even find drafting history to see changes made to a document.

The Metadata Analyzer main screen.

Figure 11.3. The Metadata Analyzer main screen.

Note

The highly publicized arrest of Dennis Rader, also known as the BTK Killer, is a classic use of metadata to find evidence or information in documents. Beginning in 1974, the self-nicknamed serial killer began taunting police and the media with a series of letters detailing his murders. Over the course of 30 years and numerous letters, Rader gave the police their first major break in the case when he mailed a purple floppy disk along with several other articles to a local television station in 2005. Unbeknownst to him, a document he had deleted had the name Dennis embedded in the metadata, and in another area of metadata, the church where he was president of the congregation council was listed. The police quickly put together the pieces of this circumstantial evidence to gather hard DNA evidence linking Rader to several BTK murders.

Honing In on CAM (Create, Access, Modify) Facts

The use of the create, access, and modify (CAM) time stamps often helps to track a document and determine timelines. The location of CAM information is logged in different areas, such as directory entries or inodes, depending on the operating system. The importance of creating a timeline of a suspect's whereabouts, the file history, or even tracking a file across a network is possible by using CAM metadata and is often part of the circumstantial evidence that helps support other aspects of a case.

You need to understand exactly what these time stamps really mean:

  • Create: Shows the date and time that the file was created on that particular storage media. Keep in mind that this time stamp changes whenever a file is copied to new media — even within the same storage device.

  • Access: Specifies the last time the file was opened or accessed, but not changed in any form.

  • Modify: Indicates the date and time that a file was modified or changed. On files that have been copied to new media, the modified time stamps might be older than the created time stamps. The reason is that the file in its original location had been modified before it was copied to the new location and thus created at a later date in the new location.

Figure 11-4 shows the CAM information for a word processing file. This application also noted the last time the file was printed, which can also be very helpful.

CAM entries for a document.

Figure 11.4. CAM entries for a document.

The dates and times associated with the CAM information are from the operating system clock, so if the clock is wrong, your time stamps are wrong too. A wide-ranging debate on the issue of time and date settings for computers takes place in the forensic community because crimes don't occur in just one time zone. The central issue isn't whether the time and date are accurate on the computer but, rather, whether the date and time are local or Zulu based, for example.

Note

The world is divided into 24 different time zones denoted by letters of the English alphabet; time zone Z (Zulu) indicates the clock at Greenwich, England. Aviation has long used Zulu time as the standard so that no matter which time zone you're in, you know what time it is. The issue you run into is that a file might be created in Hong Kong, and then transmitted to London, and then copied to New York — all within a one-second period. This range of local time zones tends to be confusing unless you know exactly what you're looking at; if you're using Zulu time, it's extremely easy to figure out the timeline of the file copy.

For most people, using local or Zulu time is a semantic argument. For investigators, however, the issue is one of accuracy and reliability of the time and date stamps. Essentially, if your case is a local case with no crossing of time zones, using a third-party clock to check the accuracy of your suspect computer will usually suffice. If, on the other hand, you have an international case, using Zulu time might be the best strategy because you can track the file times more easily by using time zone Z as your baseline.

Note

In all cases, choose a method that standardizes your procedures for time and date checking, and stick to that method for the duration of your case.

Because the CAM information has become critical to computer forensic cases, criminals have begun to scramble this data to hide or camouflage their digital footprints. Several software packages scramble the CAM data fields with random numbers or with random dates and times, or they just plain eliminate them. This turn of events has made the computer forensic field a bit more challenging because you have to rely not on the time-and-date stamps of the files themselves but, rather, on the time-and-date stamps of secondary sources, such as e-mail servers or other trusted points, that a file might have passed. The Metasploit project (www.metasploit.com) has some helpful information on the subject of antiforensics and a test project in the works.

Discovering Documents

For most users, the place to save documents from day to day is usually in the My Documents folder on their local computers. Most people don't even give a second thought to where they save their files — as long as they can find them. Unfortunately for forensic investigators, documents can be stored in an endless number of places, and even hidden in plain sight. Even experienced investigators can miss these clever hiding places from time to time.

Luring documents out of local storage

The first place to look for documents is the application in which they were created. Most application software keeps a list of recent documents that tells you in which folder or directory these recent files were last saved. Figure 11-5 shows a Microsoft Word menu that lists the most recent files used in this application. The files' paths are listed, which makes it much easier for you to find the place where the files are saved. You don't have to hunt for the files over an entire storage device, and you gain a good idea of where other files might be located. This method also has the advantage of rapidly pointing out whether you also need to look at external storage devices.

Microsoft Word drop-down menu with the most recent file entries.

Figure 11.5. Microsoft Word drop-down menu with the most recent file entries.

If you're matching wits with a computer user who has some fairly good technical knowledge, the file history most likely is erased. In this case, your next step is to use a forensic software suite to open on the local machine all the files that match the type you're looking for. Forensic software, such as FTK and EnCase, has features that allow you to rapidly sort files by type and make your work much easier when dealing with large numbers of files. Figure 11-6 shows a typical list of files sorted by file type.

Forensic software grouping files by type.

Figure 11.6. Forensic software grouping files by type.

If you're looking for files of a certain type, this search is easy for you to perform. You see Microsoft Word files in the My Documents folder, and your first assumption is that they're Microsoft Word files. But, unfortunately, when you're dealing with savvy computer criminals, that assumption can often lead to overlooking evidence that might be in plain sight. If you're looking for JPEG files and you find only Word files, you pass up any Word files. That's not a good thing if the suspect changed the extension or the file header. You need to take the additional step of matching file headers to their file extensions. If they match but you can't open the file, you have to modify the file header.

Matching file headers to extensions

To figure out whether a file's extension has been tampered with, you have to understand the way files are recognized by operating systems and application software. An application program generally recognizes a file by either its file header or file extension, whereas operating systems tend to rely mostly on the file extension to determine file type.

A file header is usually a sequence of characters at the beginning of a file that signifies what type of file it really is. Literally thousands of different file types now exist, so finding file headers can be a challenge if the file is created by an obscure program. Fortunately, most files fall into popular software packages such as Microsoft, Novell, Adobe, or Sun. If you do happen to be working with an oddball file and need to know which headers go with that file, a good to place to start your search is www.fileinfo.net.

Figure 11-7 shows a file header from Microsoft Word. The character sequence for this file is always the same even if the file extension changes! Look carefully at the beginning of the file and you can see a character sequence that looks like a funny-looking D followed by a strange-looking I. That's your file header character sequence.

Microsoft Word file header.

Figure 11.7. Microsoft Word file header.

Figure 11-8 shows the file header for a picture file with a file header that has the character sequence GIF89.

File header for a GIF picture file.

Figure 11.8. File header for a GIF picture file.

Note

If a user changes a file extension to fool the operating system, the application software still opens the file because the file header is still intact. Changing the file extension makes no difference to the software application.

Most forensic software programs perform a signature analysis to determine whether the file header and the file extension match. If the extension and header match, the file is exactly what it claims to be; if you have a mismatch between the header and the extension, the file might have been changed to conceal its true identity. The trick is to let the forensic software do the heavy lifting and identify which files are suspect. You can study those files further to determine whether the extensions have been changed. When dealing with hundreds of thousands of files at a time, you begin to appreciate the power of forensic software.

Modifying the file header

Just because the extension and the file header match doesn't mean that the file is exactly what it seems to be. A user can also modify the file header to hide the file in plain sight.

If a user changes the file header, file extension, and filename to look like a Windows system file, a signature analysis only confirms that the extension and header match, so the software doesn't flag the file as suspicious. If you run the file — if it's an executable file, for example — an error message says that the file doesn't work or cannot be used. This message doesn't normally set off any red flags because many executable files don't work correctly if they're run by themselves.

To determine whether this hiding technique has been used

  • See whether you can open the file.

    If you try to open the file from an application, an error message states that the file cannot be opened because it is an unknown file type. The only way you can open the file at this point is to know which file header to insert at the beginning of the file to make it work again. Use the header of a file you know that works and insert it at the beginning of the suspect file.

  • Use hash values of known files to eliminate them from consideration.

    Libraries of hash values exist for almost every operating system and the support files they use. Most popular application software (and their support files), such as Microsoft Word or Excel, is also included in many hash libraries to eliminate them as potential hidden files.

    If a user tries to hide a file by disguising it in this fashion, it stands out as a file with a hash value that doesn't match any standard files for that operating system or application. The National Software Reference Library (NSRL) is a helpful source of information regarding hash values of known files and how to use these hashes. You can download the hash libraries directly from NSRL (www.nsrl.nist.gov) and incorporate them into your forensic software to filter out known files. Keep in mind that the NSRL doesn't contain hash values for anything other than known files! If you need hashes for illicit files, you might have to contact your local law enforcement representative for a source of these types of hashes.

  • Look for files that have been modified recently or quite often.

    The user has to modify the file to open it and then modify it again to hide it. Keep in mind that literally thousands of files are modified on modern computers every day, so this option is a last resort.

Tip

If the suspect is capable of modifying file headers and extensions, clues such as hex editors and viewers are often tip-offs that you might have to look closely at file headers.

Finding links and external storage

When a file is stored or copied externally to the local computer, a link file is generated so that the operating system knows where the file is located. Quite often, link files are your only clue that an external storage device was connected to the computer. Figure 11-9 shows a typical link file that shows the path where a file might be found; in this case, it's the G volume.

Typical structure of a link file with file path information.

Figure 11.9. Typical structure of a link file with file path information.

You can find link files with forensic software. You can even use forensic software to find deleted link files! Figure 11-10 shows a typical listing of link files. Depending on the forensic software, the detailed steps on retrieving link files vary, but what you are after is establishing a trail or link from one computing device to another. This part of your digital chain is necessary to show the connection between where the evidence was found in relation to where it resided before. If your suspect had or has access to one or both locations, you can reason the suspect has access to the evidence in question. Very few cases have a black and white smoking gun, but rather have an accumulation of evidence of which link files can be a part.

List of link files and related information found by forensic software.

Figure 11.10. List of link files and related information found by forensic software.

Finding external storage options

When you find a link file, the first thing to do is determine whether any external storage devices are within arm's length of the computer you're working on. If you're in the lab, refer to your documentation of the original scene; if you're still on scene, double-check the area for any type of storage device, such as

  • A thumb drive

  • An external hard drive

  • A camera

  • An audio recorder

  • An answering machine

  • A digital copier

Any of these electronic devices has the potential to be a storage device. We talk more about retrieving forensic evidence from these devices in Chapter 14. On a Microsoft Windows computer, the best place to look for previously connected external devices and correlate them with link files is in the Windows Registry. Chapter 10 covers that little known area in more detail.

Finding external networks

If your link file points to a network path (for example, \server est.doc), your job becomes a bit more complicated because you now have to track down a computer that may or may not be on the premises. If the computer resides within your local jurisdiction or control, obtaining the permissions or warrants should be fairly easy. If the files are stored on a computer located a couple of continents away, you might have some trouble getting the local Russian or Chinese law enforcement officials to see things your way.

Warning

Network leads tend to fade quickly on the Internet. Always pursue files and their associated digital footprints on a network or the Internet with all due haste lest the trail goes cold!

Tip

If you see a wireless system or router, assume that a wireless computer is nearby that may have files saved on it that you might be interested in — even if you don't find any link files on the computer you're working on. Remember that newer smart phones have WiFi capability, so they also count as wireless network devices!

In any case, link files provide almost as much information as the file itself with regard to time and date stamps. The link files are literally linked to the suspect file so that whenever the suspect uses the file in question, the link file mirrors this action as well. Figure 11-11 shows that the type of CAM information you can find in a link file is just as detailed as it is from the actual file.

Link file CAM details.

Figure 11.11. Link file CAM details.

Rounding up backups

In organizations of any substantial size, you tend to find a data backup system of some type. Most organizational users have no idea how the backup system works until they lose a file or storage device and even then forget rather quickly about the backup systems that are in place. For computer forensics, backup systems can be a bonanza of information because they tend to be snapshots in time of the computer systems and are often kept long after the physical computers are discarded.

For criminals who are quite tech savvy and know how to hide their digital tracks on computers they control, analysis of backup media is often quite productive because they usually have little control over the backup systems — if they even know of their existence.

Data can be backed up in several ways, and each one has pros and cons with regard to computer forensic analysis. You can back up information on duplicate storage devices, tape drives, and even network storage services. Here are the points to consider:

  • Backups done on duplicate storage devices and network storage devices usually follow the same file system formats as the original versions.

    Your job becomes much easier because the file system formats are fairly standardized.

    The problem with these methods of backing up data is that they tend to be expensive and have the same failure points as the storage media they're backing up.

  • Using tape for backups is by far the most popular and cost effective method.

    Tape backups cost pennies per linear foot, are relatively stable, have been around for decades, and are portable, so you can take them offsite to further protect your data, if necessary.

    The disadvantage is that many different standards exist for tape backup systems. Another issue to consider is that quite a number of legacy tape backup solutions exist. The problem with these systems is twofold:

    • The company that made the equipment may no longer be around, leaving you with no fallback support.

    • You might not be able to find the equipment to read the tapes easily.

Because literally dozens of tape drive types have differing standards and a multitude of software applications to run the tape backup drives, you have to know exactly what type of tape backup system you're working with. Otherwise, extracting data from a tape backup is extremely challenging.

The best-case scenario is to use the same tape drive and tape software to extract a list of files from the tape to create an index of the files that reside on that tape. By using the same equipment that saved the data, you eliminate any problems with different data archiving standards. And, by creating an index, you can scan terabytes of data rather quickly to make a list of the files you really want. By a happy coincidence, the best way to create an index of files on a tape is to scan the metadata for items such as file type, CAM, and any particular file attributes you're looking for. After you figure out which files you need to extract, restoring them from the tape is just a matter of pulling the right tape and extracting the file or files.

If you happen to be the unlucky computer forensic analyst who is tasked with finding data on a tape backup set that has an unknown history, your job becomes a little more difficult.

Note

Before you start, make a duplicate of the tape you're testing, or at least make sure that the write protection is enabled on the tape!

If a tape backup is handed to you and you don't know its history, you need to follow several basic steps to determine its format:

  1. Determine which tape drive was used.

    Most tape backup systems use standard tapes that are compatible with particular tape drives. You must first match the tape to the tape drive. The number one problem that trips up most investigators in this step is that certain tape drives accept tapes that are incompatible even though they fit physically. This is usually because tapes are available in various storage capacities, which leads to different physical densities even though they look identical on the outside. The best example is the old-fashioned 3½ floppy disk, which was either high density or low density. Both types of floppy disk looked identical externally, but their internal structures were so different that mixing up the two disks often led to confusion and lost data.

  2. Determine which tape backup software was used.

    After you know the physical components of the tape backup system, you must determine exactly which type of tape backup software was used to archive this data. This step might not be as easy as it sounds because many tape backup drives are designed to work with almost any type of tape backup software. The most logical approach is to find the most popular software used with that particular tape drive and work your way down the list by popularity until you find a software package that recognizes the tape in the drive. The issue with tape backup software and why it's often difficult to identify which software program wrote the archive is that no real standards exist for writing tape backup software to the tape. Each tape software vendor is free to create its own file backup structure that only its software can read or write.

  3. Determine the structure of the file system.

    Because tape archival is a specialized area of computer forensics, few computer forensic professionals dig down to this level. At this point, you're essentially creating your own software to read and analyze the contents of the tape — you're essentially creating your own tape restoration software. If you're in this situation, seek out a computer forensic firm that handles this area of computer forensics. Chances are good that you aren't the first person to have this problem, and reinventing the wheel usually isn't necessary or even advisable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset