Chapter 5. Data Collection

Data is the lifeblood of nearly every defensive security operation. Data tells you the current state of the system, what has happened in the past, and even what might happen in the future. Data is needed for forensic investigations, verifying compliance, and detecting malicious activity. Table 5-1 describes data that is commonly relevant to defensive operations and where it is typically located.

Table 5-1. Data of interest
Data Data Description Data Location

Logfiles

Details on historical system activity and state. Interesting logfiles include web and DNS server logs, router, firewall, and intrusion detection system logs, and application logs.

In Linux, most logfiles are located in the /var/log directory. In a Windows system logs are found in the Event Log.

Command history

List of recently executed commands.

In Linux, the location of the history file can be found by executing echo $HISTFILE. This file is typically located in the user’s home directory in .bash_history.

Temporary files

Various user and system files that were recently accessed, saved, or processed.

In Windows, temp files can be found in c:windows emp and %USERPROFILE%AppDataLocal. In Linux, temp files are typically located in /tmp and /var/tmp. The Linux temporary directory can also be found by using the command echo $TMPDIR.

User data

Documents, pictures, and other user-created files.

User files are typically located in /home/ in Linux and c:Users in Windows.

Browser history

Web pages recently accessed by the user.

Varies widely based on operating system and browser.

Windows Registry

Hierarchical database that stores settings and other data that is critical to the operation of Windows and applications.

Windows Registry.

Throughout this chapter, we explore various methods to gather data, locally and remotely, from both Linux and Windows systems.

Commands in Use

We introduce cut, file, head, and for Windows systems reg and wevtutil, to select and gather data of interest from local and remote systems.

cut

cut is a command used to extract select portions of a file. It reads a supplied input file line by line and parses the line based on a specified delimiter. If no delimiter is specified, cut will use a tab character by default. The delimiter characters divide each line of a file into fields. You can use either the field number or character position number to extract parts of the file. Fields and characters start at position 1.

Common command options

-c

Specify the character(s) to extract.

-d

Specify the character used as a field delimiter. By default, the delimiter is the tab character.

-f

Specify the field(s) to extract.

Command example

The cutfile.txt is used to demonstrate the cut command. The file consists of two lines each, with three columns of data, as shown in Example 5-1.

Example 5-1. cutfile.txt
12/05/2017 192.168.10.14 test.html
12/30/2017 192.168.10.185 login.html

In cutfile.txt. each field is delimited using a space. To extract the IP address (field position 2), you can use the following command:

$ cut -d' ' -f2 cutfile.txt

192.168.10.14
192.168.10.185

The -d' ' option specifies the space as the field delimiter. The -f2 option tells cut to return the second field, in this case, the IP address.

Warning

The cut command considers each delimiter character as separating a field. It doesn’t collapse whitespace. Consider the following example:

Pat   25
Pete  12

If we use cut on this file, we would define the delimiter to be a space. In the first record there are three spaces between the name (Pat) and the number (25). Thus, the number is in field 4. However, for the next line, the name (Pete) is in field 3, since there are only two space characters between the name and the number. For a data file like this, it would be better to separate the name from the numbers with a single tab character and use that as the delimiter for cut.

file

The file command is used to help identify a given file’s type. This is particularly useful in Linux, as most files are not required to have an extension that can be used to identify its type (unlike Windows, which uses extensions such as .exe). The file command looks deeper than the filename by reading and analyzing the first block of data, also known as the magic number. Even if you rename a .png image file to end with .jpg, the file command is smart enough to figure that out and tell you the correct file type (in this case, a PNG image file).

Common command options

-f

Read the list of files to analyze from a given file.

-k

Do not stop on the first match; list all matches for the file type.

-z

Look inside compressed files.

Command example

To identify the file type, pass the filename to the file command:

$ file unknownfile

unknownfile: Microsoft Word 2007+

head

The head command displays the first few lines or bytes of a file. By default, head displays the first 10 lines.

Common command options

-n

Specify the number of lines to output. To show 15 lines, you can specify -n 15 or -15.

-c

Specify the number of bytes to output.

reg

The reg command is used to manipulate the Windows Registry and is available in Windows XP and later.

Common command parameters

add

Add an entry to the registry

export

Copy the specified registry entries to a file

query

Return a list of subkeys below the specified path

Command example

To list all of the root keys in the HKEY_LOCAL_MACHINE hive:

$ reg query HKEY_LOCAL_MACHINE

HKEY_LOCAL_MACHINEBCD00000000
HKEY_LOCAL_MACHINEHARDWARE
HKEY_LOCAL_MACHINESAM
HKEY_LOCAL_MACHINESECURITY
HKEY_LOCAL_MACHINESOFTWARE
HKEY_LOCAL_MACHINESYSTEM

wevtutil

Wevtutil is a command-line utility used to view and manage system logs in the Windows environment. It is available in most modern versions of Windows and is callable from Git Bash.

Common command parameters

el

Enumerate available logs

qe

Query a log’s events

Common command options

/c

Specify the maximum number of events to read

/f

Format the output as text or XML

/rd

Read direction—if set to true, it will read the most recent logs first

Warning

In the Windows command prompt, only a single / is needed before command options. In the Git Bash terminal, two // are needed (e.g., //c) because of the way commands are processed.

Command example

To list all of the available logs:

wevtutil el

To view the most recent event in the System log via Git Bash:

wevtutil qe System //c:1 //rd:true
Tip

For additional information about the wevtutil command, see Microsoft’s documentation.

Gathering System Information

One of the first steps in defending a system is understanding the state of the system and what it is doing. To accomplish this, you need to gather data, either locally or remotely, for analysis.

Executing a Command Remotely Using SSH

The data you want may not always be available locally. You may need to connect to a remote system such as a web, File Transfer Protocol (FTP), or SSH server to obtain the desired data.

Commands can be executed remotely and securely by using SSH if the remote system is running the SSH service. In its basic form (no options), you can just add ssh and a hostname in front of any shell command to run that command on the specified host. For example, ssh myserver who will run the who command on the remote machine myserver. If you need to specify a different username, ssh username@myserver who or ssh -l username myserver who both do the same thing. Just replace username with the username you would like to use to log in. You can redirect the output to a file on your local system, or to a file on the remote system.

To run a command on a remote system and redirect the output to a file on your local system:

ssh myserver ps > /tmp/ps.out

To run a command on a remote system and redirect the output to a file on the remote system:

ssh myserver ps > /tmp/ps.out

The backslash will escape the special meaning of the redirect (in the current shell) and simply pass the redirect character as the second word of the three words sent to myserver. When executed on the remote system, it will be interpreted by that shell and redirect the output on the remote machine (myserver) and leave it there.

In addition, you can take scripts that reside on your local system and run them on a remote system using SSH. You’d use this command to run the osdetect.sh script remotely:

ssh myserver bash < ./osdetect.sh

This runs the bash command on the remote system, but passes into it the lines of the osdetect.sh script directly from your local system. This avoids the need for a two-step process of, first, transferring the script to the remote system and, then, running that copied script. Output from running the script comes back to your local system and can be captured by redirecting stdout, as we have shown with many other commands.

Gathering Linux Logfiles

Logfiles for a Linux system are normally stored in the /var/log/ directory. To easily collect the logfiles into a single file, use the tar command:

tar -czf ${HOSTNAME}_logs.tar.gz /var/log/

The option -c is used to create an archive file, -z to zip the file, and -f to specify a name for the output file. The HOSTNAME variable is a bash variable that is automatically set by the shell to the name of the current host. We include it in our filename so the output file will be given the same name as the system, which will help later with organization if logs are collected from multiple systems. Note that you will need to be logged in as a privileged user or use sudo in order to successfully copy the logfiles.

Table 5-2 lists some important and common Linux logs and their standard locations.

Table 5-2. Linux logfiles
Log location Description

/var/log/apache2/

Access and error logs for the Apache web server

/var/log/auth.log

Information on user logins, privileged access, and remote authentication

/var/log/kern.log

Kernel logs

/var/log/messages

General noncritical system information

/var/log/syslog

General system logs

To find more information on where logfiles are being stored for a given system, refer to /etc/syslog.conf or /etc/rsyslog.conf on most Linux distributions.

Gathering Windows Logfiles

In the Windows environment, wevtutil can be used to manipulate and gather logfiles. Luckily, this command is callable from Git Bash. The winlogs.sh script, shown in Example 5-2, uses the wevtutil el parameter to list all available logs, and then the epl parameter to export each log to a file.

Example 5-2. winlogs.sh
#!/bin/bash -
#
# Cybersecurity Ops with bash
# winlogs.sh
#
# Description:
# Gather copies of Windows log files
#
# Usage:
# winlogs.sh [-z]
#   -z Tar and zip the output
#

TGZ=0
if (( $# > 0 ))						1
then
    if [[ ${1:0:2} == '-z' ]]				2
    then
	TGZ=1	# tgz flag to tar/zip the log files
 shift
    fi
fi
SYSNAM=$(hostname)
LOGDIR=${1:-/tmp/${SYSNAM}_logs}			3

mkdir -p $LOGDIR					4
cd ${LOGDIR} || exit -2

wevtutil el | while read ALOG				5
do
    ALOG="${ALOG%$'
'}"				6
    echo "${ALOG}:"					7
    SAFNAM="${ALOG// /_}"				8
    SAFNAM="${SAFNAM////-}"
    wevtutil epl "$ALOG" "${SYSNAM}_${SAFNAM}.evtx"
done

if (( TGZ == 1 ))					9
then
    tar -czvf ${SYSNAM}_logs.tgz *.evtx			10
fi
1

The script begins with a simple initialization and then an if statement, one that checks to see whether any arguments were provided to the script. The $# is a special shell variable whose value is the number of arguments supplied on the command line when this script is invoked. This conditional for the if is an arithmetic expression, because of the double parentheses. Therefore, the comparison can use the greater-than character (>) and it will do a numerical comparison. If that symbol is used in an if expression with square brackets rather than double parentheses, > does a comparison of lexical ordering—alphabetical order. You would need to use -gt for a numerical comparison inside square brackets.

For this script, the only argument we are supporting is a -z option to indicate that the logfiles should all be zipped up into a single TAR file when it’s done collecting logfiles. This also means that we can use a simple type of argument parsing. We will use a more sophisticated argument parser (getopts) in an upcoming script.

2

This check takes a substring of the first argument ($1) starting at the beginning of the string (an offset of 0 bytes), 2 bytes long. If the argument is, in fact, a -z, we will set a flag. The script also does a shift to remove that argument. What was the second argument, if any, is now the first. The third, if any, becomes the second, and so on.

3

If the user wants to specify a location for the logs, it can be specified as an argument to the script. The optional -z argument, if supplied, has already been shift-ed out of the way, so any user-supplied path would now be the first argument. If no value was supplied on the command line, the expression inside the braces will return a default value as indicated to the right of the minus sign. We use the braces around SYSTEM because the _logs would otherwise be considered part of the variable name.

4

The -p option to mkdir will create the directory and any intervening directories. It will also not give an error message if the directory exists. On the next line, we invoke cd to make that directory the current directory, where the logfiles will be saved; if the cd should fail, the program will exit with an error code.

5

Here we invoke wevtutil el to list all the possible logfiles. The output is piped into a while loop that will read one line (one log filename) at a time.

6

Since this is running on a Windows system, each line printed by wevtutil will end with both a newline ( ) and a return ( ) character. We remove the character from the right side of the string by using the % operator. To specify the (nonprinting) return character, we use the $'string' construct, which substitutes certain backslash-escaped characters with nonprinting characters (as defined in the ANSI C standard). So the two characters of are replaced with an ASCII 13 character, the return character.

7

We echo the filename to provide an indication to the user of progress being made and which log is currently being fetched.

8

To create the filename into which we want wevtutil to store its output (the logfile), we make two edits to the name. First, since the name of the log as provided may have blanks, we replace any blank with an underscore character. While not strictly necessary, the underscore avoids the need for quotes when using the filename. The syntax, in general, is ${VAR/old/new} to retrieve the value of VAR with a substitution: replacing old with new. Using a double slash, ${VAR//old/new} replaces all occurrences, not just the first.

Warning

A common mistake is to type ${VAR/old/new/}, but the trailing slash is not part of the syntax and will simply be added to the resulting string if a substitution is made. For example, if VAR=embolden then ${VAR/old/new/} would return embnew/en.

Second, some Windows logfile names have a slash character in them. In bash, however, the / is the separator between directories when used in a pathname. It shouldn’t be used in a filename, so we make another substitution using the ${VAR/old/new} syntax, to replace any / with a - character. Notice, though, that we have to “escape” the meaning of the / in our substitution so that bash doesn’t think it’s part of the substitution syntax. We use / to indicate that we want a literal slash.

9

This is another arithmetic expression, enclosed in double parentheses. Within those expressions, bash doesn’t require the $ in front of most variable names. It would still be needed for positional parameters like $1 to avoid confusion with the integer 1.

10

Here we use tar to gather all the .evtx files into one archive. We use the -z option to compress the data, but we don’t use the -v option so that tar does its work silently (since our script already echoed the filenames as it extracted them).

The script runs in a subshell, so although we have changed directories inside the script, once the script exits, we are back in the directory where we started. If we needed to be back in the original directory inside the script, we could use the cd - command to return to the previous directory.

Gathering System Information

If you are able to arbitrarily execute commands on a system, you can use standard OS commands to collect a variety of information about the system. The exact commands you use will vary based on the operating system you are interfacing with. Table 5-3 shows common commands that can yield a great deal of information from a system. Note that the command may be different depending on whether it is run within the Linux or Windows environment.

Table 5-3. Local data-gathering commands
Linux command Windows Git Bash equivalent Purpose

uname -a

uname -a

Operating system version information

cat /proc/cpuinfo

systeminfo

Display system hardware and related info

ifconfig

ipconfig

Network interface information

route

route print

Display routing table

arp -a

arp -a

Display Address Resolution Protocol (ARP) table

netstat -a

netstat -a

Display network connections

mount

net share

Display filesystems

ps -e

tasklist

Display running processes

The script getlocal.sh, shown in Example 5-3, is designed to identify the operating system type using osdetect.sh, run the various commands appropriate for the operating system type, and record the results to a file. The output from each command is stored in Extensible Markup Language (XML) format, i.e., delimited with XML tags, for easier processing later. Invoke the script like this: bash getlocal.sh < cmds.txt, where the file cmds.txt contains a list of commands similar to that shown in Table 5-3. The format it expects are those fields, separated by vertical bars, plus an additional field, the XML tag with which to mark the output of the command. (Also, lines beginning with a # are considered comments and will be ignored.)

Here is what a cmds.txt file might look like:

# Linux Command  |MSWin  Bash |XML tag    |Purpose
#----------------+------------+-----------+------------------------------
uname -a         |uname -a    |uname      |O.S. version etc
cat /proc/cpuinfo|systeminfo  |sysinfo    |system hardware and related info
ifconfig         |ipconfig    |nwinterface|Network interface information
route            |route print |nwroute    |routing table
arp -a           |arp -a      |nwarp      |ARP table
netstat -a       |netstat -a  |netstat    |network connections
mount            |net share   |diskinfo   |mounted disks
ps -e            |tasklist    |processes  |running processes

Example 5-3 shows the source for the script.

Example 5-3. getlocal.sh
#!/bin/bash -
#
# Cybersecurity Ops with bash
# getlocal.sh
#
# Description:
# Gathers general system information and dumps it to a file
#
# Usage:
# bash getlocal.sh < cmds.txt
#   cmds.txt is a file with list of commands to run
#

# SepCmds - separate the commands from the line of input
function SepCmds()
{
      LCMD=${ALINE%%|*}                   11
      REST=${ALINE#*|}                    12
      WCMD=${REST%%|*}                    13
      REST=${REST#*|}
      TAG=${REST%%|*}                     14

      if [[ $OSTYPE == "MSWin" ]]
      then
         CMD="$WCMD"
      else
         CMD="$LCMD"
      fi
}

function DumpInfo ()
{                                                              5
    printf '<systeminfo host="%s" type="%s"' "$HOSTNAME" "$OSTYPE"
    printf ' date="%s" time="%s">
' "$(date '+%F')" "$(date '+%T')"
    readarray CMDS                           6
    for ALINE in "${CMDS[@]}"                7
    do
       # ignore comments
       if [[ ${ALINE:0:1} == '#' ]] ; then continue ; fi     8

      SepCmds

      if [[ ${CMD:0:3} == N/A ]]             9
      then
          continue
      else
          printf "<%s>
" $TAG               10
          $CMD
          printf "</%s>
" $TAG
      fi
    done
    printf "</systeminfo>
"
}

OSTYPE=$(./osdetect.sh)                     1
HOSTNM=$(hostname)                          2
TMPFILE="${HOSTNM}.info"                    3

# gather the info into the tmp file; errors, too
DumpInfo  > $TMPFILE  2>&1                  4
1

After the two function definitions the script begins here, invoking our osdetect.sh script (from Chapter 2). We’ve specified the current directory as its location. You could put it elsewhere, but then be sure to change the specified path from ./ to wherever you put it and/or add that location to your PATH variable.

Note

To make things more efficient, you can include the code from osdetect.sh directly in getlocal.sh.

2

Next, we run the hostname program in a subshell to retrieve the name of this system for use in the next line but also later in the DumpInfo function.

3

We use the hostname as part of the temporary filename where we will put all our output.

4

Here is where we invoke the function that will do most of the work of this script. We redirect both stdout and stderr (to the same file) when invoking the function so that the function doesn’t have to put redirects on any of its output statements; it can write to stdout, and this invocation will redirect all the output as needed. Another way to do this is to put the redirect on the closing brace of the DumpInfo function definition. Redirecting stdout might instead be left to the user who invokes this script; it would simply write to stdout by default. But if the user wants the output in a file, the user has to create a tempfile name and has to remember to redirect stderr as well. Our approach is suitable for a less experienced user.

5

Here is where the “guts” of the script begins. This function begins with output of an XML tag called <systeminfo>, which will have its closing tag written out at the end of this function.

6

The readarray command in bash will read all the lines of input (until end-of-file or on keyboard input until Ctrl-D). Each line will be its own entry in the array named, in this case, CMDS.

7

This for loop will loop over the values of the CMDS array—over each line, one at a time.

8

This line uses the substring operation to take the character at position 0, of length 1, from the variable ALINE. The hashtag (#), or pound sign, is in quotes so that the shell doesn’t interpret it as the start of the script’s own comment.

If the line is not a comment, the script will call the SepCmds function. More about that function later; it separates the line of input into CMD and TAG, where CMD will be the appropriate command for a Linux or Windows system, depending on where we run the script.

9

Here, again, we use the substring operation from the start of the string (position 0) of length 3 to look for the string that indicates there is no appropriate operation on this particular operating system for the desired information. The continue statement tells bash to skip to the next iteration of the loop.

10

If we do have an appropriate action to take, this section of code will print the specified XML tag on either side of the invocation of the specified command. Notice that we invoke the command by retrieving the value of the variable CMD.

11

Here we isolate the Linux command from a line of our input file by removing all the characters to the right of the vertical bar, including the bar itself. The %% says to make the longest match possible on the right side of the variable’s value and remove it from the value it returns (i.e., ALINE isn’t changed).

12

Here the # removes the shortest match and from the left side of the variable’s value. Thus, it removes the Linux command that was just put in LCMD.

13

Again, we remove everything to the right of the vertical bar, but this time we are working with REST, modified in the previous statement. This gives us the MSWindows command.

14

Here we extract the XML tag by using the same substitution operations we’ve seen twice already.

All that’s left in this function is the decision, based on the operating system type, as to which value to return as the value in CMD. All variables are global unless explicitly declared as local within a function. None of ours are local, so they can be used (set, changed, or used) throughout the script.

When running this script, you can use the cmds.txt file as shown or change its values to get whatever set of information you want to collect. You can also run it without redirecting the input from a file; simply type (or copy/paste) the input after the script is invoked.

Gathering the Windows Registry

The Windows Registry is a vast repository of settings that define how the system and applications will behave. Specific registry key values can often be used to identify the presence of malware and other intrusions. Therefore, a copy of the registry is useful when later performing analysis of the system.

To export the entire Windows Registry to a file using Git Bash:

regedit //E ${HOSTNAME}_reg.bak

Note that two forward slashes are used before the E option because we are calling regedit from Git Bash; only one would be needed if using the Windows Command Prompt. We use ${HOSTNAME} as part of the output filename to make it easier to organize later.

If needed, the reg command can also be used to export sections of the registry or individual subkeys. To export the HKEY_LOCAL_MACHINE hive using Git Bash:

reg export HKEY_LOCAL_MACHINE $(HOSTNAME)_hklm.bak

Searching the Filesystem

The ability to search the system is critical for everything from organizing files, to incident response, to forensic investigation. The find and grep commands are extremely powerful and can be used to perform a variety of search functions.

Searching by Filename

Searching by filename is one of the most basic search methods. This is useful if the exact filename is known, or a portion of the filename is known. To search the Linux /home directory and subdirectories for filenames containing the word password:

find /home -name '*password*'

Note that the use of the * character at the beginning and end of the search string designates a wildcard, meaning it will match any (or no) characters. This is a shell pattern and is not the same as a regular expression. Additionally, you can use the -iname option instead of -name to make the search case-insensitive.

To perform a similar search on a Windows system using Git Bash, simply replace /home with /c/Users.

Tip

If you want to suppress errors, such as Permission Denied, when using find you can do so by redirecting stderr to /dev/null or to a logfile:

find /home -name '*password*' 2>/dev/null

Searching for Hidden Files

Hidden files are often interesting as they can be used by people or malware looking to avoid detection. In Linux, names of hidden files begin with a period. To find hidden files in the /home directory and subdirectories:

find /home -name '.*'
Tip

The .* in the preceding example is a shell pattern, which is not the same as a regular expression. In the context of find, the “dot-star” pattern will match on any file that begins with a period and is followed by any number of additional characters (denoted by the * wildcard character).

In Windows, hidden files are designated by a file attribute, not the filename. From the Windows Command Prompt, you can identify hidden files on the c: drive as follows:

dir c: /S /A:H

The /S option tells dir to recursively traverse subdirectories, and the /A:H displays files with the hidden attribute. Unfortunately, Git Bash intercepts the dir command and instead executes ls, which means it cannot easily be run from bash. This can be solved by using the find command’s -exec option coupled with the Windows attrib command.

The find command has the ability to run a specified command for each file that is found. To do that, you can use the exec option after specifying your search criteria. Exec replaces any curly braces ({}) with the pathname of the file that was found. The semicolon terminates the command expression:

$ find /c -exec attrib '{}' ; | egrep '^.{4}H.*'

A   H                C:UsersBobscriptshist.txt
A   HR               C:UsersBobscriptswinlogs.sh

The find command will execute the Windows attrib command for each file it identifies on the c: drive (denoted as /c), thereby printing out each file’s attributes. The egrep command is then used with a regular expression to identify lines where the fifth character is the letter H, which will be true if the file’s hidden attribute is set.

If you want to clean up the output further and display only the file path, you can do so by piping the output of egrep into the cut command:

$ find . -exec attrib '{}' ; | egrep '^.{4}H.*' | cut -c22-

C:UsersBobscriptshist.txt
C:UsersBobscriptswinlogs.sh

The -c option tells cut to use character position numbers for slicing. 22- tells cut to begin at character 22, which is the beginning of the file path, and continue to the end of the line (-). This can be useful if you want to pipe the file path into another command for further processing.

Searching by File Size

The find command’s -size option can be used to find files based on file size. This can be useful to help identify unusually large files, or to identify the largest or smallest files on a system.

To search for files greater than 5 GB in size in the /home directory and subdirectories:

find /home -size +5G

To identify the largest files in the system, you can combine find with a few other commands:

find / -type f -exec ls -s '{}' ; | sort -n -r | head -5

First, we use find / -type f to list all of the files in and under the root directory. Each file is passed to ls -s, which will identify its size in blocks (not bytes). The list is then sorted from highest to lowest, and the top five are displayed using head. To see the smallest files in the system, tail can be used in place of head, or you can remove the reverse (-r) option from sort.

Tip

In the shell, you can use !! to represent the last command that was executed. You can use it to execute a command again, or include it in a series of piped commands. For example, suppose you just ran the following command:

find / -type f -exec ls -s '{}' ;

You can then use !! to run that command again or feed it into a pipeline:

!! | sort -n -r | head -5

The shell will automatically replace !! with the last command that was executed. Give it a try!

You can also use the ls command directly to find the largest file and completely eliminate the use of find, which is significantly more efficient. To do that, just add the -R option for ls, which will cause it to recursively list the files under the specified directory:

ls / -R -s | sort -n -r | head -5

Searching by Time

The filesystem can also be searched based on when files were last accessed or modified. This can be useful when investigating incidents to identify recent system activity. It can also be useful for malware analysis, to identify files that have been accessed or modified during program execution.

To search for files in the /home directory and subdirectories modified less than 5 minutes ago:

find /home -mmin -5

To search for files modified less than 24 hours ago:

find /home -mtime -1

The number specified with the mtime option is a multiple of 24 hours, so 1 means 24 hours, 2 means 48 hours, etc. A negative number here means “less than” the number specified, a positive number means “greater than,” and an unsigned number means “exactly.”

To search for files modified more than 2 days (48 hours) ago:

find /home -mtime +2

To search for files accessed less than 24 hours ago, use the -atime option:

find /home -atime -1

To search for files in the /home directory accessed less than 24 hours ago and copy (cp) each file to the current working directory (./):

find /home -type f -atime -1 -exec cp '{}' ./ ;

The use of -type f tells find to match only ordinary files, ignoring directories and other special file types. You may also copy the files to any directory of your choosing by replacing the ./ with an absolute or relative path.

Warning

Be sure that your current working directory is not somewhere in the /home hierarchy, or you will have the copies found and thus copied again.

Searching for Content

The grep command can be used to search for content inside files. To search for files in the /home directory and subdirectories that contain the string password:

grep -i -r /home -e 'password'

The -r option recursively searches all directories below /home, -i specifies a case-insensitive search, and -e specifies the regex pattern string to search for.

Tip

The -n option can be used identify which line in the file contains the search string, and -w can be used to match only whole words.

You can combine grep with find to easily copy matching files to your current working directory (or any specified directory):

find /home -type f -exec grep 'password' '{}' ; -exec cp '{}' . ;

First, we use find /home/ -type f to identify all of the files in and below the /home directory. Each file found is passed to grep to search for password within its content. Each file matching the grep criteria is then passed to the cp command to copy the file to the current directory (indicated by the dot). This combination of commands may take a considerable amount of time to execute and is a good candidate to run as a background task.

Searching by File Type

Searching a system for specific file types can be challenging. You cannot rely on the file extension, if one even exists, as that can be manipulated by the user. Thankfully, the file command can help identify types by comparing the contents of a file to known patterns called magic numbers. Table 5-4 lists common magic numbers and their starting locations inside files.

Table 5-4. Magic numbers
File type Magic number pattern (hex) Magic number pattern (ASCII) File offset (bytes)

JPEG

FF D8 FF DB

ÿØÿÛ

0

DOS executable

4D 5A

MZ

0

Executable and linkable format

7F 45 4C 46

.ELF

0

Zip file

50 4B 03 04

PK..

0

To begin, you need to identify the type of file for which you want to search. Let’s assume you want to find all PNG image files on the system. First, you would take a known-good file such as Title.png, run it through the file command, and examine the output:

$ file Title.png

Title.png: PNG image data, 366 x 84, 8-bit/color RGBA, non-interlaced

As expected, file identifies the known-good Title.png file as PNG image data and also provides the dimensions and various other attributes. Based on this information, you need to determine what part of the file command output to use for the search, and generate the appropriate regular expression. In many cases, such as with forensic discovery, you are likely better off gathering more information than less; you can always further filter the data later. To do that, you will use a very broad regular expression that will simply search for the word PNG in the output from the file command 'PNG'.

You can, of course, make more-advanced regular expressions to identify specific files. For example, if you wanted to find PNG files with dimensions of 100 × 100:

'PNG.*100x100'

If you want to find PNG and JPEG files:

'(PNG|JPEG)'

Once you have the regular expression, you can write a script to run the file command against every file on the system looking for a match. When a match is found, typesearch.sh, shown in Example 5-4, will print the file path to standard output.

Example 5-4. typesearch.sh
#!/bin/bash -
#
# Cybersecurity Ops with bash
# typesearch.sh
#
# Description:
# Search the file system for a given file type. It prints out the
# pathname when found.
#
# Usage:
# typesearch.sh [-c dir] [-i] [-R|r] <pattern> <path>
#   -c Copy files found to dir
#   -i Ignore case
#   -R|r Recursively search subdirectories
#   <pattern> File type pattern to search for
#   <path> Path to start search
#

DEEPORNOT="-maxdepth 1"		# just the current dir; default

# PARSE option arguments:
while getopts 'c:irR' opt; do                         1
  case "${opt}" in                                    2
    c) # copy found files to specified directory
	       COPY=YES
	       DESTDIR="$OPTARG"                             3
	       ;;
    i) # ignore u/l case differences in search
	       CASEMATCH='-i'
	       ;;
    [Rr]) # recursive                                 4
        unset DEEPORNOT;;                             5
    *)  # unknown/unsupported option                  6
        # error mesg will come from getopts, so just exit
        exit 2 ;;
  esac
done
shift $((OPTIND - 1))                                 7


PATTERN=${1:-PDF document}                            8
STARTDIR=${2:-.}	# by default start here

find $STARTDIR $DEEPORNOT -type f | while read FN     9
do
    file $FN | egrep -q $CASEMATCH "$PATTERN"          10
    if (( $? == 0 ))   # found one                    11
    then
	        echo $FN
	        if [[ $COPY ]]                               12
	        then
	            cp -p $FN $DESTDIR                       13
	        fi
    fi
done
1

This script supports options that alter its behavior, as described in the opening comments of the script. The script needs to parse these options to tell which ones have been provided and which are omitted. For anything more than a single option or two, it makes sense to use the getopts shell built-in. With the while loop, we will keep calling getopts until it returns a nonzero value, telling us that there are no more options. The options we want to look for are provided in that string c:irR. Whichever option is found is returned in opt, the variable name we supplied.

2

We are using a case statement here that is a multiway branch; it will take the branch that matches the pattern provided before the left parenthesis. We could have used an if/elif/else construct, but this reads well and makes the options so clearly visible.

3

The c option has a colon (:) after it in the list of supported options, which indicates to getopts that the user will also supply an argument for that option. For this script, that optional argument is the directory into which copies will be made. When getopts parses an option with an argument like this, it puts the argument in the variable named OPTARG, and we save it in DESTDIR because another call to getopts may change OPTARG.

4

The script supports either an uppercase R or lowercase r for this option. Case statements specify a pattern to be matched, not just a simple literal, so we wrote [Rr]) for this case, using the brackets construct to indicate that either letter is considered a match.

5

The other options set variables to cause their action to occur. In this case, we unset the previously set variable. When that variable is referenced later as $DEEPORNOT, it will have no value, so it will effectively disappear from the command line where it is used.

6

Here is another pattern, *, which matches anything. If no other pattern has been matched, this case will be executed. It is, in effect, an “else” clause for the case statement.

7

When we’re done parsing the options, we can get rid of the ones we’ve already processed with a shift. Just a single shift gets rid of a single argument so that the second argument becomes the first, the third becomes the second, and so on. Specifying a number like shift 5 will get rid of the first five arguments so that $6 becomes $1, $7 becomes $2, and so on. Calls to getopts keep track of which arguments to process in the shell variable OPTIND. It refers to the next argument to be processed. By shifting by this amount, we get rid of any/all of the options that we parsed. After this shift, $1 will refer to the first nonoption argument, whether or not any options were supplied when the user invoked the script.

8

The two possible arguments that aren’t in -option format are the pattern we’re searching for and the directory where we want to start our search. When we refer to a bash variable, we can add a :- to say, “If that value is empty or unset, return this default value instead.” We give a default value for PATTERN as PDF document, and the default for STARTDIR is ., which refers to the current directory.

9

We invoke the find command, telling it to start its search in $STARTDIR. Remember that $DEEPORNOT may be unset and thus add nothing to the command line, or it may be the default -maxdepth 1, telling find not to go any deeper than this directory. We’ve added a -type f so that we find only plain files (not directories or special device files or FIFOs). That isn’t strictly necessary, and you could remove it if you want to be able to search for those kinds of files. The names of the files found are piped in to the while loop, which will read them one at a time into the variable FN.

10

The -q option to egrep tells it to be quiet and not output anything. We don’t need to see what phrase it found, only that it found it.

11

The $? construct is the value returned by the previous command. A successful result means that egrep found the pattern supplied.

12

This checks to see whether COPY has a value. If it is null the if will be false.

13

The -p option to the cp command will preserve the mode, ownership, and timestamps of the file, in case that information is important to your analysis.

If you are looking for a lighter-weight but less-capable solution, you can perform a similar search using the find command’s exec option as shown in this example:

find / -type f -exec file '{}' ; | egrep 'PNG' | cut -d' ' -f1

Here we send each item found by the find command into file to identify its type. We then pipe the output of file into egrep and filter it, looking for the PNG keyword. The use of cut is simply to clean up the output and make it more readable.

Warning

Be cautious if using the file command on an untrusted system. The file command uses the magic pattern file located at /usr/share/misc/. A malicious user could modify this file such that certain file types would not be identified. A better option is to mount the suspect drive to a known-good system and search from there.

Searching by Message Digest Value

A cryptographic hash function is a one-way function that transforms an input message of arbitrary length into a fixed-length message digest. Common hash algorithms include MD5, SHA-1, and SHA-256. Consider the two files in Examples 5-5 and 5-6.

Example 5-5. hashfilea.txt
This is hash file A
Example 5-6. hashfileb.txt
This is hash file B

Notice that the files are identical except for the last letter in the sentence. You can use the sha1sum command to compute the SHA-1 message digest of each file:

$ sha1sum hashfilea.txt hashfileb.txt

6a07fe595f9b5b717ed7daf97b360ab231e7bbe8 *hashfilea.txt
2959e3362166c89b38d900661f5265226331782b *hashfileb..txt

Even though there is only a small difference between the two files, they generated completely different message digests. Had the files been the same, the message digests would have also been the same. You can use this property of hashing to search the system for a specific file if you know its digest. The advantage is that the search will not be influenced by the filename, location, or any other attributes; the disadvantage is that the files need to be exactly the same. If the file contents have changed in any way, the search will fail. The script hashsearch.sh, shown in Example 5-7, recursively searches the system, starting at the location provided by the user. It performs a SHA-1 hash of each file that is found and then compares the digest to the value provided by the user. If a match is found, the script outputs the file path.

Example 5-7. hashsearch.sh
#!/bin/bash -
#
# Cybersecurity Ops with bash
# hashsearch.sh
#
# Description:
# Recursively search a given directory for a file that
# matches a given SHA-1 hash
#
# Usage:
# hashsearch.sh <hash> <directory>
#   hash - SHA-1 hash value to file to find
#   directory - Top directory to start search
#

HASH=$1
DIR=${2:-.}	# default is here, cwd

# convert pathname into an absolute path
function mkabspath ()				6
{
    if [[ $1 == /* ]]				7
    then
    	ABS=$1
    else
    	ABS="$PWD/$1"				8
    fi
}

find $DIR -type f |				1
while read fn
do
    THISONE=$(sha1sum "$fn")			2
    THISONE=${THISONE%% *}			3
    if [[ $THISONE == $HASH ]]
    then
	mkabspath "$fn"				4
	echo $ABS				5
    fi
done
1

We’ll look for any plain file for our hash. We need to avoid special files; reading a FIFO would cause our program to hang as it waited for someone to write into the FIFO. Reading a block special or character special file would also not be a good idea. The -type f ensures that we get only plain files. It prints those filenames, one per line, to stdout, which we redirect via a pipe into the while read commands.

2

This computes the hash value in a subshell and captures its output (i.e., whatever it writes to stdout) and assigns it to the variable. The quotes are needed in case the filename has spaces in its name.

3

This reassignment removes from the righthand side the largest substring beginning with a space. The output from sha1sum is both the computed hash and the filename. We want only the hash value, so we remove the filename with this substitution.

4

We call the mkabspath function, putting the filename in quotes. The quotes make sure that the entire filename shows up as a single argument to the function, even if the filename has one or more spaces in the name.

5

Remember that shell variables are global unless declared to be local within a function. Therefore, the value of ABS that was set in the call to mkabspath is available to us here.

6

This is our declaration of the function. When declaring a function, you can omit either the keyword function or the parentheses, but not both.

7

For the comparison, we are using shell pattern matching on the righthand side. This will check whether the first parameter begins with a slash. If it does, this is already an absolute pathname and we need do nothing further.

8

When the parameter is only a relative path, it is relative to the current location, so we prepend the current working directory, thereby making it absolute. The variable PWD is a shell variable that is set to the current directory via the cd command.

Transferring Data

Once you have gathered all of the desired data, the next step is to move it off the origin system for further analysis. To do that, you can copy the data to a removable device or upload it to a centralized server. If you are going to upload the data, be sure to do so using a secure method such as Secure Copy (SCP). The following example uses scp to upload the file some_system.tar.gz to the home directory of user bob on remote system 10.0.0.45:

scp some_system.tar.gz [email protected]:/home/bob/some_system.tar.gz

For convenience, you can add a line at the end of your collection scripts to automatically use scp to upload data to a specified host. Remember to give your files unique names, so as to not overwrite existing files as well as to make analysis easier later.

Warning

Be cautious of how you perform SSH or SCP authentication within scripts. It is not recommended that you include passwords in your scripts. The preferred method is to use SSH certificates. The keys and certificates can be generated using the ssh-keygen command.

Summary

Gathering data is an important step in defensive security operations. When collecting data, be sure to transfer and store it by using secure (i.e., encrypted) methods. As a general rule, gather all data that you think is relevant; you can easily delete data later, but you cannot analyze data you did not collect. Before collecting data, first confirm that you have permission and/or legal authority to do so.

Also be aware that when dealing with adversaries, they will often try to hide their presence by deleting or obfuscating data. To counter that, be sure to use multiple methods when searching for files (name, hash, contents, etc.).

In the next chapter, we explore techniques for processing data and preparing it for analysis.

Workshop

  1. Write the command to search the filesystem for any file named dog.png.

  2. Write the command to search the filesystem for any file containing the text confidential.

  3. Write the command to search the filesystem for any file containing the text secret or confidential and copy the file to your current working directory.

  4. Write the command to execute ls -R / on the remote system 192.168.10.32 and write the output to a file named filelist.txt on your local system.

  5. Modify getlocal.sh to automatically upload the results to a specified server by using SCP.

  6. Modify hashsearch.sh to have an option (-1) to quit after finding a match. If the option is not specified, it will keep searching for additional matches.

  7. Modify hashsearch.sh to simplify the full pathname that it prints out:

    1. If the string it outputs is /home/usr07/subdir/./misc/x.data, modify it to remove the redundant ./ before printing it out.

    2. If the string is /home/usr/07/subdir/../misc/x.data, modify it to remove the ../ and also the subdir/ before printing it out.

  8. Modify winlogs.sh to indicate its progress by printing the logfile name over the top of the previous logfile name. (Hint: Use a return character rather than a newline.)

  9. Modify winlogs.sh to show a simple progress bar of plus signs building from left to right. Use a separate invocation of wevtutil el to get the count of the number of logs and scale this to, say, a width of 60.

  10. Modify winlogs.sh to tidy up; that is, to remove the extracted logfiles (the .evtx files) after it has tar’d them up. There are two very different ways to do this.

Visit the Cybersecurity Ops website for additional resources and the answers to these questions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset