Data is the lifeblood of nearly every defensive security operation. Data tells you the current state of the system, what has happened in the past, and even what might happen in the future. Data is needed for forensic investigations, verifying compliance, and detecting malicious activity. Table 5-1 describes data that is commonly relevant to defensive operations and where it is typically located.
Throughout this chapter, we explore various methods to gather data, locally and remotely, from both Linux and Windows systems.
We introduce cut
, file
, head
, and for Windows systems reg
and wevtutil
, to select and gather data of interest from local and remote systems.
cut
is a command used to extract select portions of a file. It reads a supplied input file line by line and parses the line based on a specified delimiter. If no delimiter is specified, cut
will use a tab character by default. The delimiter characters divide each line of a file into fields. You can use either the field number or character position number to extract parts of the file. Fields and characters start at position 1.
The cutfile.txt is used to demonstrate the cut
command. The file consists of two lines each, with three columns of data, as shown in Example 5-1.
12/05/2017 192.168.10.14 test.html 12/30/2017 192.168.10.185 login.html
In cutfile.txt. each field is delimited using a space. To extract the IP address (field position 2), you can use the following command:
$ cut -d' ' -f2 cutfile.txt 192.168.10.14 192.168.10.185
The -d' '
option specifies the space as the field delimiter. The -f2
option tells cut
to return the second field, in this case, the IP address.
The cut
command considers each delimiter character as separating a field. It doesn’t collapse whitespace. Consider the following example:
Pat 25 Pete 12
If we use cut
on this file, we would define the delimiter to be a space. In the first record there are three spaces between the name (Pat) and the number (25). Thus, the number is in field 4. However, for the next line, the name (Pete) is in field 3, since there are only two space characters between the name and the number. For a data file like this, it would be better to separate the name from the numbers with a single tab character and use that as the delimiter for cut
.
The file
command is used to help identify a given file’s type. This is particularly useful in Linux, as most files are not required to have an extension that can be used to identify its type (unlike Windows, which uses extensions such as .exe). The file
command looks deeper than the filename by reading and analyzing the first block of data, also known as the magic number. Even if you rename a .png image file to end with .jpg, the file
command is smart enough to figure that out and tell you the correct file type (in this case, a PNG image file).
Wevtutil
is a command-line utility used to view and manage system logs in the Windows environment. It is available in most modern versions of Windows and is callable from Git Bash.
To list all of the available logs:
wevtutil el
To view the most recent event in the System
log via Git Bash:
wevtutil qe System //c:1 //rd:true
For additional information about the wevtutil
command, see Microsoft’s documentation.
One of the first steps in defending a system is understanding the state of the system and what it is doing. To accomplish this, you need to gather data, either locally or remotely, for analysis.
The data you want may not always be available locally. You may need to connect to a remote system such as a web, File Transfer Protocol (FTP), or SSH server to obtain the desired data.
Commands can be executed remotely and securely by using SSH if the remote system is running the SSH service. In its basic form (no options), you can just add ssh
and a hostname in front of any shell command to run that command on the specified host. For example, ssh myserver who
will run the who
command on the remote machine myserver
. If you need to specify a different username, ssh username@myserver who
or ssh -l username myserver who
both do the same thing. Just replace username
with the username you would like to use to log in. You can redirect the output to a file on your local system, or to a file on the remote system.
To run a command on a remote system and redirect the output to a file on your local system:
ssh myserver ps > /tmp/ps.out
To run a command on a remote system and redirect the output to a file on the remote system:
ssh myserver ps >
/tmp/ps.out
The backslash will escape the special meaning of the redirect (in the current shell) and simply pass the redirect character as the second word of the three words sent to myserver
. When executed on the remote system, it will be interpreted by that shell and redirect the output on the remote machine (myserver
) and leave it there.
In addition, you can take scripts that reside on your local system and run them on a remote system using SSH. You’d use this command to run the osdetect.sh script remotely:
ssh myserver bash < ./osdetect.sh
This runs the bash
command on the remote system, but passes into it the lines of the osdetect.sh script directly from your local system. This avoids the need for a two-step process of, first, transferring the script to the remote system and, then, running that copied script. Output from running the script comes back to your local system and can be captured by redirecting stdout, as we have shown with many other commands.
Logfiles for a Linux system are normally stored in the /var/log/ directory. To easily collect the logfiles into a single file, use the tar
command:
tar -czf ${HOSTNAME}_logs.tar.gz /var/log/
The option -c
is used to create an archive file, -z
to zip the file, and -f
to specify a name for the output file. The HOSTNAME
variable is a bash variable that is automatically set by the shell to the name of the current host. We include it in our filename so the output file will be given the same name as the system, which will help later with organization if logs are collected from multiple systems. Note that you will need to be logged in as a privileged user or use sudo
in order to successfully copy the logfiles.
Table 5-2 lists some important and common Linux logs and their standard locations.
Log location | Description |
---|---|
/var/log/apache2/ |
Access and error logs for the Apache web server |
/var/log/auth.log |
Information on user logins, privileged access, and remote authentication |
/var/log/kern.log |
Kernel logs |
/var/log/messages |
General noncritical system information |
/var/log/syslog |
General system logs |
To find more information on where logfiles are being stored for a given system, refer to /etc/syslog.conf or /etc/rsyslog.conf on most Linux distributions.
In the Windows environment, wevtutil
can be used to manipulate and gather logfiles. Luckily, this command is callable from Git Bash. The winlogs.sh script, shown in Example 5-2, uses the wevtutil el
parameter to list all available logs, and then the epl
parameter to export each log to a file.
#!/bin/bash -
#
# Cybersecurity Ops with bash
# winlogs.sh
#
# Description:
# Gather copies of Windows log files
#
# Usage:
# winlogs.sh [-z]
# -z Tar and zip the output
#
TGZ
=
0
if
(
(
$#
>
0
)
)
then
if
[
[
${
1
:
0
:
2
}
=
=
'-z'
]
]
then
TGZ
=
1
# tgz flag to tar/zip the log files
shift
fi
fi
SYSNAM
=
$(
hostname
)
LOGDIR
=
${
1
:-
/tmp/
${
SYSNAM
}
_logs
}
mkdir
-p
$LOGDIR
cd
${
LOGDIR
}
||
exit
-2
wevtutil
el
|
while
read
ALOG
do
ALOG
=
"
${
ALOG
%
$' '
}
"
echo
"
${
ALOG
}
:
"
SAFNAM
=
"
${
ALOG
// /_
}
"
SAFNAM
=
"
${
SAFNAM
//
/
/-
}
"
wevtutil
epl
"
$ALOG
"
"
${
SYSNAM
}
_
${
SAFNAM
}
.evtx
"
done
if
(
(
TGZ
=
=
1
)
)
then
tar
-czvf
${
SYSNAM
}
_logs.tgz
*.evtx
fi
The script begins with a simple initialization and then an if
statement, one that checks to see whether any arguments were provided to the script. The $#
is a special shell variable whose value is the number of arguments supplied on the command line when this script is invoked. This conditional for the if
is an arithmetic expression, because of the double parentheses. Therefore, the comparison can use the greater-than character (>
) and it will do a numerical comparison. If that symbol is used in an if
expression with square brackets rather than double parentheses, >
does a comparison of lexical ordering—alphabetical order. You would need to use -gt
for a numerical comparison inside square brackets.
For this script, the only argument we are supporting is a -z
option to indicate that the logfiles should all be zipped up into a single TAR file when it’s done collecting logfiles. This also means that we can use a simple type of argument parsing. We will use a more sophisticated argument parser (getopts
) in an upcoming script.
This check takes a substring of the first argument ($1
) starting at the beginning of the string (an offset of 0 bytes), 2 bytes long. If the argument is, in fact, a -z
, we will set a flag. The script also does a shift
to remove that argument. What was the second argument, if any, is now the first. The third, if any, becomes the second, and so on.
If the user wants to specify a location for the logs, it can be specified as an argument to the script. The optional -z
argument, if supplied, has already been shift
-ed out of the way, so any user-supplied path would now be the first argument. If no value was supplied on the command line, the expression inside the braces will return a default value as indicated to the right of the minus sign. We use the braces around SYSTEM
because the _logs
would otherwise be considered part of the variable name.
The -p
option to mkdir
will create the directory and any intervening directories. It will also not give an error message if the directory exists. On the next line, we invoke cd
to make that directory the current directory, where the logfiles will be saved; if the cd
should fail, the program will exit with an error code.
Here we invoke wevtutil el
to list all the possible logfiles.
The output is piped into a while
loop that will read one line (one log filename) at a time.
Since this is running on a Windows system, each line printed by wevtutil
will end with both a newline (
) and a return (
) character. We remove the character from the right side of the string by using the %
operator. To specify the (nonprinting) return character, we use the $'string'
construct, which substitutes certain backslash-escaped characters with nonprinting characters (as defined in the ANSI C standard). So the two characters of
are replaced with an ASCII 13 character, the return character.
We echo the filename to provide an indication to the user of progress being made and which log is currently being fetched.
To create the filename into which we want wevtutil
to store its output (the logfile), we make two edits to the name. First, since the name of the log as provided may have blanks, we replace any blank with an underscore character. While not strictly necessary, the underscore avoids the need for quotes when using the filename. The syntax, in general, is ${VAR/old/new}
to retrieve the value of VAR
with a substitution: replacing old
with new
. Using a double slash, ${VAR//old/new}
replaces all occurrences, not just the first.
A common mistake is to type ${VAR/old/new/}
, but the trailing slash is not part of the syntax and will simply be added to the resulting string if a substitution is made. For example, if VAR=embolden
then ${VAR/old/new/}
would return embnew/en
.
Second, some Windows logfile names have a slash character in them. In bash, however, the /
is the separator between directories when used in a pathname. It shouldn’t be used in a filename, so we make another substitution using the ${VAR/old/new}
syntax, to replace any /
with a -
character. Notice, though, that we have to “escape” the meaning of the /
in our substitution so that bash doesn’t think it’s part of the substitution syntax. We use /
to indicate that we want a literal slash.
This is another arithmetic expression, enclosed in double parentheses. Within those expressions, bash doesn’t require the $
in front of most variable names. It would still be needed for positional parameters like $1
to avoid confusion with the integer 1.
Here we use tar
to gather all the .evtx
files into one archive. We use the -z
option to compress the data, but we don’t use the -v
option so that tar
does its work silently (since our script already echoed the filenames as it extracted them).
The script runs in a subshell, so although we have changed directories inside the script, once the script exits, we are back in the directory where we started. If we needed to be back in the original directory inside the script, we could use the cd -
command to return to the previous directory.
If you are able to arbitrarily execute commands on a system, you can use standard OS commands to collect a variety of information about the system. The exact commands you use will vary based on the operating system you are interfacing with. Table 5-3 shows common commands that can yield a great deal of information from a system. Note that the command may be different depending on whether it is run within the Linux or Windows environment.
Linux command | Windows Git Bash equivalent | Purpose |
---|---|---|
|
|
Operating system version information |
|
|
Display system hardware and related info |
|
|
Network interface information |
|
|
Display routing table |
|
|
Display Address Resolution Protocol (ARP) table |
|
|
Display network connections |
|
|
Display filesystems |
|
|
Display running processes |
The script getlocal.sh, shown in Example 5-3, is designed to identify the operating system type using osdetect.sh, run the various commands appropriate for the operating system type, and record the results to a file.
The output from each command is stored in Extensible Markup Language (XML) format, i.e., delimited with XML tags, for easier processing later. Invoke the script like this: bash getlocal.sh < cmds.txt
, where the file cmds.txt contains a list of commands similar to that shown in Table 5-3. The format it expects are those fields, separated by vertical bars, plus an additional field, the XML tag with which to mark the output of the command. (Also, lines beginning with a #
are considered comments and will be ignored.)
Here is what a cmds.txt file might look like:
# Linux Command |MSWin Bash |XML tag |Purpose #----------------+------------+-----------+------------------------------ uname -a |uname -a |uname |O.S. version etc cat /proc/cpuinfo|systeminfo |sysinfo |system hardware and related info ifconfig |ipconfig |nwinterface|Network interface information route |route print |nwroute |routing table arp -a |arp -a |nwarp |ARP table netstat -a |netstat -a |netstat |network connections mount |net share |diskinfo |mounted disks ps -e |tasklist |processes |running processes
Example 5-3 shows the source for the script.
#!/bin/bash -
#
# Cybersecurity Ops with bash
# getlocal.sh
#
# Description:
# Gathers general system information and dumps it to a file
#
# Usage:
# bash getlocal.sh < cmds.txt
# cmds.txt is a file with list of commands to run
#
# SepCmds - separate the commands from the line of input
function
SepCmds
(
)
{
LCMD
=
${
ALINE
%%|*
}
REST
=
${
ALINE
#*|
}
WCMD
=
${
REST
%%|*
}
REST
=
${
REST
#*|
}
TAG
=
${
REST
%%|*
}
if
[
[
$OSTYPE
=
=
"MSWin"
]
]
then
CMD
=
"
$WCMD
"
else
CMD
=
"
$LCMD
"
fi
}
function
DumpInfo
(
)
{
printf
'<systeminfo host="%s" type="%s"'
"
$HOSTNAME
"
"
$OSTYPE
"
printf
' date="%s" time="%s"> '
"
$(
date
'+%F'
)
"
"
$(
date
'+%T'
)
"
readarray
CMDS
for
ALINE
in
"
${
CMDS
[@]
}
"
do
# ignore comments
if
[
[
${
ALINE
:
0
:
1
}
=
=
'#'
]
]
;
then
continue
;
fi
SepCmds
if
[
[
${
CMD
:
0
:
3
}
=
=
N/A
]
]
then
continue
else
printf
"<%s> "
$TAG
$CMD
printf
"</%s> "
$TAG
fi
done
printf
"</systeminfo> "
}
OSTYPE
=
$(
./osdetect.sh
)
HOSTNM
=
$(
hostname
)
TMPFILE
=
"
${
HOSTNM
}
.info
"
# gather the info into the tmp file; errors, too
DumpInfo
>
$TMPFILE
2>
&
1
After the two function definitions the script begins here, invoking our osdetect.sh script (from Chapter 2). We’ve specified the current directory as its location. You could put it elsewhere, but then be sure to change the specified path from ./
to wherever you put it and/or add that location to your PATH
variable.
To make things more efficient, you can include the code from osdetect.sh directly in getlocal.sh.
Next, we run the hostname
program in a subshell to retrieve the name of this system for use in the next line but also later in the DumpInfo
function.
We use the hostname as part of the temporary filename where we will put all our output.
Here is where we invoke the function that will do most of the work of this script. We redirect both stdout and stderr (to the same file) when invoking the function so that the function doesn’t have to put redirects on any of its output statements; it can write to stdout, and this invocation will redirect all the output as needed. Another way to do this is to put the redirect on the closing brace of the DumpInfo
function definition. Redirecting stdout might instead be left to the user who invokes this script; it would simply write to stdout by default. But if the user wants the output in a file, the user has to create a tempfile name and has to remember to redirect stderr as well. Our approach is suitable for a less experienced user.
Here is where the “guts” of the script begins. This function begins with output of an XML tag called <systeminfo>
, which will have its closing tag written out at the end of this function.
The readarray
command in bash will read all the lines of input (until end-of-file or on keyboard input until Ctrl-D). Each line will be its own entry in the array named, in this case, CMDS
.
This for
loop will loop over the values of the CMDS
array—over each line, one at a time.
This line uses the substring operation to take the character at position 0, of length 1, from the variable ALINE
. The hashtag (#
), or pound sign, is in quotes so that the shell doesn’t interpret it as the start of the script’s own comment.
If the line is not a comment, the script will call the SepCmds
function. More about that function later; it separates the line of input into CMD
and TAG
, where CMD
will be the appropriate command for a Linux or Windows system, depending on where we run the script.
Here, again, we use the substring operation from the start of the string (position 0) of length 3 to look for the string that indicates there is no appropriate operation on this particular operating system for the desired information. The continue
statement tells bash to skip to the next iteration of the loop.
If we do have an appropriate action to take, this section of code will print the specified XML tag on either side of the invocation of the specified command. Notice that we invoke the command by retrieving the value of the variable CMD
.
Here we isolate the Linux command from a line of our input file by removing all the characters to the right of the vertical bar, including the bar itself. The %%
says to make the longest match possible on the right side of the variable’s value and remove it from the value it returns (i.e., ALINE
isn’t changed).
Here the #
removes the shortest match and from the left side of the variable’s value. Thus, it removes the Linux command that was just put in LCMD
.
Again, we remove everything to the right of the vertical bar, but this time we are working with REST, modified in the previous statement. This gives us the MSWindows
command.
Here we extract the XML tag by using the same substitution operations we’ve seen twice already.
All that’s left in this function is the decision, based on the operating system type, as to which value to return as the value in CMD
. All variables are global unless explicitly declared as local within a function. None of ours are local, so they can be used (set, changed, or used) throughout the script.
When running this script, you can use the cmds.txt file as shown or change its values to get whatever set of information you want to collect. You can also run it without redirecting the input from a file; simply type (or copy/paste) the input after the script is invoked.
The Windows Registry is a vast repository of settings that define how the system and applications will behave. Specific registry key values can often be used to identify the presence of malware and other intrusions. Therefore, a copy of the registry is useful when later performing analysis of the system.
To export the entire Windows Registry to a file using Git Bash:
regedit //E ${HOSTNAME}_reg.bak
Note that two forward slashes are used before the E
option because we are calling regedit
from Git Bash; only one would be needed if using the Windows Command Prompt. We use ${HOSTNAME}
as part of the output filename to make it easier to organize later.
If needed, the reg
command can also be used to export sections of the registry or individual subkeys. To export the HKEY_LOCAL_MACHINE
hive using Git Bash:
reg export HKEY_LOCAL_MACHINE $(HOSTNAME)_hklm.bak
The ability to search the system is critical for everything from organizing files, to incident response, to forensic investigation. The find
and grep
commands are extremely powerful and can be used to perform a variety of search functions.
Searching by filename is one of the most basic search methods. This is useful if the exact filename is known, or a portion of the filename is known. To search the Linux /home directory and subdirectories for filenames containing the word password:
find /home -name '*password*'
Note that the use of the *
character at the beginning and end of the search string designates a wildcard, meaning it will match any (or no) characters. This is a shell pattern and is not the same as a regular expression. Additionally, you can use the -iname
option instead of -name
to make the search case-insensitive.
To perform a similar search on a Windows system using Git Bash, simply replace /home
with /c/Users
.
Hidden files are often interesting as they can be used by people or malware looking to avoid detection. In Linux, names of hidden files begin with a period. To find hidden files in the /home directory and subdirectories:
find /home -name '.*'
The .*
in the preceding example is a shell pattern, which is not the same as a regular expression. In the context of find
, the “dot-star” pattern will match on any file that begins with a period and is followed by any number of additional characters (denoted by the *
wildcard character).
In Windows, hidden files are designated by a file attribute, not the filename. From the Windows Command Prompt, you can identify hidden files on the c: drive as follows:
dir c: /S /A:H
The /S
option tells dir
to recursively traverse subdirectories, and the /A:H
displays files with the hidden attribute. Unfortunately, Git Bash intercepts the dir
command and instead executes ls
, which means it cannot easily be run from bash. This can be solved by using the find
command’s -exec
option coupled with the Windows attrib
command.
The find
command has the ability to run a specified command for each file that is found. To do that, you can use the exec
option after specifying your search criteria. Exec
replaces any curly braces ({}
) with the pathname of the file that was found. The semicolon terminates the command expression:
$ find /c -exec attrib '{}' ; | egrep '^.{4}H.*' A H C:UsersBobscriptshist.txt A HR C:UsersBobscriptswinlogs.sh
The find
command will execute the Windows attrib
command for each file it identifies on the c: drive (denoted as /c
), thereby printing out each file’s attributes. The egrep
command is then used with a regular expression to identify lines where the fifth character is the letter H, which will be true if the file’s hidden attribute is set.
If you want to clean up the output further and display only the file path, you can do so by piping the output of egrep
into the cut
command:
$ find . -exec attrib '{}' ; | egrep '^.{4}H.*' | cut -c22- C:UsersBobscriptshist.txt C:UsersBobscriptswinlogs.sh
The -c
option tells cut
to use character position numbers for slicing. 22-
tells cut
to begin at character 22
, which is the beginning of the file path, and continue to the end of the line (-
). This can be useful if you want to pipe the file path into another command for further processing.
The find
command’s -size
option can be used to find files based on file size. This can be useful to help identify unusually large files, or to identify the largest or smallest files on a system.
To search for files greater than 5 GB in size in the /home directory and subdirectories:
find /home -size +5G
To identify the largest files in the system, you can combine find
with a few other commands:
find / -type f -exec ls -s '{}' ; | sort -n -r | head -5
First, we use find / -type f
to list all of the files in and under the root directory. Each file is passed to ls -s
, which will identify its size in blocks (not bytes). The list is then sorted from highest to lowest, and the top five are displayed using head
. To see the smallest files in the system, tail
can be used in place of head
, or you can remove the reverse (-r
) option from sort
.
In the shell, you can use !!
to represent the last command that was executed. You can use it to execute a command again, or include it in a series of piped commands. For example, suppose you just ran the following command:
find / -type f -exec ls -s '{}' ;
You can then use !!
to run that command again or feed it into a pipeline:
!! | sort -n -r | head -5
The shell will automatically replace !!
with the last command that was executed. Give it a try!
You can also use the ls
command directly to find the largest file and completely eliminate the use of find
, which is significantly more efficient. To do that, just add the -R
option for ls
, which will cause it to recursively list the files under the specified directory:
ls / -R -s | sort -n -r | head -5
The filesystem can also be searched based on when files were last accessed or modified. This can be useful when investigating incidents to identify recent system activity. It can also be useful for malware analysis, to identify files that have been accessed or modified during program execution.
To search for files in the /home directory and subdirectories modified less than 5 minutes ago:
find /home -mmin -5
To search for files modified less than 24 hours ago:
find /home -mtime -1
The number specified with the mtime
option is a multiple of 24 hours, so 1 means 24 hours, 2 means 48 hours, etc. A negative number here means “less than” the number specified, a positive number means “greater than,” and an unsigned number means “exactly.”
To search for files modified more than 2 days (48 hours) ago:
find /home -mtime +2
To search for files accessed less than 24 hours ago, use the -atime
option:
find /home -atime -1
To search for files in the /home directory accessed less than 24 hours ago and copy (cp
) each file to the current working directory (./
):
find /home -type f -atime -1 -exec cp '{}' ./ ;
The use of -type f
tells find
to match only ordinary files, ignoring directories and other special file types. You may also copy the files to any directory of your choosing by replacing the ./
with an absolute or relative path.
Be sure that your current working directory is not somewhere in the /home hierarchy, or you will have the copies found and thus copied again.
The grep
command can be used to search for content inside files. To search for files in the /home
directory and subdirectories that contain the string password:
grep -i -r /home -e 'password'
The -r
option recursively searches all directories below /home
, -i
specifies a case-insensitive search, and -e
specifies the regex pattern string to search for.
The -n
option can be used identify which line in the file contains the search string, and -w
can be used to match only whole words.
You can combine grep
with find
to easily copy matching files to your current working directory (or any specified directory):
find /home -type f -exec grep 'password' '{}' ; -exec cp '{}' . ;
First, we use find /home/ -type f
to identify all of the files in and below the /home directory. Each file found is passed to grep
to search for password within its content. Each file matching the grep
criteria is then passed to the cp
command to copy the file to the current directory (indicated by the dot). This combination of commands may take a considerable amount of time to execute and is a good candidate to run as a background task.
Searching a system for specific file types can be challenging. You cannot rely on the file extension, if one even exists, as that can be manipulated by the user. Thankfully, the file
command can help identify types by comparing the contents of a file to known patterns called magic numbers. Table 5-4 lists common magic numbers and their starting locations inside files.
File type | Magic number pattern (hex) | Magic number pattern (ASCII) | File offset (bytes) |
---|---|---|---|
JPEG |
FF D8 FF DB |
ÿØÿÛ |
0 |
DOS executable |
4D 5A |
MZ |
0 |
Executable and linkable format |
7F 45 4C 46 |
.ELF |
0 |
Zip file |
50 4B 03 04 |
PK.. |
0 |
To begin, you need to identify the type of file for which you want to search. Let’s assume you want to find all PNG image files on the system. First, you would take a known-good file such as Title.png, run it through the file
command, and examine the output:
$ file Title.png Title.png: PNG image data, 366 x 84, 8-bit/color RGBA, non-interlaced
As expected, file
identifies the known-good Title.png file as PNG image data and also provides the dimensions and various other attributes. Based on this information, you need to determine what part of the file
command output to use for the search, and generate the appropriate regular expression. In many cases, such as with forensic discovery, you are likely better off gathering more information than less; you can always further filter the data later. To do that, you will use a very broad regular expression that will simply search for the word PNG
in the output from the file
command 'PNG'
.
You can, of course, make more-advanced regular expressions to identify specific files. For example, if you wanted to find PNG files with dimensions of 100 × 100:
'PNG.*100x100'
If you want to find PNG and JPEG files:
'(PNG|JPEG)'
Once you have the regular expression, you can write a script to run the file
command against every file on the system looking for a match. When a match is found, typesearch.sh, shown in Example 5-4, will print the file path to standard output.
#!/bin/bash -
#
# Cybersecurity Ops with bash
# typesearch.sh
#
# Description:
# Search the file system for a given file type. It prints out the
# pathname when found.
#
# Usage:
# typesearch.sh [-c dir] [-i] [-R|r] <pattern> <path>
# -c Copy files found to dir
# -i Ignore case
# -R|r Recursively search subdirectories
# <pattern> File type pattern to search for
# <path> Path to start search
#
DEEPORNOT
=
"-maxdepth 1"
# just the current dir; default
# PARSE option arguments:
while
getopts
'c:irR'
opt
;
do
case
"
${
opt
}
"
in
c
)
# copy found files to specified directory
COPY
=
YES
DESTDIR
=
"
$OPTARG
"
;
;
i
)
# ignore u/l case differences in search
CASEMATCH
=
'-i'
;
;
[
Rr
]
)
# recursive
unset
DEEPORNOT
;
;
*
)
# unknown/unsupported option
# error mesg will come from getopts, so just exit
exit
2
;
;
esac
done
shift
$((
OPTIND
-
1
))
PATTERN
=
${
1
:-
document
}
STARTDIR
=
${
2
:-
.
}
# by default start here
find
$STARTDIR
$DEEPORNOT
-type
f
|
while
read
FN
do
file
$FN
|
egrep
-q
$CASEMATCH
"
$PATTERN
"
if
(
(
$?
=
=
0
)
)
# found one
then
echo
$FN
if
[
[
$COPY
]
]
then
cp
-p
$FN
$DESTDIR
fi
fi
done
This script supports options that alter its behavior, as described in the opening comments of the script. The script needs to parse these options to tell which ones have been provided and which are omitted. For anything more than a single option or two, it makes sense to use the getopts
shell built-in. With the while
loop, we will keep calling getopts
until it returns a nonzero value, telling us that there are no more options. The options we want to look for are provided in that string c:irR
. Whichever option is found is returned in opt
, the variable name we supplied.
We are using a case statement here that is a multiway branch; it will take the branch that matches the pattern provided before the left parenthesis. We could have used an if/elif/else
construct, but this reads well and makes the options so clearly visible.
The c
option has a colon (:
) after it in the list of supported options, which indicates to getopts
that the user will also supply an argument for that option. For this script, that optional argument is the directory into which copies will be made. When getopts
parses an option with an argument like this, it puts the argument in the variable named OPTARG
, and we save it in DESTDIR
because another call to getopts
may change OPTARG
.
The script supports either an uppercase R
or lowercase r
for this option.
Case statements specify a pattern to be matched, not just a simple literal, so we wrote [Rr])
for this case, using the brackets construct to indicate that either letter is considered a match.
The other options set variables to cause their action to occur. In this case, we unset
the previously set variable. When that variable is referenced later as $DEEPORNOT
, it will have no value, so it will effectively disappear from the command line where it is used.
Here is another pattern, *
, which matches anything. If no other pattern has been matched, this case will be executed. It is, in effect, an “else” clause for the case
statement.
When we’re done parsing the options, we can get rid of the ones we’ve already processed with a shift
. Just a single shift
gets rid of a single argument so that the second argument becomes the first, the third becomes the second, and so on. Specifying a number like shift 5
will get rid of the first five arguments so that $6
becomes $1
, $7
becomes $2
, and so on. Calls to getopts
keep track of which arguments to process in the shell variable OPTIND
. It refers to the next argument to be processed. By shifting by this amount, we get rid of any/all of the options that we parsed. After this shift, $1
will refer to the first nonoption argument, whether or not any options were supplied when the user invoked the script.
The two possible arguments that aren’t in -option
format are the pattern we’re searching for and the directory where we want to start our search. When we refer to a bash variable, we can add a :-
to say, “If that value is empty or unset, return this default value instead.” We give a default value for PATTERN
as PDF document
, and the default for STARTDIR
is .
, which refers to the current directory.
We invoke the find
command, telling it to start its search in $STARTDIR
. Remember that $DEEPORNOT
may be unset and thus add nothing to the command line, or it may be the default -maxdepth 1
, telling find
not to go any deeper than this directory. We’ve added a -type f
so that we find only plain files (not directories or special device files or FIFOs). That isn’t strictly necessary, and you could remove it if you want to be able to search for those kinds of files. The names of the files found are piped in to the while
loop, which will read them one at a time into the variable FN
.
The -q
option to egrep
tells it to be quiet and not output anything. We don’t need to see what phrase it found, only that it found it.
The $?
construct is the value returned by the previous command. A successful result means that egrep
found the pattern supplied.
This checks to see whether COPY
has a value. If it is null the if
will be false.
The -p
option to the cp
command will preserve the mode, ownership, and timestamps of the file, in case that information is important to your analysis.
If you are looking for a lighter-weight but less-capable solution, you can perform a similar search using the find
command’s exec
option as shown in this example:
find / -type f -exec file '{}' ; | egrep 'PNG' | cut -d' ' -f1
Here we send each item found by the find
command into file
to identify its type. We then pipe the output of file
into egrep
and filter it, looking for the PNG
keyword. The use of cut
is simply to clean up the output and make it more readable.
Be cautious if using the file
command on an untrusted system. The file
command uses the magic pattern file located at /usr/share/misc/. A malicious user could modify this file such that certain file types would not be identified. A better option is to mount the suspect drive to a known-good system and search from there.
A cryptographic hash function is a one-way function that transforms an input message of arbitrary length into a fixed-length message digest. Common hash algorithms include MD5, SHA-1, and SHA-256. Consider the two files in Examples 5-5 and 5-6.
This is hash file A
This is hash file B
Notice that the files are identical except for the last letter in the sentence. You can use the sha1sum
command to compute the SHA-1 message digest of each file:
$ sha1sum hashfilea.txt hashfileb.txt 6a07fe595f9b5b717ed7daf97b360ab231e7bbe8 *hashfilea.txt 2959e3362166c89b38d900661f5265226331782b *hashfileb..txt
Even though there is only a small difference between the two files, they generated completely different message digests. Had the files been the same, the message digests would have also been the same. You can use this property of hashing to search the system for a specific file if you know its digest. The advantage is that the search will not be influenced by the filename, location, or any other attributes; the disadvantage is that the files need to be exactly the same. If the file contents have changed in any way, the search will fail. The script hashsearch.sh, shown in Example 5-7, recursively searches the system, starting at the location provided by the user. It performs a SHA-1 hash of each file that is found and then compares the digest to the value provided by the user. If a match is found, the script outputs the file path.
#!/bin/bash -
#
# Cybersecurity Ops with bash
# hashsearch.sh
#
# Description:
# Recursively search a given directory for a file that
# matches a given SHA-1 hash
#
# Usage:
# hashsearch.sh <hash> <directory>
# hash - SHA-1 hash value to file to find
# directory - Top directory to start search
#
HASH
=
$1
DIR
=
${
2
:-
.
}
# default is here, cwd
# convert pathname into an absolute path
function
mkabspath
(
)
{
if
[
[
$1
=
=
/*
]
]
then
ABS
=
$1
else
ABS
=
"
$PWD
/
$1
"
fi
}
find
$DIR
-type
f
|
while
read
fn
do
THISONE
=
$(
sha1sum
"
$fn
"
)
THISONE
=
${
THISONE
%% *
}
if
[
[
$THISONE
=
=
$HASH
]
]
then
mkabspath
"
$fn
"
echo
$ABS
fi
done
We’ll look for any plain file for our hash. We need to avoid special files; reading a FIFO would cause our program to hang as it waited for someone to write into the FIFO. Reading a block special or character special file would also not be a good idea. The -type f
ensures that we get only plain files. It prints those filenames, one per line, to stdout, which we redirect via a pipe into the while read
commands.
This computes the hash value in a subshell and captures its output (i.e., whatever it writes to stdout) and assigns it to the variable. The quotes are needed in case the filename has spaces in its name.
This reassignment removes from the righthand side the largest substring beginning with a space. The output from sha1sum
is both the computed hash and the filename. We want only the hash value, so we remove the filename with this substitution.
We call the mkabspath
function, putting the filename in quotes. The quotes make sure that the entire filename shows up as a single argument to the function, even if the filename has one or more spaces in the name.
Remember that shell variables are global unless declared to be local within a function. Therefore, the value of ABS
that was set in the call to mkabspath
is available to us here.
This is our declaration of the function. When declaring a function, you can omit either the keyword function
or the parentheses, but not both.
For the comparison, we are using shell pattern matching on the righthand side. This will check whether the first parameter begins with a slash. If it does, this is already an absolute pathname and we need do nothing further.
When the parameter is only a relative path, it is relative to the current location, so we prepend the current working directory, thereby making it absolute. The variable PWD
is a shell variable that is set to the current directory via the cd
command.
Once you have gathered all of the desired data, the next step is to move it off the origin system for further analysis. To do that, you can copy the data to a removable device or upload it to a centralized server. If you are going to upload the data, be sure to do so using a secure method such as Secure Copy (SCP). The following example uses scp
to upload the file some_system.tar.gz to the home directory of user bob
on remote system 10.0.0.45
:
scp some_system.tar.gz [email protected]:/home/bob/some_system.tar.gz
For convenience, you can add a line at the end of your collection scripts to automatically use scp
to upload data to a specified host. Remember to give your files unique names, so as to not overwrite existing files as well as to make analysis easier later.
Gathering data is an important step in defensive security operations. When collecting data, be sure to transfer and store it by using secure (i.e., encrypted) methods. As a general rule, gather all data that you think is relevant; you can easily delete data later, but you cannot analyze data you did not collect. Before collecting data, first confirm that you have permission and/or legal authority to do so.
Also be aware that when dealing with adversaries, they will often try to hide their presence by deleting or obfuscating data. To counter that, be sure to use multiple methods when searching for files (name, hash, contents, etc.).
In the next chapter, we explore techniques for processing data and preparing it for analysis.
Write the command to search the filesystem for any file named dog.png.
Write the command to search the filesystem for any file containing the text confidential.
Write the command to search the filesystem for any file containing the text secret or confidential and copy the file to your current working directory.
Write the command to execute ls -R /
on the remote system 192.168.10.32
and write the output to a file named filelist.txt on your local system.
Modify getlocal.sh to automatically upload the results to a specified server by using SCP.
Modify hashsearch.sh to have an option (-1
) to quit after finding a match. If the option is not specified, it will keep searching for additional matches.
Modify hashsearch.sh to simplify the full pathname that it prints out:
If the string it outputs is /home/usr07/subdir/./misc/x.data
, modify it to remove the redundant ./
before printing it out.
If the string is /home/usr/07/subdir/../misc/x.data
, modify it to remove the ../
and also the subdir/
before printing it out.
Modify winlogs.sh to indicate its progress by printing the logfile name over the top of the previous logfile name. (Hint: Use a return character rather than a newline.)
Modify winlogs.sh to show a simple progress bar of plus signs building from left to right. Use a separate invocation of wevtutil el
to get the count of the number of logs and scale this to, say, a width of 60.
Modify winlogs.sh to tidy up; that is, to remove the extracted logfiles (the .evtx
files) after it has tar’d them up. There are two very different ways to do this.
Visit the Cybersecurity Ops website for additional resources and the answers to these questions.