Chapter 5. Using Python for Virtualization Forensics

Currently, virtualization is one of the most trending concepts of modern IT. For forensic analysis, it introduces new challenges as well as new techniques.

In this chapter, we will show how virtualization introduces the following:

  • New attack vectors
  • New chances of gathering evidence
  • New targets for forensic analysis such as the virtualization layer
  • New sources for forensic data

Considering virtualization as a new attack surface

Before we start with a forensic analysis, it is important to understand what to look for. With virtualization, there are new attack vectors and scenarios that are introduced. In the following sections, we will describe some of the scenarios and how to look for the corresponding evidence.

Virtualization as an additional layer of abstraction

Virtualization is the technique of emulating IT systems such as servers, workstations, networks, and storages. The component that is responsible for the emulation of virtual hardware is defined as hypervisor. The following figure depicts the two main types of system virtualization that are used today:

Virtualization as an additional layer of abstraction

The architecture on the left-hand side is called bare-metal hypervisor architecture and is also known as a Type 1 hypervisor. In this architecture, the hypervisor replaces the operating system and runs directly on the bare metal hardware. Examples of Type I hypervisors are VMware ESXi and Microsoft Hyper-V.

The right-hand side of the image depicts an architecture that is usually referred to as desktop virtualization or a Type 2 hypervisor. In this architecture, there is a standard operating system that is running on the hardware, for example, a standard Windows 8 or Linux Desktop system. The hypervisor runs among other native applications directly on this operating system. Some functionality of the hypervisor may directly interact with the underlying hardware, for example, by providing special drivers. For Type 2 hypervisors, the operating system that is running directly on the hardware is called host OS, while the operating system running on a virtual machine is called guest OS. Examples of Type 2 hypervisor architectures are Oracle VirtualBox and VMware Workstation. These hypervisors can be installed just like any other application on an existing operating system.

Note

While Hyper-V seems like Type 2, it actually converts the host OS into just another guest OS during the installation and establishes a Type 1 architecture.

A common feature of almost all virtualization environments is the ability to create snapshots. A snapshot of a virtual system contains a frozen-in-time state of the system. All changes to the system that are happening after the snapshot creation can be undone by the hypervisor to roll back to the point in time when the snapshot was taken. Furthermore, most systems allow having multiple snapshots of a single system and rolling back and forward to arbitrary snapshots. Snapshots can be utilized as a source of forensic data, which we will demonstrate in the Using virtualization as source of evidence section.

Tip

For forensics, snapshots are to be treated like independent machines!

If a virtual system is subject to forensic analysis, always check whether this system is a virtual system and whether there are snapshots. If snapshots exist, the forensic analysis has to be repeated for every single snapshot as if this were an independent virtual machine. The rationale behind this requirement is that it is most likely unknown when the system was compromised, when the attacker tried to destroy evidence, and most importantly, what version of the machine was running during the attack.

Most virtualization environments consist of more than one hypervisor. To ease the management of multiple hypervisors and to enable additional features; for example, moving machines between hypervisors for fail over, load balancing, and save power; these environments provide a central management for all of hypervisors. In the case of VMware vSphere, this management component is called vCenter Server, as follows:

Virtualization as an additional layer of abstraction

If vCenter Server is used, then all administrative tasks are supposed to be handled via this vCenter Server instance.

How does this new hypervisor layer influence attack scenarios and forensics?

The introduction of the new hypervisor layer also introduces a new layer that can be used to manipulate virtual systems without detection and adds another new layer that can be subject to the attacks. In the following sections, we will provide some sample scenarios for attacks that are committed through the hypervisor.

Creation of rogue machines

If an attacker can get access to the hypervisor, he may just create new virtual resources. These resources can act as a bridgehead in the network or just steal memory and compute resources from the environment. Therefore, it is crucial to extract the creation and disposal of virtual resources during a forensic analysis of the hypervisor environment.

Fortunately, every widespread virtualization environment offers APIs and language bindings to enumerate the virtual machines and other virtual resources of the environment. In this chapter, we chose to use VMware vSphere as the prominent example of a virtualization environment.

Note

VMware vSphere is one of the most used virtualization environments for on-premise virtualization. Its basic structure consists of one central management instance called vCenter Server and one or multiple systems that are actually hosting the virtual environment (hypervisors), called ESXi servers. To programmatically control a vSphere environment with Python, pyVmomi is used. This Python SDK is available on Github at https://github.com/vmware/pyvmomi.

In the following, we will use pyVmomi to create a list of all virtual machines. It is recommended to run such inventory scan at regular intervals to compare the list of existing virtual assets with your local inventory database.

We recommend to install pyVmomi using pip:

user@lab:~$ pip install --upgrade pyVmomi

Tip

Sample code for pyVmomi

There is a project on GitHub about a community-provided sample code for pyVmomi. More information about these samples is available on https://vmware.github.io/pyvmomi-community-samples/.

Then, a script as shown in the following may be used to enumerate all systems of the vSphere environment:

#!/usr/bin/env python

from pyVim import connect
from pyVmomi import vmodl
import sys

def print_vm_info(vm):
    """
    Print the information for the given virtual machine.
    If vm is a folder, recurse into that folder.
    """

    # check if this a folder...
    if hasattr(vm, 'childEntity'):
        vms = vm.childEntity
        for child in vms:
            print_vm_info(child)

    vm_info = vm.summary

    print 'Name:      ', vm_info.config.name
    print 'State:     ', vm_info.runtime.powerState
    print 'Path:      ', vm_info.config.vmPathName
    print 'Guest:     ', vm_info.config.guestFullName
    print 'UUID:      ', vm_info.config.instanceUuid
    print 'Bios UUID: ', vm_info.config.uuid
    print "----------
"


if __name__ == '__main__':
    if len(sys.argv) < 5:
        print 'Usage: %s host user password port' % sys.argv[0]
        sys.exit(1)
    
    service = connect.SmartConnect(host=sys.argv[1],
                                   user=sys.argv[2],
                                   pwd=sys.argv[3],
                                   port=int(sys.argv[4]))

    # access the inventory
    content = service.RetrieveContent()
    children = content.rootFolder.childEntity
    
    # iterate over inventory
    for child in children:
        if hasattr(child, 'vmFolder'):
            dc = child
        else:
            # no folder containing virtual machines -> ignore
            continue

        vm_folder = dc.vmFolder
        vm_list = vm_folder.childEntity
        for vm in vm_list:
            print_vm_info(vm)

This script creates a connection to the vCenter Server platform. However, it can also be used to connect to a single ESXi hypervisor instance. This is possible because the API offered to the script is identical for both management variants.

Note

The API used by pyVmomi is the vSphere Web Service API. A detailed description is available in the vSphere Web Services SDK via https://www.vmware.com/support/developer/vc-sdk/.

The highlighted lines show that the script uses recursion to enumerate all virtual machines. This is necessary because in VMware vSphere, virtual machines can be put into nested groups.

Here is a sample call of this script with the output of a single virtual machine:

user@lab:~$ python enumerateVMs.py 192.168.167.26 'readonly' 'mypwd' 443
Name:     vCenterServer
State:      poweredOff
Path:      [datastore1] vCenterServer/vCenterServer.vmx
Guest:     Microsoft Windows Server 2012 (64-bit)
UUID:      522b96ec-7987-a974-98f1-ee8c4199dda4
Bios UUID: 564d8ec9-1b42-d235-a67c-d978c5107179
----------

The output lists the name of the virtual machine, its current state, the path of its configuration file, a hint for the guest operating system, and the unique IDs for the instance and the BIOS configuration. The path information is valuable, especially, because it shows where to find all the virtual machine's configuration and data file.

Cloning of systems

In the previous section, we used the API of the hypervisor to get the forensic data. In this section, we will look for traces of abuse of this API. Therefore, we will analyse the log information of the vSphere installation.

Note

Collect log information on a central log system

In this section, we will assume that the log information is stored with the default settings of the vSphere installation. However, when setting up a system, we recommend to store the log information on a dedicated logging system. This makes it more difficult for an attacker to manipulate system logs as he requires access to not only his target system, but also to the central log collection system. Another advantage of many central log collection systems is the built-in log analysis function.

While a copy of all system logs is highly recommended for a forensically sound analysis, single events can also be reviewed using the event browser of VMware vSphere, as follows:

Cloning of systems

The vSphere environment offers collecting and storing all log files in an archive. Perform the following steps to get an archive of all the available log data:

  • Use the Windows version of vSphere Web Client and log in to the vCenter Server.
  • In the Administration menu, select Export System Logs.
  • Select one or multiple vCenter Servers to export the logs, as shown in the following:
    Cloning of systems
  • When asked to Select System Logs, ensure that all log types are selected, as follows:
    Cloning of systems

The log files are saved as compressed archives. One archive represents the log information of one system, that is, vCenter Server or ESXi host.

First, we will extract the collected log file using tar with a command as follows:

user@lab:~$ tar xfz [email protected]

The filename of this archive follows the format Host/IP—vcsupport (for vCenter Server)—timestamp. The directory in this archive follows the vc-Hostname-Timestamp naming scheme, for example, vc-winserver-2015-07-05--02.19. The timestamps of the archive name and the contained directory usually do not match. This can be caused due to the clock drift and the time required to transmit and compress the logs.

In the following, we will use the vCenter Server logs to reconstruct events indicating the cloning of virtual machines. In this example, we will use the redundancy of the logs and use the log data from one of the core services of vCenter Server: vpxd, that is, the core vCenter daemon:

#!/usr/bin/env python

import gzip
import os
from os.path import join
import re
import sys


# used to map session IDs to users and source IPs
session2user_ip = {}

def _logopen(filename):
    """Helper to provide transparent decompressing of compressed logs,
       if indicated by the file name.
    """
    if re.match(r'.*.gz', filename):
        return gzip.open(filename, 'r')

    return open(filename, 'r')

def collect_session_data(vpxlogdir):
    """Uses vpx performance logs to map the session ID to
       source user name and IP"""
    extract = re.compile(r'SessionStats/SessionPool/Session/Id='([^']+)'/Username='([^']+)'/ClientIP='([^']+)'')

    logfiles = os.listdir(vpxlogdir)
    logfiles = filter(lambda x: 'vpxd-profiler-' in x, logfiles)
    for fname in logfiles:
        fpath = join(vpxlogdir, fname)
        f = _logopen(fpath)
            
        for line in f:
            m = extract.search(line)
            if m:
                session2user_ip[m.group(1)] = (m.group(2), m.group(3))

        f.close()

def print_cloning_hints(basedir):
    """Print timestamp, user, and IP address for VM cloning without
       by reconstructing from vpxd logs instead of accessing
       the 'official' event logs"""
    vpxlogdir = join(basedir, 'ProgramData',
                              'vCenterServer',
                              'logs',
                              'vmware-vpx')
    collect_session_data(vpxlogdir)

    extract = re.compile(r'^([^ ]+).*BEGIN task-.*?vim.VirtualMachine.clone -- ([0-9a-f-]+).*')

    logfiles = os.listdir(vpxlogdir)
    logfiles = filter(lambda x: re.match('vpxd-[0-9]+.log(.gz)?', x), logfiles)
    logfiles.sort()

    for fname in logfiles:
        fpath = join(vpxlogdir, fname)
        f = _logopen(fpath)
            
        for line in f:
            m = extract.match(line)
            if m == None:
                continue
       
            timestamp = m.group(1)
            session = m.group(2)
            (user, ip) = session2user_ip.get(session, ('***UNKNOWN***', '***UNKNOWN***'))
            print 'Hint for cloning at %s by %s from %s' % (timestamp, user, ip)
            
if __name__ == '__main__':
    if len(sys.argv) < 2:
        print 'Usage: %s vCenterLogDirectory' % sys.argv[0]
        sys.exit(1)

    print_cloning_hints(sys.argv[1])

First, this script reads the so-called performance log of vpxd. This log contains data about client sessions and we use it to extract a mapping from the unique session identifier to the client username and the IP address that the client is connecting from. In the second step, the main log of vpxd is searched for the start of tasks of vim.VirtualMachine.clone type, that is, the cloning of virtual machines on the server side. The session information is then looked up in the mapping that is harvested from the performance log to retrieve the data about possible cloning events, as follows:

user@lab:~$ python extractCloning.py vc-winserver-2015-07-05--02.19/
Hint for cloning at 2015-07-05T01:30:01.071-07:00 by VSPHERE.LOCALAdministrator from 192.168.167.26

In the example, the script revealed that the Administrator account was used to clone a virtual machine. This hint can be correlated with the event log of vCenter Server and it will show up there as well. If it does not, then this is a strong indicator of a compromised environment.

Note

Depending on your system environment, operations such as cloning and exporting virtual machines may be a part of daily operations. In that case, the previous script or its variants may be used to detect unusual users or source IPs that are performing these operations.

Similar searches and correlations can be used for other events of interest. Copying of files of the datastore or exporting virtual machines are promising candidates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset