16
Forensics‐as‐a‐Service (FaaS) in the State‐of‐the‐Art Cloud

Avinash Srinivasan1 and Frank Ferrese2

1 Computer and Information Sciences, Temple University, Philadelphia, PA, USA

2 Electrical and Computer Engineering, Temple University, Philadelphia, PA, USA

16.1 Introduction

Advances and fundamental changes in the computing and communications industries have resulted in significant challenges to current digital forensic analysis practices, policies, and regulations. Consequently, the forensic analysis process is suffering from significant roadblocks not only from unclear cyberlaws and regulations, but also as a result of significant technology challenges. Integrity is the key requirement in the forensic analysis process. To further complicate matters, computer forensic analysis is fundamentally a serial process. Therefore, inherent scalability challenges exist. Most importantly, the ability to withstand the Daubert test during the trial is pivotal to designing parallel and distributed forensic analysis tools (Ball 2017). In light of this web of challenges, case backlogs are growing at an increasing rate. As noted in (Hitchcock et al. 2016), the backlog is commonly in the order of 6–18 months, but can reach significantly higher numbers in some jurisdictions.

One instance of a key paradigm shift in the computing industry is the advent of cloud computing. In recent years, cloud computing capabilities have advanced significantly and evolved from a mere plausible concept to hard reality of survival for many industries. It has brought along numerous business opportunities; and everyone, from start‐ups and small industries to Fortune 100 companies, is embracing cloud computing, though perhaps from different viewpoints and varying business needs. Some of the attractive benefits of cloud computing include a reduced in‐house infrastructure burden, minimized maintenance and update pressures, and the ability to quickly scale as computing needs increase. A 2013 Gartner report predicted that the cloud‐based security services market, which includes secure e‐mail and web gateways, identity and access management (IAM), remote vulnerability assessment, security information, and event management would surpass $4 billion by 2017 (Messmer 2013).

16.1.1 Current State of Cloud Computing

Cloud computing is undoubtedly one of the most significant technology advances of twenty‐first century computing technology. The dawn of the cloud computing paradigm had three service delivery models: Software‐as‐a‐Service (SaaS), Platform‐as‐a‐Service (PaaS), and Infrastructure‐as‐a‐Service (IaaS). However, innovation and advancement fueled by growing consumer and business needs led to the birth of numerous other delivery models, Scanning‐as‐a‐Service (Gionta et al. 2014) and Monitoring‐as‐a‐Service (Alhamazani et al. 2015).

On the flip side, the cloud computing platform presents some very serious security and privacy concerns. The vast resource pool it offers has been and continues to be exploited by malicious actors. An adversary can easily exploit resources in real time for malicious reasons. This situation has transformed matters from bad to worse for law enforcement (LE) and the intelligence community (IC). Some potential threats from the cloud computing platform can be evidenced from services such as Cybercrime‐as‐a‐Service (CaaS) (Robinson 2016), Malware‐as‐a‐Service (Drozhzhin 2016), Attacks‐as‐a‐Service (Lemos 2010), Crimeware‐as‐a‐Service (CaaS) (Sood and Enbody 2013), and Exploit‐as‐a‐Service (Grier et al. 2012). Today's cloud computing architectures, though very popular, are not designed to meet some of the stringent digital forensics requirements for electronic evidence. The most important requirements that are impacted by cloud computing are chain of custody and data provenance.

Numerous works have focused in this area, including (Reilly et al. 2010; Birk and Wegener 2011; Marty 2011; Taylor et al. 2011; Dykstra and Sherman 2012; Sibiya et al. 2012; Zawoad and Hasan 2012; Grispos et al. 2013; Zawoad et al. 2013; Zawoad and Hasan 2013b). Of particular relevance is (Zawoad and Hasan 2013a), which notes that many of the assumptions of digital forensics with regard to tools and techniques are not valid in cloud computing. In (Chen et al. 2013), the authors evaluate the implementation of a cloud‐based security center for network security forensic analysis to process stored network traffic using cloud computing platforms to find malicious attacks.

16.1.2 What Is This Chapter About?

The chapter's primary focus solving the ever‐increasing number of both criminal and civil cases that involve electronic evidence, increasing data and storage device sizes, and devices connected to the Internet of Things (IoT) that have been and continue to be foot soldiers for geographically remote cybercriminals and nation states. Some in the forensics community – LE and IC agencies, researchers, and practitioners – have turned toward parallel and distributed computing paradigms in the hope of overcoming the seemingly unsurmountable case backlog. One specific direction of interest is cloud computing. It is now clear that utilization of cloud resources to accelerate the turnaround times for forensics investigations is inevitable, and the its adoption of massive scale is both imminent and impending. Some of the early works toward this aim include (Wei 2004; Richard and Roussev 2006a; Richard and Roussev 2006b; Beebe and Clark 2007; Liebrock et al. 2007; Marziale et al. 2007; Ayers 2009; Roussev et al. 2009; Reilly et al. 2010).

16.1.3 Chapter Road Map

The remainder of this chapter is organized as follows. In Section 16.2, we discuss relevant background and present the necessary preliminaries of this chapter. Then, in Section 16.3, we review existing state‐of‐the‐art work focusing on parallel and distributed digital forensic analysis, followed by a discussion of the limitations in this work in Section 16.3.2. In Section 16.3.3, we present some of the key requirements to offering cloud‐based FaaS. Finally, in Section 16.3.4, we conclude the chapter with future research directions.

16.2 Background and Motivation

16.2.1 Limitations of Traditional Computer Forensics – Now and Forever

Today, it is not uncommon for laptops and desktops to be equipped with terabyte‐sized storage. Similarly, in contrast to a decade ago, digital forensic analysts today deal not only with significantly larger average disk size, but also with an extremely large variety of devices. Consequently, the amount of data that needs to be processed can run into tens of terabytes. Adding to this problem is the number of cases today that require computer forensic analysis. As witnessed in recent crimes, attackers' level of sophistication has significantly advanced from the days of the Rabbit virus “fork bomb” and the Morris worm (Chen and Robert 2004) to state‐of‐the‐art Petya and Mirai. Even widely used commercial forensics suites such as EnCase (www.guidancesoftware.com) and Forensic Toolkit (FTK; accessdata.com) are not keeping pace with the increased complexity and data volumes of modern investigations. The growing burden on computer forensic analysts is evident from the reports published by the FBI Regional Computer Forensics Laboratories (RCFLs) and Computer Analysis Response Team (CART). According to the 2010 RCFL annual report (RCFL 2011), a total of 6564 examinations were conducted, requiring processing of 3086TB data, with an average case size of 0.4TB. In its 2011 annual report (RCFL 2012), the RCFL reported a total of 7629 examinations by analyzing 4263TB of data. During fiscal year 2012, the FBI CART supported nearly 10 400 investigations, with over 13 300 computer forensic examinations, processing data volumes in excess of 10 500 TB (FBI‐CART 2013).

Numerous works in recent years have tested the limits of traditional computer forensic tools and techniques to deal with evolving technology. Conventional wisdom may seem to be that computer systems should make investigations much faster simply by virtue of being able to perform billions of operations per second. In reality, however, the ever‐increasing drive sizes necessitate significant (pre)processing times that far outweigh the benefits of those billions of operations per second.

Limitations of first‐generation computer forensic tools are presented in (Ayers 2009) along with metrics for measuring the efficacy and performance of good tools. The author further lays out a broad set of requirements for second‐generation tools and presents a high‐level work‐in‐progress design for a second‐generation computer forensic analysis system. The goal is to implement and test the prototype using two different processing architectures: (i) Beowulf cluster, and (ii) IBM BlueGene/L supercomputer.

A forensic cloud is a framework for a forensic index‐based search application presented in (Lee and Hong 2011). While it takes substantial effort to construct an index database, the authors argue that searching the indexed database returns a query response in a fraction of the time the same query would take without indexing. Later, (Lee and Un 2012) present a case study supporting forensic indexed search as a service along with a work‐in‐progress model.

In their experiments, they achieve significantly better performance (≈56MB/s−1) when the target data to be processed is more than 56GB. When a 1TB drive is analyzed with bigrams, their system takes ≈2 hours. Their system can also retrieve results from compressed text document formats at an average of ≈25MB/s−1 for a single query. Processing this query against a 1.27TB target took the authors ≈13 hours. However, they argue that this performance indeed outperforms existing forensic bitwise search methods by a significant margin. Further, the authors note that forensic bitwise search methods take ≈18.5 hours to perform a single keyword search on a 1TB drive.

This conclusion is supported further in (Roussev and Richard 2004), where the authors argue that a large part of the processing time is the “think” time, i.e. time needed for the human investigator to analyze the data. While it may be possible for a system to accumulate experience and reduce this time through Machine Learning, they are confident that the processing time needed by the system to execute investigator‐issued queries is largely dependent on the quality of the construction of the query.

In summary, the limitations of current forensic tools and techniques are deeply rooted in the following: (i) data diversity and abstraction, (ii) input/output (I/O) and processing speed, (iii) I/O‐intensive tasks, (iv) lack of automation, (v) inability to scale, and (vi) potential open source tools that aren't yet approved.

16.2.2 Potential of Looking Up to the Cloud – Forensics‐as‐a‐Service

Cyberspace is highly dynamic and will not cease to evolve in its applications, sophistication, and reach. Consequently, the LE community will continue to work against the odds, making forensic analysis ever‐more challenging. (Marziale et al. 2007) presents compelling real‐world use cases justifying the need for more advanced tools. Their use cases clearly demonstrate the inadequate capacity of traditional forensic investigation tools executing on a single workstation.

The time has come for a paradigm shift in computer forensic analysis. We require an adaptive, widely available, priority‐driven parallel and distributed computing architecture. While the Cloud is inherently a distributed computing paradigm, its resourcefulness as a parallel computing paradigm has also been established (Ekanayake and Fox 2009). This migration to the Ccloud is necessary to both clear current backlogs as well as make the backlog manageable in the future.

The Advanced Forensics Format (AFF) was proposed as an alternative to proprietary disk image formats. AFF is an open and extensible format for storing images of hard disks and other kinds of storage devices (Stevens et al. 2006). The authors also present AFFLIB, an open source library that implements AFF. This work also proposes Advanced Imager (AIMAGE), which is a new disk‐image‐acquisition program that compares favorably with existing alternatives. (Cohen et al. 2009) later redesigned this as AFF4 with backward compatibility. The redesigned AFF4 format built on the well‐supported ZIP file format specification, making it simple to implement. Furthermore, the AFF4 implementation has downward comparability with existing AFF files.

16.3 State of the Art in Parallel and Distributed Forensic Analysis

16.3.1 GPU‐Based Distributed Forensic Analysis

(Gao et al. 2004) discuss user and software engineering requirements for on‐the‐spot digital forensics tools to overcome time‐consuming, in‐depth forensic examinations. They present their Bluepipe architecture (shown in Figure 16.1) for on‐the‐spot investigation along with the remote forensics protocol they have developed.

Bluepipe architecture illustrating single-client/single server linking boxes labeled BpS, Query processor, UI, etc. (left); multiple-clients/multiple-servers with arrows from BpProxy/coord server to BpS1, BpSn, etc. (right).

Figure 16.1 Bluepipe architecture (Gao et al. 2004).

The feasibility of using graphics processing units (GPUs) for accelerating the traditional digital forensic analysis process is explored in (Marziale et al. 2007). They note that the current generation of GPUs contains a large number of general‐purpose processors, in sharp contrast to previous generations of designs, where special‐purpose hardware units such as texture and vertex shaders were commonly used. This fact, combined with the prevalence of multicore general‐purpose central processing units (CPUs) in modern workstations, suggests that performance‐critical software such as digital forensics tools should be “massively” threaded to take advantage of all available computational resources.

Results from a number of experiments that evaluate the effectiveness of offloading processing common to digital forensics tools to a GPU, using “massive” numbers of threads to parallelize the computation, are presented in (Marziale et al. 2007). These results are compared to speedups obtainable by simple threading schemes appropriate for multicore CPUs, indicating that in many cases, the use of GPUs can substantially increase the performance of digital forensics tools.

(Roussev and Richard 2004) present the impact of evidence data size on analysis turnaround time. They evaluate the performance of the very popular commercial tool FTK by opening a case containing an old 6GB hard disk using the default options of the tool. During their study, FTK took approximately 2 hours to just open the case with the 6GB image. Using this time as the baseline, with a conservative assumption that the processing time grows linearly as a function of size, the authors conclude that it would take the state‐of‐the‐art commercial tool approximately 60 hours to simply open a case with a 200GB disk image. However, in reality, when they tested their estimation on an 80GB image, it took FTK over 4 days (96+ hours) just to open the image. Therefore, there are indications that the tool does not scale linearly with increasing sizes of disk images.

Finally, (Roussev and Richard 2004) weigh in on the long‐standing debate on whether to adopt a generic distributed framework (GDF) for distributed digital forensics (DDF) purposes or to develop a more specialized solution. They conclude that a specialized solution is a better approach for the following reasons. First, specialized solutions are more amenable to optimization for any specific purpose and, hence, can achieve better performance with less overhead. Second, specialized solutions minimize requirements for preinstalled infrastructure on machines. This enables regular users to run the system with ease and minimal administrative overhead. Finally, specialized solutions are better because GDFs have specialized programming interfaces requiring effort and experience for the operator to use them.

In summary, the conclusion of their work was that the fundamental resource constraints on workstation‐class systems have been pushed to their processing and performance limits. Consequently, efforts focusing on task and resource optimizations will only result in marginal improvements, if any, on execution time.

16.3.1.1 XML Information Retrieval Approach to Digital Forensics (XIRAF)

(Alink et al. 2006) propose XIRAF, a prototype system for forensic analysis that is an Extensible Markup Language (XML) based implementation aimed at managing and querying forensic traces extracted from digital evidence. XIRAF systematically applies forensic analysis tools to evidence files. Each forensic analysis tool that is used produces an output consisting of structured XML annotations capable of referring to regions in the corresponding evidence file. Furthermore, such annotations are stored in a persistent back end such as an XML database (DB) that can be queried at a later time. To query XIRAF's XML database, the authors have developed XQuery, which is a custom query tool.

XIRAF's XML‐based forensic analysis platform provides the forensic investigator with a powerful, feature‐rich query environment in which browsing, searching, and predefined query templates are all expressed as XQuery queries – XML DB queries. The authors address two key data‐processing problems that occur during the feature‐extraction and analysis phases of a computer system investigation:

  • Evidence quantity: Modern computer systems are routinely equipped with hundreds of gigabytes of storage, and a large investigation will often involve multiple systems, so the amount of data to process can run into terabytes. The amount of time available for processing this data is often limited (e.g. because of legal limitations). Also, the probability that a forensic investigator will miss important traces increases every day because there are simply too many objects to keep track of.
  • Evidence diversity: A disk image contains a plethora of programs and file formats. This complicates processing and analysis and has led to a large number of special‐purpose forensic analysis tools such as browser history analyzers, file carvers, file‐system analyzers, Internet relay chat (IRC) analysis tools, registry analysis tools, etc. While it is clear that the output of different tools can and should be combined in meaningful ways, it is difficult today to obtain an integrated view of the output from different tools. Furthermore, even if proprietary and commercial tools are approved and acceptable, it is highly unlikely that any forensic investigator would have the time and the knowledge to apply the relevant tools to the case and evidence at hand. Hence the authors propose their XIRAF framework, which has the following key properties: (i) clean separation between feature extraction and analysis; (ii) single, XML‐based output format for all forensic analysis tools; (iii) XML DB for storing the XML annotations; and (iv) custom query tool XQuery for querying analysis tools' XML output.

Since December 2010, the Netherlands Forensic Institute has been using XIRAF – a service‐based approach for processing and investigating high volumes of seized digital material. Service‐based XIRAF has over the years evolved significantly and become a standard for hundreds of criminal cases and over a thousand investigators, both in the Netherlands and in other parts of the world. The authors note the impact of the XIRAF system and the paradigm shift it is causing, having processed over a petabyte of data with the XIRAF system.

XIRAF was originally primarily aimed at identifying and developing techniques for automating (parts of) the data analysis process for forensics investigations. It was never meant to be an operational system for processing large volumes of data, and most definitely not data volumes in petabytes. Consequently, design considerations made during the development of XIRAF leave significant room for improvement.

16.3.1.2 Hansken: Digital Forensics as a Service (DFaaS) Successor to XIRAF

Hansken was well defined and designed from its inception and has a proof of concept (PoC) based on the new principles and ideas and a production version to replace XIRAF (Alink et al. 2006). The forensic drivers behind the design and development of Hansken have been to provide a service that processes high volumes of digital material in a forensic context. In addition, it provides easy and secure access to the processed results. The Hansken forensics framework is designed to focus on the following three drivers: (i) minimization of case lead time, (ii) maximization of trace coverage, and (iii) specialization of people involved (Van Beek et al. 2015).

Processing seized material must be automated to provide the investigations team access to critical data. This impacts the way digital material is handled (Van Baar et al. 2014). Furthermore, the results of this automated process must be made available to the investigation team directly, not to specialized digital investigators. To further speed up the investigation, analysts should be able to annotate or tag interesting traces such as those that need further analysis, or those that are not clear to the investigator who tagged them. Such annotation/tagging should be available to other analysts so that the case can be solved through collaborative analysis.

The design of Hansken (Figure 16.2) supports distributed extraction of traces from images. XIRAF, the precursor to Hansken, applies multiple tools to a forensic image on a single machine. This is iterative in nature and hence does not scale well. Most importantly, the design of XIRAF means taking data to the tools, since tools are applied sequentially, with each tool having dedicated access to the image. To overcome this limitation of sequential processing, Hansken's design was driven toward taking the tools to the data. Hansken uses distributed technology, making it possible to process one forensic image using multiple machines. Consequently, as soon as the data is read from the image, it is kept in memory, and all tools are applied. Once a trace is fully processed, the results are stored in a database so it can be queried while other traces are still being extracted. This makes the first trace available in minutes, with more traces available for querying, mitigating idle time.

The design of Hansken architecture depicting Digital investigator with arrows pointing to Data to investigator and Digital evidence (left) and to Analyst and Detective (right).

Figure 16.2 Hansken architecture (Van Beek et al. 2015).

Another key feature of Hansken is its data‐driven acquisition, such that analysts can start the process of extracting traces from a forensic image as soon as the first bits of a device are uploaded to the central system. To support this feature, the authors have designed an image format that splits the image data into encrypted blocks. Such a format supports processing unordered blocks, which makes it possible to implement dynamic pipelining where the extraction process influences the imaging process by asking for certain blocks of data to become available with priority.

16.3.1.3 MPI MapReduce (MMR)

(Roussev et al. 2009) present three possible alternative approaches for augmenting forensics data processing in a fixed amount of time. The first is through the development of improved algorithms and tools for better and more efficient use of available machine resources. The second approach is the use of additional hardware resources to deploy additional machine resources. The third alternative is to facilitate human collaboration, taking advantage of human expertise in problem solving. All three approaches are mutually independent and support large‐scale forensics in complementary ways.

The authors propose an open implementation of the MapReduce processing model that they call MPI MapReduce (MMR). The proposed MMR falls under the second category since it supports the use of additional hardware in the form of commodity distributed computational resources to speed up forensic investigations.

MMR's performance has been evaluated through a proof‐of‐concept implementation using two key technologies. The first is the Phoenix shared‐memory implementation of MapReduce. The second is the Message Passing Interface (MPI) distributed communication standard. In summary, MMR provides linear scaling for CPU‐intensive processing and super‐linear scaling for indexing‐related workloads.

16.3.1.4 GRR Rapid Response Framework

(Cohen et al. 2011) present GRR Rapid Response Framework (GRR), a new multiplatform, open source tool for enterprise forensic investigations. A key feature of GRR is its ability to support remote raw disk and memory access. GRR is designed to be scalable and is a distributed approach for remote live access. However, it is not a cloud‐based solution; instead, it is a live forensics tool geared toward preserving volatile evidence. Yet another remote‐access technique utilized is presented in (Cohen 2005). The advantage of this technique is that the client side is very simple, while the server side performs the complex forensic analysis.

A key challenge to automating analysis is that it may require executing many sequential steps. Current solutions create a dedicated console process that waits for the client to complete each step before issuing the next step. This limits scalability because the server needs to allocate resources for each client and wait until the entire analysis is complete. In GRR, the authors use state serialization to suspend execution for analysis processes for each client. These serialized forms are then stored dormant on disk until the client responds. Consequently, this approach resolves the problem of a resource drain imposed on servers. In GRR, such constructions are referred to as flows. A flow is simply a state machine with well‐defined serialization points where it is possible to suspend its execution.

The architecture of GRR addresses auditing and privacy issues by allowing for nonintrusive automated analysis with audited access to retrieved data. However, it strives to achieve a balance between protecting access to user data and warranted forensically sound analysis. It also provides a secure and scalable platform to facilitate employing a variety of forensic analysis solutions. The authors support the usefulness and practicality of their proposed GRR through the following four case studies: (i) Investigation of intellectual property leaks, (ii) Isolation of a targeted malware attack, (iii) discovery request compliance, and iv) periodic snapshots of system states.

16.3.1.5 A Scalable File‐Based Data Store for Forensic Analysis

(Cruz et al. 2015) present a specific implementation of the GRR Rapid Response (GRR) framework (Cohen et al. 2011). (Cruz et al. 2015) present a new data store back end (Figure 16.3) that can be used as a storage layer for the AFF4 Resolver. GRR's AFF4 Resolver stores AFF4 objects permanently in a NoSQL data store, enabling the application to only deal with high‐level objects. The proposed GRR's distributed data store partitions data into database files that can be accessed independently, enabling scalable distributed forensic analysis. Furthermore, the authors discuss utilizing the software reference database National Software Reference Library (NSRL) in tandem with their distributed data store to avoid wasting resources when collecting/processing benign files. The following two functionalities must be implemented by the data store in order to support an AFF4 Resolver.

  • Single‐object access: Simplifies the partitioning of data because operations never deal with multiple objects. GRR systems require synchronous operations to guarantee globally deterministic ordering.
  • Support for synchronous and asynchronous operations: Synchronous operations will block until the data store returns the results, while asynchronous operations will be scheduled to be performed at some point in the future. Asynchronous operations improve program concurrency and provide a huge performance advantage, and hence are heavily used by GRR systems.
GRR architecture depicting 2-headed arrows placed in between Data store, AFF4 subsystem, Workers, Frontend, and Clients.

Figure 16.3 GRR architecture (Cruz et al. 2015).

Originally, the SQLite data store provided by GRR exhibits two limitations: (i) the capacity of each individual worker degrades as new workers are added to the GRR system due to contention at the data store, which limits its horizontal scaling; and (ii) since existing data stores rely on a central database server, increasing storage demands on a single server are only possible to a certain extent, which limits storage scaling.

(Cruz et al. 2015) reason that these limitations are due to file‐lock contention at the central server. Therefore, they work to resolve this problem by completely dividing the AFF4 namespace into independent storage files. This helps mitigate the file‐lock contention problems. The benefits of their approach can be witnessed in their validation results.

16.3.1.6 Forensics‐as‐a‐Service

(Wen et al. 2013) propose a domain‐specific cloud environment that can use the emerging trends of service‐based computing for supporting forensic applications. The proposed cloud‐based forensics framework (Figure 16.4) is specifically designed for dealing with large volumes of forensic data. Furthermore, their approach has the ability to enable the sharing of interoperable forensic software and provides tools for forensic investigators to create and customize forensic data‐processing workflows. The authors have conducted experiments using their forensic cloud framework with Amazon's Elastic Compute Cloud (EC2) service.

The proposed digital forensic cloud‐based forensics framework illustrating Application manager, Workflow manager, Data manager, Resource plans, Physical servers, VM servers, and Storage servers.

Figure 16.4 Digital forensic‐as‐a‐service software stack (Wen et al. 2013).

The experimental infrastructure is based on Hadoop 0.20 and HBase 0.20 and is managed by Cloudera (www.cloudera.com). For evaluations, the workloads are parallelized, and the results show that their approach can reduce forensic data analysis time considerably. They also argue that the overhead for investigators to design and configure complex forensic workflows is greatly minimized. Finally, they claim their proposed workflow management solution can save up to 87% of analysis time in the tested scenarios.

16.3.1.7 Data Deduplication Driven Acceleration of Forensic Analysis

(Scanlon 2016; Wolahan et al. 2016) present a unique perspective to combat the digital forensic backlog. The proposed method explores a data deduplication framework to eliminate redundancy in reacquisition, storage, and analysis of previously processed data. The primary objective of the authors in this case is to design a system that can alleviate some of the backlog by minimizing duplicated effort, while providing a number of enhancements to the functionality available with the traditional alternative. (Wolahan et al. 2016) explore alternatives to the traditional evidence‐acquisition model by using a forensic data deduplication system. This work also presents the advantages of a deduplicated approach along with some preliminary results of a prototype implementation.

16.3.2 Limitations in State‐of‐the‐Art Research and Tools

The speed of digital forensic evidence acquisition is traditionally limited by two main factors: (i) the read speed of the storage device being acquired and (ii) the write speed of the system the evidence is being acquired to. None of key research discussed here in distributed and parallel forensic analysis has addressed this issue. The researchers are assuming that the data used is collected from different sources in a distributed way, including using the Cloud during acquisition. This is not a realistic assumption.

Typically, first responders collect and image all of the evidence, and then it is uploaded to the Cloud. Unless multiple systems are being imaged, using the Cloud for acquiring evidence images does not yield better results due to the system I/O limitations noted previously. Additionally, the authors have not tested their frameworks on disk images without ground truth. Only knowing the ground truth of an image and then processing it with tailored workflow management will yield good results. Therefore, the process efficiency and increased‐speed claims are questionable.

In (Cohen et al. 2011), the proposed system provisions remote access to networked systems. However, the tool is specifically designed for remote live forensics of the networked systems. In (Wen et al. 2013), the authors note that the data they use for experiments is collected from different sources in a distributed way using the Cloud. They further note that the forensic data manager provides support for uploading evidence files to the Cloud. However, the upload time for evidence files is not considered when evaluating the performance of their framework. Therefore, the increased speed they report in their results does not truly reflect the actual increase in speed, since uploading the evidence files is one of the most time‐consuming steps in forensic analysis.

Another key area that has not been addressed is the difficulty in merging analysis results from various tools into a single case report. Note that frameworks (Van Beek et al. 2015) that facilitate execution of various tools on the evidence images need a streamlined approach for consolidating the output into a meaningful analysis report.

16.3.3 Cloud‐Based Forensics‐as‐a‐Service (FaaS)

16.3.3.1 Security and Privacy Requirements

FaaS service providers must assure all stakeholders – suspects, victims, judge, and jury – that their implementation and the operations of FaaS meet the regulatory standards for security and privacy of data and the integrity requirements of the forensics processes. A FaaS service provider is expected to assure its stakeholders of the three core security requirements – confidentiality, integrity, and availability – with regard to case and evidence data security and privacy. The service provider must ensure that resource pooling in a multitenant environment does not risk the fundamental requirements of the security triad.

Confidentiality of case‐relevant information and evidence data is a key requirement. The service provider must ensure that appropriate control mechanisms (Figure 16.5) are in place to prevent accidental or intentional data disclosure, unauthorized access, or accidental/intentional data leaks either during or after the case analysis is complete. Furthermore, any potential user confidentiality violations can potentially have a domino effect, resulting in secondary violations of the Health Insurance Portability and Accountability Act (HIPAA), Family Educational Rights and Privacy Act (FERPA), etc. Similarly, third‐party tools must be thoroughly vetted to detect any potential data leaks.

Diagram for security enforcement in FaaS depicting 2-headed arrows placed in between a cloud shape labeled Cloud-based FaaS, Case evidence files, Security enforcement point, and Analyst interface.

Figure 16.5 Security enforcement in FaaS.

The integrity of case‐relevant data is of even greater significance in the realm of computer forensic analysis. The FaaS provider mush have well‐established and tested integrity controls enforced to counter any potential risk of accidental or intentional alterations to case information and, more importantly, evidence data. Failure to implement strong integrity‐preserving security mechanisms can be catastrophic to digital investigations, with the potential of rendering all evidence data inadmissible.

Finally, case information and evidence data should be available whenever authorized users need access. Though non‐availability is not a critical security concern to the investigation, it can impact the investigation indirectly due to downtime resulting in delayed analysis. This can cascade to the discovery of information that could warrant additional seizures, which may already have been destroyed irreversibly. Also, at the completion of the analysis, there must be proper procedures for backup and archiving to ensure the availability of case‐relevant data in the future for (re)appeals or other legal purposes.

16.3.3.2 Regulatory and Legal Requirements

Compliance in the realm of information security is a fundamental requirement. A majority of enterprise forensics investigations include noncompliant matters involving employees or the employer. Forensics investigations can span the whole spectrum of possibilities – from enterprise policy violations to insider threats, harassing e‐mails to cyberstalking, robbery to vandalism, and suicide to homicide. Digital forensics investigations should comply with key regulations:

  • There must be strict control over the cloud infrastructure and resources, ensuring consistency in jurisdiction and applicable laws of the FaaS platform.
  • The FaaS platform and the entire process of analysis are monitored and logged at appropriate granularity, enabling audits by a neutral, trusted third party. The logs are themselves secured such that the neutral, trusted third party’s auditing will not have access to any sensitive information such as personally identifiable information (PII) during the course of the audit.
  • All methods, tools, and techniques must be validated and approved by appropriate government authorities. One of the key approvals often comes from the National Institute of Standards and Technology (NIST) Computer Forensics Tool Testing (CFTT) program. Failure to prove the integrity and reproducibility of the process would render all efforts futile in a court of law.

16.3.3.3 Design Requirements

Some of the key requirements for designing a parallel, distributed digital forensics toolkit framework are delineated next:

  • Modular: Since the entire forensic analysis is a complex process, a modular design facilitates a systematic breakdown of the complex process. Subsequently, tools and techniques can be developed for smaller tasks at a level of granularity and abstraction that supports the case hypothesis. A modular design enables flexibility and extensibility, two key requirements to cope with an evolving technology and threat landscape. It also enables rapid development of newer tools and their easy integration into the master tool framework.
  • Scalable: The architecture of the FaaS should be capable of scaling well with increasing numbers and sizes of cases and associated evidence. An increasing workload should not compromise resource allocation and execution capabilities. A digital forensic analysis process is scalable if it can keep the average time per investigation constant in the face of growing target sizes and diversity (Roussev and Quates 2012).
  • Platform independent: FaaS should be able to handle forensics tools independent of the tools' needs for a specific hardware/software platform. Furthermore, for the FaaS framework, it should be possible to pool the machine resources of a group of investigators working on the same case, to speed up the processing of critical evidence (Roussev and Richard 2004).
  • Extensible: Cloud‐based FaaS frameworks should be devoid of vendor‐locked functionality and capability expansions. This is a critical requirement for enhancing FaaS relevance and capabilities so they are current with evolving technology needs and caseloads. Note that this is a standard software engineering requirement and mandates that it should be easy to add new or replace existing functions (Roussev and Richard 2004).

16.3.3.4 Benefits of Provisioning Cloud‐Based Forensics‐as‐a‐Service

By migrating the computer forensic analysis process to the cloud, the digital forensic science discipline will experience a broad spectrum of benefits. The first and foremost benefit will be more efficient utilization of limited manpower with required skills. This would also mean improved consistency in results from forensic analysis. Since the Cloud already offers metered services, migrating the forensic analysis process to the cloud will result in improved resource utilization while minimizing costs. Since the Cloud as a computing platform is ubiquitous and widely accessible, it enables better interagency and intra‐agency information and resource sharing. Furthermore, FaaS will offer consistent analysis platforms and resource allocation through an established baseline. Finally, the most important benefit of FaaS will be provisioning accreditation and certification bodies with convenient access to tools and processes for validation and certification.

16.4 Conclusion and Future Research Direction

Current trends in computing and communications technologies are putting vast amounts of disk storage and abundant bandwidth in the hands of ordinary computer users. These trends have long surpassed the capabilities of traditional workstation‐based platforms for computer forensics. There is plenty of evidence in the existing body of work, which addresses the limitations of the current generation of tools and technologies from different perspectives. However, timely processing of digital data is still fundamental to computer forensic analysis. Consequently, large‐scale distributed computing resources coupled with the flexibility to customize forensics processing is critical.

There have been some initial attempts to use parallel and distributed computing paradigms to address a plethora of challenges faced by computer forensics analysts. (Roussev et al. 2009) have developed MPI MapReduce (MMR) as an alternative to Hadoop and demonstrated that the basic building blocks of many forensic tools can be efficiently realized using the MapReduce framework. Nonetheless, the true power of cloud computing is yet to be fully explored by providing a ubiquitous Forensics‐as‐a‐Service platform. The future for accelerating digital forensic analysis to keep pace with the ever‐evolving technology and complexities in computer forensic analysis is inevitably in the direction of parallel and distributed computing. In particular, the ubiquitous and plentiful resources available in the Cloud are the most promising option to alleviate most – if not all – of the problems currently faced.

References

  1. Alhamazani, K., Ranjan, R., Jayaraman, P.P. et al. (2015). Cross‐layer multi‐cloud real‐time application qos monitoring and benchmarking as‐a‐service framework. arXiv preprint arXiv:1502.00206.
  2. Alink, W., Bhoedjang, R., Boncz, P.A., and de Vries, A.P. (2006). Xiraf–xml‐based indexing and querying for digital forensics. Digital Investigation 3: 50–58.
  3. Ayers, D. (2009). A second generation computer forensic analysis system. Digital Investigation 6: S34–S42.
  4. Beebe, N.L. and Clark, J.G. (2007). Digital forensic text string searching: improving information retrieval effectiveness by thematically clustering search results. Digital investigation 4: 49–54.
  5. Birk, D. and Wegener, C. (2011). Technical issues of forensic investigations in cloud computing environments. In: 2011 IEEE Sixth International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE), 1–10. IEEE.
  6. Ball, C.E. (2017). Swaying the jury: the effect of expert witness testimony on jury verdicts in rape trials. Senior Capstone thesis. Arcadia University.
  7. Chen, T. and Robert, J.‐M. (2004). The evolution of viruses and worms.
  8. Chen, Z., Han, F., Cao, J. et al. (2013). Cloud computing‐based forensic analysis for collaborative network security management system. Tsinghua Science and Technology 18 (1): 40–50.
  9. Cohen, M. (2005). Hooking io calls for multi‐format image support. http://www.sleuthkit.org/informer/sleuthkit‐informer‐19.txt.
  10. Cohen, M., Bilby, D., and Caronni, G. (2011). Distributed forensics and incident response in the enterprise. Digital Investigation 8: S101–S110.
  11. Cohen, M., Garfinkel, S., and Schatz, B. (2009). Extending the advanced forensic format to accommodate multiple data sources, logical evidence, arbitrary information and forensic workflow. Digital Investigation 6: S57–S68.
  12. Cruz, F., Moser, A., and Cohen, M. (2015). A scalable file based data store for forensic analysis. Digital Investigation 12: S90–S101.
  13. Drozhzhin, A. (2016). Adwind malware‐as‐a‐service hits more than 400,000 users globally. Kaspersky Lab.
  14. Dykstra, J. and Sherman, A.T. (2012). Acquiring forensic evidence from infrastructure‐as‐a‐service cloud computing: exploring and evaluating tools, trust, and techniques. Digital Investigation 9: S90–S98.
  15. Ekanayake, J. and Fox, G. (2009). High performance parallel computing with clouds and cloud technologies. In: International Conference on Cloud Computing, 20–38. Springer.
  16. FBI‐CART (2013). Piecing together digital evidence – the computer analysis response team. http://www.fbi.gov/news/stories/2013/january/piecing‐together‐digital‐evidence.
  17. Gao, Y., Richard, G.G., and Roussev, V. (2004). Bluepipe: A scalable architecture for on‐the‐spot digital forensics. International Journal of Digital Evidence (IJDE) 3.
  18. Gionta, J., Azab, A., Enck, W. et al. (2014). Seer: practical memory virus scanning as a service. In: Proceedings of the 30th Annual Computer Security Applications Conference, 186–195. ACM.
  19. Grier, C., Ballard, L., Caballero, J. et al. (2012). Manufacturing compromise: the emergence of exploit‐as‐a‐service. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, 821–832. ACM.
  20. Grispos, G., Storer, T., and Glisson, W.B. (2012). Calm before the storm: the challenges of cloud computing in digital forensics. International Journal of Digital Crime and Forensics (IJDCF) 4 (2): 28–48. https://arxiv.org/pdf/1410.2123.pdf.
  21. Hitchcock, B., Le‐Khac, N.‐A., and Scanlon, M. (2016). Tiered forensic methodology model for digital field triage by non‐digital evidence specialists. Digital investigation 16: S75–S85.
  22. Lee, J. and Hong, D. (2011). Pervasive forensic analysis based on mobile cloud computing. In: 2011 Third International Conference on Multimedia Information Networking and Security (MINES), 572–576. IEEE.
  23. Lee, J. and Un, S. (2012). Digital forensics as a service: A case study of forensic indexed search. In: 2012 International Conference on ICT Convergence (ICTC), 499–503.
  24. Lemos, R. (2010). Criminals 'go cloud' with attacks‐as‐a‐service. Technical report, University of Zurich, Department of Informatics.
  25. Liebrock, L.M., Marrero, N., Burton, D.P. et al. (2007). A preliminary design for digital forensics analysis of terabyte size data sets. In: Proceedings of the 2007 ACM Symposium on Applied Computing, 190–191. ACM.
  26. Marty, R. (2011). Cloud application logging for forensics. In: Proceedings of the 2011 ACM Symposium on Applied Computing, 178–184. ACM.
  27. Marziale, L., Richard, G.G., and Roussev, V. (2007). Massive threading: using GPUs to increase the performance of digital forensics tools. Digital Investigation 4: 73–81.
  28. Messmer, E. (October 2013). Calm before the storm: The challenges of cloud computing in digital forensics. Network World.
  29. RCFL (2011). Annual report for fiscal year 2010.
  30. RCFL (2012). Annual report for fiscal year 2011.
  31. Reilly, D., Wren, C., and Berry, T. (2010). Cloud computing: Forensic challenges for law enforcement. In: 2010 International Conference for Internet Technology and Secured Transactions (ICITST), 1–7. IEEE.
  32. Richard, G.G. and Roussev, V. (2006a). Digital forensics tools: the next generation. In: Digital Crime and Forensic Science in Cyberspace, 75–90. IGI Global.
  33. Richard, G.G. and Roussev, V. (2006b). Next‐generation digital forensics. Communications of the ACM 49 (2): 76–80.
  34. Robinson, R.M. (2016). Cybercrime‐as‐a‐service poses a growing challenge. https://securityintelligence.com/cybercrime‐as‐a‐service‐poses‐a‐growing‐challenge.
  35. Roussev, V. and Quates, C. (2012). Content triage with similarity digests: the m57 case study. Digital Investigation 9: S60–S68.
  36. Roussev, V. and Richard, G.G. (2004). Breaking the performance wall: The case for distributed digital forensics. In: Proceedings of the 2004 Digital Forensics Research Workshop (DFRWS) 94.
  37. Roussev, V., Wang, L., Richard, G., and Marziale, L. (2009). A Cloud Computing Platform for Large‐Scale Forensic Computing, 201–214. Berlin, Heidelberg: Springer Berlin Heidelberg.
  38. Scanlon, M. (2016). Battling the digital forensic backlog through data deduplication. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH), 10–14. IEEE.
  39. Sibiya, G., Venter, H.S., and Fogwill, T. (2012). Digital forensic framework for a cloud environment. IST‐Africa 2012.
  40. Sood, A.K. and Enbody, R.J. (2013). Crimeware‐as‐a‐service – a survey of commoditized crimeware in the underground market. International Journal of Critical Infrastructure Protection 6 (1): 28–38.
  41. Stevens, C., Malan, D., Garfinkel, S. et al. (2006). Advanced forensic format: An open, extensible format for disk imaging. In: Advances in Digital Forensics II, 13–27. Springer.
  42. Taylor, M., Haggerty, J., Gresty, D., and Lamb, D. (2011). Forensic investigation of cloud computing systems. Network Security 2011 (3): 4–10.
  43. Van Baar, R., Van Beek, H., and van Eijk, E. (2014). Digital forensics as a service: a game changer. Digital Investigation 11: S54–S62.
  44. Van Beek, H., Van Eijk, E., Van Baar, R. et al. (2015). Digital forensics as a service: game on. Digital Investigation 15: 20–38.
  45. Wei, R. (2004). A framework of distributed agent‐based network forensics system. In: Digital Forensic Research Workshop (DFRWS), 11–13.
  46. Wen, Y., Man, X., Le, K. et al. (2013). Forensics‐as‐a‐service (FaaS): computer forensic workflow management and processing using cloud. In: Fifth International Conferences on Pervasive Patterns and Applications, 1–7.
  47. Wolahan, H., Lorenzo, C. C., Bou‐Harb, E. et al. (2016). Towards the leveraging of data deduplication to break the disk acquisition speed limit. In: 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS), 1–5.
  48. Zawoad, S., Dutta, A. K., and Hasan, R. (2013). SecLaaS: secure logging‐as‐a‐service for cloud forensics. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, 219–230. ACM.
  49. Zawoad, S. and Hasan, R. (2012). I have the proof: providing proofs of past data possession in cloud forensics. In: 2012 International Conference on Cyber Security (CyberSecurity), 75–82. IEEE.
  50. Zawoad, S. and Hasan, R. (2013a). Cloud forensics: A meta‐study of challenges, approaches, and open problems. arXiv preprint arXiv:1302.6312.
  51. Zawoad, S. and Hasan, R. (2013b). Digital forensics in the cloud. Technical report, Defense Technical Information Center DTIC document. http://www.dtic.mil/dtic/tr/fulltext/u2/a590911.pdf.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset