13
Data Acquisition in the Cloud

Nhien‐An Le‐Khac1, Michel Mollema2, Robert Craig3, Steven Ryder4, and Lei Chen5

1University College Dublin, Dublin, Ireland

2Dutch National High Tech Crime Unit, Driebergen‐Rijsenburg, The Netherlands

3Walworth County Sheriff's Office, Elkhorn, WI, USA

4Europol, The Hague, The Netherlands

5Georgia Southern University, Statesboro, GA, USA

13.1 Introduction

The Cloud can be considered “a shared pool of configurable computing resources (e.g. networks, servers, storage, applications, and services)” (Mell and Grance 2011). In the context of digital forensics, this could mean that, for example, a given collection of illicit material is stored on the servers of a cloud provider or across multiple cloud providers. Bear in mind that the cloud provider's country of operation does not necessarily allow conclusions about the geographic location of one or more of its servers, which could be hosted in multiple countries – not to mention that the original cloud provider may have subcontracted its services to other providers (Nishawala 2013). This means in practical terms that the illicit material stored by the suspect is literally scattered in the clouds, for all intents and purposes regarding gaining access to it. It may as well be, from the perspective of law enforcement, which may possibly be required to identify the various locations, issue specific and separate requests for mutual legal assistance, and then hope for a swift response from the various jurisdictions contacted (Dykstra 2013).

The challenges include, in order of the discovery of a suspect/suspected activity, first the identification of the use of cloud services for storage. This could be something obvious, such as a Dropbox account that is prominent on the desktop, or a Google Drive icon, even though provided by the manufacturer of the device; but it could also be a less obvious solution that is not immediately considered, such as using an e‐mail account to store data in, for example, unsent messages or drafts. If the use of cloud storage is suspected and a specific provider is identified, the next challenge is the location of any stored material and its identification. If the suspect does not cooperate, the challenges increase – to such an extent that non‐cooperation by a suspect as regards the provision of passwords has been made a criminal offense in a very pragmatic manner in some locations.

Basically, if or when the cloud provider has been identified, the need for cooperation from the provider is essential. This may involve a non‐judicial means such as a simple request for information: some providers are open to this, while others require – and even pride themselves in not divulging any information about suspected clients in the absence of – a court order to do so. Cloud providers often overwrite their own data, or may remove it without involving law enforcement if it breaches their own terms and conditions, so speed is of the essence in the identification of suspected cloud storage – which as a first step would be followed by a preservation order, forbidding the provider to delete or remove any content. As a follow‐up or at the same time, an order is issued for the production of information regarding all activity of the client, together with a request for access to or provision of the material stored by the suspect.

During cybercrime investigations in a data‐dense environment such as the Cloud, investigators are frequently sent to data centers to collect evidence from computer hosts located within these data centers. In an increasing number of cases, it is becoming difficult to locate the data center where the computer host the investigator is interested in is running. While it is common to reach out to the hosting provider and ask for the data center’s address, there are situations in which it is not possible to contact the hosting provider ahead of time. There are also several reasons why a hosting provider cannot be trusted with the details of the involved computer host. For instance, an Internet service provider (ISP) could inform the user of the computer host prior to the investigator’s arrival. The user could then alter or remove evidence before the investigator collects or intercepts it. These so‐called non‐law‐enforcement‐friendly ISPs require a different approach.

So far, most of the research on cloud forensics has focused on challenges, theory models, forensic services, and process or forensic frameworks (Plunkett et al. 2015). There is very little research on data acquisition in the Cloud. Therefore, in this chapter, we tackle challenges related to forensic acquisition and analysis of artifacts in the Cloud. We first discuss different legal perspectives related to cloud service providers and data storage. Next, we describe how to locate the data center where the computer host the investigator is interested in is running. We also propose an efficient approach to tackling this challenge: a new three‐phase guideline that builds on known techniques and combines them with investigative techniques. Finally, we show the forensic acquisition and analysis of a popular cloud storage platform: Amazon Web Service S3. The preliminary result is promising and provides useful suggestions.

13.2 Background

13.2.1 Inside the Internet

A typical website such as cnn.com provides world news to its audience. Technically, a website like cnn.com has an Internet‐connected host behind it to provide its content to the Internet. Cnn.com is a domain name with an owner; this domain name is linked to the Internet Protocol (IP) address of the Internet‐connected host using the Domain Name System (DNS). DNS translates an easy‐to‐remember domain name into the IP address, which is more difficult to remember. The Internet‐connected host of cnn.com also has an owner, but not necessarily the same owner as that of the domain name cnn.com. The Internet connection, and the location (data center) of the host, can be owned by different entities.

Many hosting providers are very transparent in advertising their whereabouts. They enter the correct data in the regional Internet registry (RIR) database (https://www.nro.net/about/rirs) and provide contact and network details on their websites. They will disclose the correct subscriber information if they receive a court order. Other providers take less care in providing their host locations to the RIR database and have fewer details on their websites. This could be due to lack of proper administration and to save on costs. For instance, a small hosting company can save money by not having to answer phone calls, so it only allows contact via e‐mail. It may still disclose the correct information if it receives a court order. Finally, there are so‐called bulletproof hosts. These hosts advertise to their customers that they will not respond to abuse requests and law‐enforcement requests. They do not keep extensive logs about their customers, and they put incorrect data on their whereabouts in the RIR database. Usually they do not mention details about their data center locations on their websites. Some do not even have a website; they get new customers by word of mouth or by advertising in the cyber underground.

13.2.2 Law Enforcement Interventions in Cybercrime

Law enforcement investigators look at the Internet from a different perspective than normal users. They try to see where traces of evidence can be found. After this, physically locating and gathering the evidence is one of the most important steps the investigators are interested in. Law enforcement and other public cybersecurity organizations and private companies around the world conduct investigations on cyber‐related matters. For instance, these could be investigations into e‐mail spam by a security company, or into vulnerabilities on Internet‐connected devices. Usually cybercrimes are investigated by law enforcement organizations. Normally, the cybercrime investigation unit of the law enforcement agency is responsible for investigating the highest level of high‐tech criminal investigations. Typically, a perpetrator uses many server hosts to perform crimes. Within cyber investigations, all types of hosting companies are encountered. The strategy for locating data centers depends on the type of hosting company encountered, as mentioned earlier.

13.3 Data Center as a Source of Evidence

Investigators often find themselves on their way to data centers to collect relevant forensic evidence on Internet‐connected hosts. Based on the IP address, many of these hosts can be pinpointed to a data centers' physical address (Nicolls et al. 2016). Contacting the involved hosting provider is usually enough to get the address details of the data center. Data centers are facilities where computer systems are housed and data is stored. Governments, universities, and large businesses typically have their own data centers. Commercial data centers provide hosting of websites and storage of large quantities of data. Nowadays, data centers are also used to provide cloud services, such as cloud storage or cloud computing. A data center can be as big as a large factory. A large data center consumes as much energy as a small town (Mittal 2014).

Data centers are complex facilities. Since they need power, cooling, and connectivity, a lot of infrastructure needs to be in place. Modern data centers allow for redundancy on all these aspects. Furthermore, a typical computer system in a data center is no typical desktop PC; most of the time, it is a 19‐inch‐wide server system. These server systems can have server‐specific hardware like SAS‐hard drives. A server can be connected to the power grid by one or more power supplies; it connects to the Internet or network with one or more network cables. Servers can host one operating system or several at the same time. Servers can also be interconnected to form clusters. These interconnected server systems can also connect to each other across the Internet to share resources for cloud services. The server systems are located in racks, which can hold over 40 server systems each.

Not every hosting provider can be trusted. These non‐law‐enforcement‐friendly ISPs cannot be contacted ahead of time. This could mean that the hosting company is not yet known to law enforcement, and therefore no objective assessment is available based on experience with the hosting company. After a first encounter with such a hosting company, it may be considered a trusted hosting provider from then on. Sometimes the hosting provider is the subject of the investigation or is thought to inform customers about law enforcement contacting the hosting provider about a specific host. This poses the risk of the customer being able to alter or remove evidence from the involved host. Such bulletproof hosts are non‐law enforcement Friendly ISPs and require a different approach by law enforcement. They cannot be contacted ahead of time because of the chance they will contact their customers prior to the arrival of law enforcement officials. Investigators tend to use other methods like traceroute and WHOIS queries to find the location of the data center involved, but these have proven to be less accurate than required.

The term non‐law‐enforcement‐friendly ISP is sometimes used in combination with the term bulletproof host (BPH) (Bernaards et al. 2012). These BPH companies willingly provide services to Internet criminals to facilitate activities like hosting child pornography, e‐mail spamming, and distribution of malware. The term non‐law enforcement friendly ISP can also mean to a law enforcement officer that there is no knowledge yet about how cooperative the ISP is. So, it doesn't necessarily mean that the ISP is willingly facilitating criminals. A different example of a non‐law enforcement friendly ISP is when an ISP is very hesitant to cooperate with law enforcement and insists on informing customers if law enforcement asks questions about them. This could be due to a transparency policy, or out of political motivations. Sometimes the term non‐law‐enforcement‐friendly ISP is used for one specific occurrence, like WikiLeaks. The same ISP may be considered law enforcement friendly during other encounters.

13.4 Cloud Service Providers: Essential Requirements, Governance, and Challenges

We need to establish the status of cloud providers and the circumstances in which they operate. Equally relevant is the constantly changing nature of their operating environment, the changes that are ongoing as well as those that are coming, and the varying legislation they face. Once we have fully understood the scope and strategies in which cloud providers operate, we can then assess the similar environment for law enforcement – which will then lead to an analysis of where there are joint needs as well as competing ones. We then try to assess which needs will prevail.

13.4.1 Business Model

The business models used by providers of cloud services vary and are hard to summarize due to the vast services they provide. In general, they mainly provide Infrastructure‐as‐a‐Service (IaaS), Software‐as‐a‐Service (SaaS), and Platform‐as‐a‐Service (PaaS) (Grispos et al. 2012). For the purposes of this chapter, the focus will be on storage providers, which fall under the general concept of IaaS and specifically, in this respect, the consumer market, rather than commercial or business‐oriented solutions. With regard to law enforcement activity, currently the majority of suspects using cloud services do so in a private capacity, rather than through business‐oriented solutions. Specifically, the major providers of personal cloud storage are on a global level, such as Apple, Dropbox, Amazon, Google, and Microsoft.

Cloud storage is a massive business area. The biggest difference, compared to traditional data storage services, is the scale to which providers outsource their own services and act more as intermediaries, rather than full‐service providers using their own infrastructure. This brings with it a number of specific challenges, when taking into account the need or ability for private companies to preserve or access their own logs or infrastructure for forensic purposes. In addition, due to heightened public awareness concerning data retention and collection, companies are protecting themselves by being unable to provide logs or forensic evidence to law enforcement.

13.4.2 Legal Environment of Operations

The legal governance of cloud providers is a very new challenge (Ryder and Le‐Khac 2016). The field is nearly unique in its overlap with and often contradiction of legal compliance, interwoven with and dependent on various judicial jurisdictions. An assessment will be made of the specific areas of law that present unique challenges and the consequences these may have on the operations of a cloud provider.

13.4.2.1 Jurisdictional Issues

As already stated, and as is often accepted, cloud storage by definition is not bound by geographical boundaries. The client that uploads data has very little control over where the data is ultimately stored, either in its entirety or in parts. Until recently, this has not concerned consumers much – until the Snowden revelations, which are ongoing. Since then, entire markets have arisen, from making a feature of and letting consumers choose the geographical place of storage, and therefore the applicable jurisdiction from a legal perspective, to new offers such as hybrid‐cloud storage systems (http://www.fujitsu.com/global/services/hybrid‐cloud).

While most clients have been generally unaware of the differences in legal regimes that may apply to their data, depending on its storage location in a physical sense, the differences, especially as regards privacy and in that sense data protection (not from a security perspective, but from a personal data protection point of view) are vast.

Overall, the primary cloud storage providers are based in and/or storing their data in the EU, in the United States, or in other countries. Where a number of locations are chosen or available to a provider, as is the case for a significant number of them, the choice of location for the following analysis will follow the weakest‐link concept – depending, naturally, on whose perspective it is viewed from. Based on the applicability of US law, companies that operate in the US are under an obligation to comply with US‐issued court orders and warrants for all data under their control. In that sense, a reliance on the fact that other legislation is applicable to their other data centers may be a valid point, albeit a mute one, as it will not overrule US law and the obligation of the provider to supply the data ordered (In Re Grand Jury Proceedings the Bank of Nova Scotia 1984).

Further, a contrast between the EU and the US is based not only on the question of market distribution, but also on a comparison of the EU with the US. Fundamental differences exist as regards the balance of powers between law enforcement and data protection or privacy rights, with the EU taking a stronger stance on privacy, at the cost of law enforcement; the reverse is true for the US, mainly as a result of the events of September 11, 2001 (Fuchs 2013). For the purposes of the following evaluation and comparison, we can place countries in three groups, while taking into account EU legislation, US legislation, and that of other countries.

The EU can be placed in one group, as the majority of legislation applicable is of an EU nature and consequently is applicable across the EU. The same applies to data‐transfer agreements with third states, which are concluded between the EU and a third state, and are binding on all of the EU's member states – just like, for example, mutual legal assistance agreements or extradition agreements, or any other agreement the EU concludes. These are all binding in their entirety on each of the individual Member States. Therefore, a specific examination per country is superfluous. The only exception, as far as EU legislation concerning law enforcement cooperation, and law enforcement specifically, is the United Kingdom; as applicable, it will be highlighted separately, along with other deviations of interest (Ryder and Le‐Khac 2016).

The second group, the US, is even more homogenous as regards its legislation and specifically as regards its law enforcement powers. This includes the perceived ability to have US courts issue US law enforcement near‐global mandates, regardless of the physical location of whichever virtual object or concrete physical device for storage or other use they desire to interact with or remove – as long as there is some connection to the United States (Hiller 2015).

The third group, the generically titled third states, is a catch‐all for assorted countries based on their lack of law enforcement actions and cooperation, for practical purposes. They are in essence bulletproof hosting providers, and may even specifically market themselves as being locations renowned for lack of cooperation – be it due to the lack of will of law enforcement, lack of ability, or even cooperation or tacit acceptance by the host state. On the one hand, this speaks for strong privacy guarantees. In practical terms, however, such providers will often be used purely for criminal activities, such as botnets, child abuse material and its dissemination, or open terrorism propaganda – all safe in the knowledge that the reach of law enforcement does not stretch to them or their users (Goncharov 2015).

As concerns data protection and the ability to outsource/subcontract cloud storage, users have the greatest control if they choose a company within the EU. This will prohibit outsourcing of storage and ensure that data stays outside of the physical jurisdiction of the US, which seems to be one of the major privacy concerns of the public at present. The trend can equally be observed in the number of EU‐based cloud storage providers promoting themselves equally to US or other non‐EU based citizens, as well as EU citizens, with slogans such as “It's Better in Europe,” or moving their headquarters to the EU to avoid association with US law enforcement. This also applies to a number of US headquartered firms moving data centers to the EU to address concerns of data sovereignty, such as IBM, Google, Amazon, and VMWare (Ryder and Le‐Khac 2016).

To avoid the perceived overreach of the US onto data stored in the cloud, a solely EU‐based cloud storage company, using its own infrastructure, is a definitive safeguard. As this includes not only criminal activities or considerations, but also those of businesses and corporations, we will most likely see the trend moving toward the EU being the preferred option for cloud storage; while boosting the market in the EU, this will have a detrimental impact on that of the US.

13.4.2.2 Permissibility of Encryption and Expectation of Privacy

The central aspect of this thesis as regards the likelihood of service providers moving to encryption naturally depends heavily on the legal permissibility of doing so. As argued earlier and further later in this chapter as regards possible countermeasures, outlawing encryption is an option that could be considered; but we believe it is a nonstarter based simply on the dramatic impact this would have on a civil transparent society – which would be made transparent by legislation. Equally, encryption is one of the essential manners in which businesses protect themselves against criminal activity. It is of specific relevance and interest to assess the current manner in which, from a law enforcement perspective, legislation exists to hinder, deter, or deal with the question of permissibility of encryption.

From a contextual practical perspective, encryption not only hinders swift law enforcement examination of seized media and online accounts or storage, but it also links to a need to gain access to e‐mails, media devices, etc. that are protected by passwords, especially when dealing with live systems. Pragmatic approaches exist most notably in the EU in the United Kingdom, but in general the consequence of the seizure of encrypted devices leads to frustration, resignation, and dramatic delays in the forensic and evidential analysis of seized media, or to clear clashes with the basic right against self‐incrimination.

13.4.2.3 Summary

The needs, requirements, and terms of operation of a cloud service provider, especially as regards the storage of data, can be simplified easily from the previous analysis and descriptions. They will also form the basis for benchmarking against proposals made to address the problem of law enforcement needs compared with those of cloud providers.

The first realization, as just discussed, is the fact that cloud services will continue to increase. The manner in which they increase will require a basis of security for providers to operate, which – based on the previous discussion and elaborated further later – is the need for clarity about the expectations providers can have about cooperation and demands from law enforcement, as well as the legal framework applicable to them.

13.5 Cloud Storage Forensics

(Quick et al. 2014) describe their research on forensic analysis of different cloud storage providers such as Dropbox, Google Drive, and Microsoft Skybox. They used Windows‐based virtual machines (VM) as their test machines and examined the random access memory (RAM) using the VM's memory file instead of live acquisitions of memory. They also used a control VM as a base. The researchers downloaded client‐side apps to interface with the cloud storage. These apps are designed to interact with the cloud storage and would leave artifacts such as registry entries, specific log files, and folders created on the computers. The research showed artifacts left by the client software. There were also artifacts of cloud storage activity found in the user folder AppData. Authors also examined Internet history. They used the Internet Explorer, Mozilla Firefox, and Google Chrome web browsers when they did not use a client‐side app. Information about the use of cloud storage was found in the index.dat files and temporary Internet files. URLs were located, referencing transactions of cloud storage. Unencrypted passwords were located in the RAM analysis of Skybox. The authors found that crucial evidence might be stored in a cloud storage account that is not available on the computer itself. They addressed collecting evidence from cloud storage directly. Their first step is to understand the focus of the investigation. The second step is to have determined legal authority to gain access to the cloud storage in question. The third step is to identify the cloud storage account, such as Dropbox. The fourth step is the actual collection of the evidence. They discuss using a VM on a host machine with Internet access, to protect any host from any malware. Using a packet‐capture tool such as Wireshark captures traffic between the VM and server. They suggest using a screen‐capture tool to video‐record the process. Then, download the files onto the VM, and pause the VM. In step five, the researchers conduct the analysis of the VM and packet captures. Step six reports the findings. The seventh step is to complete the backup files and reports.

(Roussev et al. 2016) discussed the traditional acquisition of data on a client‐side computer. They acknowledge that using only client‐side data could leave out critical data. Their research used an alternative approach of acquiring evidence on the cloud storage side by using an application program interface (API). They indicate that APIs are well documented and used by many application developers. They created a prototype called Kummodd, with a command‐line tool and a graphical user interface (GUI) mode. The user's credentials (username and password) are still required to gain access to the cloud storage. A Python script uses the API to communicate with the cloud storage drive. Directly accessing the cloud storage drive gives access to the metadata to ascertain the contents by downloading them. The tool will also show revisions of files and download them. The authors acknowledge that this is a logical acquisition of files. Roussev et al. 2016 believe that this is a forensically acceptable way to acquire evidence without physical acquisition and is justified by the current storage developments. The cloud side could have data divided up onto different servers or, with the invention of solid‐state drives (SSDs), use wear leveling (overwriting unallocated space) that makes recovering deleted data difficult.

Hale (2013) did research on Amazon Cloud Drive (ACD). Note that ACD is a different service than Amazon Simple Storage Service (AS3). AWS is a storage service for the Internet that provides a web services interface so that users can store files and access them, whereas ACD is a consumer frontend that the user needs an Amazon account to access; there is also a pricing difference (Head in the Cloud 2014). ACD is closer to cloud storage, similar to Dropbox. Hale's research was done using a web‐based interface and the desktop application for ACD. ACD is marketed as an online MP3 player and storage system. An Amazon account is created, and the account credentials are used to access the storage. According to (Hale 2013), the browsing history files were the most forensically rewarding and left artifacts showing how the user interacted with the web‐based interface. Hale found within the web browser cache a specific file with useful information. The cache files that are the server response to getInfoById are issued after an upload or delete operation. The content of the cache files begins with the text getInfoByIdResponse, followed by a number of fields: File Name, Object ID, Amazon Customer ID, File Creation Date, File Last Updated Date, Cloud Path, File Size, and MD5. Hale's findings show that artifacts are left from a web‐based interface and can show dates and times of file transfers. As would be expected, numerous persistent artifacts were located when the desktop application was used. Artifacts were found in the registry and the application‐specific file AdriveNativeClientService.log.

13.6 Case Study 1: Finding Data Centers on the Internet in Data‐Dense Environments

During investigations in a data‐dense environment such as cloud computing, cybercrime investigators are frequently sent to data centers to collect evidence from computer hosts located in those data centers. It is becoming increasingly difficult to locate the data center where the computer host of the interest is running. While is it common to reach out to the involved hosting provider to ask for the data center’s address, there are situations in which it is not possible to contact the hosting provider ahead of time.

Law enforcement investigators are known for their experience with wiretaps to collect evidence. Although wiretapping could be an option, successfully wiretapping a server in a data center cannot be done without cooperation of the entity and access to the data center.

There are several reasons why a hosting provider might not be trusted with details of the involved computer host. For instance, an ISP might inform the user of the involved computer host prior to the investigator’s arrival. This user could then alter or remove evidence before the investigator could collect or intercept it.

Non‐law enforcement friendly ISPs require a different approach. Typically, an investigator will try to perform an online query like a WHOIS and/or a traceroute to find information about the host's physical location. These techniques have a relatively low rate of success. To address this problem of finding the (geo)location (Tillekens et al. 2016) of the involved data center, this chapter will propose a method for law enforcement members that increases the success rate of finding the correct data center location significantly, up to 80%.

During the research, the techniques currently used were evaluated, together with other techniques found during the research. The evaluation focuses on several indicators, including accuracy and usability for law enforcement. Data analytics are performed on the results of both a questionnaire as well as the geolocation techniques currently used by European law enforcement.

Based on the results, a new three‐phase guideline is introduced, which builds on known techniques and combines them with investigative techniques. The preliminary result is promising and provides useful suggestions for when the data center cannot be accurately pinpointed. The recommended three‐phase guideline is more accurate and tailored for law enforcement purposes.

The following approach is used:

  1. To allow for better insight in the techniques used by law enforcement, a questionnaire is completed by the digital investigators.
  2. The results of this questionnaire are combined with the results of the literature survey to formulate an overall state of the art of the most promising geolocation techniques for law enforcement.
  3. The most promising techniques are then reviewed using mostly publicly available tools and resources. The review is done using the target IPs from a test set. The test set contains IPs that are already geographically located by their owners.
  4. A new procedure is proposed and tested using the same method involving the combination of the most accurate techniques.

13.6.1 Traditional Techniques

Normalizing the results of the questionnaire and the literature survey generates an overall state of the art. It gives us an overview of the most promising techniques. The techniques were selected based on the number of times the respondents and literature mentioned the technique as useful. The level of accuracy of the technique was also used to make this selection. An overview of the greatest number of times the technique was used and the highest level of accuracy per technique helped to formulate the top five most promising techniques. Relevant techniques used in this case study in comparison with our three‐phase guideline are described in the following subsections: (i) traceroute analysis; (ii) WHOIS analysis; (iii) open source intelligence; (iv) routing analysis; (v) hop analysis; and (vi) previous data reported.

13.6.1.1 Traceroute

In a wide sense, we can see a traceroute from a single location, multiple traceroutes from different locations, and traceroutes from geolocated locations (landmarks). A traceroute provides several things: number of hops to connect to the target, the round trip time (RTT) from each hop to the target, and the fully qualified domain name (FQDN) of each hop. Delays (RTT) can help provide insight into how distant the target is from the hops.

13.6.1.2 WHOIS Analysis

In an EU country, the RIPE database (https://www.ripe.net/manage‐ips‐and‐asns/db) stores the details of whom IP addresses are assigned to. This is also called a network WHOIS. It should typically provide the autonomous system (AS) number, name, address, e‐mail, and phone number of the entity to which the IP address is assigned.

13.6.1.3 Open Source Intelligence

Open source intelligence is a technique of gathering as much knowledge on a specific target as possible. It involves extensive use of Internet search engines like Google. Normally, it is combined with a WHOIS to determine the entity the IP address is assigned to. It also involves visiting websites known to be closely tied to the targets. Relevant, previously unknown, online databases can also be part of this technique.

13.6.1.4 Routing Analysis

A routing analysis provides information about how IP address blocks (autonomous system numbers) are announced to the Internet. It also involves analysis of the peers of these IP blocks. Adjacent IP addresses in the same block typically are geolocated in the same datacenter, so it might be relevant to analyze them as well.

13.6.1.5 Hop Analysis

Hop analysis analyzes the FQDN domain names connected to the target IP address, or the IP addresses on its route to the target. Part of this technique is also the last‐known‐hop approach. The hops (almost) neighboring the target are considered to be physically very close to the target and are thus interesting subjects for geolocation as well. Analyzing DNS records is also considered part of this technique.

13.6.1.6 Previous Data Reported

This approach queries police systems. Since law enforcement investigations are showing an increase in the use of digital evidence, the chances of finding relevant information on the target IP address's location are increasing, as well. In addition, relevant information, like Chamber of Commerce records, can be obtained through police systems.

13.6.2 Three‐Phase Approach

In this section, we introduce a new procedure. It uses combinations of the techniques just reviewed and tries to minimize the amount of data that must be gathered. Overall, this new method should be faster and easier to deploy than gathering all the data from the six techniques just described and combining them into an end result. It is a three‐phase guideline involving: (i) data gathering; (ii) answering questions based on the gathered results; and (iii) making choices.

13.6.2.1 Phase One: Data Gathering

The gathering of the required data is performed first. To avoid unnecessary online queries, first a check is performed to see if the target IP has been queried before. The following steps are performed:

  1. Query the RIPE stat web page or RIPE API to collect this info: country, autonomous system number (ASN), AS‐name, prefix, Inetnum: netname, Inetnum: descry, Reverse DNS: PTR record, BGPlay: AS‐numbers closest to target ASN (up to three), and Registry browser: tech‐c (https://stat.ripe.net).
  2. Query police databases for the following: target IP address location, target IP block location(s), if available the corresponding date time(s), and if available the connected entity and its address details.
  3. Perform RIPE Atlas measurements using the web page or API for traceroute. Try to use probes of the same of target ASN or its closest peers, probe(s) that uses the minimum number of hops to the target, a probe that has the minimum RTT to the target, geocoordinates of these probes, the FQDN of the penultimate hop, and the ASN of the penultimate hop.
  4. Perform hop‐naming analysis to obtain the domain name of the host and hints in the FQDN referring to the location of the hops.
  5. Query peeringDB.com to obtain routing information for finding the web page of the ASN and extract peer facilities of the target ASN.
  6. Perform online open source intelligence (OSINT) queries to obtain the following: hits on “AS‐name” and “data center” and information on the hop hints found. Visit the website of the entity of the last‐known hop for the data center location info, visit the website of the host used by the target for data‐center location information, and visit the website running on the target IP (anonymously) for data‐center location information.
  7. Analyze all results, including looking for same or similar address records; mark them as validated where applicable, converting found addresses to geocoordinates where necessary, compare found geocoordinates, and plot them on a map, for instance using Google Maps.
  8. Compare the results with online records of data from different online sources such as http://www.datacentrumgids.nl/overzicht/nederland, www.datacentermap.com, http://www.telewiki.nl/Lijst_van_datacentra_in_Nederland_op_volgorde_van_plaatsnaam, http://map.ring.nlnog.net, Google Maps, and Yellow Pages.
  9. Generate an overview of obtained results, including target IP, owning entity including contact details, country of the target IP, police records, top‐x list of most‐probable locations of data centers including validation scores, and the degree of separation between the target IP and the suggested data center.

13.6.2.2 Phase Two: Answering Questions Based on the Gathered Results

By answering four questions, this guideline aims to help the user make the right decisions in phase three of the guideline (Figure 13.1). The answers are specifically relevant to law enforcement to do the following:

  • Help to determine if they can claim jurisdiction.
  • Shed light on previous encounters with law enforcement.
  • Find multiple validations of the result, which is preferred. It tells us that when different sources/techniques come up with the same result, the result is of a higher level of accuracy than if the result was provided by one source.
  • Provide insight into whether further research or action is required to locate the target.
Image described by caption and surrounding text.

Figure 13.1 Flowchart of phases two and three of the three‐phase guideline.

With the obtained data, the following questions need to be answered:

  1. Is the target IP located in this country?
  2. Was the target IP visited before by law enforcement?
  3. Is the location of the target IP validated?
  4. Need extra options to locate host?

13.6.2.3 Phase Three: Making Choices About What to Do Next

Depending on the outcome of the previously answered questions, decisions can be made:

  1. If Yes, proceed with question 2. If No, consider sending a legal request for assistance to the involved country.
  2. If Yes, this gives the highest level of validation. See if the location was visited recently and what the level of cooperation was. This knowledge could also change the status from non‐law enforcement friendly to law enforcement friendly. If No, proceed to question 3.
  3. If Yes, consider visiting the data center to perform digital forensics. If No, proceed to question 4.
  4. If Yes, the best option is to try to repeat step 3 from phase one for the cities where the suggested data centers are located. Redoing steps 6 and 8 of phase one to get a more detailed measurement could follow this step.

Although this new method is not flawless in pinpointing a single data center, it can serve additional purposes. For instance, if it cannot tell the exact location of the needed data center, it could still reduce the number of possible data centers to a workable amount. This could mean that digital investigators could visit two or three data centers at the same time to allow for control over the possible loss of evidence. This method could also give investigators indicators of which entity to contact for more information about the target IP. For instance, if the IP belongs to a client of a shady reseller, this reseller has most likely rented servers from a third party. This third party, an ISP, has its servers running in a data center belonging to another entity. The owner of this data center could be then contacted and asked for more information on the IP range, without revealing the target IP itself.

13.6.3 Experiments

A method is to be developed to pinpoint the location of data centers based on the IP address of a host. The method needs to be usable in data‐dense environments. Since this research is focusing on non‐law enforcement friendly ISPs, the method needs to be as undetectable as possible. The solution needs to be as accurate as possible; less than or equal to 10 km is considered to be accurate. The solution needs be easy to use and must work when there is a limited time frame available. It needs to be applicable for law enforcement so that its results can be used for court purposes. This means it should be repeatable and easy to explain in both reports and court, preferably by the investigators themselves.

A suitable test set of hosts at data centers needs to be available to review the known available techniques. It was not possible to obtain a test set of target IP addresses that are known to belong to non‐law enforcement friendly ISPs. This is due to the confidentiality of police records. Thus another, more neutral test set was found.

13.6.3.1 Platform

A suitable test set of hosts at data centers needs to be available to review the known techniques. It was not possible to obtain a test set of target IP addresses known to belong to non‐law enforcement friendly ISPs, due to the confidentiality of police records. Another, more neutral test set was found.

The Netherlands Network Operators Group (NLNOG) Ring is a network of hosts distributed over 51 countries. It has a total of 418 nodes, of which 72 are located in The Netherlands. All of them are tagged with geocoordinates. The main purpose of the NLNOG Ring is to give network operators remote shell access on all these nodes, to allow for network testing.

In this chapter, the PlanetLab test bed was used. PlanetLab basically has both a global‐based and a EU‐based platform. The global‐based platform has 1,353 nodes at 717 sites. The EU‐based platform is part of this global test bed and has 288 nodes, only 7 of which are based in The Netherlands.

RIPE Atlas (https://atlas.ripe.net/about) is a global network of hardware devices called probes and anchors that actively measure Internet connectivity. Anyone can access this data via Internet traffic maps, streaming data visualizations, and an API. RIPE Atlas users can also perform customized measurements to gain valuable data about their own networks. Probe owners collect credits by hosting a probe; they can share or use these credits to perform their own measurements. RIPE also has other options for obtaining these credits. RIPE Atlas has 925 probes in The Netherlands. Due to privacy concerns, the exact location of each probe is obfuscated to 1 km away. The probes are deployed in data centers as well as in domestic and business locations. This platform was chosen for several reasons, mainly because it has many probes in the Netherlands, but also because it has an API. It also allows for generating results in a computer‐readable format. Hence, it allows for automated tasks. Since the platform is open to anyone, law enforcement is also allowed to use it. Due to its public nature, RIPE Atlas also makes the results of every measurement public. After a measurement has been performed, it can be set to private.

RIPEstat (https://stat.ripe.net/index/about‐ripestat) is a web‐based interface that provides information on the IP address space, ASNs, and related information for hostnames and countries. It uses several sources, including the RIPE database (WHOIS), databases of other RIRs, RIPE Atlas, and MaxMind. Since it can output in a computer‐readable format and has an API, it will be easier to automate these tasks.

The rest of the testing, which did not have to rely on delay measurements and real‐time telemetry, was performed on a regular computer with an Internet connection. For instance, a browser was used to perform the Google queries and to visit websites for OSINT purposes. Where necessary, a virtual private network (VPN) or Tor connection was invoked to allow for anonymization. Command‐line tools were also used on this computer.

13.6.3.2 Findings and Analysis

13.6.3.2.1 Traceroute

The minimum hop counts in this test were between one and eight. 80% of the tested targets have a hop count greater than four. The lowest measured RTTs to the targets are between .459 and 5.48 ms. Two out of 10 targets show a hop count of less than three when tested. Target ID5 has a hop count of one. In addition, Target ID10 has a hop count of two. So although the NLNOG test set is handled by a different entity than the RIPE Atlas test bed, it appears that target ID5 and the RIPE Atlas probe are the exact same host. Interestingly, this probe reports a mean latency (RTT) of 3535 ms, which correspond to a distance of 176 km between the target and the probe.

Approaching the results from the perspective that all probes are landmarked, the distances from the probes to the targets are between 0.5 and 123.4 km. The two probes with the lowest hop count, for targets ID5 and ID10, have the lowest distance between probe and target, respectively 0.5 and 4.7 km. Considering the privacy obfuscation of up to 1 km for each probe, these two results are considered accurate enough for further investigative actions. Target ID3 has a distance to the probe of 14.3 km. Depending on the location in The Netherlands, for instance in a rural area with few data centers, this might be accurate enough. The mentioned Randstad area, where the target is actually located, has a need for better accuracy.

Although this technique is widely used by academia and the digital investigators of the LE agency, it does not prove to be highly accurate. This is where the variable of the data‐dense environment pops up. The Dutch infrastructure is fast and complex at the same time. Even when we know that both the targets and probes are located in The Netherlands, the minimum hop count is sometimes still eight hops. And although the minimum RTT times are quite low, its corresponding distances are still between 18.3 and 273.8 km. Overall, among the tested traceroute techniques, the landmark‐based tracerouting is the most accurate, with 20% of the results proven usable.

13.6.3.2.2 WHOIS Queries

The second‐most‐used technique of digital investigators has a higher success rate: 50% of the targets are geolocated based on the RIPE database entries. This level of accuracy can be debated, though. Because of the nature of the test set, they all need to be geolocated before they can be added to the NLNOG Ring. It is assumed that the owning entities have also paid extra attention to entering the correct details in the RIPE database. Results from the questionnaire state that in 30% of their encounters with non‐law enforcement friendly ISPs, the respondents had to deal with bad RIPE database records. This was the second‐most‐common difficulty the investigators mentioned. Overall, the WHOIS technique can provide accurate data up to 50% of the time; taking into consideration the nature of non‐law enforcement friendly ISPs, this percentage might drop.

13.6.3.2.3 OSINT

Since gathering WHOIS data, which could also be part of the OSINT process, was already done, the possible location obtained from WHOIS records were omitted. For 90% of the targets, locations of possible data centers were found. The number of possible locations found averaged between one and four per target. Five of the targets (50%) had the correct data center location in the results. Two of the targets (20%) had information in the results that was not exactly geolocated, but within 4 km. One target (10%) had enough hints in the results to come up with an entity that has more than one data center in The Netherlands. The OSINT technique was more difficult to review; this is why the proposed method for OSINT in subsection 4.4.3 used limited queries to allow for better reviewing.

Open source gathering of intelligence is almost limitless in its options. There is no guideline available for an OSINT process for finding data centers belonging to target hosts. Every investigator uses their own set of OSINT sources to query, based on personal experience. Overall, the OSINT technique can provide accurate data up to 50% of the time; results depend on the OSINT skills of the investigator.

13.6.3.2.4 Hop Analysis

Hop analysis showed relevant hints in the results for 30% of the targets. The FQDN of the last hop before the target, the penultimate hop, was the source for these results. The hints were not sufficient to provide hard evidence. An example is 80ge.br3‐cr1.smartdc.rtd.i3d.net, where two hints – “smartdc” and “rtd” – were extracted. “rtd” could be short for the city of Rotterdam and “smartdc” for the SmartDC company that owns two data‐center facilities, one of which is located in Rotterdam. The last hop (the target) and the antepenultimate hop did not provide hints on the location in the FQDN. Overall, the hop‐analysis technique can provide indicative data, but typically not 100% accurate data.

13.6.3.2.5 Routing Information

One (10%) target could be accurately pinpointed to a data center using routing information from peeringdb.com. Indeed, 30% of the targets could be located down to a few options, including the correct one, using the peeringdb.com source. 20% of the targets did not have data entered in the peeringdb.com database, so no geolocation was possible using this resource. 40% had results, but the correct data center was not among them. Overall, the routing technique provided accurate data up to 10% of the time.

13.6.3.2.6 Previous Data Reported

None of the target IP addresses appeared in police databases. But 30% of the target classless inter‐domain routing (CIDR) prefixes appeared in police systems. Unfortunately, none of them had a report about a visit to the involved data center. Nor was there any reporting of contact with the entity the IP block belonged to. During an assessment of the overall trustworthiness of this entity, no relevant data was found in police systems about the target. This could be due to the fact that the test set did not consist of known non‐law enforcement friendly ISPs. This technique can be proven successful in the future, though. Police systems can be queried more extensively: for instance, Europol databases can be added to these queries. It is a basically a circle: if police officers report all encounters with an IP address correctly into police databases, eventually more hits will be generated when the geolocation of an IP is queried. If better reporting on previous geolocation efforts is done in police systems, including the results of visits to data centers, this technique may show better result when investigating real non‐law enforcement friendly ISPs.

13.6.3.2.7 Three‐Phase Approach

Using the three‐phase guideline, 80% of the targets can be accurately located to a specific data center. The two added techniques in phase one (comparison with online databases with data center locations) and phase three (using new measurements from RIPE Atlas probes) are responsible for the improvement. The remaining 20% of the targets, ID4 and ID10, did get more suggestions about which data center(s) to look at. Interestingly, the most difficult target appears to be ID10. Based on the techniques used, it appears to be located in a data center 4.7 km from the real location provide by the NLNOG Ring test‐set host. The real NLNOG Ring test‐set location corresponds with the headquarters of the target ID10. The RIPE Atlas probe, which appears to be geographically closest to target ID10, is owned by the same entity as the target host, but located at a university at the given distance. This could mean the measurement is not correct, but it could also mean the entered location of ID10 is not correct (anymore). ID4 gets two suggestions for its data center location; they are geographically close to each other, about 4 km. In this scenario, an option could be to simultaneously visit both data centers. This allows the investigators to look for the exact location of the target host. By visiting the data centers at the same moment, the investigators have control over the possible altering or removal of evidence, by keeping an eye on the personnel of the data centers and not allowing them to communicate with customers about investigators' presence. The guideline helps investigators remember which data to gather and helps to avoid errors while gathering data.

Both the state of the art and the review of available techniques show that combining techniques is the best approach to pinpoint data centers. Only one of the researched six techniques provided no results. This was the previous‐data‐reported technique. Querying police systems did not give the desired results. The other five techniques combined provide accuracy up to 70%. This included at least one validation. The three‐phase guideline helps investigators with a workflow for gathering all the data necessary to obtain at least the previously mentioned 80% accuracy. The guideline prevents investigators from forgetting relevant steps. It also helps to avoid unnecessary steps during the geolocation process and guides investigators through a decision process to suggest possible next steps. The guideline is easy to use and not necessarily meant for digital investigators alone. Regular investigators with some knowledge of online investigations will be able to pinpoint data centers as well, with this guideline in hand. The guideline can also be used to pinpoint geographically dispersed data centers of cloud service providers. This could be relevant to a broader audience with interest in where their data resides (Table 13.1).

Table 13.1 Accuracy of reviewed techniques/methods.

Technique reviewed Accuracy (%)
Traceroute 20%
WHOIS queries 50%
OSINT 50%
Hop analysis N/A
Routing information 10%
Previous data reported N/A
Toward proposal, combinations 70%
Three‐phase guideline 80%

13.7 Case Study 2: Cloud Forensics for the Amazon Simple Storage Service

Cloud storage services are often free to customers. The customer just needs to sign up with an e‐mail and will receive a limited amount of available storage space on the Cloud. This is considered cloud computing. Cloud computing is a shared collection of configurable network resources, such as networks, serves, storage, applications, and services. Customers can log in to the cloud service via a web browser and upload or download files. Customers can also increase storage space. The cloud service providers (CSPs) provide servers and storage space. To ensure service, CSPs maintain data centers around the world (Ruan et al. 2011). Data may not necessarily be located in just one place; and the location of the company headquarters does not necessarily mean data is stored there. This adds a level of jurisdiction. In the United States, for example, one state's law enforcement agency may have to have a federal law enforcement agency produce a subpoena or search warrant because the “data” resides across state lines.

In a cloud investigation, there is the client side, the CSP side, and the matter of law (whether the investigator / digital forensic examiner [DFE] has the right to search and seize). The client side is the traditional type of computer/digital forensics, where the investigator has access to the suspect's computer or mobile device (Faheem et al. 2015). The device is at a physical location, and jurisdiction is relatively easy to show. The device in question contains artifacts. This client approach has worked well in the past, but files are no longer in a persistent state on the client side and have shifted to being web‐app based and leaving little trace (Roussev and McCulley 2016). Jurisdiction needs to be addressed. This is not an attempt to give legal advice in any form; it merely reflects the difficulties of multi‐jurisdiction investigations and difficulties encountered when dealing with cloud computing. As mentioned before, CSPs use servers and data centers not centrally located or confined to one location.

An example of this difficulty is seen with Amazon Web Services (AWS). With AWS, the user can select where they would like their data stored. The corporate headquarters may not necessarily be the location of the data. CSPs intentionally hide the location of the data from customers to facilitate movement and replication.

Another scenario would be for the investigator to connect to the Cloud and download files with the user's credentials. This lends a question of legal authority. If the user gives consent freely and willingly to search the account, then the investigator can connect and download the contents. However, if the user does not give consent, then the investigator would need some kind of legal authority to connect to the cloud storage/account with the user's credentials and download data.

There is also the possibility the owner of the cloud account will cooperate and give full authority for a search. If the investigator can gain access to an account in their jurisdiction, they can intercept the communication.

13.7.1 Approach and Experiments

The examination began with the client side (user side). The focus of this research was Amazon S3 (AS3). A search was conducted for any logs, images, and Internet history left from user activity, both in a persistent state and in volatile memory (RAM). Using a non‐test computer, an AS3 account was set up. This is a computer not used as a base machine or in testing. The account with contents is already established and accessed by a different computer. AS3 uses buckets to store files. These buckets are similar to file folders. For the experiment, the buckets were created and the region set as Sao Palo, Brazil. A region outside the US was used for storage to see what artifacts, if any, could be seen and to show that the files were actually stored outside the US. The ease of accessing and viewing the files contained in the bucket was also examined. The main target bucket was named minionswi. Images of the well‐known cartoon Minions were used as substitutes for contraband images or files. The username was the e‐mail address **********@gmail.com (redacted).

Some initial testing using the bucket something1 was done, downloading and renaming files stored in the bucket. Once the files were in the bucket, AS3 provided information on the files. AS3 showed that Amazon uses an ETag. This ETag is a MD5 hash value. AWS references the use of MD5 hash values. According to Amazon, the ETag may or may not be an MD5 hash value, depending on how the file was created, such as a file that is a multipart upload (http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html. The online reference provided by AWS says the AS3 compares the returned ETag with the MD5 hash value. Using a hash calculator, HashCalc v2.02, downloaded hash values were calculated. The hash values remained the same after download and upload. A verification was done, comparing the original file on the computer, and the hash stayed the same. Also, changing the name in AS3 did not change the hash or ETag. For the client‐side part of the investigation, VMs were used for testing. The VMs were created using VM Workstation Player 12.1.0. These VMs simulated what would typically be found in a home computer. The VM RAM file, *.vmem, was used for RAM analysis.

FTK Imager v 3.4.26 was used to create an evidence file from the test VMs. The evidence file was analyzed in Autopsy 4. For collection on the cloud service side, we used CloudBerry Drive. This program was used because it will mount cloud storage as a network drive. The drive can be mounted as read only.

There are some considerations with this method. AWS uses a security key for the individual. The key needs to be known. For the investigator to access AS3, it is necessary to know the credentials and password and log in to the AWS console. Then access the security key through the user console, and there is an option to download an access‐key file. A user may store this key file on the computer. If a preservation order has been sent to Amazon for the account, and Amazon blocks the user, it is likely the investigator/DFE cannot access the account by this method using the user's credentials. The security key is needed for CloudBerry Drive to connect.

13.7.2 Findings and Analysis

The first VM started with the base VM clone. IE 11 was used to search for images or videos of Minions. A video of the Minions movie trailer was downloaded to the desktop of the first VM. Images of Minions were downloaded to the desktop of the first VM. IE11 was used to log in to the AS3 account. Four image files and the .mp4 file were uploaded to the AS3 bucket minionswi.

When examining the first VM, it was clear the Minion files had been saved onto the VM desktop. Temporary Internet files contained the images in question. The Internet history showed a connection to AS3. IE changed after the introduction of IE 10; prior to IE 10, history resided in the index.dat file. The biggest change was that now IE used an extensible storage engine (ESE) named WebCacheV01.dat and the folder WebCache, plus other files (transaction logs) that work together (Malmstrom and Teveldal 2013). When using a web browser to access AS3, there is a web page for the login credentials.

Information regarding files was located in a WebCache log in the V01.log (path: Users/RCRAIG/AppData/Local/Microsoft/Windows/WebCache/V01.log). The ETag (md5Hash) of the Minion files was recorded. The temporary Internet files of the Minions could have been created by the Google search for such images, normal web‐browsing activity, and may not have been directly related to the AS3 account. Links were found with the Etag, filename, and file path pointing to the AS3 web URL. There was also a reference to the date and time in this artifact.

The second VM would give a better idea what was created just by viewing the files using IE 11 but never downloading the images of Minions. Using Autopsy 4, artifacts were indeed found from viewing through the web browser. V01.logs correlated with the Minion images.

To begin the activity of the user on the second VM, we went to the AS3 console using IE 11, on 15 June 2016 at 4:31 p.m. (UTC‐5). Once logged in, minionswi was opened. The video in the bucket, 518752464_2.mp4, was opened. It automatically downloaded and was opened in the default player, Windows Media Player. At 4:38 p.m. (UTC‐5), each file in the bucket was opened to view. All the viewing was done through the web browser. The files were viewed in this order: Beefeater.jpg, cornerpicture.jpg, download.jpg, napoleanic.jpg, th2P068ZMZ.jpg, th4Z9C28AE.jpg, thCA0PWCAL.jpg, and thCaS1V76V.jpg. The file napoleonic.jpg was downloaded to the desktop. Analysis of the second VM showed activity in the V01.log. A record of the V01.log is shown in Figure 13.2.

Image described by caption and surrounding text.

Figure 13.2 V01.log.

This record correlates with the viewing of the image napoleanic.jpg. The relevant portions are highlighted. Note the URL path is the same as the link in the bucket. The blue highlighted data is an actual link to the file. This was compared to the download link in AS3 in an open browser. The paths of the bucket were the same, but when the link was created, the date and time indicate when the link is created in view/open or download. To make it clearer, when a user wants to view a file or download a file, they need to right‐click the file. The user can choose to open the file or download it. When they choose Open, the link is created and opens in a new window of the browser. When Download is chosen, the user needs to Save Link As and select where to save the file. There is reference to a date and time, 20160615T213912Z. This time appears to be in UTC‐0. The viewing of this image correlates with the time period 15 June 2016 at 4:38 p.m. (UTC‐5). There is also a reference to the ETag, and it is the correct ETag for that file. This record also corresponds to the creation of the temporary Internet file napoleanic.jpg. Even the MD5 hash value is the same as the ETag. The image is viewed and is cached in the temporary Internet files. The RAM file was searched for the MD5 hash value in text. Records were found in the RAM that appear to be the same as found in the V01.log (Figure 13.2). Similar findings were located that detail the same information regarding viewing of the other files in question. The napoleonic.jpg file was also downloaded onto the desktop. A link file also shows it located on the desktop. The file in question was downloaded at the approximate modified, accessed, created (MAC) date and times.

The /AppData/Roaming/Microsoft/Windows/IEDownloadHistory/container.dat file did not contain information regarding the download.

A third VM was created, called Browse_Deleted. In this scenario, a user logged in to the AS3 using the web browser and viewed the files bunchofminions.jpg, th2P068ZMZ.jpg, and twominions.jpg in the bucket “minions.” The web‐browser history was also deleted. This activity was done to see what artifacts were left. The three files viewed were located in unallocated space. The files th2P068ZMZ.jpg, bunchofminions.jpg, and twominions.jpg had the original MD5 hash values. Shown in Figure 13.3 is the file information of the files located in unallocated space. Since the three files were found in unallocated space, there was no filename.

A box displaying a minion with the text Original file name: th2P068ZMZ, MD5Hashb:67818…, SourceBrowse_Deleted_History.E01- (Unallocated Clusters), and Located At Physical Sector 64135944.

Figure 13.3 Example of file information for the files located in unallocated space.

Even though there was an attempt to delete the Internet history, references were found in the WebCache folder in a log. This was consistent with a previous test that showed viewing of a file through AS3 using the web browser. Looking closely, the file length listed for the entries is the correct file size.

13.7.2.1 Collecting Evidence via Internet

CloudBerry Drive is able to mount an AS3 account as a read‐only network drive. To do this, an investigator/examiner must have the user's credentials (username and password). Using CloudBerry Drive also requires the access key and secret access key to be known. With these keys, anyone can log in to the buckets, so a user could share buckets with others. Logging in to the AS3 account with the users credentials, an investigator could find the access key but not a secret access key. This creates a problem. A user can only have two access keys at one time. In order to gain access, a new access key and secret access key will need to be created. If only one set is made, an additional one can be created. If two are made, the oldest key could be deleted and a new set created. This is all done through the web browser and is located in the AWS console (Figure 13.4).

Create Access Key dialog box displaying the expanded list button labeled Hide Access Key with a blank text box for Access Key ID and Secret Access Key. At the bottom are 2 command buttons for Download Key File and Close.

Figure 13.4 Access key.

For testing, CloudBerry v2.0.1.6 was installed on an examination computer with Internet access. The evidence file was verified successfully. MD5 hash values were compared with the files in minionswi, and the hashes were the same. One thing of note that has not been mentioned is that AS3 has the option to turn on or off versions of files. If a file is deleted, AS3 does save an older version. When the option is on, deleted files can be viewed when using the web browser interface. CloudBerry Drive did not see these older versions when the option was on or off. That could be important if the user of the account has tried to delete evidence.

13.7.3 Discussion

AS3 is ideal for web service providers, but anybody can create a free account and have storage. If the person wants to expand their storage, they can always purchase more. If the person is sharing files, they could give out the access keys and share with other users. The user does not have to download a desktop application to view, download, or upload files. All the person needs is a computer with an Internet connection and a web browser.

The first hurdle in investigating a suspect is the legal authority to seize evidence that is stored on cloud storage. This is not just a problem in the United States, but also in other countries such as the UK. One way to seize evidence stored on cloud storage would be to obtain a search warrant and serve the CSP. This could be an issue because the CSP may be in a foreign country and may not recognize the courts of another country. This was the scenarios during testing, when the buckets were owned by Amazon, but the files were located in the region of Sao Palo, Brazil. This process relies on the CSP to provide the evidence. It is important that the search warrant also asks for the right files. One thing found in the research is that AS3 and other CSPs keep older versions of files (Roussev et al. 2016). The older versions actually contained delete files. Any warrant served should contain language such as, “all versions of files attached to the account.”

The person the account belongs to could give permission, but they will need to provide certain credentials to access the account. The person also has to freely consent to such a search.

Cloud storage accounts can be anywhere in the world. During an investigation, credential information might be located, such as a key file. Files can be considered electronic communication. If the target files are located in another jurisdiction, and the investigator could obtain authorization for a wiretap, they could log in to the target account and acquire data.

Testing showed that AS3 uses an ETag that is an MD5 hash. There are numerous references to the ETag in the Internet history, and a specific URL that details the date and time, the file path being viewed, and the file length. Files that were located in the bucket and were not viewed did not create URLs and ETag references. The image itself is stored in the Internet temporary files. When the image is hashed in the temporary Internet files, it has the same hash as the ETag. A user viewing a contraband image will leave artifacts. Finding a contraband image in unallocated space usually has no reference. In the case of a suspect using AS3, taking the MD5 hash value of the contraband image found in unallocated space and doing a keyword search can give reference to that contraband image. The Internet history, or even whether the AS3 URLs are located in RAM or unallocated space, can tell an investigator the time and date viewed, and the bucket file path. This activity shows the user actively opened individual files to view. This information at a minimum could be used to obtain a search warrant for the AS3 account. If no contraband images were found, the investigator could still do a keyword search with known “bad” MD5 hash values and get results indicating viewing of contraband images through the URLs. This would point to the AS3 account and the likelihood that it contains contraband. Using an application such as CloudBerry Drive has some limitations. The user credentials are a must, and the user may refuse to cooperate, even if there is a court order. The access keys are needed and may not be accessible. Deleting one of the sets of keys to get new ones may be outside legal authority. Connecting to cloud storage and acquiring the data maintains chain of custody and does not rely on the CSP. However, some files could be missed, such as older versions. Client‐side forensic examination of the suspect's computer yields useful information and direct correlation of evidence with activity. It can be said with a high degree of certainty that a user viewed a contraband image when the URL, WebCache log files, ETag, date and time, and image MD5 hash values are all connected.

13.8 Conclusion

The usage of cloud storage for criminal purposes is well known and will most likely not end. The eradication of crime is a noble cause, yet not one that seems to be a realistic prospect, without any intention of entering into a philosophical discourse on human nature and societies. The fact that criminals and crime and their methods will continue to evolve in pace with and take advantage of technological development and innovations is equally acknowledged. In this chapter, we discussed aspects related to data acquisition in cloud computing platforms. We also showed an efficient approach to locate the data center where the computer host of interest to investigators is running. In addition, we described a forensic acquisition and analysis of Amazon Web Service, one of the most popular cloud storage platforms.

Obtaining data stored on cloud storage is problematic. The cross‐jurisdictional nature of the CSP creates issues, and no specific laws deal with cloud storage. Existing laws are used and may not be the most useful. There is still a dependency on the CSP complying and providing the contents of cloud storage. Further research would have to be done on how a folder is treated when it is uploaded to AS3. In addition, the number of non‐law enforcement friendly ISPs is increasing over time. Further research into this phenomena is advised, including the advice to better report encounters with non‐law enforcement friendly ISPs. In addition, more intrusive methods can be explored: for example, wiretaps on upstream connections (Schut et al. 2015), not for content, but only for IP addresses, might also help to improve on results.

References

  1. Bernaards, F., Monsma, E., and Zinn, P. (2012). Hightech Crime: Crime Image Analysis. Driebergen: Korps Landelijke Politiediensten.
  2. Dykstra, J. (2013). Seizing electronic evidence from cloud computing environments. In: Cybercrime and Cloud Forensics: Applications for Investigation Processes (ed. K. Ruan), 156–185. Hershey: IGI Global.
  3. Faheem, M., Kechadi, M.‐T., and Le‐Khac, N.‐A. (2015). The state of the art forensic techniques in mobile cloud environment: a survey, challenges and current trend. International Journal of Digital Crime and Forensics 7 (2): 1–19.
  4. Fuchs, C. (2013). Privacy and security in Europe. The Privacy & Security Research Paper Series, No. 6.
  5. Goncharov, M. (2015). Criminal Hideouts for Lease: Bulletproof Hosting Services. Irving: TrendLabs.
  6. Grispos, G., Storer, T., and Glisson, W.B. (2012). Calm before the storm: the challenges of cloud computing in digital forensics. International Journal of Digital Crime and Forensics 4 (2): 28–48.
  7. Hale, J.S. (2013). Amazon Cloud Drive forensic analysis. The International Journal of Digital Forensics & Incident Response 10 (3): 259–265.
  8. Hiller, J. (2015). Civil Cyberconflict: Microsoft, cybercrime and botnets. Sanata Clara High Technology Law Journal 31 (2): 163–214.
  9. In Re Grand Jury Proceedings the Bank of Nova Scotia. (1984). United States of America, Plaintiff‐appellee, v. the Bank of Nova Scotia, Defendant‐appellant, 740 F.2d 817 (11th Cir.).
  10. Head in the Cloud (2014). Amazon S3 and Amazon Cloud Drive: what's the difference? https://web.archive.org/web/20140608040735/http://www.headinthecloudstorage.com/amazon‐ec2‐and‐s3‐whats‐the‐difference.
  11. Malmstrom, B. and Teveldal, P. (2013). Forensic analysis of the ESE database in Internet Explorer 10. Bachelors thesis: School of Information Science, Computer and Electrical Engineering, Halmstad, Sweden.
  12. Mell, P. and Grance, T. (2011). The NIST definition of cloud computing. National Institute of Standards and Technology.
  13. Mittal, S. (2014). Power management techniques for data centers: a survey. Technical report. https://pdfs.semanticscholar.org/c389/9a699ecd188e658374c861b3b21e80c364e4.pdf (accessed July 2017).
  14. Nicolls, V., Le‐Khac, N‐A., Chen, L. et al. (2016). IPv6 security and forensics. The 6th IEEE International Conference on Innovative Computing Technology, Dublin, Ireland, August 2016.
  15. Nishawala, V.N. (2013). Subcontracting in the Cloud. Pillsbury Global Sourcing. http://www.sourcingspeak.com/2013/06/subcontracting‐in‐the‐cloud.html (accessed May 2017).
  16. Plunkett J., Le‐Khac N‐A., and Kechadi, M‐T. (2015). Digital forensic investigations in the Cloud: a proposed approach for Irish law enforcement. 11th Annual IFIP WG 11.9 International Conference on Digital Forensics (IFIP119 2015), Orlando, Florida, 26–28 January 2015.
  17. Quick, D., Martini, B., and Choo, K.R. (2014). Cloud Storage Forensics. Waltham, MA: Syngress.
  18. Roussev, V. and McCulley, S. (2016). Forensic analysis of cloud‐native artifacts. The International Journal of Digitl Forensics & Incident Response.
  19. Roussev, V., Barreto, A., and Ahmed, I. (2016). Forensic acquisition of cloud drives. In: Advances in Digital Forensics XII (ed. G. Peterson and S. Shenoi), 5–8. Springer.
  20. Ruan, K., Carthy, J., Kechadi, T. et al. (2011). Cloud forensics. IFIP Advances in Information and Communication Technology 361: 35–46.
  21. Ryder, Steven and Le‐Khac, N‐A. (2016). The end of effective law enforcement in the cloud? To encrypt, or not to encrypt. The 9th IEEE International Conference on Cloud Computing, San Francisco, CA USA, June 2016.
  22. Schut, H., Farina, J., Scanlon, M. et al. (2015). Towards the forensic identification and investigation of cloud hosted servers through noninvasive wiretaps. The 10th International Workshop on Frontiers in Availability, Reliability and Security (FARES 2015), Toulouse, France, August 24–28, 2015.
  23. Tillekens, A., Le‐Khac, N‐A., and Pham‐Thi, T‐T. (2016). A bespoke forensics GIS tool. IEEE International Symposium on Mobile Computing, Wireless Networks, and Security, Nevada, USA, Dec 2016.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset