CHAPTER 2
The Proliferation of Data Available for Discovery

James P. Martin and Harry Cendrowski

The same technologies that make cloud computing possible (i.e., the pervasive availability of high-speed wide-area network connections for devices of all kinds) have also enabled the capture of data related to virtually every aspect of our lives. Smart devices are in communication with central data repositories exchanging information about our personal habits and preferences, where we go, and what we do. E-mail and other communication are available at our fingertips, no matter where we are, and a host of social media applications keep us in constant communication with friends and business contacts; this data is also correlated in databases designed to help marketers serve us with advertising intended to be specific to our needs.

The industry and technologies that capture and collate this expansive set of personal information is often referred to as Big Data. Big Data vendors promise enhanced customer interactions, more specifically, the ability to more accurately target potential buyers at the moment they are considering a purchase; this is their stated reason for collecting all that data. Device and operating system manufacturers design their systems to capture and share data with Big Data, and, in turn, utilize big data to refine their offerings. Advertising revenue driven by Big Data is the engine that makes many of the services available in the cloud “free” of cost.

From a litigation and investigation perspective, there is more data available about a subject than at any previous time in history. Much of the data captured would be of direct interest to investigators or parties to litigation depending on the particular legal issue at hand. Cases emerge every day in which attorneys and investigators apply an innovative investigative analysis to data to create a picture of the subject or to support theories about events leading up to the case. Today, investigators and attorneys need to understand the types of data available about subjects, and how to access such data.

Data can reside on a local device, for example, a personal computer, tablet, or smartphone, or it can reside in the hands of a third-party service provider, like Google, or other third party. Technically, accessing data on a local device is relatively easy, and accepted forensic procedures are defined to allow such data to be used in a legal forum. From a legal perspective, accessing data on a local device can become complicated depending on specific circumstances of the case and the device; many of the nuances of such discovery are covered in detail in this book.

For civil litigation, discovery often hinges on ownership or control of a device. Some devices have clear ownership: for example, a company-owned computer installed at an employee’s workstation. Many companies have computer use policies that clarify the ownership of the device and all data included on the device. Under current laws and regulations in the United States, a company should have little trouble examining the device for relevant data and using that discovery for litigation purposes.

On the other hand, some companies allow employees to bring a personal tablet or smartphone to the workplace, and allow them to be used for business purposes. This has become known as Bring Your Own Device (or BYOD), and it tends to muddy the discovery process. Courts are still deciding whether the company has a right to access the device without the owner’s permission, and to what extent such data can be used. Divorce cases frequently involve the family computer, one that is used by both parties to the litigation, and may contain financial information about the family unit, or each party individually. Under many circumstances, either party would be able to access the device and obtain the underlying data; however, in certain circumstances, access may become complicated.

On the criminal side, examination of a device may fall under criminal search and seizure procedures that may not yet be universally applied. If a suspect is arrested, do the police have a right to examine the contents of a smartphone found in his possession? What about computer drives or flash drives? Some law enforcement agencies have claimed a digital device is a “container” that is subject to search the same as a suitcase or purse. Courts have differed on the interpretation, with some jurisdictions requiring an additional search warrant and some that allow the search.

Accessing hosted applications, including e-mail systems, social media sites, and cloud computing solutions, is often driven by identification of the “subscriber” who would have control over the account. The subscriber is often the person who can authorize disclosure of the contents of the systems, and courts differ on the procedures for compelled disclosure. Discovery of data hosted by third-party service providers is generally covered by the restrictions of the Stored Communications Act (SCA) of 1986, which prohibits a service provider from releasing information to a third party. Accessing data held by a third-party provider is frequently the subject of litigation, and courts frequently differ about the type of data protected by the SCA and procedures to access that data.

An Example of Third-Party Data: Google Search Engine

Google is not exclusive in their data-gathering activities; these are highlighted here as an illustration of the extensive types of information captured.

The Google search engine records and collates every search a user has ever performed when logged into Google services, irrespective of the device with which the search was initiated or the Google service that was used. Google services include web search, image search, news search, Gmail (both personal and corporate), YouTube, and Maps, among others. Once a user signs into Google services on a device, Google starts tracking search activity. Let’s say a user signs into their personal Gmail account at work, and performs some web searches using Google. Then they go to lunch and use Google maps to find the restaurant. Both searches will be recorded in the search history for that Google account. Google also accesses and stores information directly from the mobile device, including call information and location data.

Google describes collecting such information in terms of providing a better experience, like providing advertising specific to your interests or finding the people who matter most to you online. The extent of data collected by Google is expansive, and is described in their privacy policy:

  • Information we get from your use of our services. We may collect information about the services that you use and how you use them, like when you visit a website that uses our advertising services or you view and interact with our ads and content. This information includes:
    • Device information
    • We may collect device-specific information (such as your hardware model, operating system version, unique device identifiers, and mobile network information including phone number). Google may associate your device identifiers or phone number with your Google Account.
    • Log information
    • When you use our services or view content provided by Google, we may automatically collect and store certain information in server logs. This may include:
      • Details of how you used our service, such as your search queries.
      • Telephone log information like your phone number, calling-party number, forwarding numbers, time and date of calls, duration of calls, SMS routing information, and types of calls.
      • Internet protocol address.
      • Device event information such as crashes, system activity, hardware settings, browser type, browser language, the date and time of your request, and referral URL.
      • Cookies that may uniquely identify your browser or your Google Account.
    • Location information
    • When you use a location-enabled Google service, we may collect and process information about your actual location, like GPS signals sent by a mobile device. We may also use various technologies to determine location, such as sensor data from your device that may, for example, provide information on nearby Wi-Fi access points and cell towers.
    • Local storage
    • We may collect and store information (including personal information) locally on your device using mechanisms such as browser web storage (including HTML 5) and application data caches.
    • Cookies and anonymous identifiers
    • We use various technologies to collect and store information when you visit a Google service, and this may include sending one or more cookies or anonymous identifiers to your device. We also use cookies and anonymous identifiers when you interact with services we offer to our partners, such as advertising services or Google features that may appear on other sites.1

A user’s Google search history is available at www.google.com/history; the user must sign in with their Google credentials to view their account. Once in the account, specific items may be flagged and removed by the user.

Consideration of Data Points in Discovery

In a data-driven world, discovery procedures must be updated to consider data points from new sources. They should also be flexible, based on the particular circumstances. The age-old concept of “modus operandi” now should include a subject’s interface with the digital world; in many cases, the subject may not even realize he is creating data points. Considering available data points is a different process than attempting to access the data, which will present technical and legal hurdles; the first step is to consider what data may be available regarding the subject. Like many other areas of litigation, this is an arduous task, as the possibilities may seem limitless. However, to facilitate the process, it is often helpful to break the possibilities down into more manageable subcategories. Below are common examples of data created in the category described; it would be impossible to innumerate all possible data sources within a single book. Also, please consider that systems and applications change extremely rapidly, and newly introduced technologies create new categories of data.

User-Created Data

This category includes data that is overtly created by the subject, and includes things like e-mail accounts and social media accounts. Depending on the nature of the case, accessing data of this type could be helpful to prove intent, prior knowledge of an event, involvement, or planning with others. Social media is commonly thought of as Facebook, Twitter, and LinkedIn. However, many other sites should be given consideration in certain circumstances:

  • Photo hosting sites. A common cloud application is the photograph server. These sites allow a user to host pictures and videos for free or for an annual fee. In certain cases, such photographs or videos may provide valuable evidence regarding the case.
  • Dating sites. An offshoot of social media is the dating site; many sites exist that cater to a particular market segment, for example, christianmingle.com, ourtime.com, or farmersonly.com. One site, ashleymadison.com, is specifically intended for married persons to meet other married persons. Particularly in divorce actions, identification of a dating profile could be key to the case.
  • People-meeting sites. As more of an informal dating site, apps like Tinder or Skout allow a user to “broadcast” their picture and brief biographical information to the surrounding area. Users may “like” a profile, and users that like each other are placed together in a chat.

While many online users are careful with their online posts, others are not so cautious. Many cases are solved through social media content. For example, Anthony James Lescowitch, Jr. was wanted by Freeland, Pennsylvania, police for several months. Lescowitch was apprehended within two hours of sharing the police department’s wanted poster of himself on Facebook. An officer posed as a woman wanting to meet Lescowitch, and when he arrived at the meeting was promptly arrested.2

E-mail accounts are most frequently identified by cross-e-mailing from known e-mail accounts (e.g., a person uses their work e-mail account to forward a document to a personal e-mail account). Additionally, it is common for an e-mail site to use another e-mail site as a password recovery option. For example, in the event of a forgotten password, a Yahoo! account may forward a password recovery link to the user’s Gmail account; the Gmail account may have content identifying the Yahoo! account. In such a case, the recovery email account would have been sent a confirming email that may still exist on the account. Web searches including the subject’s name and interests may also identify previously undiscovered e-mail accounts. Many hosted e-mail platforms also come with document storage capabilities, for example Google Drive or Microsoft Hosted SharePoint. Identification of an e-mail account should include determining what other services are provided with the account that may be of interest to the case.

Another cloud application with particular importance to litigation is hosted accounting and financial systems. These may range from detailed ledger systems to financial tracking applications to online investment accounts. These are frequently discovered by analysis of e-mail records that provide updates and sales information to the user.

Given the wide variety of sites and applications available through the cloud, it is impossible to create an exhaustive list of all potential sites of interest. Clues to the use of a cloud application may be found in e-mail accounts, as well as through the devices themselves. For example, examination of browser favorites may reveal saved links to dating sites or financial sites. Desktop icons may link to web pages of interest. Internet browser history may additionally reveal prior visits to sites of interest. Also, credit card statements should be examined for payments to online services, including dating sites, financial sites, and online archives.

It has become increasingly common to provide an interrogatory asking the person to identify any and all cloud-based solutions utilized by the subject, including social media and e-mail accounts, and including account information (such as handle or user ID) and dates the accounts were defined. The resulting list can be compared to the results of investigative actions for accuracy and completeness.

Data Created about the User

This category includes data created through different real-world transactions and events that may be important to a case. We create data as we move throughout the world and conduct routine business transactions; often we don’t even stop to think about the data we create and what it says about us. Depending on the nature of the case, this data can be extremely interesting.

Different courts have interpreted data of this type in different ways. Some courts have held that this is data protected by the Stored Communications Act, while others have held that these are the business records of the company, and that authorization to capture the data is provided as a part of using the service. Care should be exercised when attempting to access data of this type:

  • Electronic security systems. Electronic security systems have become pervasive, and have replaced keys in many environments. These systems provide a near-field token or RFID chip in a card or key fob that is placed in proximity to a reader (card reader or proximity sensor) to activate the locking mechanism. The security system can be defined to limit access based on the day of the week or time of day. The system also maintains a log of the card or fob presented to the reader and the time stamp it was presented. This allows a report to be created of (1) all the cards that were presented at a reader and the time they were presented, and (2) the presentation history of a single card; in essence, a history of a person accessing doors within the organization.

    The use of such systems has grown beyond the corporate setting. Many colleges and universities have implemented an electronic security system that requires residents living in the residence calls to have a card to access their hall or floor.

  • Transportation systems. Many manual toll collection facilities have been replaced by an electronic toll collection facility. On the tollway, transponders are available that capture the date and time the transponder moves through the sensor gate. Often this is at real time driving speeds; a common system is called E-ZPass. Subways, buses, and other mass-transit systems have implemented an electronic card linked to a bank account that can be used to pay the toll. These systems log the card swipes, including date, time, and location.

    A recent article in Forbes revealed that E-ZPass transponders are read throughout lower Manhattan, even where no toll is collected. According to the TransCore spokesperson quoted by the article, the E-ZPass tag ID is scrambled to make the identity of the tag anonymous. The tag data is gathered and accumulated with data from other readers to measure traffic flow and conditions over an interval; the tag readers are placed at strategic locations to help measure the average travel times through the area. The spokesperson offered assurance that data pertaining to an individual are not retained:

    • Tag sightings (reads) age off the system after several minutes or after they are paired and are not stored because they are of no value. Hence the system cannot identify the tag user and does not keep any record of the tag sightings.3

    Irrespective of the purpose of the New York system to read E-ZPass transponders, the article demonstrates the availability to read the devices at any time.

  • Frequent shopper cards. Many stores require a person to register as a frequent shopper to obtain sale prices on merchandise; the frequent shopper has an electronic card or barcode tag that is presented at checkout to receive the discounts. These systems maintain a purchase history of items purchased, including date and time of purchase and location. At pharmacy chains, this includes prescription drug purchases.
  • Credit card metadata. Beyond the amounts spent on a credit card, credit card statements include interesting information about the location, time, and type of purchase made. This can include travel information (e.g., airfare, auto rental, hotels), dining habits (type of restaurant preferred), and manner of dress.

Data in this category tends to indicate broad behavioral actions. Often, observation of data of this type over time can reveal changes in behavior that could be related to a case. For example, in a divorce action, changes in dining habits or travel could indicate marital stress prior to the divorce action. Drastic changes may also indicate intervention of another party. For example, when investigating an elder abuse case, behavioral information indicated the elder person was frugal, did not eat at restaurants very frequently or purchase expensive clothing. Credit card balances were minimal and paid off routinely. After care was transferred to a new person, transactional information changed, including meals at expensive restaurants and purchases at trendy clothing stores. Data of this type should be viewed with a long-term perspective.

Data Created by Devices with Which We Interact

Hardware devices include personal computers, tablets, and smartphones, and they create activity logs showing access times and other information that could be relevant to a case. Additionally, since local files on a device are not protected by the SCA, it may be important to identify, catalog, and request copies of all devices with which a subject may interact.

  • Smartphones and tablets. Smartphones and tablets can be a treasure-trove of information from a litigation perspective. These devices interface with frequently used applications, including call logs, social media, e-mail, text messaging, calendaring, and photo and videos, and additionally can log where a user has been. Most smartphones have privacy notification procedures that tell a user when an app enables GPS tracking; however, researchers have found log files indicating that iOS (Apple’s operating system) continuously tracks and records the device’s location.4

    Industry standard procedures are utilized by digital forensics specialists to “image” a smartphone, allowing analysis of all contents and application data. Digital search tools can also be employed by the specialist to automate the review of the phone contents.

  • Automobiles. Higher-end automobiles frequently include an “infotainment” system that syncs with smart devices in the vehicle via cord or Bluetooth connection. An emerging area of eDiscovery includes analysis of these infotainment systems. Specialty vendors are currently introducing analytical tools and procedures to image and analyze vehicle systems. Vendors claim they can acquire forensic artifacts including: tracking points, tracking logs, recent destinations, favorite locations, paired device history, contacts, call logs, analysis of speed, altitude, direction of travel, and time-line analysis.

    Device information is often critical to a case; smartphones are emerging as a first-person witness in many criminal and civil cases. Device acquisition, however, can become expensive, and care must be exercised based on the nature and magnitude of the case to ensure that digital forensic expenditures are in line with the case objectives.

Creating an eDiscovery Plan in a Cloud-Based World

Cloud computing means that important systems and data may be hosted outside the four walls of the organization. Traditionally, investigators could image key computers and servers within the organization and have a copy of all the organization’s data. Today, cloud applications mean that key applications and archives may not be easily identifiable. However, a methodical approach should still identify all key systems. One key difference is that investigators will be looking for clues to the existence of relationships with cloud systems, not necessarily the systems themselves. Additionally, discovery may become a multistep process, with identification of systems separate from attempts to access the data. As more data is produced, additional hosted systems may be discovered.

The following is a generic outline for considerations developing a discovery process considering the possibility of cloud-based systems. Based on the wide variety of litigation and criminal investigations, this should not be considered definitive for a particular matter and does not constitute legal advice:

  1. Provide interrogatories and production requests regarding use of hosted systems and cloud-based applications; interrogatories should require identification of the system, vendor, purpose, subscriber to the service, and internal administrator of the service.
    1. As a key system, specific interrogatories should ask about e-mail systems employed by the organization. If the matter is related to an individual, interrogatories should ask them to identify each and every e-mail account utilized by the individual.
    2. The interrogatories should be as broad as possible, and differentiate between systems subscribed by (under the control of) the company versus systems subscribed by the individual.
    3. Interrogatory answers may not identify all the cloud systems in place, but will be important to attempt to extend discovery if additional systems are noticed late in discovery.
  2. Interrogatories should specifically identify document retention and archival procedures and vendors used in the retention process for both paper-based and electronic documents. Cloud solutions frequently include data archive systems, and archive information may contain useful information. Identify the vendors used, retention requirements, and litigation hold procedures.
  3. Categorize systems between internal systems and cloud-based systems.
    1. Internal systems and data may generally be accessed through production requests and traditional discovery procedures.
    2. Data hosted with third-party service providers (i.e., cloud-based systems) may fall under the restrictions of the SCA, and may require authorization of the opposing party to allow the third party to produce the data.
  4. Catalog all devices used by a subject. The device list should include assigned computers and workstations, company-issued tablets, and smartphones.
    1. Additionally, identify any personal device used to conduct business transactions, including home computers or BYOD devices.
    2. Prioritize devices for examination; work with a digital forensics specialist to obtain images of key devices.
    3. Images of the devices should be examined for evidence of undisclosed cloud systems, including shortcuts, saved favorites, and browsing history.
  5. Obtain and examine e-mail records for evidence of other systems. This may include discussion e-mails, updates from the vendor, changes in service notifications, etc. Be sure to follow up on any previously unknown systems.
  6. Examine payables records and credit card statements for payments to cloud vendors.
  7. Deposition questions should be based on an understanding of the business, and include questions about how key transactions are processed.
    1. Note and follow up on any differences from previous discovery procedures.
    2. Consider deposing a lower level person in the organization for questions regarding procedures to record transactions and process functions; often they will be the users of systems and will describe the systems used.

Production of Cloud Data

Beyond the restrictions of the SCA, production of data from a cloud system is different than traditional eDiscovery procedures. Traditionally, an investigator could image the entire computer and obtain a copy of the entire drive; procedures could then be applied to the data to find content related to the case. In a cloud computing model, organizational data is stored in a virtual server containing the data of possibly thousands of other clients; obtaining an image of the entire server is neither technically nor legally feasible. Investigators typically do not directly search the cloud computers for the information requested. Instead, the warrant, subpoena, or authorization by the subscriber directs the service provider to produce all the content of the account or accounts desired, and the information produced is then reviewed by investigators for items that fall within the scope of items to be produced.

As the cloud computing market is still under development, forensic procedures to be applied to a cloud-hosted system are still being built as well. As more data is moved from internally hosted devices to the cloud, the challenges of litigation will continue to grow. Vendors of analytical toolsets will continue to improve and expand their product offerings as demand will grow, and it is a good practice to remain abreast of developments in the cloud computing market as well as in the digital forensics solutions market.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset