Data Collection Techniques for Forensic Investigation in Cloud

Internet plays a vital role in providing various services to people all over the world. Its usage has been increasing tremendously over the years. In order to provide services efficiently at a low cost, cloud computing has emerged as one of the prominent tech-nologies. It provides on-demand services to the users by allocating virtual instances and software services, thereby reducing customer’s operating cost. The availability of massive computation power and storage facilities at very low cost motivates a malicious individual or an attacker to launch attacks from machines either from inside or outside the cloud. This causes high resource consumption and also results in pro-longed unavailability of cloud services. This chapter surveys the systematic analysis of the forensic process, challenges in cloud forensics, and in particular the data collection techniques in the cloud environment. Data collection techniques play a major role to identify the source of attacks by acquiring evidence from various sources such as cloud storage (Google Drive, Dropbox, and Microsoft SkyDrive), cloud log analysis, Web browser, and through physical evidence acquisition process.


Introduction
In today's world, users are highly dependent on the cyberspace to perform all day-to-day activities. With the widespread use of Internet technology, cloud computing plays a vital role by providing services to the users. Cloud computing services enable vendors (Amazon EC2, Google, etc.) to provide on-demand services (e.g., CPU, memory, network bandwidth, storage, applications, etc.) to the users by renting out physical machines at an hourly basis or by dynamically allocating virtual machine (VM) instances and software services [1][2][3]. Cloud computing moves application software and databases to large data centers, where the outsourcing of sensitive data and services is not trustworthy. This poses various security threats and attacks in the cloud. For instance, the attackers use employee login information to access the account remotely with the usage of cloud [4]. Besides attacking cloud infrastructure, adversaries can also use the cloud to launch an attack on other systems. For example, an adversary can rent hundreds of virtual machine (VM) instances to launch a distributed denial-of-service (DDoS) attack. A criminal can also keep secret files such as child pornography, terrorist documents, etc. in cloud storage to remain clean. To investigate such crimes involved in the cloud, investigators have to carry out forensic investigations in the cloud environment. This arises the need for cloud forensics, which is a subset of network forensics. Cloud forensics

Types of forensics
The forensic process is initiated after the crime occurs as a post-incident activity. It follows a set of predefined steps to identify the source of evidence. It is categorized into five groups, namely digital forensics, network forensics, Web forensics, cloud forensics, and mobile forensics.
• Digital forensics: According to National Institute of Standards and Technology (NIST) standards, it is the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data.
• Network forensics: It identifies and analyzes the evidence from the network. It retrieves information on which network ports are used to access the information.
• Web forensics: It identifies the evidence from the user history, temporary log files, registry, chat logs, session log, cookies, etc. as digital crimes occur on the client side with the help of Web browser.
• Cloud forensics: It is the application of digital forensics in the cloud and it is a subset of network forensics. It is harder to identify evidence in cloud infrastructure since the data are located in different geographical areas. Some examples of evidence sources are system log, application log, user authentication log, database log, etc.
• Mobile forensics: It is the branch of digital forensics that identifies evidence from mobile devices. The evidence is collected from the mobile device as call history, SMS, or from the memory.

Cloud forensic process flow
The cloud forensic process flow is shown in Figure 1, which is described as follows: • Identification: The investigator identifies whether crime has occurred or not.
• Evidence collection: The investigator identifies the evidence from the three different sources of cloud service model (SaaS, IaaS, and PaaS) [8]. The SaaS model monitors the VM information of each user by accessing the log files such as application log, access log, error log, authentication log, transaction log, data volume, etc. The IaaS monitors the system level logs, hypervisor logs, raw virtual machine files, unencrypted RAM snapshots, firewalls, network packets, storage logs, backups, etc. The PaaS model identifies the evidence from an application-specific log and accessed through API, patch, operating system exceptions, malware software warnings, etc.
• Examination and analysis: The analyst inspects the collected evidence and merges, correlates, and assimilates data to produce a reasoned conclusion. The analyst examines the evidence from physical as well as logical files where they reside.
• Preservation: The information is protected from tampering. The chain of custody has been maintained to preserve the log files since the information is located in a different geographical area.
• Presentation and reporting: An investigator makes an organized report to state his findings about the case.

Evidence collection
Evidence collection plays a vital role to identify and access the data from various sources in the cloud environment for forensic investigation. The evidence is no longer stored in a single physical host and their data are distributed across a different geographical area. So, if a crime occurs, it is very difficult to identify the evidence. The evidence is collected from various sources such as router, switches, server, hosts, VMs, browser artifacts, and through internal storage media such as hard disk, RAM images, physical memory, etc., which are under forensic investigation. Evidence is also collected through the analysis of log files, cloud storage data collection, Web browser artifacts, and physical memory analysis.

Cloud log analysis
Logging is considered as a security control which helps to identify the operational issues, incident violations, and fraudulent activities [9,10]. Logging is mainly used to monitor the system and to investigate various kinds of malicious attacks. Cloud log analysis helps to identify the source of evidence generated from various devices such as the router, switches, server, and VM instances and from other internal components, namely hard disk, RAM images, physical memory, log files etc., at different time intervals. The information about different types of attacks is stored in various log files such as application logs, system logs, security logs, setup logs, network logs, Web server logs, audit logs, VM logs, etc., which are given as follows: • Application log is created by the developers through inserting events in the program. Application logs assist system administrators to know about the situation of an application running on the server.
• System log contains the information regarding date and time of the log creation, type of messages such as debug, error, etc., system-generated messages related to the occurrence, and processes that are affected by the occurrence of an event.
• Firewall log provides information related to source routed packets, rejected IP addresses, outbound activities from internal servers, and unsuccessful logins.
• Network log contains detailed information related to different events that happened on the network. The events include recording malicious traffic, packet drops, bandwidth delays, etc. The network administrator monitors and troubleshoots daily activities by analyzing network logs for different intrusion attempts.
• Web server log records entries related to the Web pages running on the Web server. The entries contain history for a page request, client IP address, date and time, HTTP code, and bytes served for the request.
• Audit log records unauthorized access to the system or network in a sequential order. It assists security administrators to analyze malicious activities at the time of attack. The information in audit log files includes source and destination addresses, user login information, and timestamp.
• VM log records information specific to instances running on the VM, such as startup configuration, operations, and the time VM instance finishes its execution. It also records the number of instances running on VM, the execution time of each application, and application migration to assist CSP in finding malicious activities that happen during the attack.
Due to the increase in usage of network or new release of software in the cloud, there is an increase in the number of vulnerabilities or attacks in the cloud and these attacks are reflected in various log files. Application layer attacks are reflected in various logs, namely access log, network log, authentication log, etc., and also reflected in the various log file traces stored on Apache server. These logs are used for forensic examination to detect the application layer attacks. Table 1 indicates the various attack information and the tools used for log analysis of different types of attacks. Figure 2 shows the sample access log trace (

Evidence collection from cloud storage
It is the process of collecting evidence from cloud storage such as Dropbox, Microsoft SkyDrive, Google drive, etc., using the Web browser and also by downloading files using existing software tools [11][12][13]. This helps to identify the illegal modification or access of cloud storage during the uploading or downloading of file contents in storage media and also checks whether the attacker alters the timestamp information in user's accounts. The Virtual Forensic Computing (VFC) tool is used by forensic investigators to identify evidence from VM image file. The evidence is accessed for each account using the Web browser running in the cloud environment by recording the encoded value of VM image. The packets are captured using network packet tools, namely Wireshark, snappy, etc., of each VM instance running in hosts. The account information is synchronized and downloaded using client accessing software of each device which is used to identify the source of evidence. The evidence is isolated from the files found in VM using "C:\Users\[username]\ Dropbox\" for Dropbox as shown in Figure 3. The zip file contains the name of the folder that can be accessed via the browser to determine the effect of a timestamp in a drive. If an attacker modifies the contents of a file, the evidence is found by analyzing the VM hard drive, history of files stored in the cloud, and also from a cache. It can also be analyzed by computing the hash value of the VM image. The evidence of Google Drive cloud storage is depicted in Figure 4.

Evidence collection via a Web browser
The clients communicate with the server in the cloud environment with the help of a Web browser to do various tasks, namely checking email and news, online shopping, information retrieval, etc. [14][15][16][17][18]. Web browser history is a critical source of evidence. The evidence is found by analyzing the URLs in Web browser history, timeline analysis, user browsing behavior, and URL encoding, and is recovered from deleted information. Here is an example of Web browser URLs, https://www.nitt.edu/en#files:/Documents/<Folder name>, https://www.nitt.edu/en#files:/E:<Folder ID>. Similarly, the evidence stored in Web browser cache at the root directory of a Web application is used to identify the source of an attack. Table 3 indicates the evidence collection process and recovery method for various Web browsers.     Here is an example of a Chrome forensic tool that captures and analyzes data stored in Google Web browser. It analyzes the data from the history, web logins, bookmarks, cookies, and archived history. It identifies the evidence from C:\Users\ USERNAME\Appdata\Local\Google chrome\UserData\Default. Figure 5 depicts the Google Chrome analysis forensic tool.

Evidence analysis
Identifying patterns from the evidence collection process to determine the source of attacks in cloud environment Determining the attack patterns from cloud log files and analyzing these patterns using cloud traceback mechanism to identify the source of evidence.

Evidence presentation and reporting
Forensic investigator examines the evidence and presents the evidence in court Identifying the evidence from analysis and reporting the evidence Table 4. Evidence collection process for cloud forensics.

Physical memory analysis
This has the ability to provide caches of cloud computing usage that can be lost without passive monitoring such as network socket information, encryption keys, and in-memory database. They are analyzed from the physical memory dump using the "pslist" function, which recovers the process name, process identifier, parent process identifiers, and process initiation time. The processes can be differentiated using the process names ©exe© on the Windows, and ©sync© on the Ubuntu and Mac OS. Table 4 indicates the evidence collection process for cloud forensics in cloud storage and cloud log analysis.

Cloud forensics challenges
This section elucidates the forensic challenges in private and public cloud. It is observed from the literature that most of the challenges are applicable to the public cloud while fewer challenges are applicable to the private cloud environment.

Accessibility of logs
Logs are generated in different layers of the cloud infrastructures [2][3][4][5][6][7]. System administrators require relevant logs to troubleshoot the system, developers need logs for fixing up the errors, and forensic investigators need relevant logs to investigate the case. With the help of an access control mechanism, the logs can be acquired from all the parties, that is, from a user, CSP, and forensic investigator.

Physical inaccessibility
The data are located in different geographical areas of the hardware device. It is difficult to access these physical access resources since the data reside in different CSPs and it is impossible to collect the evidence from the configured device. If an incident occurs, all the devices are acquired immediately in case of a private cloud environment since an organization has full control over the resources. The same methods cannot be used to access the data in case of a public cloud environment.

Volatility of data
Data stored in a VM instance in a cloud will be lost when the VM is turned off. This leads to the loss of important evidence such as syslog, network logs, registry entries, and temporary Internet files. It is important to preserve the snapshot of the VM instance to retrieve the logs from the terminated VMs. The attacker launches an attack and turns off the VM instance, hence these traces are unavailable for forensic investigation.

Identification of evidence at client side
The evidence is identified not only in the provider's side but also the client side. The user can communicate with the other client through the Web browser. An attacker sends malicious programs with the help of a Web browser that communicates with the third parties to access the services running in the cloud. This, in turn, leads to destroying all the evidence in the cloud. One way of collecting the evidence is from the cookies, user agent, etc., and it is difficult to obtain all the information since the client side VM instance is geographically located.

Dependence of CSP trust
The consumers blindly depend on CSPs to acquire the logs for investigation. The problem arises when CSPs are not providing the valid information to the consumer that resides in their premises. CSPs sign an agreement with other CSPs to use their services, which in turn leads to loss of confidential data.

Multitenancy
In cloud infrastructures, multiple VMs share the same physical infrastructure, that is, the logs are distributed across various VMs. The investigator needs to show the logs to court by proving the malicious activities occurring from the different service providers. Moreover, it also preserves the privacy of other tenants.

Decentralization
In cloud infrastructures, the log information is located on different servers since it is geographically located. Multiple users' log information may be collocated or spread across several layers and tiers in the cloud. The application log, network log, operating system log, and database log produce valuable information for a forensic investigation. The decentralized nature of the cloud brings the challenge for cloud synchronization.

Absence of standard format of logs
Logs are available in heterogeneous formats from different layers of a cloud at CSP. The logs provide information such as by whom, when, where, and why some incidents occurred. This is an important bottleneck to provide a generic solution for all CSPs and all types of logs. Table 5 indicates the survey of literature that deals with the challenges of cloud forensics mainly for evidence collection process.  Table 5.
Challenges of cloud forensics.

Forensic tools
There are many tools to identify, collect, and analyze the forensic data for investigation. Juel et al. developed the PORs tool for the identification of online archives for providing integrity and privacy of files [19]. Dykstra et al. proposed a forensic tool for acquiring the cloud-based data in management plane [6]. It ensures trust in cloud infrastructures. Moreover, Encase and Access data FTK toolkit are used for the identification of trusted data to acquire the evidence. Similarly, tools such as evidence finder and F-response are used to find the evidence related to social networks. Dystra et al. proposed FROST, an open source OpenStack cloud tool for the identification of evidence from virtual disks, API logs, firewall logs, etc. [20].

Open research problems in cloud forensics
Many researchers have proposed various solutions to mitigate the challenges of cloud forensics. Some of the researchers have proposed new approaches to test the attacks in real-time environment. CSPs have not adopted the proposed solutions yet. Customers or investigators rely on CSPs to collect the necessary logs since they do not have direct physical access. Customers or investigators depend on CSP to collect the various information from the registry, hard disk, memory, log files etc. Even though various forensic acquisition process is proposed still the dependence of CSP remain unsolved. The critical issue is the usage of bandwidth resources. If the cloud storage is too high, then it results in more utilization of bandwidth. There is insufficient work evolved to preserve the chain of custody to secure provenance. There is no ideal solution for cyber crime scene reconstruction and preservation of evidence. Another critical issue is based on the modification of existing forensic tools that may lose evidence. Some researchers have proposed logging as a service to provide confidentiality, integrity, and authentication [3]. This solution is not suitable for IaaS cloud.

Case study
This section introduces a hypothetical forensic case study related to a cloud storage service and also describes a forensic investigation of the case.

Case study: cloud storage
The organization "X" found that their document named as "X_new.pdf " about the new release of a product has been leaked to their competitor [21][22][23][24]. "Mr. Morgan" was managing the credential files of the document stored in the cloud. At the initial stage of the investigation process, the suspect of the leaked file case was "Mr. Morgan." The forensic investigator has to identify the suspect by checking the organization network, or by the analysis of log files, or by collecting the trace of relevant file in the network. Mr. Morgan's network does not have any clue about the secrets since he uses only the personal computer (PC) and Android phone for business. To identify the suspects, the forensic investigator seized the PC and Android phone since these are the target devices used by the adversary. From the suspected devices, the leaked file has not been detected. Later, the investigator started analyzing the unallocated area in the file system, operating system, external devices such as hard drive, tablets, etc., and the Web service, but no evidence was found in the investigation. The investigator found that the Dropbox was installed in the PC and five files of config.db have been accessed recently. The forensic investigator issued © 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Author details
Thankaraja Raja Sree* and Somasundaram Mary Saira Bhanu Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India *Address all correspondence to: trajasree87@gmail.com

Conclusion
Cloud computing offers on-demand services (CPU, memory, network bandwidth, storage, applications, etc.) to users by allocating virtual instances and software services. Security is a major concern in the cloud wherein investigation of security attacks and crimes are very difficult. Due to the distributed nature of attacks and crimes in cloud, there is a need for efficient security mechanism. As cloud logs are spread across different virtual/physical machines (VM instances), switches, routers, etc., and also the customer (end user) is not aware of the activities of VM instances, cybercriminals exploit these sources to exhaust all the resources running in the cloud. Hence, evidence collection plays a crucial role to identify the suspects. However, collecting logs from the cloud infrastructure is extremely difficult because the investigator/security analyst has to depend on CSPs for collecting the logs and they have little control over the infrastructure. So, in order to identify the suspicious activity involved in the cloud, this chapter surveys the various forensic processes, evidence collection techniques for cloud forensics and the various challenges faced in cloud environment for forensic investigation.