2
Integration of Big Data Analytics Into Cyber-Physical Systems

Nandhini R.S.* and Ramanathan L.

Computer Science and Engineering, Vellore Institute of Technology, Vellore, India

Abstract

The evolving Cyber-Physical Systems technology advances the big data analytics and processing. The chapter discusses the topics of Big Data which are required for Cyber-Physical Systems across all data streams including the heterogeneous data resource integration. The challenges such as integration of data generated from multiple sources into cyber-physical systems, big data for conventional databases and offline processing, scalability are further considered. The control and management of big data is aided by the architecture of cyber-physical system with cyber layer, physical layer and communication layer is designed which not only integrates but also helps cyber-physical system in decision making. The case study that aids big data processing and analytics in cyber-physical system is stated.

Keywords: Cyber-Physical systems, big data analytics and processing, Internet of Things, data mining

2.1 Introduction

The rapid growth in things or devices in particular sensors and actuators made the development to control the smart physical things, smart objects and digital technologies such as machines in smart manufacturing and structures in smart cities, etc. possible. The communication technologies and physical devices are merged to generate systems that are effective, productive, safe called intelligent systems, where the integrations and interactions are combined to create a global cyber-physical system. A cyber-physical system is the association of cyber and physical components that have been specifically engineered to monitor, coordinate and control based on computational algorithms. It is a 3C technology—communication, computation, and control. Cyber-physical systems capture the data from the wireless sensor devices and monitor them, the control of the physical devices is based on the physical data using actuators, thus interacting both with the physical and cyber world in the real environment. These systems are interconnected with each other on a universal scale using different network and communication resources. The physical control is efficient when the data collected from the sensors are processed for information with data mining techniques. The interaction among the users from context perspective, the physical device’s surroundings and the process in the cyber-physical systems are observed when the features of cyber-physical system are considered. However, the integration rules, interoperation among the devices, control of cyber-physical system are the functions that are globally distributed and networked in real-time [1]. This system is used extensively in many applications such as industries, transport and vehicular industry, medical and health management, smart grids, military applications, weather forecasting and many more.

An enormous measure of data is generated from various digital technologies like wireless detectors and sensors, mobile phones, storage devices connected to the internet where a continuous data stream is produced. Cyber-physical system has a computational capability that needs to be scaled to provide efficiency as the increasing sensors, digital technologies and devices that are networked create a huge volume of data. To develop a system that is more efficient, intelligent, reliable, trustworthy and self-adaptable integration of big data into a cyber-physical system is mandatory. Computing and computational resources are comparatively lower than the huge data generated from various resources. The big data analytics techniques aim to examine, process and handle the big data characteristics of data to identify the patterns, obtain the information that is needed and relationships in the data sets also the innovative forms of data can be obtained for decision making and process control. The insights about how to model, capture, specify, transfer, organize and manage the data efficiently can be discovered [2]. Conventional data analytics processes the data sets the whole size or type, whereas big data analytics collect, process the data and manage them with low latency and typical data such as unorganized data, data gathered from the sensors including the ones that have spatiotemporal characteristics and the data produced in real-time considered as the stream of data flow can be composed with faster results during real-time processing. Machine learning (ML), artificial neural networks (ANN), statistics, dynamic Bayesian networks (DBA), deep learning, and natural language processing are some of the advanced big data techniques. The merging the big data analytics with the cyber-physical system is inevitable as it is the key to productive, efficient and adaptable cyber-physical system to sustain.

The following are the contents discussed in the chapter. Section 2.2 contains the architecture of cyber-physical system from a big data model for cyber-physical system. Section 2.3 explains the issues and challenges when big data is integrated with cyber-physical system, integration of CPS and BDA and its control and management. The storage and its communication of big data for cyber-physical system are stated in Section 2.4. Data processing techniques and models of big data such as cloud and multi-cloud processing, clustering in big data and cyber-physical system and big data analytics models are stated in Section 2.5. Applications of big data-enabled CPS are stated in Section 2.6 particularly manufacturing, smart grids and smart cities, healthcare and smart transportation. The data security and privacy from the CPS applications and loop holes that cause cyber threats in big data analytics are further discussed.

2.2 Big Data Model for Cyber-Physical System

The big data characteristics can be understood by 5V architecture—volume, variety, veracity, velocity and value [3]. Big data analytics (BDA) is applied in many distinct domains such as e-commerce, enterprise to predict the patterns of customers’ interest, and weather forecasting, where changes in the weather can be analyzed and pattern prediction is done based on past data, etc. The data characteristics are varied and the implementation of aggregated data cost is considered due to which smart data was proposed. The concept of smart data is to make sure to eliminate the noise so that important and relevant data can be obtained, which can further be used for application purpose in cyber-physical system to monitor and control so that accurate decision can be made which impacts the physical device in the real-time environment [4]. The present BDA models that are used focus on mining the data, functions that process the data along with data storage and visualization instead of exploring the ways that big data acquire smart data from raw data which makes the integration vulnerable and lowering the analytic capabilities of the system. The BDA architecture should improve the effectiveness and intelligence of the cyber-physical system. The communication layer is included in the system architecture for smart data purpose, data source layer is included in the BDA model which integrates smart methods for data mining and visualizing layer that aids in the integration of collection, pre-processing, storage, mining and visualization of data functions in CPS [5].

2.2.1 Cyber-Physical System Architecture

The BDA enabled CPS design comprises of three layers namely—a physical layer, a cyber layer and a communication layer.

Physical layer—Sensors that are locally distributed across the CPS application fields generate data that are accumulated in the layer for further process. This data contains noise and are uncertain which can be termed as raw data and needs to be processed.

Communication layer—This layer pre-processes the raw data into smart data and converts the decisions from the cyber layer to executable commands. Cyber layer—Controlling and monitoring decisions are made by analysing the data that reflects in the infrastructure of the physical layer.

State sensing, intelligent analysis in real-time, accurate execution and self-optimization are some of the main functions of the architecture from a data processing perspective.

2.2.2 Big Data Analytics Model

The BDA is the other section of the architecture—a vast amount of raw data is processed so that decisions are made faster and better. The learning process in the BDA model is inspired by the human brain, techniques (support vector machine, fuzzy clustering, convolutional neural networks, auto-encoders, deep learning models) that are integrated with data processing techniques [6]. The big data analytics model contains four layers—the data source layer, smart data warehouse layer, smart data mining layer and smart visualization layer.

Data source layer—Many technologies are used to gather data in this layer. Raw data is collected from distributed wireless sensors that include industrial applications, social media, the internet, etc. from the physical CPS devices.

Smart data warehouse layer—This layer manages and maintains historical data that aids decision making and provides an environment to analyse information [7]. The raw data is processed into information with the aid of a data cleaning module that removes the inaccurate record, a data integration module that integrates data with different formats, a data reduction module that reduces data to a more simplified form, data transformation module converts raw data to same formats and data discretization module that converts attributes to discrete intervals.

Smart data mining layer—This layer consists of five modules—extraction model, training model, analytic model, data mining model, and prediction model. Different BDA techniques are used in each model for better results.

Smart data visualization model—This layer can be designed according to users’ preferences. The analytic results are displayed to gain perception into the modelled data through visualization techniques.

2.3 Big Data and Cyber-Physical System Integration

Big data analytics is necessary for cyber-physical system as it produces a massive amount of data dynamically, which needs to be explored and examined to obtain useful information and predict patterns. It is undoubtedly proven that the integration of BDA into CPS is inevitable. The big data-enabled CPS must process all the complex data to ensure that the correct operation is carried out so that the system can make the decision and control the dynamic continuous changing behavior of the physical devices. To implement the big data-enabled CPS many concepts are to be adapted and introduced such as data structures, big data features and characteristics and spatial and temporal constraints. However, this integration does not fit the offline processing data solutions which are conventional as the system deals with the real world where the decisions made are critical and takes place in the real-time. The consequences of big data in real-time need to be resolved by a suitable non-classic, vertically integrated solution that handles real-time stream processing for control purposes and batch processing for learning purposes.

2.3.1 Big Data Analytics and Cyber-Physical System

Integrating the cyber-physical system with big data analytics, the CPS focuses on the streaming data produced by the sensors and the data analytics part, where the computation and communication systems collect the data. The features of big data need to be considered in the integration process where the Volume estimates the total amount of data volume, Velocity determines the pace with which the data is created and aggregated, Variety tells the richness in the data representation, and Value estimates the information from the raw data to make decisions. Apart from this, spatial data is also taken into account as it plays important part in the big data-enabled cyber-physical systems.

2.3.1.1 Integration of CPS With BDA

To enable the integration of two systems, an Architecture Analysis and Design Language (AADL) [8], Modelica modeling language—Modelicaml [9] and clock theory [10] integration ensures that the requirements of big data are met and are implemented on the platforms of big data and its properties are considered [11]. A vector-logical big data processing approach, that lets cyber-physical systems control the operations and a computing automation model that impacts performance and hardware intricacy is proposed in the aid of the integration [12].

2.3.1.2 Control and Management of Cyber-Physical System With Big Data Analytics

The so-called system controls the interconnected devices and systems between the physical environment and the computational capabilities in a real-time dynamic environment and manages them. Self-awareness, self-configuration and self-repairing are some of the abilities that cyber-physical system has to adapt for the system to sustain.

The big data environment handles the data as a service to deal with, where this service will be able to manage big data characteristics such as volume, velocity and variety while gathering the generated data from the sensors and the machine controls, and organize them based on the multi-dimensional feature spaces and apply in the industry 4.0 to function [13]. Some of the challenges here faced are big data acquisition and storage, widespread data relevance, data stream elaboration, analysing the data and machine similarity identification, the human–machine interfaces (HMI) based on certain applications and feedback-control mechanisms.

Managing and control of cyber-physical system always depend on the modes created by the humans, but hard to verify and maintain as they are incomplete which leads us to data-driven approaches where the huge amount of data collected by the CPS are modeled such that they learn automatically the models. Cognitive reference architecture is best preferred in this context [14]. This analysis of cyber-physical systems includes different interfaces that interconnect with each other. The big data platform is an interface that all the relevant raw data from the machines and sensors are gathered and prepared for analysis and interpretation. The next interface is learning algorithms that brief about the anomaly detection used for monitoring conditions and predictive maintenance from the data. The information provided from the learning algorithm interface is combined with specific domain knowledge to identify faults and semantic context is added to the results in this conceptual layer. The results from the conceptual layer are converted in a human-understandable manner and implemented to achieve better standardization, efficiency and repeatability in task-specific HMI. Another conceptual layer is placed where the use of knowledge is done to recognize actions that are needed to be taken under the users’ decisions which are needed to be communicated to the next interface. The final interface is the adaption layer where the computation of commands takes place in real-time, which communicates changes to the control system that reflects in the physical device.

Modeling the cyber-physical with big data should consider the chaotic features caused by the control of cyber-physical system as it deals with the vast amount of data and its control so that it may lead to unpredicted results. The cyber-physical system responds to all the minor changes and disturbances which cause the system to be sensitive. A fuzzy feedback linearization model followed by a time prediction algorithm is initiated to tackle the chaotic control problems in CPS and also including the synchronization control problem [15].

2.3.2 Issues and Challenges for Big Data-Enabled Cyber-Physical System

The big data-driven CPS will consider the special characteristics and attributes, restrictions, demands and constraints along with the basic big data properties—volume, velocity, variety, volatility, value, veracity and validity that are met during the development of certain system domain integrated with big data. The functional components of big data in CPS are system infrastructure and data analytics which should be considered during the integration. Real-time communication between the physical and cyber devices, where capturing the data, monitoring the database and its functionalities and the distributed computing is part of the system infrastructure component. Data analytics deals with product actualization and resource efficiency and organization along with predictive and descriptive analysis. Some other important issues that deal with both the components are adaptability, flexibility, security and reliability.

In cyber-physical system, a vast amount of data from networking sensors, machines, and several other embedded devices are collected from the physical environment. These data-producing devices such as sensors are not restricted to a certain time or space and also several category and forms such as temperature, speed, geographical data, environmental data, astronomical data, health and logistics data from different sectors and also from digital equipment, transportation and public facilities and smart homes. This leads us to spatiotemporal data requirements, where the system mostly functions in a real-time environment which makes us consider the spatial and temporal data. Geographical data, time-series data, data from remote locations and from moving object trajectories—where data contains movement history of objects are considered as spatiotemporal data.

The time and space correlations are to be considered as important cyber-physical system data features, where the dimensions of such data are observed during analysis and processing. The heterogeneous data are most common in cyber-physical system and the data representation and model makes the data more insightful. Real-time support, sensing and communication services availability, maintenance, infrastructure for the system, evolvability, modularity challenges are persistent when the integration takes place. This integration also questions the infrastructure of the cyber-physical system where the communication and computational capabilities needed to be inspected. Security is another important challenge as its standards vary from applications when they interact with different devices. The control decisions, the trustworthiness of data and authentication of devices and their management where there is a necessity to interpret the protocols and approach towards the system in specific applications as security demands [16].

2.4 Storage and Communication of Big Data for Cyber-Physical System

The management of data and communication in the real environment is key for a successful system to function and sustain constantly with efficiency. Managing the storage operation for cyber-physical system with big data solutions should be regarded alongside caching and routing as there is a huge amount of traffic from the social media applications, people health data, traffic and weather monitoring applications and other smart home appliances which led to the researchers find solutions in storage and communications of big data CPS. Enhancing the performance of system needs to concentrate on the improvement of data collection, data processing techniques from a storage perspective.

2.4.1 Big Data Storage for Cyber-Physical System

Storing the persistent and continual data from numerous resources demands that the approaches be efficient and effective from a scalability, cost and flexibility perspective. Combining the cloud/edge computing facilities with big data analytics can give significant results for data storage objectives. Innovative measures should be applied such as proactive content caching in the networks and its characteristics that predict the user behaviour is the motivation for big data-enabled architectures where data and statistical analysis and visualizations methods are taken into account at base stations. To satisfy the users the data is controlled and used for content popularity estimation and content caching in which cyber-physical system has a high interest [17].

Pre-cache technologies are used with big data for higher performances during the transfer of data from sensors to servers, given that cyber-physical system generates a vast amount of data, where network traffics are caused. Two differential algorithms namely Data Filter Algorithm (DFA) and Data Assembler on Server Algorithm (DASA) are used to reduce the traffic in the networks during the data transfer [18]. This can be implied as an optimal trade-off solution that resolves the network traffic problem effectively and also the data accuracy problem where the data captured by the sensors are changed slightly due to the accuracy of the sensors. The data accuracy is dealt with by choosing the relevant parameters and the algorithm functions before sending the data to the servers by using filters and places them in the sensors and a measure is assigned to each.

Performing the caching on the wireless sensor networks, device-to-device networks in wireless environment and its caching and other data generation devices like base stations rather than on the clouds offers a positive impact on data management. Coded multi transmission is used at the base stations for caching in a realistic environment which allows sharp attributes and quality of the throughput in the asymptotic regime of the sensors which is based on a simple protocol model that uses geometric link conflict constraints and captures elementary aspects of the interference and spatial spectrum reuse [19]. The integration of big data with real-time CPS finds these caching and storage techniques very useful where reliability and predictability are preferred first and different strategies to enhance the CPS performance can be used to speed up the data collection, processing and distribution and the correct use of caching techniques makes the system more manageable.

2.4.2 Big Data Communication for Cyber-Physical System

Cyber-physical system makes decisions considering the data generated from the sensors newly created by the digital technologies which provide information and is used for processing. The innovations in big data technologies provide new insights into the effect of strategic communication, the communication process needs to be analyzed and controlled along with the management of information in real-time evaluation. Modern ways of thinking and decision making are one of the prominent promises that big data computing offers. The data is always made available to the users’ advantage so that optimal decisions can be made by determining the latest information which gives more accurate results. Big data delivery technology can be a key technology that does computing better. The big data transmission requirements are to be considered and met among the big data characteristics, which is challenging to process the data where the limited transmission capabilities are to be observed.

The big data environment should be made familiar for cyber-physical systems by proposing new architectures, network infrastructure and other services that have become vital. The data delivery performance should be improved for betterment in the device-to-device (D2D) communications. Without support from the network infrastructure or central control units, the data is exchanged among the nodes. There are certain limitations in the data delivery capacity in D2D communications when the quality and mobility of the nodes are considered. As the cognitive radio technology is integrated with D2D communications, the cognitive radio technology gives the device-to-device the ability to improve the data delivery capacity and makes D2D an alternative that acts as supporting system for the applications of big data [20]. The routing algorithms for D2D cognitive radio networks should be appropriately chosen along with its communication. Integrating the wireless sensor network with mobile cloud computing creates significant advantages where WSN have distributed sensors spatially that monitor the physical conditions such as temperature, sound, pressure, motion, light etc. that changed the way that interaction takes place with the physical world, whereas mobile cloud computing appears to be the new computing model with efficiency, powerful and unique computing basics such as processors, storage, applications and services offered in networks which can be accessed easily on demand. Lower operating cost, high scalability, easy accessibility and maintenance expense are some of the advantages of MCC. Integrating WSN and MCC, where WSN collects the information from the deployed sensors and process them to the cloud and powerful cloud computing is utilized to store and process them to users on demand so that they have the information available to them through simple devices [21]. The sensory data processing framework decreases the storage information of the sensors and reducing the traffic load during the data transmissions is done through WSNs where the transmissions are done in a fast, reliable and secure way. The analytical approach for big data technologies for communication in a real-time environment involves the fusion of data models such as relational, semantical and data and metadata-based in big data along with the provision of distributed computing [22].

These technologies help to find a solution for handling the data speed and data processing in the storage perspective along with easy communication and transmission of the data for systems.

2.5 Big Data Processing in Cyber-Physical System

Big data management can be made better if the processing speed is at a good pace. Computing and clustering help the parallel processing, execution, queries and scheduling tasks in the real-time cloud environment. Big data analytics lets the cyber-physical system discover the patterns, correlations and useful information from the data collected in the physical environment through relevant techniques.

2.5.1 Data Processing

It is impossible to process the huge dynamic data using conventional methods and in a centralized manner. The data needs to be distributed to speed up the processing methods. Parallel processing techniques are applied over traditional processing techniques to handle the data along with its characteristics, scalability issues, availability of resources and programming inefficiency and also the limitations of the database that could not keep with the latest techniques of processing. The following are some of the processing methods and techniques that can be used to overcome the limitations in processing of data—cloud and multi-cloud data processing, clustering in big data.

2.5.1.1 Data Processing in the Cloud and Multi-Cloud Computing

Parallel processing methods have an advantage over conventional processing as they have dedicated servers to process the data. Processing the data in large amounts remains a challenging task in many aspects as efficient data processing has become mandatory due to which the computing and networking infrastructures need to be reconsidered. The method of data processing in big data methods differs in cloud servers, and public cloud proves to be more efficient in terms of resources provisioning, tasks task scheduling and impact of networking on performance in big data. This offers the option to hire the resources such as computing and storage to users in a pay-as-you-go manner [23].

A distributed algorithm is used to adapt the allocated resources and also to support the query rate. This is a resource algorithm in the dynamic environment that carries out computations in the presence of queries. The communication bandwidth and the computation capacity limit the query rate when the network computation performance is limited. The communication of big data with the network resources is understood with the help of a communication network graph, computation nodes to balance the computation loads and network nodes to schedule the processed data transmission [24]. Cloud adapted a new similarity check based compression technique that uses a weighted fast compression distance method instead of traditional data compression techniques due to velocity and volume of big data and lack of efficiency and scalability in the data processing. The adapted data compression techniques are established on similarity calculations within the data chunks that are partitioned along the restoration functions and predictions also improving the efficiency and affordable data loss [25].

The flow of data processing including the data collection, generation and computation, analysis is assigned to individual computing entities breaking the workflow in many big data, IoT and CPS applications. Data and intensive computing workflow are deployed in multi-cloud computing, where data transfer within the cloud affects the workflow standards. In the multi-cloud environments, mathematical models examine the intra and inter cloud execution procedure of the workflow and optimize the network performance of the workflow [26]. The distributed virtual machines run on both single and multi-level platform. An asynchronous deployment protocol is used for the multi-cloud framework that accelerates the deployment process. The global big data analytics for IoT, other models of cloud such as private, hybrid and multi-cloud uses this framework that uses a domain-specific language (DML) [27]. The large-scale multi-cloud environment also has multiple data centers for the big data processing platform. The computing requirement for multi-cloud services is given by the data-driven and feedback enhanced trust (DFET) design over the multiple data centres. The indicator that monitors data is the basis for the computing pattern that integrates the service indicators into computing so that it is applied to service-oriented cloud applications. Hierarchical feedback is associated with this computing model, which considers the relationship among users, monitors and service providers and enhances robustness and reliability [28].

2.5.1.2 Clustering in Big Data

Clustering is a method of unsupervised learning and is used in statistical data analysis. The data clustering partitions a set of objects into groups that are of the same features. It categorizes the data and recognizes the hidden patterns. There is a need for data clustering in big data applications, where the vast amount of data needs to be analyzed. Clustering helps in the distribution of data for storage purposes and task execution. Hierarchical clustering and centroid-based clustering are operated in big data applications frequently. Multiple clustering analysis is a clustering technique that is used in automation systems to explore the patterns in big data which considers the requirements of different clustering. A tensor-based multiple clustering (TMC) and a multiple services and analytic framework cluster the dissimilar data objects in cyber-physical systems, measuring the importance of attributes combination [29]. The most popularly used k-means clustering algorithm increases the number of iterations for convergence as the numbers of iterations increases the numbers of clusters increases which prove that tractions it is not advisable for the applications of big data. The enhanced versions of traditional clustering algorithm should be used, such as supervising the cluster about the initial centroid, data points, etc.

Since big data uses distributed computing, that can be achieved using MapReduce, with the help of the Hadoop platform. For initial computations, the enhanced k-means algorithm averages the data points and selects the initial centroids of clustering rather than the random selection which makes it more efficient than the traditional k-means algorithm and also attain cluster formation accuracy [30]. A clustering based on the summary statistic (coss) is an algorithm that is established on the grounds of summary statistics. The threshold for micro clusters different from one another that is a threshold setting mechanism which adapts is used. All the clusters are combined for a fitting and appropriate clustering algorithm. This results in efficiency and refined clustering [31]. These are a few data clustering algorithms that can be used in big data applications.

2.5.1.3 Clustering in Cyber-Physical System

The CPS when combined with big data has numerous applications such as traffic control system, signaling systems in railways, intelligent transportation, military application, etc. The sensor nodes in the large-scale sensor-based system are intelligently organized and designed to have a long network lifetime. An inter cluster communication relay algorithm where the set of clusters is based on the structured sensor network so that the energy efficiency and its distance is based on the distance that connects the cluster head work and base station also called data collections centers [32].

A density-based data stream clustering algorithm called FlockStream algorithm is used for monitoring the data streams, a big data-enabled cyber-physical system. The flocking behavior is used as the base for agents in this algorithm [33]. The data is processed efficiently in the cloud and multi-cloud environments as CPS application in big data deals with realtime data with a large volume.

2.5.2 Big Data Analytics

Big data analytical approach is used to fulfil the requirements of cloud computing services so that they can be efficiently processed and analysed to enhance their performance. The following are the concepts that are useful for the integration.

Data mining: The process of obtaining the information from raw data can be referred to as data mining. The data mining process reduces the data complexity by capturing the important data. Data mining follows some processing steps before useful information can be obtained from raw data such as selection, pre-processing and data transmission.

Automated decision making and control are key characteristics of the system. Cyber-physical system objects are expected to interact with other objects, perform computations, make decisions and let the decisions reflect in the real world. Huge data are collected from the physical environment into information, data mining techniques are availed. Dimensionality reduction is one such important technique for CPS applications that can alter the features of data.

Principal Component Analysis (PCA) is a dimensionality reduction technique that is used to reduce the dimensions of very huge data sets. The data collected from sensors may contain certain errors as different methods are used for the collection process and some heterogeneous dynamic patterns. These data may also contain noise and are multi-dimensional. Neural networks, when combined with the basic clustering method through Principal Component Analysis (PCA) deals with the complexity issues in the data [34]. The raw data that has unknown patterns and correlations needs to be transformed to useful information through knowledge discovery in databases. Predictive analysis assign scores to data based on the data attributes to analyze data behavior.

Real-time analytics: Real-time analytics lets big data-enabled CPS deal with the challenges in the data gathered from the real-time CPS, which are unstructured and need to be converted to use structured data before analyzing. The data from the social networks, medical devices, traffic monitoring, household appliances, etc. fall under this category.

Spatial–temporal analysis: The data collected have challenges in data storage, scalability and efficiency. The data is collected from the spatiotemporal distributed cyber-physical system sensor nodes, determines the information on the locations. Artificial intelligence algorithms such as Particle Swarm Optimization (PSO) are used to assess and detect the location and update them [35].

2.6 Applications of Big Data for Cyber-Physical System

Big data-enabled cyber-physical system applications impact our daily lives in different fields like automatic cars, smart manufacturing, smart grids, intelligent manufacturing, transportation, medical and healthcare, smart cities and disaster event applications, military applications, etc. A huge volume of data is produced through big data-enabled CPS applications and needs to be processed to utilize in the applications’ performance.

Manufacturing, smart grids and smart cities, and healthcare are some of the emerging applications of big data-enabled CPS.

2.6.1 Manufacturing

Digital manufacturing integrates the methods of manufacturing with computer-based technology and computation and communication to create a product. Analytics and visualization collaborate to form computer-based technologies in digital manufacturing. Digital manufacturing is combined with control and automation to define cyber-physical system based manufacturing. The fourth industrial revolution, Industry 4.0 has a great impact on manufacturing. Flexibility, reducing the time, altering according to the customer needs, and services are the advantages offered by Industry 4.0 [36].

Decentralized factory environments are created using interconnected cyber-physical systems along with a combination of tracking technology and component-based assembly line. Agile techniques are included in the entire development phase. A Cyber-Physical Human System (CPHS) offers the possibility of product modification during manufacturing [37]. RFIDs, sensors, microprocessors or embedded system are physical entities in CPSs that collect data from the environment and process the data by connecting and communicating with other systems to proceed for further process. Logistics, human robot interaction and surveillance as a service are some of the applications of CPS used in manufacturing [38]. The optimization can be achieved by the productivity of manufacturing using predictive productive systems [39].

2.6.2 Smart Grids and Smart Cities

Smart grids—Advances in sensing and signal processing make a sustainable energy environment more popular. Home sensors and appliances generate huge volume of data and communicate with the embedded power sensors. Sensing technologies are used in smart grids, however, applied to many large scale challenges such as data processing, analysis and management of the information are to be considered. Smart grids assure improved efficiency and reliability. A big data architecture that consists of data resources, transmission, storage and analysis elements are used in smart grids [40]. In smart grids cyber-physical environment, the communication with the smart grid is done through the control center. This modern cyber-physical system has a hierarchical architecture that has a cyber and a physical plane. All the smart devices are located in the physical plane but the control center is in the cyber plane [41].

Smart cities—Smart cities include traffic management, automatic operation of lights in the cities on roads, electricity management in the city, water management, green city maintenance, garbage collection and automatic disposal, identifying threats causing the situation to the citizens, etc. Smart cities are possible by deploying sensors in the environment and are emergent these days. Analyzing the traffic patterns so that smart users can reach their destinations faster is also a part of smart cities. Smart transportation can be achieved through a graph-oriented mechanism. The overall traffic information can be obtained along with the location and speed of each vehicle. Road sensors are deployed for this purpose, where the obtained information is processed using big data tools [42].

IoT is associated with smart things and hence with smart cities. An architecture that is based on the architecture of IoT that helps in the applications of smart cities. The architecture contains the following layers—technologies, middleware, management and service layer [43].

2.6.3 Healthcare

Cyber-physical system plays a prominent role in medical and healthcare systems. Many wireless sensor networks, dedicated medical sensors such as blood pressure, EEG, oxygen saturation heart rate, magnetic field, temperature, etc. are developed along with the computation techniques for healthcare applications of CPS. More people are now depending upon smart health devices to track their daily activities, where smartwatches are the ones, most commonly used in wearables to track the heart rate, count the number of steps and track other physical activities like running, jogging, swimming, sports, etc. It also monitors the sleep patterns and water intake amount. Various biomedical sensors are designed for designated purposes to monitor the patient’s condition and their daily progress.

A Medical Cyber-Physical System (MCPS) is the heterogeneous data from different kinds of medical sensors and other medical devices in a seamless manner are analyzed, shared and appropriate and accurate decisions are made [44]. The cyber-physical system is integrated with Wireless Body Area Network (WBAN), where the wearable devices are used by patients. Local action and data collections are offered by such wearable devices. These can be applied to elderly people who require constant care, mild cognitive impairment and disabilities [45].

2.6.4 Smart Transportation

For transportation to be smart or intelligent, the internet should be associated with it. The objects or the vehicles across different locations should be interconnected with one another for data communication, information and other requirement purposes and need to be connected to the cloud. Incorporating the CPS technology mechanisms—communication, computation and control and by combining the cyber world with the real world, smart transportation can be achieved successfully. These smart transportations rely on technological advancements such as the increase in sensors and the embedded systems.

Vehicular Cyber-Physical System (VCPS) uses a reinforcement approach to deal with smart transportation. It considers the entire challenge of transportation associated with the internet as a game and Nash Equilibrium (NE) balances the problem making it faster and better where past behavior and mistakes from other players are considered as input to tackle the present situation. The information about the past is accessed from the cloud [46]. The social media information such as local traffic information, drivers’ condition, other parameters of the vehicles and the surrounding infrastructure information are uploaded into the cloud technology. The big data analysis processes the data in the cloud and the relevant information are passed to drivers through an interface like predicting the destination, driving skills of the driver, etc. which benefits both the customer and the driver [47]. Artificial Intelligence (AI) in real-time applications, control and computing let the embedded Cyber-Physical Vehicle Systems (CPVSs) sustain and overcome the challenges. Optimization in CPVS is done only at design and run time, considering the cyber and physical system co-optimization and the response to both. The pattern across the time, feedback, control of the cyber physical vehicle system is observed to make intelligent transportation better [48].

Smart transportation helps in the progress of autonomous vehicles, communication between vehicles in real-time, robotic transportation, aerospace applications and other challenges in smart transportation.

2.7 Security and Privacy

The physical device gathers the raw data and sends it to the cyber part where the processing takes place. During the transmission, the data is exposed to security threats. It is crucial to protect the system from internal and external attacks, as the data is stored in the cloud networks in realtime. The data storage, access to it and its processing and analysis, all need security as they are exposed to cyber threats and attacks.

Privacy invasion and malicious attacks are possible in the cloud where the continuous stream of data from the CPS applications is stored. The cloud operators and the third party providing the cloud services may have access to the data. The security for cloud services must be revised, made stronger and frequent inspection must be done so that important data from business, industries or government department so that the cloud’s credibility does not decrease. The protection of cloud data’s security and privacy is of utmost importance. The file that is needed to be stored in the cloud is broken into several files and stored at different locations in the distributed servers of the cloud, where the cloud operators would not have access to the entire file which improves the security of the data stored and the information is contained privately [49]. Big data is very complex to deal with, as it is distributed, has many characteristics, the models of big data also process the data into information and predicts the situation and outcome using the past and the present data. The entire data is processed by the big data and kept in the cloud, where the cryptography techniques can be considered to overcome security and privacy issues. Data can be encrypted on sending to the cloud and decrypted when retrieved when used [50].

A secure big data analytics model provides a strong trust in the cybersecurity and privacy of the data. Certain measures are supposed to be overlooked to provide the security such on [51].

  • New strategies of security and privacy can be developed for business, financial industries and government agencies.
  • A centralized data management infrastructure should be adapted and frequent security checks on the analytics model should be carried out.
  • Network monitoring, suspicious alerts must be implemented and security should be ensured to priority databases such as government and military databases all the time without fail.
  • Monitor the real-time stream data and anomaly detection in the network traffic must be guaranteed.

Different strategies must be applied to the cloud that ensures advanced security to the data stored to provide robustness, reliability and privacy to cyber-physical systems application where a stream of dynamic and sensitive data is generated.

2.8 Conclusion

The big data-enabled CPS technology benefits both the big data processing and analysis and the technological advancements in the system. Cyber-physical system is all about the integration of the physical system with a cyber system where communication, computation and control aid in the integration. To make faster and better control decisions the BDA enable CPS architecture is availed where it adapts the characteristics of the system by including the communication layer. The integration of big data analytics with cyber-physical system focuses on basic characteristics of big data and some specialized features of the cyber-physical system and the issues and challenges of integrating big data and CPS are discussed along with the control and management. The storage and communication when the cyber-physical system combined with big data and the data processing in the cloud and multi-cloud technologies, clustering methods in big data and CPS, and big data analytics such as data mining, real-time analytics, spatial–temporal analytics overcome the computational challenges. The CPS big data collection, storage, transmission is considered. The applications of big data-driven CPS such as manufacturing, smart grids, smart cities, healthcare and smart intelligence are discussed from the cyber-physical system perspective. The data generated from the various cyber-physical system through sensors, digital networks, physical devices for particular applications such as military, government, smart girds, manufacturing, aerospace, etc. are of high importance where security and privacy must be provided to those data. The issues and challenges of security and privacy to the cloud containing the sensitive data must be strongly protected along with the big data analytical techniques that process the data and store it in the cloud.

References

1. Broy, M., Engineering cyber-physical systems: Challenges and foundations. CSD&M, pp. 1–13, 2013.

2. Chen, M., Mao, S., Liu, Y., Big data: A survey. Mobile Netw Appl., 19, 2, 171–209, 2014.

3. Chan, J.O., An architecture for big data analytics. CIIMA, 13, 2, 1, 2013.

4. García-Gil, D., Luengo, J., García, S., Herrera, F., Enabling smart data: noise filtering in big data classification. Inf. Sci., 479, 135–152, 2019.

5. Luo, S., Liu, H., Qi, E., Big data analytics–enabled cyber-physical system: model and applications. Ind. Manage. Data. Syst., 119, 5, 1072–1088, 2019.

6. Hawkins, J., George, D., Niemasik, J., Sequence memory for prediction, inference and behaviour. Philos. Trans. R. Soc B., 364, 1521, 1203–1209, 2009.

7. Golfarelli, M. and Rizzi, S., A survey on temporal data warehousing. IJDWM., 5, 1, 1–17, 2009.

8. Zhang, L., Designing big data driven cyber physical systems based on AADL. IEEE SMCS., 3072–3077, 2014, October.

9. Schamai, W., Modelica modeling language (ModelicaML): A UML profile for Modelica, Linköping University Electronic Press, Linköping, 2009.

10. Jifeng, H., A clock-based framework for construction of hybrid systems. ICTAC, Springer, Berlin, Heidelberg, pp. 22–41, 2013, September.

11. Zhang, L., A framework to model big data driven complex cyber physical control systems, in: ICAC, pp. 283–288, IEEE, Cranfield, UK, 2014, September.

12. Hahanov, V., Gharibi, W., Litvinova, E., Chumachenko, S., Big data driven cyber analytic system. IEEE BigData Congress, 615–622, 2015, June.

13. Marini, A. and Bianchini, D., Big Data As A Service For Monitoring Cyber-Physical Production Systems. ECMS, pp. 579–586, 2016, May.

14. Niggemann, O., Biswas, G., Kinnebrew, J.S., Khorasgani, H., Volgmann, S., Bunte, A., Data-Driven Monitoring of Cyber-Physical Systems Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control. Proceedings of the 26th International Workshop on Principles of Diagnosis, 185–192, 2015, August. DX@ Safeprocess.

15. Liu, L., Zhao, S., Yu, Z., Dai, H., A big data inspired chaotic solution for fuzzy feedback linearization model in cyber-physical systems. Ad. Hoc. Netw., 35, 97–104, 2015.

16. Ray, I. and Ray, I., Proc. NSF Workshop Cyber-Phys. Syst., 1–5, 2009, July.AU: Please provide article title and volume number.

17. Zeydan, E., Bastug, E., Bennis, M., Kader, M.A., Karatepe, I.A., Er, A.S., Debbah, M., Big data caching for networking: Moving from cloud to edge. IEEE Commun. Mag., 54, 9, 36–42, 2016.

18. Zhao, H., Gai, K., Li, J., He, X., Novel differential schema for high performance big data telehealth systems using pre-cache. IEEE HPCC. CSS. ICESS, 2015, August1412-1417.

19. Ji, M., Caire, G., Molisch, A.F., Wireless device-to-device caching networks: Basic principles and system performance. IEEE. J. Sel. Areas Commun., 34, 1, 176–189, 2015.

20. Huang, J., Wang, S., Cheng, X., Bi, J., Big data routing in D2D communications with cognitive radio capability. IEEE Wirel. Commun., 23, 4, 45–51, 2016.

21. Zhu, C., Wang, H., Liu, X., Shu, L., Yang, L.T., Leung, V.C., A novel sensory data processing framework to integrate sensor networks with mobile cloud. ISJ., 10, 3, 1125–1136, 2014.

22. Jabbar, S., Malik, K.R., Ahmad, M., Aldabbas, O., Asif, M., Khalid, S., Ahmed, S.H., A methodology of real-time data fusion for localized big data analytics. IEEE Access, 6, 24510–24520, 2018.

23. Wang, D. and Liu, J., Optimizing big data processing performance in the public cloud: opportunities and approaches. IEEE Netw., 29, 5, 31–35, 2015.

24. Destounis, A., Paschos, G.S., Koutsopoulos, I., Streaming big data meets backpressure in distributed network computation. IEEE INFOCOM, pp. 1–9, 2016, April.

25. Yang, C. and Chen, J., A scalable data chunk similarity based compression approach for efficient big sensing data processing on cloud. IEEE Trans. Knowl. Data Eng., 29, 6, 1144–1157, 2016.

26. Wu, C.Q. and Cao, H., Optimizing the performance of big data workflows in multi-cloud environments under budget constraint. IEEE SCC, pp. 138–145, 2016, June.

27. Pham, L.M., Tchana, A., Donsez, D., Zurczak, V., Gibello, P.Y., De Palma, N., An adaptable framework to deploy complex applications onto multi-cloud platforms. IEEE RIVF ICC – RIVF, pp. 169–174, 2015, January.

28. Li, X., Ma, H., Yao, W., Gui, X., Data-driven and feedback-enhanced trust computing pattern for large-scale multi-cloud collaborative services. IEEE Trans. Serv. Comput., 11, 4, 671–684, 2015.

29. Zhao, Y., Yang, L.T., Zhang, R., A tensor-based multiple clustering approach with its applications in automation systems. IEEE Trans. Ind. Informat., 14, 1, 283–291.

30. Shettar, R. and Purohit, B.V., A MapReduce framework to implement enhanced K-means algorithm. IEEE ICATCCT., 361–363, 2015, October.

31. Fu, J., Liu, Y., Zhang, Z., Xiong, F., Big data clustering based on summary statistics. IEEE CCITSA., 87–91, 2015, December.

32. Cao, J. and Li, H., Energy-efficient structuralized clustering for sensor-based cyber physical systems. 234–239, 2009, July.

33. Spezzano, G. and Vinci, A., Pattern detection in cyber-physical systems. Proc. Comput. Sci., 52, 1016–1021, 2015.

34. Chen, T.C., Sanga, S., Chou, T.Y., Cristini, V., Edgerton, M.E., Neural network with K-means clustering via PCA for gene expression profile analysis. IEEE WRI. CSIE., 3, 670–673, 2009, March.

35. Ding, G., Tan, Z., Wu, J., Zeng, J., Zhang, L., Indoor fingerprinting localization and tracking system using particle swarm optimization and Kalman filter. IEICE., 98, 3, 502–514, 2015.

36. Wang, L. and Wang, G., Big data in cyber-physical systems, digital manufacturing and industry 4.0. IJEM., 6, 4, 1–8, 2016.

37. Scheuermann, C., Verclas, S., Bruegge, B., Agile factory-an example of an industry 4.0 manufacturing process. IEEE CPSNA, 2015, August 43-47.

38. Thoben, K.D., Wiesner, S., Wuest, T., Industrie 4.0” and smart manufacturing-a review of research issues and application examples. Int. J. Autom. Technol., 11, 1, 4–16, 2017.

39. Lee, J., Jin, C., Bagheri, B., Cyber physical systems for predictive production systems. J. Prod. Eng., 11, 2, 155–165, 2017.

40. Wang, K., Wang, Y., Hu, X., Sun, Y., Deng, D.J., Vinel, A., Zhang, Y., Wireless big data computing in smart grid. IEEE Wirel. Commun., 24, 2, 58–64, 2017.

41. Kumar, N., Zeadally, S., Misra, S.C., Mobile cloud networking for efficient energy management in smart grid cyber-physical systems. IEEE Wirel. Commun., 23, 5, 100–108, 2016.

42. Rathore, M.M., Ahmad, A., Paul, A., Jeon, G., Efficient graph-oriented smart transportation using internet of things generated big data. IEEE SITIS, 2015512-519.

43. Moreno, M.V., Terroso-Sáenz, F., González-Vidal, A., Valdés-Vela, M., Skarmeta, A.F., Zamora, M.A., Chang, V., Applicability of big data techniques to smart cities deployments. IEEE Tran. Ind. Informat., 13, 2, 800–809, 2016.

44. Alhumud, M.A., Hossain, M.A., Masud, M., Perspective of health data interoperability on cloud-based medical cyber-physical systems. IEEE ICMEW, pp. 1–6, 2016, July.

45. De Venuto, D. and Annese, V.F., Sangiovanni-Vincentelli, A. L. (2016, May). The ultimate IoT application: A cyber-physical system for ambient assisted living. IEEE ISCAS, pp. 2042–2045.

46. Kumar, N., Bali, R.S., Iqbal, R., Chilamkurti, N., Rho, S., Optimized clustering for data dissemination using stochastic coalition game in vehicular cyber-physical systems. J. Supercomput., 71, 9, 3258–3287, 2015.

47. Nawa, K., Chandrasiri, N.P., Yanagihara, T., Oguchi, K., Cyber physical system for vehicle application. Trans. Inst. Meas. Control., 36, 7, 898–905, 2014.

48. Bradley, J.M. and Atkins, E.M., Optimization and control of cyber-physical vehicle systems. Sensors, 15, 9, 23020–23049, 2015.

49. Gai, K., Qiu, M., Zhao, H., Security-aware efficient mass distributed storage approach for cloud systems in big data. IEEE BigDataSecurity HPSC. IDS, pp. 140–145, 2016, April.

50. Sekar, K. and Padmavathamma, M., Comparative study of encryption algorithm over big data in cloud systems. IEEE INDIACom, pp. 1571–1574, 2016, March.

51. Mahmood, T. and Afzal, U., Security analytics: Big data analytics for cybersecurity: A review of trends, techniques and tools. IEEE NCIA, pp. 129–134, 2013, December.

  1. *Corresponding author: [email protected]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset