Data Analytics and Machine Learning in the Cloud and in the Fog

The value of an IoT system is not a single sensor event, or a million sensor events archived away. A significant value of IoT is in the interpretation and decision made of that data. While a world of billions of things connected and communicating with each other and the cloud is well and good, the value lies in what is within the data, what is not in the data, and what the patterns of data tell us. This is the data science and data analytics portions of IoT, and probably the most valuable area for the customer. 

Analytics for the IoT segment deal with:

  • Structured data (SQL storage), a predictable format of data.
  • Unstructured data (raw video data or signals), a high degree of randomness and variance.
  • Semi-structured (Twitter feeds), some degree of variance and randomness in form.

Data also may need to be interpreted and analyzed in real time as a streaming dataflow, or it may be archived and retrieved for deep analytics in the cloud. This is the data ingest phase. Depending on the use case, the data may need to be correlated with other sources in flight. In other cases, the data is simply logged and dumped to a data lake like a Hadoop database. 

Next comes some type of staging, meaning a messaging system like Kafka will route data to a stream processor, or a batch processor, or perhaps both. Stream processing tolerates a continuous stream of data. Processing is typically constrained and very fast, as the data is processed in-memory. Therefore, processing must be as fast, or faster, than the rate of data entering the system. While stream processing provides near real-time processing in the cloud, when we consider industrial machinery and self-driving cars, stream processing does not provide hard real-time operating characteristics.

Batch processing, on the other hand, is efficient in dealing with high-volume data. It is particularly useful when IoT data needs to correlate against historical data. 

After this phase, there may be a prediction and response phase where information may be presented on some form of the dashboard, logged, or perhaps the system will respond back to the edge device, where corrective actions can be applied to resolve some issue.

This chapter will discuss various data analysis models from complex event processing to machine learning. Several use cases will be taught to help generalize where one model can work and others may fail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset