Deploying analytics

Although analytics should be agnostic with regard to how the data is fed to the platform, we have to consider several potential pitfalls that can affect the efficiency of the analytics. There are several strategies that we can use to feed I-IoT data to the platform:

  • Bulk ingestion, for example, one file daily
  • ­Small portion, for example, one file every five minutes
  • Data streams, where files are fed continuously with a small latency

Data is also affected by several issues:

  • ­ It might be in the wrong order. For example, a data point at 18:00 might be sent at 18:10 and a data point at 17:59 might be sent at 18:11.
  • ­It might be of a bad quality.
  • It might have holes in it.
  • It might have anomalous spikes in it.
  • ­It might be frozen. This refers to a situation where you have a suspiciously flat number for a long time.

These issues are illustrated in the following diagram:

Data being affected by issues

Data might also be delayed for a long period of time. We know this from personal experience in the oil and gas industry—one particular customer reactivated their connection to the cloud after six months of being disconnected and the data from the sensors filled the data lake in three days. Unfortunately, the analytics processed the data of the entire time period and detected a whole series of anomalies and alerts. These alerts were not useful at all because they were from the time in which the customer was disconnected, so the operations center was flooded with junk alerts.

To build a real I-IoT platform, we have to develop our architecture with sufficient robustness to address these issues. For example, we can adopt a timeout for data that is too late, or we can pre-process data and mark any data that is in the wrong order or that is frozen. Moreover, we can interpolate data before fueling the analytics to avoid holes.

We should avoid manipulating the raw data during pre-processing. The best approach is to simply mark the data point or the events so that we can restore the raw data if there are errors.

In the previous section, we learned about the technologies required to build analytics. Now, we need to deploy them, assuming that the infrastructure supports our use case. We then need to define the method to trigger the analytics. We have three methods we can use to do this:

  • Stream analytics
  • Micro-batch analytics
  • Condition-based analytics
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset