Chapter 8
IN THIS CHAPTER
Grasping IoT vocabulary and technology components
Seeing how data science supports the IoT
Grasping the powerful combination of and IoT
The Internet of things (IoT) is a network of connected devices that use the Internet to communicate amongst each other. That sounds sort of scary, right — like the movie Her, where machine-machine communications allow machines to begin thinking and acting autonomously? But actually, IoT represents the next level of insight, efficiency, and virtual assistance — the stuff we modern humans love and crave.
The rise of IoT has been facilitated by three major factors:
The good news for data scientists is that data science is at the root of each of these three factors, making the IoT an ideal area for data scientists to develop expertise.
Just like data science, IoT itself is not the endgame. What’s most inspiring and impressive about IoT is how it’s deployed within different vertical markets — niche areas of commercial and industrial application (for example, manufacturing, oil and gas, retail, banking, and so on). For some examples, consider the following types of emerging technologies:
Read on to learn how IoT works, the technologies that support it, and the advancements it promises to foster.
The Internet of things is its own class of technology. It has its own vocabulary and its own set of underlying technologies. Before getting into IoT data science, take a moment to familiarize yourself with them in the next four sections.
Before delving into the data science and innovation that’s related to IoT, you need a grasp of the fundamental vocabulary. The fog — or IoT cloud — is a network of cloud services that connect to IoT-enabled devices. Cloud-based big data processing and analytics requirements are supported by these IoT cloud services. They use cloud-based data processing and analytics to support the IoT by facilitating intelligent, adaptive, and autonomous device operations.
Edge devices are the IoT-enabled devices that are connected to the IoT cloud. Besides being connected to the fog, these devices all share one thing in common: They generate data through any number of appliances, including sensors, odometers, cameras, contact sensors, pressure sensors, laser scanners, thermometers, smoke detectors, microphones, electric meters, gas meters, water flow meters, and much more.
The good news is that not all the data that’s produced on edge devices needs to be moved to the cloud for processing, storage, and analytics. In fact, most edge devices come equipped with device-embedded applications that are capable of processing and deriving insights locally, using the data that’s created by device appliances in real-time. Local data processing and analytic deployment is called edge processing, and it helps save resources by
Whether processing happens locally or on the cloud, IoT analytic applications that implement adaptive machine learning algorithms are called adaptive IoT applications. These adaptive IoT applications enable devices to adjust and adapt to the local conditions in which the device is operating. Later in this chapter, you can see an overview of popular machine learning methods for data science in IoT. Figure 8-1 illustrates some of these components to help pull them together into a conceptual schematic.
Like most other things related to the IoT, IoT professionals are a breed of their own. IoT cloud application developers are data scientists and engineers who focus exclusively on building adaptive IoT applications for deployment on local devices. The more general IoT developer, on the other hand, is responsible for building products and systems that serve the greater needs of the IoT cloud at-large, including all its connected IoT devices, data sources, and cloud computing environments.
IoT platforms are broken into hardware platforms and software platforms. IoT hardware platforms are hardware components that you can use to connect devices to the IoT cloud, to stream data, and to manage device operations locally. Each platform offers its own set of core features, so you’ll need to do some research into which meets your specific needs; some popular IoT hardware platforms are Raspberry Pi, Intel Edison, and Arduino products. IoT software platforms offer services such as device management, integration, security, data collection protocols, analytics, and limited data visualization. Again, each solution offers its own, unique blend of features, so do the research; major vendors are AWS IoT platform and IBM IoT Foundation Device Cloud.
Spark is an ideal framework for integrated real-time big data processing and analysis. With respect to the IoT, each IoT sensor stream can be transformed into Spark DStreams — discreet data streams that are the fundamental data abstraction in the Spark Streaming module (the module where data processing is carried out). After you have your data in DStreams, it’s then quite simple to construct automated analytical operations that filter, process, and detect based on DStream content. Depending on what’s detected, real-time notifications and alerts are issued back to IoT applications regarding mission-critical insights. You can use the Spark Streaming window operations on DStream sources to quickly and easily aggregate processing and alerting to any regular time intervals of your choosing. Lastly, for comparative analytics, you can use Spark’s Resilient Distributed Datasets (RDD) — an immutable collection of objects, and a fundamental Spark data structure — to store any relevant historical datasets in-memory.
Major IoT advancements are being made in contextual-awareness — where sensors are generating data that can be used for real-time context-aware services rendered by the device that’s generating the data. This context awareness is facilitated by a technology called sensor fusion — where data from several different sensors is fused by a microcontroller to produce a broader, more detailed view on what’s happening in a local environment. Technologies that support sensor fusion include EM Microelectronic, NXP, and even Apache Flink.
If you want to build predictive IoT models and applications, you need to know Python and SQL, covered in Chapter 14 and Chapter 16, respectively. You can use Python for data wrangling, visualization, time series analysis, and machine learning. Knowing SQL is useful for querying data from traditional databases or from the Hadoop Distributed File System. (I tell you more about this topic in Chapter 2.) Read on to learn more about specific analytical methods as they relate to IoT data science.
Most IoT sensor data is composed of time series, so you should be adept at building and using time series models. One way that time series models are useful in the IoT is for decreasing the data transmission overhead for a wireless sensor network. (You’ll understand why after you read the following list.) These two time series models are important for IoT data science:
Just as sensor nodes create data that’s labeled with a timestamp, they also produce data that’s labeled with a geospatial location stamp. Each observation occurs at its given time — and place — so location is a big deal when it comes to the IoT. Many IoT applications consider an edge device’s location, and nearness, with respect to other connected devices. All of this requires multidimensional geospatial data processing and analytics capabilities, which only a GIS application — a geographic information system application — is designed to offer. GIS, coupled with IoT network and data technologies, facilitate real-time geo-space-time analytics, enabling geo-insights to be delivered at the right time and place, precisely when these insights are actionable. Real-time geospatial analytics generate serious, sometimes life-saving, results when you use them to do things like this:
Deep learning is an exciting development within IoT. That’s because deep learning enables adaptive autonomous operations of the machine network. As you may recall from Chapter 4, deep learning is a machine learning algorithm that deploys layers of hierarchical neural networks to learn from data in an iterative and adaptive way. Similar to how the moving average and ARIMA models update on their own, deep learning models are able to adjust to and learn from data, despite changes and irregularities present in incoming sensor data.
I’ve listed some of the requirements that a deep learning model will face when deployed in the IoT environment:
To understand artificial intelligence and its place in the IoT, you first need to grasp some key differences between the terms artificial intelligence, machine learning, and IoT. The term artificial intelligence (AI) refers to built-systems that mimic human behavior by making insightful decisions that are derived from artificial neural network model outputs. Many AI technologies implement deep learning or reinforcement learning, but, traditionally, the driving intelligence behind AI was artificial neural networks. As I explain in Chapter 4, neural nets are one type of machine learning method, among many. So, to be clear, machine learning is not AI, but it encompasses a few methods that drive the decisions that are made by AI technologies. In itself, machine learning is simply the practice of applying algorithmic models to data in order to discover hidden patterns or trends that you can use to make predictions.
The IoT is a network of connected, smart devices, many of which depend on output from machine learning models to direct and manage device operations. In this sense, some IoT devices are considered a form of artificial intelligence. But not all devices that are connected to the IoT are AI technologies. Some connected devices are managed by traditional control systems that don’t include machine learning or advanced analytics, like SCADA — Supervisory Control and Data Acquisition. These devices would still be IoT devices, but they would not be considered AI-driven technologies.
Artificial intelligence has been around awhile — since the 1940s, in fact. Some of the more recent AI-driven innovations include these objects:
IoT is ushering in its own breed of AI advancements, though. One type of innovation that is already available is the smart home. To understand how IoT combines with AI to produce a smart home, imagine that it’s summertime and it’s very hot outside. When you leave for work, your air conditioning is always turned off, and then when you get home at 5 p.m., it takes a long time to cool the house. Well, with IoT and AI advancements, you can connect your phone’s GPS, an outdoor temperature sensor, and the air conditioner. The network can learn what features indicate your impending arrival — like departure from work, time of departure, and directionality of travel — to predict that you will arrive by a certain time. The network can use the outdoor temperature reading to learn how long the air conditioner should run and at what temperature, to bring the room temperature down to the temperature setting you’ve selected. So, when you arrive home, your house will be the perfect temperature without you having had to wait or to turn the systems on or off. They could act autonomously, based on what they’ve learned from the various connected devices, and based on the parameters you set for them.