Distributed systems – single points of failure

Designing distributed systems also means taking into account link defects, unreachable gateways, and other failures. Embedded devices should not stop working when disconnected from the internet, but rather offer fall-back mechanisms, based on local gateways. Consider, for example, a demotic IoT system for controlling all the heating and cooling units in a house, accessible from portable devices and coordinated remotely using any network access. Temperature sensors, heaters, and coolers are controlled by a mesh network of embedded devices, while the central control is on remote cloud servers. The system can control the actuators remotely, based on user settings and sensor readings. This gives us the possibility of accessing the service even from a remote location, allowing the user to tune the system to set the desired temperature in each room, based on the commands sent from user interfaces, which are processed and relayed by the cloud to reach their destination in the embedded devices. As long as all the components are connected to the internet, the IoT system works as expected.

Nevertheless, in the case of connection failure, users would not be able to control the system or activate any function. Terminating the application service on a local device within the local area network ensures the continuity of the services across failures of the link to the internet and any issues that would prevent the local network from accessing the remote cloud device. If such a mechanism is in place, a system disconnected from the internet would still provide a failover alternative to access sensors and actuators, assuming that all the actors in play are connected to a common LAN. Moreover, having a local system processing and relaying settings and commands reduces the latency of the actions requested, because requests do not have to travel across the internet to be processed and forwarded back to the same network. Designing reliable IoT networks must include a careful assessment of the single points of failure among all the links and devices used to provide services, and this must include the backbone link used to reach services, message brokers, and remote devices that can cause malfunctions or other issues to the entire system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset