9

Distributed Systems and IoT Architecture

By accessing communication peripherals, such as network controllers and radio interfaces, microcontrollers are able to establish data communication with nearby devices and even with remote servers through the internet.

A set of embedded targets connected together and interacting with each other can be seen as a self-contained distributed system. Homogeneous machine-to-machine communication can be implemented using non-standard, and even proprietary, protocols.

Depending on the set of standard protocols it implements, an embedded system may be able to successfully communicate with heterogeneous, remote systems. Implementing standard protocols that are standardized or widely supported introduces the possibility to interact with gateways in the same geographic area, and with remote cloud servers across the internet.

The connectivity range of small, embedded devices may include remote coordination using Information Technology (IT) systems. The encounter between the two worlds has changed the modern interpretation of distributed systems: low-power, inexpensive devices can now be part of services with solid roots in IT, which, in turn, can extend their branches into localized and specialized sensors and actuators, creating what has been known as the Internet of Things (IoT).

This technological step, considered revolutionary by many, is capable of changing the way we access technology, and human-to-machine interaction processes, forever. Unfortunately, the security aspects of IoT communication have too often been neglected, leading to unpleasant incidents, which may compromise the confidentiality and integrity of the data transmitted and permit attackers to take control of remote devices.

This chapter analyzes the telecommunication technologies and protocols that are possible to integrate into embedded targets, using them to better understand the design from the point of view of the whole embedded system, up to integration within IoT networks.

We will learn about the networking model, starting from the physical layer and the possible technologies for establishing wireless or wired links, up to tailored embedded applications that can establish secure communication with cloud services, using standard communication protocols.

In particular, we will look at the following:

  • Network interfaces
  • The Internet protocols
  • TLS
  • The application protocols

By the end of this chapter, you will have an in-depth understanding of today’s microcontroller’s IoT capabilities.

Technical requirements

In this chapter, we assume that you are familiar with general concepts of modern computer networking, although no previous experience with distributed applications is required. For a more complete background on network programming, which is relevant to the content of this chapter, we suggest, as further reading, Hands On Network Programming with C (L. Van Winkle – Packt Publishing 2019). There are no specific examples provided in the book’s repository for this chapter. More complete examples of TCP and Transport Layer Security (TLS) client/server communication can be found in the source code distribution of the open source projects presented here.

Network interfaces

Embedded devices often integrate one or more communication interfaces. Many microcontrollers integrate the Media Access Control (MAC) portion of an Ethernet interface, so connecting a Physical Layer Transceiver (PHY) would enable LAN access. Some devices are coupled with radio transceivers, operating at fixed frequency ranges and implementing one or more protocols to communicate over wireless links. Frequently used frequencies for wireless communication are the 2.4 GHz band, in use by Bluetooth and 802.11 Wi-Fi, and some specific ISM ranges of frequency below 1 GHz, which depend on local regulations. Usable sub-GHz frequencies include the 868 MHz ISM band in the European Union and the 915 MHz ISM band in the US. Transceivers are usually designed to access the physical layer according to specific link protocols, regulating shared access to the physical media among two or more devices. While two interfaces accessing the same media can have different configurations, the MAC model implemented must follow the same specifications on all the endpoints in order to establish point-to-point communication. Part of the MAC layer may be implemented in the device itself, which, in turn, can use a parallel or a serial interface to transfer data to and from the microcontroller.

Hardware manufacturers may distribute the device drivers to access the link layer. When the full source code is made available, it is easier for a developer to customize the media access, integrate the device communication features, and tailor the communication to any protocol stack supported by the media. However, many device drivers are only partially open source, sometimes limiting the possibilities for integration with open standards. Moreover, integrating third-party proprietary code into an embedded system impacts the project maintenance and often requires workarounds for known issues or to enable features not foreseen by the manufacturer, and definitely impacts the security model of the system.

The implementation of device drivers in embedded systems, for either wired or wireless network interfaces, includes integrating the relevant access control mechanism in the communication logic and dealing with specific channel features. Some characteristics of the link may affect the design of higher-level communication, thus impacting the architecture of the entire distributed system. Alongside a reliable interaction with the MAC mechanisms, aspects such as bit rate, latency, and maximum packet size must be addressed and evaluated in the design phase to evaluate the resources required based on the goals of the system.

The next section offers an overview of some popular network interfaces in the embedded world, typically used by connected devices to communicate with the other components of a broader distributed system. The subsequent section will suggest some criteria to navigate through the options for selecting the best technology for a specific purpose during the design of the communication infrastructures and protocols.

MAC

The most important components to establish successful communication links over any physical media are grouped in the MAC logic, the implementation of which is often a shared responsibility between the software and hardware. Different technologies have evolved to define standards to access the links that are used nowadays for machine-to-machine communication, while only a few can scale within the context of a geographically distributed IoT system without intermediate gateways performing protocol conversions.

Some of the standards are directly derived from the IT world and consist of adaptations of existing TCP/IP technologies capable of scaling down to fit within the limited resources available on embedded systems. Other standards have evolved entirely within the context of small, embedded devices, and interaction with the classic IT infrastructure is achieved through the modeling of TCP/IP protocols on top of low-power wireless technologies. In both cases, the research for convergence is dictated by the need for broader integration of small, inexpensive, self-powered devices into IoT services.

There is no such thing as a definitive one-size-fits-all solution to define network access for embedded systems. The differences in requirements across the embedded industry have encouraged the development of tailored MAC protocols and technologies, both standardized and proprietary, each of them tailored to respond to the need for specific features or a range of embedded systems.

In the following subsections, some of the most successful MAC technologies for machine-to-machine communication are described, taking into consideration the aspects related to the adoption of the technology and the modes of integration.

Ethernet

Even though it may sound a little impractical for contexts in which the size of the whole system is comparable to an RJ-45 connector, Ethernet is still the most reliable and fastest channel of communication available to integrate into embedded systems.

Many Cortex-M microcontrollers are equipped with one Ethernet MAC controller, which must be integrated with an external PHY. Other link-layer protocols implement the same mechanism for link-layer addressing, consisting of a 14-byte preamble attached to each packet transmitted, indicating the source and destination link addresses and the type of payload contained in the packet being transported. The MAC addresses are rewritten every time a packet is routed toward an Ethernet-like interface by the TCP/IP stack so that they match the next link that the packet must cross in its journey toward its final destination.

Device drivers can activate filters to discard all the traffic that does not involve the host, which would otherwise impact the amount of background data communication unnecessarily being processed by the TCP/IP stack.

Wi-Fi

Among all the possibilities in the wireless universe, 802.11 Wi-Fi is chosen for its high-speed, low-latency channel, and for the widest possible topological compatibility, including with personal computers and mobile devices. However, the power requirement of a Wi-Fi transceiver can sometimes be difficult to afford for low-power devices. The complexity of protocols and mechanisms to regulate media access requires a consistent amount of controlling software, which is often distributed in binary form, and thus impossible to debug and maintain without the support of the manufacturers.

Wi-Fi provides large bandwidth and reasonably low latency and may implement authentication and encryption at the data-link level.

While it is technically possible to realize a local mesh network configuring the Wi-Fi transceivers to operate in an ad hoc mode, embedded systems equipped with 802.11 technology are mostly used to connect to existing infrastructures to interact with other portable devices and access the internet.

Several embedded low-cost platforms are available on the market, equipped with a TCP/IP stack and a built-in RTOS, which can be used as a standalone platform or integrated into complete systems to access wireless LAN, either as a station or to provide an access point.

Low-Rate Wireless Personal Area Networks (LR-WPANs)

Sensor mesh networks make extensive use of wireless technology to establish communication in a local geographical area. The 802.15.4 standard regulates the access to 2.4 GHz and sub-GHz frequencies to provide limited-range local area networks with a typical maximum bit rate of 250 Kbps, which can be accessed using low-cost, low-power transceivers. The media access is not based on infrastructure and supports contention resolution and collision detection at the MAC level, using a beaconing system. Each node can be addressed using 2 bytes, and the special address of 0xFFFF is reserved for broadcast traffic to reach all the nodes in visibility. The maximum payload size for 802.15.4 frames is fixed to 127 bytes, and thus it is not possible to encapsulate full-size IP packets routed from an Ethernet or a wireless LAN link. Network protocol implementations that are capable of communicating through 802.15.4 interfaces are either application-specific, do not support IP networking, or offer fragmentation and compression mechanisms to transmit and receive each packet across multiple wireless frames.

While not specifically designed for the IoT, and not directly compatible with classic IP infrastructures, there are multiple choices available to build networks on top of 802.15.4. In fact, while the standard specifies the MAC protocol for exchanging frames among nodes that are in visibility, multiple link-layer technologies, standard and non-standard, have been developed to define networks on top of 802.15.4.

LR-WPAN industrial link-layer extensions

Thanks to the flexibility of the transceivers, and the capability of transmitting and receiving 802.15.4 raw frames, it is relatively easy to implement networking protocols for LR-WPANs.

In the pre-IoT era, the process automation industry was the first to adopt the 802.15.4 technology and had been searching for a standard protocol stack to enable compatibility among devices from different manufacturers for a long time. The Zigbee protocol stack endeavored to become a de facto, industry-imposed standard for 802.15.4 networking, with noticeable success, considering the proprietary, closed-source, and royalties applicable to its commercial use. In a parallel effort, the International Society of Automation (ISA) has created a proposal for the open standard ISA100.11a, which aims to define the guidelines for building networks based on 802.15.4 links to be used in industrial automation processes. Another industrial automation protocol, originally developed by a consortium of enterprises and then approved by the International Electrotechnical Commission (IEC) as a standard for industrial automation, is WirelessHART.

Technologies such as Zigbee, ISA100.1, and WirelessHART define the entire protocol stack above 802.15.4, including network definition and transport mechanisms, providing custom address mechanisms and communication models, and exporting an API that can be used to integrate applications. From the perspective of the design of the distributed system, enabling internet connectivity for devices in a custom network, not implementing the IP stack, requires one or more devices to act as a gateway, rerouting and transforming each packet for the custom LR-WPAN protocol stack. The transformation procedure, however, violates the end-to-end semantics of TCP/IP communication, impacting various aspects of the communication, including end-to-end security.

6LoWPAN

6LoWPAN, described in RFC 4944, is the IETF-standardized 802.15.4 link protocol that can transport IPv6 packets, and it is the established standard for IP-compatible LR-WPANs. 6LoWPAN makes it possible for embedded systems to access the internet using 802.15.4 interfaces, as long as the nodes implement TCP/IP networking, and the link layer provides mechanisms to transmit and receive full-size IP packets using short LR-WPAN frames. The content of the packet is fragmented and transmitted into consecutive transport units, and the network and transport headers are optionally compressed to reduce the transmission overhead.

There is currently no IPv4 counterpart of the 6LoWPAN standard; however, IETF is evaluating proposals adopting a similar approach to enable legacy IPv4 connectivity for embedded nodes.

6LoWPAN is part of several network stack implementations, and it is part of a recent attempt to create an industrial alliance, the Thread group, whose goal is to promote a fully IPv6, low-power mesh network technology based on open-standard protocols designed for the IoT. Multiple free and open source TCP/IP stacks and embedded operating systems support 6LoWPAN and can access 802.15.4 transceivers to provide the necessary link infrastructure to build IP networks based on the functionalities and the protocol implemented.

Mesh networking can optionally be added to the link layer to provide a transparent bridge mechanism called mesh-under, where all the frames are repeated by the link layer to the remote corners of the mesh until their destination is reached.

Because 6LoWPAN provides the infrastructure for building the network topology, mesh networking can be approached differently, using application-level protocols to update the routing tables at the IP level. These mechanisms, known as route-over mesh networking, are based on standardized dynamic routing mechanisms, and may also be used to extend the mesh network across different physical links.

Bluetooth

Another machine-to-machine connectivity technology in constant evolution is Bluetooth. Its physical layer is based on 2.4 GHz communication to establish host/device communication or provide the infrastructure for PAN supporting multiple protocols, including TCP/IP communication. Thanks to its longtime success and its consequent wide adoption in the market of personal computers and portable devices, Bluetooth connectivity has started to gain popularity in the universe of embedded microcontrollers, mostly due to the recent evolution of the standard in the direction of lower power consumption.

Initially designed as a wireless replacement for serial communication for devices at a close range, the classic Bluetooth technology has evolved to support integrated dedicated channels, including TCP/IP-capable network interfaces and dedicated audio and video streaming links.

A low-power variant of the protocol stack, introduced with version 4 of the standard definition, has been designed to limit energy consumption for embedded sensor nodes and introduces a new set of services. A sensor device may export a Generic Attribute Profile (GATT) that can be accessed by a client (usually a host machine) to establish communication with a device. When the transceiver on the target is inactive, it consumes a small amount of power, with it still remaining possible to discover its attribute and initiate a GATT transfer from a client. Bluetooth is mostly used nowadays for short-range communication; to access sensor nodes from personal computers and portable devices; to exchange multimedia content with remote audio devices such as speakers, headsets, and hands-free automotive voice interfaces; and in several healthcare applications, thanks to some profiles being specifically designed for this purpose.

Mobile networks

Connecting remote devices that have no fixed infrastructure available in their surroundings has been made possible using the same technology that portable devices use to access the internet over mobile networks, such as GSM/GPRS, 3G, and LTE. The increasing complexity, cost, and energy requirements characterizing the devices that access broadband mobile connectivity have increased the impact of integrating this sort of network communication into microcontroller-based embedded devices. Mobile networks support TCP/IP protocols natively and provide direct connectivity to the internet, or in some cases, to restricted networks provided by the access infrastructure.

Although still popular in some specific markets, such as automotive and railway, broadband network access profiles are usually overkill for transferring a small amount of information from remote sensor devices, while simpler modems to access older, narrow-bandwidth technologies are slowly disappearing from the market.

While mobile network technologies evolve, focusing on the requirements of the mobile phone market, embedded device architects are in search of new technologies that better match the needs of distributed IoT systems. New technologies better meet the embedded market goals and evolution toward low-power, cost-effective, long-distance communication.

Low-Power Wide Area Networks (LPWANs)

LWPANs are a family of emerging technologies that fill the market gap for cost-effective, low-power, long-distance, narrow-band communication. As for LR-WPANs, different industrial alliances have been formed in an attempt to conquer the market, and in some cases, establish a standard protocol stack for universal LPWAN networks. This process has led to healthy competition on features, costs, and power-saving features.

LPWAN technologies are usually based on sub-GHz physical channels, but use different radio settings, allowing for an increased range. Devices can communicate with each other over the air, and, in some cases, use an infrastructure to increase coverage, even across thousands of kilometers, when in visibility of a base station.

The most noticeable emerging technologies in this field include the following:

  • LoRa/LoRaWAN: Based on patented wireless radio access mechanisms and a fully proprietary protocol stack, this technology provides long-distance communication with a high bit rate compared to similar technologies. While it offers several interesting features, such as local node-to-node communication in the absence of infrastructure, the closed-protocol approach makes this approach less appealing for the embedded market, and less likely to keep its place in the LPWAN competition eventually in favor of more open standards.
  • Sigfox: This ultra-narrow-band radio technology requires an infrastructure to operate, and offers a particularly low bit rate on very long ranges. Regulated infrastructure access allows a limited number of bytes to be transferred from or to a node every day, and the payload of the messages is fixed at 12 bytes. While the physical layer implementation is proprietary, the protocol stack is distributed in source code form. Radio regulations in some countries are still an open point, though, and may impact the development of this technology worldwide, despite its considerable success in the European market.
  • Weightless: Another technology based on ultra-narrow-band, Weightless is a fully open standard for LPWAN operating in the sub-GHz range. Similar to Sigfox in terms of range and performance, it provides an improved security model as an alternative to the classic pre-shared keys deploying mechanisms, allowing for over-the-air security key negotiation mechanisms.
  • DASH7: The youngest of the technologies described here is based on a fully open design. The source code for the entire lightweight protocol stack is provided by the DASH7 alliance, which allows for easier integration of the technology into embedded systems. This protocol stack is designed to provide flexibility while designing distributed systems, due to the multiple choices in defining the network topology.

LPWAN protocols are not directly compatible with IP and require one of the nodes on the network to generate TCP/IP traffic based on the long-range communication data acquired from the nodes. The sporadic, low-bit rate characteristics of the network traffic make these technologies operate in their own field, and require nodes capable of rerouting data from the nodes when the architecture of the distributed systems foresees accessing remote nodes on the internet.

Selecting the appropriate network interfaces

Depending on the use case, each embedded system may benefit from the communication facilities offered by the technologies described in this section. Due to the high specialization of some embedded devices, a design tailored to specific use cases may even go beyond this classification and use technologies that are designed for one specific use case. Wireless communication is impossible in some cases, due to emission regulations in some environments, and when the media is not capable of transporting radio waves reliably, such as underwater or through the human body.

Submarines may communicate via specific transceivers, using sound waves to represent the data. Other widespread technologies are available for wired communication as well. Power line communication allows existing wires to be reused to refit older devices and brings local network connectivity, extending Ethernet or serial interfaces buses using high-frequency modulation that does not impact the original purpose of the wires used.

As it turns out, embedded devices have a broad range of possibilities when it comes to connectivity. The optimal choice always depends on the specific use case and the resources available on the system to implement protocols and standards required to reach the other endpoints of the communication. When selecting a communication technology, there might be several aspects to take into account:

  • The range of communication
  • The bit rate required for data transfer
  • The total cost of ownership (transceiver price, integration effort, and service costs)
  • Media-specific limitations, such as any latency introduced by the transceiver
  • The impact of RF interference on the hardware design requirements
  • The maximum transfer unit
  • Power consumption and energy footprint
  • Protocols or standards supported for compatibility with third-party systems
  • Compliance with Internet protocols for integration in IoT systems
  • Topology flexibility, dynamic routing, and mesh network feasibility
  • The security model
  • The resources required to implement drivers and protocols for a specific technology
  • The use of open standards to avoid lock-in for long-lived projects

Each and every technology for connected devices offers a different take on how these aspects are addressed in its intrinsic design, also depending on whether the technology has been borrowed from a different context, such as Ethernet or GSM/LTE, or has been designed with low-power embedded systems in mind, as in LR-WPAN and LWPAN protocols.

Selecting the appropriate communication channels when designing distributed systems is an operation that requires strict collaboration between hardware and software design. Creating connected devices involves one more level of complexity, especially in the low-power domain.

The next section focuses on how the implementation of Internet protocols can be adapted to scale down to embedded devices to produce network endpoints that operate within standards and are rich in features. TCP/IP stack implementation can be extended and configured to meet the requirements of an IoT-distributed system. Cases in which non-IP protocols are translated by a border gateway to integrate non-standard communication in IoT systems (edge gateways) are not covered here, as they often involve larger dedicated systems with multiple network interfaces.

As we have observed, the embedded industry is specialized enough to operate at the edge of the standards, but a new research trend is bringing TCP/IP communication back to its original position as the established standard for network communication, due to the increasing influence of the existing IT infrastructure in distributed systems, including small, low-power, cost-effective embedded systems. This has also recently extended in the market to standard security functionality, increasing the presence of secure end-to-end communication protocols such as TLS and DTLS on embedded systems.

The Internet protocols

Standardized at the beginning of the 1980s, the IP stack, mostly referred to nowadays as TCP/IP, is a family of network, transport, and application protocols providing standard communication over a wide range of technologies and interfaces. In the upcoming subsections, we will discuss the integration of these standard protocols into embedded systems, describe the interfaces that embedded applications use to communicate with remote endpoints, and learn how to interact with the different layers of the stack, from the network interfaces up to the socket abstraction to establish connections or connectionless sessions with a remote peer.

Standard protocols, custom implementations

Designing distributed communication using non-standard protocol stacks is, in almost all cases, not worth the effort required to reinvent state-of-the-art technology. TCP/IP standards have been the subject of extensive research for many decades, and have been the main building block for the internet as we know it today, integrating billions of heterogeneous devices. Equipping an embedded system with TCP/IP capabilities is no longer a pioneering task, as several open source implementations exist, and they can easily be integrated into small embedded systems, as long as they can access physical communication channels providing data transfer capabilities between two or more endpoints.

Sockets are the standard way to access transport-layer communication from network applications. The Berkeley socket model, later standardized by POSIX, includes a naming standard for functions and components and the behavior in a UNIX operating system. If the TCP/IP stack is integrated with the operating system, the scheduler can provide a mechanism to suspend the caller while waiting for a specific input, and the socket call API can be implemented to match POSIX specifications. In a bare-metal event-based application, however, synchronization with the sockets is done using callbacks, in order to follow the event-based model of the main loop. For this reason, writing applications that interact with network protocols is slightly different in terms of the APIs and paradigms. In a non-blocking network application within a single thread, no operation should keep the CPU busy while waiting for events, except the main loop function itself. Socket function calls make no exception, requiring a mechanism to initiate an operation, register a callback function to handle the end of it, and then immediately return to the main loop.

The TCP/IP stack

A modern TCP/IP stack is perhaps the most fundamental part of a distributed embedded system. The reliability of the communications depends on how accurately the standard protocols are implemented, and the security of the services running on the device may be compromised by defects hidden in the TCP/IP stack implementation, its interface drivers, and the glue code to provide socket abstractions.

The most popular open source TCP/IP library for embedded devices is the lightweight IP stack, best known as lwIP. Integrated with many real-time OSes and even distributed in a bundle by hardware manufacturers, lwIP provides the IPv4 and IPv6 network, UDP and TCP socket communication, DNS and DHCP client, and a rich bundle of application-layer protocols that can be integrated into an embedded system using just a few tens of KBs of memory. Despite being tailored for small microcontrollers, the resources required by a fully featured stack, such as lwIP, are out of range for some smaller devices, including most sensor processing targets with ultra-low power characteristics.

Micro IP, mostly referred to as uIP, is a minimalistic TCP/IP implementation based on the unusual but brilliant intuition of processing one single buffer at a time. Not having to allocate multiple buffers in memory keeps the amount of RAM needed for TCP/IP communication as limited as possible, and reduces the complexity of the implementation of TCP and other protocols, and, as a result of this, the code size of the entire stack. uIP is not designed to scale up to a higher bit rate or for implementing advanced features, but it is sometimes the best compromise to connect nodes with very limited resources, mostly to LR-WPAN networks.

picoTCP is a free software TCP/IP stack with a more recent history. It shares similar resource footprints and features lists with lwIP, but has a different modular design and a stronger focus on IoT protocols, providing dynamic routing, IP filtering, and NAT capabilities. With native support for 6LoWPAN over 802.15.4 devices, picoTCP can be used to build mesh networks, using either the mesh-under capabilities in 6LoWPAN, or a more classic route-over approach, using dynamic routing protocols, such as OLSR and AODV, provided in the modules.

Other implementations exist for both open source and proprietary TCP/IP stacks, which can be integrated into both bare-metal applications and embedded operating systems, often providing similar APIs for integrating interface drivers and interacting with the system to provide socket communication to higher-level applications. An embedded TCP/IP stack is connected to network devices through a device driver, providing a function to send frames to the network, and capable of delivering the received packets using an entry point function, which the TCP/IP stack uses to take the packet in charge. The packets that are currently being handled by the TCP/IP stack may require asynchronous operations, so the application, or the OS, must ensure that the stack loop function is called periodically so that it can process the packets in the buffers. Finally, a socket interface is provided by the transport layer for the application to create and use the socket to communicate with remote endpoints.

Network device drivers

In order to integrate a driver for a network interface, the TCP/IP stack exposes an interface to its lowest layers, sending and receiving buffers containing frames or packets. If the device supports the link-layer Ethernet address, TCP/IP stacks must connect an additional component to deal with Ethernet frames, and activate the neighbor discovery protocols to find the MAC address of the receiving device before initiating any IP communication.

lwIP provides a netif structure, describing a network interface, which must be allocated by the driver code, but is then initialized automatically by the stack using the netif_add function:

struct *netif netif_add(struct netif *mynetif,
    struct ip_addr *ipaddr,
    struct ip_addr *netmask,
    struct ip_addr *gw, void *state,
    err_t (* init)(struct netif *netif),
    err_t (* input)(struct pbuf *p, struct netif *netif));

The ipaddr, netmask, and gw arguments can be used to set an initial IPv4 configuration for the link created through this interface. lwIP supports one IPv4 address and three IPv6 addresses per interface, but all of them can be reconfigured at a later stage by accessing the relative fields in the netif structure. The IP address can be configured either using a static IP address or a mechanism to automatically assign it, such as DHCP negotiation, or deriving it from link-local addresses.

The state variable is a user-defined pointer that can create an association between the net device and a private field that can be accessed using the netif->state pointer in the driver code.

The function pointer provided as the init argument is called during the initialization of the stack, with the same netif pointer, and it must be used by the driver to initialize the remaining fields for the netif device.

The function pointer provided through the input argument describes the internal action that the stack has to perform when it receives a packet from the network. If the device communicates using Ethernet frames, the ethernet_input function should be supplied to indicate that additional processing for the Ethernet frame would be required before parsing the frame content and that the network supports neighbor discovery protocols to associate IP addresses to MAC addresses before transmitting the data. If the driver is handling naked IP packets instead, the receiving function to associate is ip_input.

The device driver initialization is finalized in the init function, which must also assign a value to other important fields in the netif structure:

  • hw_addr: Containing the MAC address for the Ethernet device, if supported.
  • mtu: The maximum transfer unit size allowed by this interface.
  • name/num: For device identification on the system.
  • output: This function pointer is called by the stack to append a custom link header to the IP packet ready for transmission. For Ethernet devices, this should point to etharp_output to trigger neighbor discovery mechanisms.
  • link_output: This function pointer is called by the stack when a buffer is ready to be transmitted.

After the link has been marked as up by calling netif_up, the device driver can call the input function upon the reception of new packets, and the stack itself will call the output/link_output functions to interact with the driver.

picoTCP exports a similar interface to implement device drivers, but it supports multiple addresses per interface, so the IP configuration is separate from the device drivers. Each device has a list of associated IPv4 and IPv6 links, each with its own IP configuration, to implement multi-homed services. A device driver structure in picoTCP must begin with a physical entry of the pico_device structure as its first field. This way, both structures point to the same address and the device can maintain its own private fields at the end of the pico_device structure. To initialize the device, the structure is allocated in the driver, and pico_device_init is called:

int pico_device_init(struct pico_device *dev, const char *name, const uint8_t *mac);

The three arguments required are the pre-allocated device structure, a name used for identification within the system, and the Ethernet MAC address, if present. If the MAC is null, the stack bypasses the Ethernet protocol, and all the traffic handled by the driver is naked IP packets with no link-layer extensions. The driver must implement the send function that is used by the stack to deliver the frames or packets to be transmitted by the interface, and input is managed through the pico_stack_recv function:

int32_t pico_stack_recv(struct pico_device *dev, uint8_t *buffer, uint32_t len);

The device is passed again as an argument so that the stack automatically recognizes whether the interface is receiving an Ethernet frame or a raw IP packet with no headers, and reacts accordingly. IP addresses can be configured using pico_ipv4_link_add and pico_ipv6_link_add, and the routing table is accessed through its API to add gateways and static routes to specific networks.

Running the TCP/IP stack

To integrate a network stack, the system must generally provide a few commodities, such as timekeeping and heap-memory management. All the system features required by the stack are associated at compile time using a system-specific configuration header, which associates functions and global values accordingly.

Depending on the characteristics of the physical channels and the throughput to achieve, a TCP/IP stack may become very demanding in terms of heap memory used, allocating space for new incoming buffers until the upper layers can process them. Assigning separate memory pools to TCP/IP stack operations might help in some designs to keep the memory usage of the stack under control by placing thresholds and hard limits without impacting the functionality of the other components on the system.

Most libraries implement their own internal timers using a monotonic counter, provided by the system and increased independently by another component in the system. The time tracking value can be increased using the SysTick interrupt, providing an acceptable accuracy at which the stack can organize timed operations for the protocols. For lwIP, it is sufficient to export a global variable called lwip_sys_now, which contains the time elapsed from booting, expressed in milliseconds. picoTCP needs to export a macro or an inline function called PICO_TIME_MS returning the same value. Both stacks expect that the main loop of the application provides recurring entry points, by calling a function in the core API, required to manage the internal states of the system protocols.

To check whether any of the pending timers have expired, the system calls sys_check_timeouts in lwIP, or pico_stack_tick in picoTCP, from the main event loop or a dedicated thread when running within an OS. The interval between consecutive calls may impact timer accuracy, and, in general, should not be longer than a few milliseconds to ensure that the network stack is responsive to timed events.

Network interfaces must also be polled for input from the network, either continuously or through an appropriate interrupt handling implemented in the system. When new data is available, the device drivers allocate new buffers and initiate the processing by calling the input functions of the data link or the network layer.

A typical bare-metal application using lwIP begins by performing all the initialization steps for the stack and the device driver. The structure for the network interface is allocated in the main function stack and initialized with a static IPv4 configuration. The following code assumes that the device driver exports a function called driver_netdev_create, which populates the interface-specific fields and callbacks:

void main(void)
{
  struct netif netif;
  struct ip_addr ipaddr, gateway, netmask;
  IP4_ADDR(&ipaddr, 192,168,0,2);
  IP4_ADDR(&gw, 192,168,0,1);
  IP4_ADDR(&netmask, 255,255,255,0);
  lwip_init();
  netif_add(&netif, &ipaddr, &netmask, &gw, NULL,
  driver_netdev_create, ethernet_input);
  netif_set_default(&netif);

The network interface is then activated in the TCP/IP stack:

  netif_set_up(&netif);

Before entering the main loop, the application initializes the communication by creating and configuring the sockets, and associating the callbacks:

  application_init_sockets();

The main loop relies on the driver to export a function called driver_netdev_poll in this case, which is the function where the driver calls ethernet_input whenever a new frame is received. Finally, sys_check_timeouts is called so that lwIP can keep track of the pending timers:

  while (1) {
   /* poll netif, pass packet to lwIP */
   driver_netdev_poll(&netif);
   sys_check_timeouts();
   WFI();
  }
}

A similar procedure is expected from bare-metal applications running picoTCP. The initialization of the device driver is independent of the stack, and the driver is expected to call pico_device_init on a pico_device struct contained in the custom driver_device type as the mandatory first member. The only function exported by the driver is driver_netdev_create, which also associates its specific network-polling function pointer, which will be called by pico_stack_tick. The stack expects a callback to pico_stack_recv whenever the poll function of the driver has new incoming packets to process:

void main(void)
{
  struct driver_device dev;
  struct ip4 addr, netmask, gw, zero, any;
  pico_string_to_ipv4("192.168.0.2", &ipaddr.addr);
  pico_string_to_ipv4("255.255.255.0", &netmask.addr);
  pico_string_to_ipv4("192.168.0.1", &gw.addr);
  any.addr = 0;
  pico_stack_init();
  driver_netdev_create(&dev);

The IPv4 address configuration is performed by accessing the API of the IPv4 module. Applications may associate one or more IP address configurations by calling pico_ipv4_link_add and specifying the address and netmask. A route in the IP protocol is created automatically to reach all the neighbors in the subnet through the interface:

  pico_ipv4_link_add(&dev, ipaddr, netmask);

To add a default route, the gateway is associated with the 0.0.0.0 address (indicating any host) with a metric of 1. The default gateway can be later overridden by defining more specific routes for other subnetworks:

  pico_ipv4_route_add(any, any, gw, 1, NULL);

As in the previous example, the application can now initialize its sockets and associate callbacks that will be called by the stack when needed:

  application_init_sockets();

This simple main loop calls pico_stack_tick repeatedly, which will poll all the associated network interfaces in a round-robin, and perform all the pending actions in all protocol modules:

  while (1)
  pico_stack_tick();
  WFI();
}

All the TCP/IP actions are associated with socket callbacks, which are called whenever the application is expected to react to network and timeout events, and timeouts are set up automatically by the stack when required to manage the internal states of the single protocols. The interface that is provided to access the socket communication in the absence of an operating system, as previously mentioned, is based on custom callbacks, depending on the implementation of the specific stack. The next section shows how to use non-blocking socket APIs in two different TCP/IP stack implementations.

Socket communication

The interface provided by lwIP for bare-metal socket communication, also called the raw socket API, consists of custom calls, each specifying a callback whenever an event is expected from the stack. When a specific event occurs, lwIP will call the callback from the main loop function.

The description of a TCP socket in lwIP is contained in a TCP-specific protocol control block structure, tcp_pcb. To allocate a new control block for the listening TCP socket, the following function is used:

struct tcp_pcb *tcp_new(void);

To accept a TCP connection, a bare-metal lwIP TCP server would first call this:

err_t tcp_bind(struct tcp_pcb *pcb, ip_addr_t *ipaddr,
    u16_t port);
err_t tcp_listen(struct tcp_pcb *pcb);

These non-blocking functions bind the socket to a local address and put it into a listening state.

At this point, a POSIX application using blocking sockets would call the accept function, which would wait indefinitely for the next incoming connection on the socket. A lwIP bare-metal application instead calls the following:

void tcp_accept(struct tcp_pcb *pcb,
    err_t (* accept)(void *arg, struct tcp_pcb *newpcb,
    err_t err)
);

This simply indicates that the server is ready to accept new connections, and wants to be called back to the address of the accept function that has been passed as a parameter when a new incoming connection is established.

Using the same mechanism, to receive the next data segment, the application calls the following:

void tcp_recv(struct tcp_pcb *pcb,
    err_t (* recv)(void *arg, struct tcp_pcb *tpcb,
    struct pbuf *p, err_t err)
);

This indicates to the TCP/IP stack that the application is ready to receive the next segment over the TCP connection, and the operation can be performed when a new buffer is available because the stack calls the actual recv function that has been specified as the argument when tcp_recv has been called.

Similarly, picoTCP associates one callback with each socket object. The callback is a common point to react to any socket-related events, such as a new incoming TCP connection, new data to be read on the socket buffer, or the end of the previous write operation.

The callback is specified when the socket is created:

struct pico_socket *pico_socket_open(uint16_t net,
    uint16_t proto,
void (*wakeup)(uint16_t ev, 
    struct pico_socket *s));

The preceding function creates a new socket object for use in the specified network and transport protocol context, the net and proto arguments respectively, and reacts to all socket events by calling the wakeup function that is provided by the application. Using this mechanism, picoTCP successfully detects half-closed socket connections and other events that are not specifically related to the current operation in progress but may occur due to a state change in the socket communication model.

A TCP socket server can be configured on the newly created socket using these functions:

int pico_socket_bind(struct pico_socket *s,
    void *local_addr,
    uint16_t *port);
int pico_socket_listen(struct pico_socket *s, int backlog);

At this point, the application has to wait for the incoming connections without calling accept. An event is generated, which calls the wakeup function, whenever a new incoming connection is established, and the application can finally call accept to generate the new socket object, corresponding to the incoming connection:

struct pico_socket *pico_socket_accept(
    struct pico_socket *s,
    void *orig,
    uint16_t *local_port);

The first argument passed to the picoTCP wakeup callback is a bitmask indicating the event types that occurred on the socket. Events may be as follows:

  • EV_RD: Indicating that there is data to read on the incoming data buffer.
  • EV_CONN: Indicating that a new connection has been established, after calling connect, or while waiting in a listening state, before calling accept.
  • EV_CLOSE: Triggered when the other side of the connection sends a FIN TCP segment, indicating that it has finished its transmission. The socket is in the CLOSE_WAIT state, meaning that the application may still send data before terminating the connection.
  • EV_FIN: Indicating that the socket has been closed, and it is not usable anymore after returning from the callback.
  • EV_ERR: An error occurred.

The callback interface provided by the TCP/IP stacks may be a little obscure to use at the beginning, but it is a very efficient way to achieve higher throughput when correctly implemented in the application.

Both the TCP/IP stacks we have analyzed are capable of providing more standardized APIs in combination with an operating system, by running the TCP/IP library main loop in a separate thread and providing access to the sockets using system calls.

Socket communication is only one of the APIs exposed by the TCP/IP stacks. Other protocols implemented by the stack provide their own function signatures; these are described in both libraries’ manuals.

Connectionless protocols

TCP is a widely used transport protocol, wherever the connection-oriented paradigm makes sense for the application. Its connectionless counterpart, UDP, is mostly used to solve a different range of problems, but it can, in some cases, cover all the needs of a small, resource-limited embedded system. TCP implementations are, in fact, large and, on some platforms, they take up a considerable portion of the available flash space. This is due to the complex internal mechanisms of TCP, which result in a lot of code to include to manage retransmissions, timeouts, and acknowledgments; organize buffers; and keep track of multiple state machines for each socket.

UDP, on the other hand, is quite simple and applies few transformations to the data from the socket interface to the network and vice-versa. Typically, UDP implementations are much smaller in size and due to the lack of reliability requirements, do not need to keep track of the order and gaps in the data already transmitted or received, impacting the runtime RAM usage as well. When the network characteristics permit it, using UDP for low-traffic redundant data transmission is often a viable option.

Mesh networks and dynamic routing

As previously mentioned, a link-layer protocol may be able to implement mesh-under mechanisms, which hide the complexity of the topology for the upper layers. A different approach is applied when the link-layer protocol does not implement this feature, or whenever the mesh solution may be extended across different network interfaces, and thus must implement a standard protocol that is interface-agnostic. Each link connects two devices in direct visibility, which, in turn, coordinates to detect the optimal network path to reach a remote node, based on the detected topology. Intermediate nodes along the path are configured to route the traffic toward the destination, based on the information available on the current topology:

Figure 9.1 – Example of a mesh network topology (node A chooses node C to route packets towards I, after detecting the optimal four-hop route)

Figure 9.1 – Example of a mesh network topology (node A chooses node C to route packets towards I, after detecting the optimal four-hop route)

In some scenarios, the topology is not fixed but evolves when nodes in the path become unavailable or change their location, altering their direct visibility with adjacent nodes. Mesh networks with non-static topology are referred to as Mobile Ad Hoc Networks (MANETs). Dynamic routing mechanisms designed for MANETs must be able to react to topology changes and update their routes accordingly, as the network is in continuous evolution.

Route-over mesh mechanisms are implemented within the TCP/IP stack because they must be able to reconfigure the IP routing table at runtime, and access socket communication. Mesh networks based on dynamic IP routing rely on different protocols, which can be divided into two categories:

  • Proactive dynamic-routing protocols: Each network node sends a broadcast message to announce its presence on the network, and other nodes can detect a neighbor’s presence by reading the messages, and communicating the neighbor list to the neighbors. The mesh network is ready to use at all times and requires a fixed reconfiguration time on topology changes.
  • Reactive dynamic-routing protocols: Nodes can be idling when there is no data to exchange, and then the path is configured by querying every neighbor, asking for a route to the destination. The message is then repeated, increasing a counter to keep track of the hops, until it reaches the destination, at which point, using the reply, the network can define the path requested by the sender. These mechanisms imply that dynamic routes are formed on demand, so the first messages of the communication can suffer an additional delay; on the other hand, it requires less power and may react faster to topology changes.

The most widely used protocols in the former group are the following:

  • Optimized Link-State Routing (OLSR), standardized by IETF in RFC3626 and RFC7181
  • Better Approach to Mobile Ad Hoc Networking (B.A.T.M.A.N.)
  • Babel (IETF RFC6126)
  • Destination Sequence Distance Vector (DSDV)

The reactive, on-demand routing protocols standardized by IETF are the following:

  • Ad-hoc, On-demand, Distance Vector (AODV), RFC3561
  • Dynamic Source Routing (RFC4728)

The choice of a routing protocol depends, once again, on the requirements of the mesh network that needs to be built. Reactive, on-demand protocols are the best fit in networks with sporadic data and battery-powered nodes, where a longer reaction time from the routing protocol is acceptable. Always-on, embedded systems may benefit from proactive routing mechanisms instead, which ensure that the routing tables are always updated to the last known state of the network, and each node knows the best route toward each possible destination at all times, but, at the same time, requires regular updates to travel across the network in the form of broadcast packets, constantly refreshing the status of the network nodes and their neighbors.

picoTCP, which has been designed to provide advanced routing technologies for IoT devices, supports one mesh-under mechanism, in the 6LoWPAN link layer, and two route-over protocols, namely OLSR (reactive) and AODV (proactive), giving broader choices for integrating TCP/IP communication into mobile, ad hoc networks. To enable OLSR, for example, it is sufficient to compile the stack with support for OLSR, and the OLSR daemon service will automatically be enabled and run within the main TCP/IP stack loop. All the devices that must participate in the definition of the mesh network must be added by calling pico_olsr_add:

pico_olsr_add(struct pico_device *dev);

AODV networking can be enabled similarly, and the interfaces are added using the pico_aodv_add function:

pico_aodv_add(struct pico_devices *dev);

In both cases, the services will run transparently for the user and alter the routing table every time a new node is detected on the network in the case of OLSR, or every time that we request communication to a remote node and an on-demand route is created to reach it. Nodes that are not in direct visibility specify a first-hop gateway that guarantees that the destination node can be reached, using the routing metric as an indication of the number of hops so that when a new, shorter destination is found, the route is replaced and the communication can continue, ideally with no disruptions caused by the route being replaced.

Routing protocols, such as OLSR, can consider other parameters rather than the number of hops when calculating the best path to a given destination in the mesh network. It is possible, for instance, to integrate information about the wireless link quality, such as the signal-to-noise ratio or the indication of the received signal strength, when calculating the best path. This allows us to select routes based on multiple parameters, and always select the best option available in terms of a wireless signal.

Route-over mesh network strategies do not foresee mechanisms to forward broadcast packets, which must be repeated by the link-layer protocol in order to reach all the nodes in the network. However, it is known that implementing such a mechanism can easily trigger a ping-pong effect where a single packet is bounced across two or more nodes, so broadcast-forwarding mechanisms implemented in the link layer must avoid retransmitting the same frame twice by keeping track of the last few frames forwarded this way.

For IoT systems in the real world, communication requires implementing security for data in transfer. This includes, but is not limited to, encryption to guarantee the confidentiality of the data transmitted.

Implementing standard security protocols guarantees interoperability between heterogeneous components in the network (for example, between the device and a remote server), in an end-to-end fashion and relying on software solutions that are perfectly compatible with the protocols used in the classic IT world. The next section approaches transport layer security and proposes.

TLS

Link-layer protocols often provide some basic security mechanisms to guarantee the authentication of the client connecting to a specific network and encrypt data by using symmetric keys such as AES. In most cases, authentication at the link layer is sufficient to guarantee a basic level of security. Nevertheless, pre-shared, well-known keys often used in LR-WPAN network stacks may be vulnerable to multiple kinds of attacks, and using a pre-shared key would allow an attacker to decipher any traffic that has been previously captured on the same link if the key was compromised. In other scenarios, encryption alone is not sufficient to guarantee that the other endpoint is what it claims to be, or that the data flow has not been altered during transmission.

A device that takes part in an IoT-distributed system is required to implement a higher grade of security, especially in embedded devices that do not protect the memory in any way and where any backdoor means that attackers can take control of the device, and retrieve all the sensitive information, such as private keys used for authentication and encryption in the communication with remote systems. TLS is a set of cryptography protocols aimed to provide secure communication over standard TCP/IP sockets. The responsibilities of this component are mostly focused on three key requirements for secure communication in distributed systems:

  • The confidentiality of communication between the parts involved through the use of symmetric cryptography. TLS defines cryptographic techniques aimed to generate one-time symmetric keys, which lose their validity at the end of the session they were generated for.
  • The authentication of the parts involved in the communication, using public-key cryptography to sign and verify a challenge payload. Due to the properties of asymmetric keys, only the part that owns the secret private key is able to sign a payload, while anyone can verify the authenticity of the signature by checking the signature with the public key counterpart of the key that signed the message.
  • The integrity of the communication, using message digests, which verify that the message has not been modified along its path.

A few open source implementations of the required protocol suite to enable standard cryptography algorithms and strategies for secure socket communications are available for the embedded market.

Note

Closed-source, proprietary implementations for security components should be avoided in this context as much as possible because security issues are much harder to track down in a closed system, and the source of the implementation has to be blindly trusted in terms of vulnerability management.

One of the most complete and up-to-date implementations is provided by the free and open source software library, wolfSSL. The library offers the latest standard version of both TLS and DTLS and is designed for performance and reliability on small embedded systems, including support for hardware accelerators and random number generators for many embedded platforms designed for system security.

wolfSSL implements the cryptographic primitives in its core library (wolfCrypt) and groups them in cipher suites used by TLS sockets that can be easily integrated into both, bare-metal network applications and any embedded operating system that provides a transport socket communication API. These cryptography primitives are optimized for embedded devices, and use assembly code for the most performance-critical operations for the best performance.

The main advantage of a TLS/SSL library designed for microcontrollers is that it implements the same protocols as any PC or server on the internet, but with a fraction of the code size, and keeps the resources usage, such as memory usage during the most expensive cryptographic operations, under control at all times.

The adoption of a TLS library with support for bleeding-edge cryptography algorithms allows perfect integration with the security measures implemented in the classic IT infrastructure components of the IoT network. On the cloud side, services meant to be accessed by remote embedded systems should allow the selection of more efficient cipher suites based on elliptic curves, as the classic RSA-based public key encryption requires larger keys and complex calculations to reach the same level of security. New standards for public-key-based encryption, such as Curve22519, are included in the TLS 1.3 specifications to provide more efficient key handling for systems with fewer resources while keeping the same security level of older algorithms. Selecting the right set of cryptographic algorithms for TLS communication among heterogeneous systems must take into account the computation times of the operations performed on the target, such as encryption, session key generation, payload signing, and verification.

Securing socket communication

wolfSSL has built-in support for many embedded operating systems, to adapt to the specific memory configurations and socket interfaces provided by different paradigms, and can also be integrated into a bare-metal system with any compatible TCP/IP stack, or easily adapted thanks to a generic, callback-based Input/Output (I/O) interface.

In either case, bare-metal or OS, the application must be designed to access the Secure Socket Layer (SSL) to communicate with the remote system, while the library is responsible for providing the abstraction for the secure communication channel through the transport layer. To integrate TLS sessions on top of an existing bare-metal TCP/IP implementation, wolfSSL can be configured to work in non-blocking mode, polling the system for new packets received on the socket, which must be processed by the TLS layer. The application initiates a TCP connection as usual, either by connecting to a remote socket in the client mode or by accepting new connections from a local listening socket. After the connection is established, wolfSSL assigns a context to it when the application calls wolfSSL_accept or wolfSSL_connect, in server mode or in client mode, respectively, to initiate the TLS handshake with the remote system. Data communication is then available using the wolfSSL_read and wolfSSL_write functions, instead of the normal socket read/write functions exported by the TCP/IP stack so that the stream can be processed by the additional SSL built by the TLS library on top.

The following usage example refers to using wolfSSL to create a TLS socket on top of a TCP connection. The approach for creating a DTLS socket, the TLS equivalent for connectionless socket, on top of UDP is quite similar, and still uses the same connect/accept paradigm as TLS, despite UDP being usually utilized in a peer-to-peer fashion that does not expose a net distinction between the client and the server side as well as TCP does. More information about creating DTLS connectionless secure sockets can be found in the wolfSSL user manual (https://www.wolfssl.com/documentation/manuals/wolfssl/index.html).

In our simple usage example, the library is first initialized before accessing any API, using wolfSSL_Init. This is the only requirement to initialize and create new objects that are commonly called contexts. A single context implements one specific method (the TLS v. 1.2 server in this example) and will be associated with one or more existing sockets through a different abstraction called SSL, which, in the case of the wolfSSL implementation, is represented by a variable of type WOLFSSL. Multiple SSL objects generated from the same context share the same set of cryptography keys and I/O callback functions that wolfSSL can use to query the system for incoming data, or transmit the processed data through the socket connection:

wolfSSL_Init();
wolfSSL_CTX *ctx;
ctx = wolfSSL_CTX_new(wolfTLSv1_2_server_method());
wolfSSL_SetIORecv(ctx, wolfssl_recv_cb);
wolfSSL_SetIOSend(ctx, wolfssl_send_cb);

The two callbacks are implemented in the system to access socket communication in the TCP/IP stack, by using the system-specific TCP socket API. Suppose, for example, that a custom TCP implementation exports read and write functions as tcp_socket_write and tcp_socket_read in a bare-metal context, and these functions return 0 when no action is taken because the TCP/IP stack is busy or not ready to process the buffers. The wolfssl_send_cb callback can be implemented to return the size of the processed data in case of success, or the WOLFSSL_CBIO_ERR_WANT_WRITE special value, which indicate that the I/O operation could not be completed without blocking:

int wolfssl_send_cb(WOLFSSL* ssl, char *buf, int sz, void *sk_ctx)
{
 tcp_ip_socket *sk = (tcp_ip_socket *)sk_ctx;
 int ret = tcp_socket_write(sk, buf, sz);
 if (ret > 0)
   return ret;
 else
   return WOLFSSL_CBIO_ERR_WANT_WRITE;
}

And the corresponding read callback will use the corresponding WOLFSSL_CBIO_ERR_WANT_READ special value to indicate that no data is available to process from the stack:

int wolfssl_recv_cb(WOLFSSL *ssl, char *buf, int sz, void *sk_ctx)
{
  tcp_ip_socket *sk = (tcp_ip_socket *)sk_ctx;
  int ret = tcp_socket_read(sk, buf, sz);
  if (ret > 0)
    return ret;
  else
    return WOLFSSL_CBIO_ERR_WANT_READ;
}

For most commonly used operating systems and TCP/IP stack APIs, wolfSSL already provides default I/O callbacks, so implementing custom callback functions is not required as long as you activate the correct configuration options.

The wolfSSL_CTX object, associated with SSL objects for every connection, must be equipped with a set of certificates and keys prior to initiating any communication. In a more complex system, certificates and keys are stored in the filesystem and can be accessed when wolfSSL has been integrated to use file operations. In embedded systems where filesystems are often not supported, certificates and keys can be stored in memory instead, and loaded into the context using pointers to their locations in memory:

wolfSSL_CTX_use_certificate_buffer(ctx, certificate, len, SSL_FILETYPE_ASN1);
wolfSSL_CTX_use_PrivateKey_buffer(ctx, key, len,SSL_FILETYPE_ASN1 );

The socket context that is passed to the callbacks is set after the underlying TCP connection is established. For a server, this can be done contextually to the accept function, while a client can associate the socket to the specific SSL context after the connect function has returned successfully. Accepting an SSL connection on the server side requires the application to call wolfSSL_accept so that the SSL handshake can be finalized before any actual data transfers. The SSL accept procedure should follow the socket accept call, after the pointer to the TCP/IP socket object is associated as the context in the SSL object, and will be used as the sk_ctx argument for the callbacks related to this socket:

tcp_ip_socket new_sk = accept(listen_sk, origin);
WOLFSSL ssl = wolfSSL_new(ctx);
if (new_sk) {
  wolfSSL_SetIOReadCtx(ssl, new_sk);
  wolfSSL_SetIOWriteCtx(ssl, new_sk);

wolfSSL_accept is called after setting the socket context, because the accept mechanism may already need to call the underlying stack to progress through its states:

 int ret = wolfSSL_accept(ssl);

If the SSL handshake is successful, wolfSSL_accept returns the WOLFSSL_SUCCESS special value, so the secure socket is now ready for communication through the wolfSSL_read and wolfSSL_write functions. When running in a bare-metal application, wolfSSL_read and wolfSSL_write must be used in non-blocking mode, by setting this flag at runtime on the SSL session object:

  wolfSSL_set_using_nonblock(ssl, 1);

Using non-blocking I/O for wolfSSL functions ensures that the event-driven main loop model previously described for transport sockets can be kept because calling library functions never stalls the system. API functions in wolfSSL are designed to immediately return specific values (such as WANT_WRITE and WANT_READ) to indicate that the operation is in progress, and the associated function (for example, wolfSSL_accept in this case) should be called again later when new data from the underlying TCP socket is available.

Once the communication between transport endpoints is secured, it is possible to exchange data using secure socket communications. What follows is an overview of some of the most common application protocols used by IoT systems.

Application protocols

In order to be able to communicate with remote devices and cloud servers in a distributed scenario, embedded systems must implement standard protocols that are compatible with the existing infrastructure. Two of the most common approaches taken when designing remote services are as follows:

  • Web-based services
  • Message protocols

The former is mainly the classic, client-server, Representational State Transfer (REST)-based communication that is popular in web services accessed through personal computers or portable devices. Web services require no adaptation in particular on the cloud side to support embedded systems, except for the choice of an embedded-friendly cipher set, as described in the Securing socket communication section. However, the request-reply communication model introduces some restrictions on the design of distributed applications. The HTTP protocol can be upgraded by common agreement on the two HTTP endpoints, and support WebSocket, which is a protocol that provides the abstraction of a symmetric, bidirectional channel on top of the HTTP services.

Message protocols are a different approach that better reflects the functions of a sensor- or actuator-embedded system, where information is exchanged by using short binary messages, which can be relayed by intermediate agents and gathered or distributed from server nodes. Message protocols are the preferred choice when the network includes smaller nodes because of the simpler presentation of the data, as opposed to web services, which are mostly based on human-readable strings and add a much larger overhead to the transport size and memory footprint of the targets having to handle the ASCII strings.

In both cases, TLS should be supported at the infrastructure- and device-level, for end-to-end encryption and reliable device identification. Plaintext authentication and pre-shared key encryption are obsolete techniques and thus should not be part of the security strategy of modern distributed systems.

Message protocols

Message-based communication protocols are not a novelty in computer networking software but have found a particularly good match with IoT-distributed systems, especially in scenarios where a one-to-many message-based model allows us to reach many devices at a time and establish bidirectional communication, or multiple devices from different locations can communicate with each other using an external server that acts as a communication broker. The lack of standardization in this area has led to several different models, each one with its own API and network protocol definition.

Some open standards in particular, however, have been designed to implement secure distributed messaging systems specifically tailored for a system with reduced resources and networks with limited bandwidth, by including specifications that are reasonably feasible to implement within a small code footprint. This is the case with the Message-Queuing Telemetry Transport (MQTT) protocol. Thanks to its publisher-subscriber model and the possibility to interconnect embedded devices at different physical locations over TCP/IP, MQTT has become widely used and is supported by several cloud architectures.

The protocol relies on TCP for establishing connections to a central broker, which dispatches messages from publishers to subscribers. Publishers push data for a certain topic, described by a URI, and subscribers can filter the topics they want to follow upon connection so that the broker selectively only forwards the messages matching the filters.

A few implementations for the client library exist for small, embedded devices too, although many of them lack support for security mechanisms. The protocol supports a plaintext password-authentication mechanism, which is not a valid security measure, and should never be used on top of clear TCP/IP communication because passwords can easily be intercepted along the path.

According to the standard, instead of the socket-based TCP communication through IANA-registered TCP port 1883, it is possible to establish an SSL session, which uses TCP port 8883 instead. A secure implementation that uses SSL sessions on top of TCP is provided by wolfSSL, in a separate GPL library called wolfMQTT. This library offers secure MQTT socket connections by default. It is capable of implementing both client and server authentication through certificates and public keys and provides symmetric-key encryption through the established session.

The REST architectural pattern

REST was a term introduced by Roy Fielding to describe the pattern used by web services to communicate with remote systems using a stateless protocol. In a REST-compliant system, resources are accessed in the form of HTTP requests targeting a specific URI, using the same protocol stack as web pages obtained through a request from a remote browser. In fact, REST requests are extended HTTP requests, representing all data as encoded strings, transported through TCP in a readable HTTP stream.

Adopting this pattern provides a number of architectural benefits on the server side, and allows us to build distributed systems with very high scalability. Although not very efficient and definitely not designed with embedded systems resources in mind, embedded systems can interact with remote web services exposed by a RESTful system by implementing a simple REST client.

Distributed systems – single points of failure

Designing distributed systems also means taking into account link defects, unreachable gateways, and other failures. Embedded devices should not stop working when disconnected from the internet, but rather offer fallback mechanisms based on local gateways. Consider, for example, a demotic IoT system for controlling all the heating and cooling units in a house, accessible from portable devices and coordinated remotely using any network access. Temperature sensors, heaters, and coolers are controlled using a mesh network of embedded devices while the central control is on remote cloud servers. The system can control the actuators remotely based on user settings and sensor readings. This gives us the possibility to access the service even from a remote location, allowing the user to tune the system to set the desired temperature in each room, based on the commands sent from user interfaces, which are processed and relayed by the cloud to reach their destination in the embedded devices. As long as all the components are connected to the internet, the IoT system works as expected.

Nevertheless, in the case of connection failure, users will not be able to control the system or activate any function. Terminating the application service on a local device within the local area network ensures the continuity of the services across failures of the link to the internet and any issues that would prevent the local network from accessing the remote cloud device. If this kind of mechanism is in place, a system disconnected from the internet would still provide a failover alternative to access sensors and actuators, assuming that all the actors at play are connected to a common LAN. Moreover, having a local system processing and relaying settings and commands reduces the latency of the actions requested because requests do not have to travel across the internet to be processed and forwarded back to the same network. Designing reliable IoT networks must include a careful assessment of the single points of failure among all the links and devices used to provide services, and this must include the backbone link used to reach services, message brokers, and remote devices that can cause malfunctions or other issues on the entire system.

Summary

This chapter has given us an overview of the design of machine-to-machine distributed systems and IoT services, including connected embedded devices, with a focus on security elements that are too often overlooked or underestimated in embedded development. The technology proposed allows full, professional-grade, secure, and fast TCP/IP connectivity on very small targets and uses state-of-the-art technology, such as the most recent version of the TLS cipher suites. Several approaches have been considered, both in terms of hardware and software technologies available for microcontroller-based targets, for a broader view of the technologies, protocols, and security algorithms available for building distributed embedded systems.

The next chapter will illustrate the multitasking possibilities of modern embedded microcontrollers by explaining how to write a small scheduler for Cortex-M microprocessors from scratch, and will summarize the key roles of a real-time operating system running on an embedded target.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset