Ruchika Gupta*; Pankaj Gupta†; Jaswinder Singh‡ * Software Architect, AMP & Digital Networking, NXP Semiconductor Pvt. Ltd., Noida, India
† Senior Software Staff, AMP & Digital Networking, NXP Semiconductor Pvt. Ltd., Noida, India
‡ Software Director, AMP & Digital Networking, NXP Semiconductor Pvt. Ltd., Noida, India
Networking, access, and industrial embedded systems are multibillion-dollar industries. However, the economic impact of these embedded systems becoming unavailable (or practically unavailable due to a lack of trust in their operation) is many times larger. Embedded systems have historically been developed as closed systems, using proprietary code, developed in-house, running on uniprocessors with OSs protecting applications from each other. The trend is now increasingly toward multicore processors, running independent OSs, open to licensed applications or possibly even end-customer developed code. Open-source code is increasingly being treated as modules, to be downloaded and plugged into the holes in an OEM’s own software offerings, despite having unknown origins and without thorough analysis for backdoors. This trend brings security vulnerabilities. With the advent of the IoT, all devices are connected. Security is once again the prime concern when everything is connected in such a way—whether it be the security of the connected device itself or the security of the data that is being processed by the device.
This chapter will enlighten the user in terms of the threats faced by connected embedded devices. Readers will also gain insight into how to build a secure embedded device.
Security; Security policy; Confidentiality; Integrity; Cryptology; Embedded systems
The word “security” can have different meanings to different readers. A few definitions, which may come to mind when you hear the word “security,” are provided here:
Clubbing all these definitions of security into one gives:
Security is the degree of measures/steps taken by an entity, to offer the degree of resistance to, or protection from harming resources, for which the entity bears responsibility.
An embedded system is an electronic product that comprises a microprocessor or multiple microprocessors executing software instructions stored on a memory module to perform an essential function or functions within a larger entity.
Considering securing entity as an embedded system that takes degree of measure to offer degree of resistance to, or protection from harming the resources of the overall system, to which either the embedded system is:
Embedded security trends clearly show increasing system complexity.
With ever-growing demands for enhanced capability, increased digitization of new manual and mechanical functions, and turning dumb embedded devices into smart ones via interconnectivity, the complexity of the embedded system is increasing. Although these electronic complexities bring betterment to mankind they also bring security vulnerabilities.
The vulnerability of security cannot necessarily be attributed to electronic complexity if such complexity can be effectively managed.
Complexity gives birth to flaws, which are later misused to circumvent system security.
Complexity cannot only be measured by code size or transistor count. Let’s take few examples to understand it better:
To evaluate an electronic product for its security strength, it is important to understand its security policies first so that the robustness of the security of a product can be evaluated against its adherence to its own set security policies.
Security policies are a set of defined steps/measures to achieve a defined level of protection for specific resources. Policies are simply created to counter threats.
Hence, to define a security policy, a prerequisite is to identify the resource requiring protection and granularize the expected protection offered by the entity, considering:
Each of the defined security policies can be mapped to one, two, or all three of the CIA Triad:
To minimize the impact of a violation of security policy, another aspect is added to the CIA Triad, called isolation (Fig. 1).
If the CIA Triad is broken on one isolated execution plane it will not hamper the CIA Triad on the other isolated execution planes.
There can be separate security policies governing the communication between two isolated execution planes. Security policies depend on the following questions:
Physical security policies are security policies detailing countermeasures to the physical threat to embedded systems.
Cryptology is the science of secure data communication and data storage in a secret form. It comprises cryptography and cryptanalysis (Fig. 2).
Let’s consider the problem of two legitimate people, Alice and Bob, who want to communicate data secretly over a communication channel. This channel is deemed unsecure as any illegitimate user, say Eve (an eavesdropper), has access to the channel and can easily hamper confidentiality and data integrity (Fig. 3).
Alice and Bob can encode/encrypt the data while sending and decode/decrypt the data upon receiving. This would block the illegitimate user Eve from decoding the data sent over the unsecure channel. This technique is called cryptography. Cryptography refers to communication techniques derived from mathematical concepts and a set of rule-based calculations, called algorithms, to transform messages in ways that are hard to decipher, for secure communication (Fig. 4).
In an ideal world, Alice and Bob should keep secret the algorithm/technique used to encrypt and decrypt the data, so that Eve can not decode it. Keeping the algorithm secret is neither sensible nor practical. Moreover, making the algorithm public hardens it by allowing cryptoanalysts to evaluate and challenge the algorithm. Using an algorithm which is publicly unannounced is never recommended (Fig. 5).
However, now that the algorithm used to obfuscate the data is public, we need ways to prevent Eve from decrypting the message. The solution is that Alice and Bob should have a preshared secret which Eve is unaware of. This preshared secret is called the key to the algorithm. The security of this key is paramount (Fig. 6).
With the unavailability of the key to Eve, her next action is to use brute-force techniques.
A brute-force attack is a trial-and-error technique where the attacker tries all the permutations and combinations of keys to decipher the plain text, with meaningful results. This attack is both time and resource consuming. Since it is a trial-and-error technique, the time taken using this attack method depends on the key space.
What is key space? Key space is the finite number of keys which can be applied to an algorithm to decipher meaningful content.
The robustness of any algorithm is governed by the following three things:
A cryptosystem is deemed secure even if every detail about the cryptosystem is public knowledge, except the keys (https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle).
There are different ways of classifying cryptographic algorithms. One is based on the number of keys used in encryption and decryption. Three types of algorithm, each of which will be briefly elaborated, are given below:
Symmetric cryptography uses a single key for encryption and decryption. It is mainly deployed in scenarios which require privacy and confidentiality.
The biggest difficulty to this approach is the distribution of the key.
Symmetric cryptography can be further categorized into stream ciphers and block ciphers.
Stream ciphers operate on a single bit at a time and have a feedback mechanism such that the key is constantly changing.
A category of symmetric key cipher is where digitalized plain data digits are combined with a pseudo random digit stream (also called a keystream) sequentially. Each plain data digit is ciphered by XORing it with its corresponding bit in the keystream, one at a time. These ciphers are also called state-ciphers since encrypting each plain data bit depends on the current state of the cipher.
The simplest example of a stream cipher is the binary additive stream cipher.
Block ciphers represent an encryption method where blocks of data are encrypted with a deterministic algorithm using a symmetric key that has been securely exchanged. DES and AES are two popular examples of block ciphers.
Key size: 56 (+ 8 parity) bits.
Block size: 64 bits.
Successors of DES ciphers are: Triple DES, G-DES, DES-X, LOKI89, ICE.
Key size: 128, 192, or 256 bits.
Block size: 128 bits.
Successors of AES are: AES-GCM, AES-CCS.
Asymmetric cryptography uses one key for encryption and another for decryption. It is usually used for cases which require authentication, key exchange, and nonrepudiation. This cryptography scheme is also referred to as public key cryptography.
In this type of cipher, a pair of keys is used to encrypt and decrypt the data. The key pair consists of two unidentical large numbers, where one of the keys can be shared with everyone, called the public key, and the other part, which is never shared, is called the private key.
A digital signature scheme can also be implemented using public key cryptography. This scheme has two algorithms, one for signing and the other for verification. Signing requires the use of a secret/private key while verification is done using a matching public key. RSA and DSA are two popular digital signature schemes.
These algorithms are computationally expensive. The hardness of RSA comes from its integer factorization while the hardness of DSA is related to a discrete logarithmic problem. More recently, elliptic curve cryptography (ECC) has been developed and is gaining traction over the RSA algorithm for the following two reasons:
Hash functions use mathematical transformations to irreversibly encrypt information without the usage of any key. These are usually used for cases where message integrity is required, for example, the digital fingerprint of a file’s content and encryption of passwords. These functions are also called message digests or one-way encryptions. Some popular examples of hash functions are:
A good random number generator is fundamental to all secure systems. Lots of security protocols require random bits to remain secure. You may often find the word “unpredictable” is used instead of “random.” Either way, the idea is to make it difficult for the attacker to guess a value. Random numbers may be required for applications that use one of the following:
There are three types of RNGs:
Cryptographic algorithms can be implemented:
OpenSSL layered architecture within the Linux kernel (CryptoAPI) is one example that can be used to understand how cryptographic algorithms are realized (Fig. 7).
All cryptographic algorithms are implemented in the software as part of the Software Algorithm Block in the Linux kernel, as shown in Fig 8.
In Fig. 8, the block above the Algorithm Implementation Block, the algorithm management block, provides:
This is the block which is responsible for off-loading requested cryptographic operations to:
There are multiple reasons why a hardware accelerator is preferred to software implementations:
The strength of software implementation is:
The abovementioned strengths become less effective with:
Security has for a long time been focused upon in the world of enterprise. With the advent of the IoT, security in embedded devices is becoming a hot topic. OEMs are usually in a hurry to get products to market and security is usually an afterthought. However, security negligence right from the very start of product development makes devices vulnerable, exposing them to a vast number of attacks.
To ensure the security of embedded systems against a wide range of attacks the entire ecosystem must be protected. Typically, the life cycle of an embedded device involves phases, such as design and development, product deployment, maintenance, and decommissioning. Security needs to be embedded at each of these phases.
The design and development phase starts with conceptualization of the product. This phase further incorporates stages such as requirement gathering, design, development, validation, and integration. Security requirements need to be added at the requirement gathering stages and mitigations need to be propagated through all the different phases. Threat analysis is an important step which needs to be added at this stage. This ensures the correct security requirements are identified and propagated throughout the development process. Another important aspect at this stage is to ensure the secure design of the device. Based on threat analysis a list of security requirements is made available to the product developer. Devices need to be made secure at this stage by utilizing both hardware and software solutions.
Once a product is developed, it is produced on a mass scale and needs to be deployed to the customer’s premises. The manufacturing/production environment also needs to be secured. The possible threats at this stage can be overproduction or cloning of the product.
Once the product is deployed at the customer’s premises, the equipment might require software upgrades. The maintenance phase typically involves taking care of secure upgrades. Furthermore, secure disposal needs to be ensured at the decommissioning stage of the product (Fig. 10).
Secure design practices need to be incorporated at various stages throughout the SDLC.
Given below are various secure design practices that apply at different levels of the development cycle (Fig. 11).
One of the important aspects during the design phase of an embedded product is to get the requirements right. “Threat modeling” helps define the security requirements for an embedded product. We will look in detail at how to do a risk assessment, followed by threat modeling, in the next section.
It is important to keep the design as simple and small as possible. Complex designs increase the likelihood that errors will be made during implementation, configuration, and use. Additionally, the effort required to achieve an appropriate level of assurance increases dramatically as security mechanisms become more complex.
It is essential to decide and follow certain coding standards that govern how developers write code. The coding standard helps in increasing reliability by advocating good coding practices.
Given below are some practices for secure coding that can be added to the coding guidelines.
Static code analyzers help to find code sequences that could result in buffer overflows, memory leaks, and other security and reliability problems. These are designed to analyze an application’s source, bytecode, or binary code to find security vulnerabilities. These tools find security flaws in the source code automatically.
Listed below are the most common errors static code analyzers can detect:
In order to identify all software flaws via static analysis, organizations should use multiple tools from different vendors.
Peer reviews are an important part of the development phase and help in catching issues at a much earlier stage. Most code reviews are aimed at finding coding bugs, design flaws, and coding standard violations. Though these ensure reliable software, emphasis should also be placed on security analysis.
The reviewer should also consider security-relevant characteristics in the code like:
A comprehensive test suite that includes functional, performance, regression, and coverage testing is well known to be one of the best mechanisms to assure software is reliable and secure.
Before final penetration tests are performed by specialized teams, a formal security evaluation stage is needed. This is accomplished through dynamic runtime testing methods [1, 2], for example, fault injection systems can also be used to check for the presence of flaws.
The term “threat analysis” refers to the organized, systematic process of examining an embedded system’s points of target and sources from which it might be attacked. A thorough threat analysis is required before any design decisions can be made regarding what methods of protection to use against any attack on the system.
A sound security policy needs to be established as the foundation for embedded system design. To establish this security policy, all threats to the system need to be identified and possible mitigations should be integrated within the product’s life cycle.
The following section provides steps that can be taken to complete threat analysis when designing an embedded system.
Hunter et al. [3] demonstrate an iterative threat modeling flow for an embedded system. Initially, the first point of weakness and attack needs to be identified. This is followed by a detailed review of security requirements and objectives. Next, modeling must occur for each security requirement, considering the three points of view of an attack scenario discussed above, that is, asset, attacker, and mitigation/defense built in to address the threat. Both the modeling of the specific security requirements and system objects are then iteratively evaluated until a high level of certainty is reached that the model developed provides adequate security against the identified threats and that no new vulnerabilities have been introduced.
Attack vectors are ways in which an attacker attempts to gain access to a system and exploit its vulnerabilities to achieve an objective. To understand how to attack a system it is important to understand the objectives of such attacks. Ravi et al. [4] broadly classify attacks into three categories, based on their functional objective:
These attacks can be launched on either the hardware or the software of an embedded system. Let’s first look into some of the most common attacks. These attacks can be:
There are several ways an attacker, who has physical access to a system, may tamper with it. This tampering can be used to extract secret information or destroy the device. Methods used to achieve this include removing or adding material to the IC to access information. Etching or FIB can be used to remove such materials. Optical inspection can be carried out to read internal signals, probe bus, or memory to extract secret information.
The goal of achieving tamper resistance is to prevent any attempt by an attacker to perform an unauthorized physical or electronic action against the device. Tamper mechanisms are divided into four groups: resistance, evidence, detection, and response. Tamper mechanisms are most effectively used in layers to prevent access to any critical components. They are the primary facet of physical security for embedded systems and must be properly implemented to be successful. From the design perspective, the costs of a successful attack should outweigh the potential rewards.
Specialized materials are used in tamper resistance to make access to the physical components of a device difficult. These include features such as locks, encapsulation, hardened-steel enclosures, or sealing.
Tamper evidence helps ensure that visible evidence is left after tampering has occurred. This can include special seals and tapes which area easily broken, making it obvious that the system has been physically tampered with.
Tamper detection enables hardware devices to be aware they are being tampered with. Sensors, switches, or circuitry can be used for this purpose.
Tamper responses are the countermeasures enacted upon detection of tampering. Measures that can be taken by hardware devices include deletion of secret information and shutting down to prevent an attacker from accessing information.
Side-channel attacks are typically noninvasive attacks where things like timing information, power consumption, or electromagnetic radiation from the system can be used to extract secret information. As the name suggests, an attacker does not tamper with the device under attack in any way but uses side channels, observations to mount a successful attack. The observation can be made either remotely or physically, using the right tools. The most common side-channel attacks are architectural/cache, timing, power dissipation, and electromagnetic-emission attacks. Let’s try and understand these attacks in a little detail to appreciate how secrets can be extracted using these side channels. The underlying idea of SCAs is to look at the way cryptographic algorithms are implemented, rather than looking at the algorithm itself.
SCAs can be instigated because it is possible to find a correlation between the physical measurements taken during computations and the internal state of an embedded device, which itself is related to a secret key. It is this correlation—with a secret key—that the SCA tries to find.
The power consumption of a device can provide information about the operations that take place and the parameters involved. An attack of this type is applicable only to the hardware implementation of cryptographic algorithms. Such attacks can be divided into two categories—Simple Power Analysis and Differential Power Analysis (SPA and DPA) attacks. In SPA attacks, the attacker wants to essentially guess from the power trace which instruction is being executed at a certain time and what values the input and output have. Therefore attackers need precise knowledge about implementation to mount such an attack. The DPA attack does not need knowledge about implementation details and alternatively exploits statistical methods in the analysis process. DPA is one of the most powerful SCAs that can be mounted, using very few resources. These attacks were introduced by Kocher et al. in 1999 [5]. In this particular case DES implementation in the hardware was under attack.
Popular countermeasures against SCAs include masking and hiding. With masking the line is broken between the processed and algorithmic intermediate value. This means that an intermediate value can be masked by a random number. To remove the mask, changes must be tracked through operations. Such a mask is not predictable and is not known to an attacker. Masking can be done at the architectural or chip level.
“Hiding” helps break the link between processed intermediate values and emitted side-channel information. It tries to make the power consumption of a device independent of the processed intermediates. There are two different strategies adopted for hiding:
No perfect solution has been found so far for hiding. A list of countermeasures proposed by various authors is available for the further study of this topic [6].
Timing attacks are popular and occur when the time taken by cryptographic operations can be used to derive a secret. Cryptographic implementation can be done via hardware or software libraries—both implementation types are vulnerable to this kind of attack.
OpenSSL is a popular crypto library which is often used in Linux web servers to provide SSL functions. Brumley and Boneh [7] demonstrated that timing attacks can reveal RSA private keys from an OpenSSL-based web server over a local network.
In recent times there have been cross-VM timing attacks on OpenSSL AES implementations [8, 9]. Here cache Flush + Reload measurements were used. This type of attack takes advantage of the fact that the executable section of the code is shared between processes. When a process is run for the first time, the operating system loads the process into physical memory. If another use launches the same process for a second time, the operating system will set the page tables for the second process to use the copy that was loaded into memory for the first process. Here, by calculating the time it takes to access any data in shared memory, it is possible to determine if another process accessed it.
In the Flush + Reload attack, both the attacker and victim have some shared memory mapped into their own virtual space. The attacker flushes the lines it is interested in, waits for some clock cycles, and then calculates the time it takes to read those lines again. If the read is fast, it means that the victim’s process accessed these lines. These lines can be either in the code area of the victim or some other data the attacker is interested in.
One possible countermeasure for these attacks is to adopt timing- invariant implementation of the algorithms in the hardware as well as the software.
In recent times, these cache attacks have been used in the popular Meltdown and Specter attacks. We will discuss in more detail the Meltdown and Specter attacks in a case study later in this chapter.
Fault attacks are active attacks against cryptographic algorithms where the hardware is exposed to random and rare hardware and software computation faults. These faults result in errors which in turn can be used to expose secrets on the chip.
The most common fault injection techniques include underpowering, temperature variation, voltage bursts, clock glitches, optical attacks, and electromagnetic fault injection. All these fault injection methods manipulate the physical layer of the device causing the transistors to switch abnormally.
Clock glitches and voltage bursts/spikes are the most popular form of noninvasive fault attacks where no damage is done to the equipment. Clock glitches can be introduced by supplying a deviated clock signal to chips while voltage spikes are introduced by varying the power supply to the chip. Both these techniques can affect the program as well as dataflow. These glitches can cause the processor to skip the execution of instructions, can change the program counter, or tamper with loops and conditional statements. Effects on the dataflow include the possibility of invalid data in memory reads, computation errors, and corrupt memory pointers. Both these fault attacks are easy to implement and are very inexpensive. Some other examples of noninvasive attacks include exposing the chip to very high or low temperatures and underpowering the device.
Another type of fault attack is the optical attack. These are semiinvasive/invasive in nature. In such attacks, a decapsulated chip is exposed to a strong light source. These attacks require expensive equipment and a complex setup. With a focused laser beam it is possible to set or unset a bit in memory.
An external electromagnetic field can also be used to change memory content. These EM field changes induce eddy currents on the surface of the chip and can cause single bit failures.
Fault attacks can be used to change program flow by attacking critical jumps. For example, say we have an authentication code sequence where a decision needs to be made to pass control to the next image if authentication passes or to stop in the case of failure. An attack on the authentication check can be critical to the execution flow (Fig. 12).
We have just considered an example where attacks on program flow can lead to the skipping of security branches or bypassing of security settings.
There are attacks which are launched on I/O loops in the code. Typically, all programs have I/O loops where copies are happening from buffers. Attacks can be launched on the copy loops to:
These attacks can lead to the wrong initialization of data or keys.
One type of countermeasure against fault attack can be applied at the hardware level to prevent fault injections. Examples include active and passive shields. An active shield can consist of a wire mesh over the chip to detect any interruptions on the wire. Passive shields are metal layers that either cover the chip completely or partially to prevent optical injection or probe attacks. Light sensors and frequency detectors can be added to the chip to detect clock and voltage glitches. However, these countermeasures are costly, and attackers are always on the lookout for ways to bypass these and come up with novel fault injection methods.
Other countermeasures include protecting the software and hardware, so that faults can be detected. These employ redundancy checks to check if a computation has been tampered with and incorporate fault checks to detect and report faults. A detailed discussion of these countermeasures can be found in [10]. In embedded systems, these faults can be detected in different parts of a processor.
If inputs are supplied to an algorithm or implementation externally, any miss on checking these input parameters can result in a fault. For example, typically in a Chain of Trust, for authenticating an image, signatures, public keys, and their lengths are provided externally. If the software does not do bound checks on these externally supplied parameters, attacks on copy loops may result. Say the user supplies a public key length which is greater than the buffer length allocated internally in the software. If no bound checks are performed on key length this would result in the copying of a key greater than the buffer allocated to it. An attacker can intelligently use this buffer overflow attack and modify some decision-making data lying in the periphery of this buffer. Thus proper checks on any input parameters are essential to prevent these kinds of attacks. Validity checking of input parameters is essential before doing any computation.
Attacks on processing parts usually attempt to change the program flow by skipping instructions or modifying memory content. Some of the countermeasures against such attacks include:
Meltdown [11] breaks the most fundamental isolation between user applications and the operating system. This attack allows a program to access memory and thus all the secrets of other programs and the operating system. One way an attacker can trigger a Meltdown attack is by making use of an exception. An exception can occur when a user tries to access something from kernel memory. This exception is handled by the kernel. Architecturally, the isolation mechanisms will not allow the user to access kernel memory, but in the short window between when the exception has been handled and control returns to client memory, some user space instructions might get executed out of order. These instructions in the user space can be used to deliberately access kernel memory. Due to execution being out of order, the content, though not visible to the user process, will be in the processor’s caches. After the exception is handled, before returning to the user process, the processor would do a cleanup. However, caches do not get cleaned up as part of this process. From the user space, an attacker can run the Flush + Reload attack, as described earlier, to extract this information.
You might be wondering about the practical aspects of this attack. In the Linux kernel, keys are usually stored and can be used for various purposes. One purpose being disk encryption. A user may be able to access this key by using the attack outlined above.
Specter [11] is like Meltdown in the sense that data being accessed speculatively will end up in cache, which is vulnerable to cache attacks looking to extract data. While Meltdown breaks the isolation between the kernel and user process, Specter attacks the isolation between different applications. It allows an attacker to trick error-free programs into leaking their secrets. Examples have been given in a white paper [11], showing how Specter attacks can be used to violate browser sandboxing. A java script code can be written to read the data from the address space of a browser process running it. There are two variants of Specter attack, one which exploits conditional branch misprediction and another which leverages the misprediction of the targets of indirect branches. We will discuss these variants in some detail in the following text.
Given below is the example stated in the Specter white paper [12].
Let’s try and define what we are going to steal using this attack. Let us assume that the target program has some secret data stored right after array [13]. This is what the attacker wants to get his hands on.
In the code snipper above, input to the program is x. Bound checking is done on “x” as expected so that extra data beyond the array [13] index in array2 does not get accessed. When the code executes, if the array1_size variable is not in cache, the processor will speculatively fetch and execute the next set of instructions. In this case, if the value of x is greater than array1_size, due to speculative fetch, bound check would be by-passed and the processor would fetch the data in array2 at the location denoted by array1[x]. However, his doesn’t solve our problem, does it? The attacker wants to find out the value of array1[x], with x being his invalid input pointing to some secret data in the target program. How does he find that value? For this he would utilize cache timing attacks as discussed in the previous sections.
Assuming array1 is a uint8_t-type variable, the possible values of array1[x] range from 0 to 255. So, this means access would be happening from array [3] at locations 0 *256,255 * 256. The attack process can use cache attacks (Prime + Probe) to fill the entire cache with values. If the value at array1[x] is 0 × 20, then the location pointed out by array2[0 × 20] will be evicted out of cache. The attacker can then measure the timing they need to fetch their data and predict the value. This is a very simplified example to help users understand how the attack can be implemented. For further details refer to the Specter white paper [12].
Systems with microprocessors utilizing speculative execution and indirect branch prediction may allow unauthorized disclosure of information to an attacker with local user access via side-channel analysis. Here the user tricks the branch predictor and influences the code which will be speculatively executed. Usually processors have a branch target buffer (BTB) to store the branch target prediction.
Embedded devices, specifically smart phones, provide an extensible operating system giving the user the ability to install applications and do variety of things. With this flexibility, comes a wide range of security threats. This highly extensible and flexible environment is usually referred to as the rich execution environment (REE). To protect the assets of the system and assure integrity of the code being executed along with the REE, we need a trusted execution environment (TEE). In simple words a TEE can be defined as the hardware support that is required for platform security.
Global Platform defines a TEE as “a secure area that resides in the main processor and ensures that sensitive data is stored, processed and protected in a trusted environment” (Fig. 13).
A TEE needs to ensure that:
To realize a TEE an external security coprocessor, like TPM, can be connected to the SoC. Since this is a separate chip it provides complete isolation.
Another way of realizing a TEE is to have an on-chip security subsystem which can fulfill its requirements.
Another architecture can be such that the processor and the other peripherals provide a secure environment without the need for a different entity. ARM TrustZone is an example of such a TEE. This will be considered in detail in the following sections.
The Trusted Computing Group (TCG) is a not-for-profit organization formed to develop, define, and promote open, vendor- neutral, global industry specifications and standards, supportive of a hardware-based Root of Trust, for interoperable trusted computing platforms. They define standards for what is called TPM. TPM is a dedicated secure crypto-processor are designed to secure hardware or software by integrating cryptographic keys into a device. TPM chips are passive and execute commands from the CPU.
The main objectives of TPM include:
Given below is a high-level block diagram of a TPM chip (Fig. 14).
TPM must be physically protected from tampering. In PCs this can be accomplished by binding it to the motherboard.
There is an I/O port which connects the TPM chip to the main processor. The data transmitted over this I/O port follows standards specified by the TCG. The I/O block is responsible for the flow of information between the components inside TPM, and between TPM and the external bus.
TPM consists of a lot of cryptographic blocks which help provide cryptographic isolation. The RNG block is the true random bit stream generator. Random numbers produced by the RNG can be used to construct keys, provide nonce, etc. It has a SHA1 engine to calculate hashes which can be used for PCR extension, integrity, and authorization. There is an RSA engine to execute the RSA algorithm. The RSA algorithm can be used for signing, encryption, and decryption.
TPM also has some nonvolatile storage to store long-term keys. Two long-term keys are the Endorsement Key and the Storage Root Key. These form the basis of a key hierarchy designed to manage secure storage. NV storage (nonvolatile) is also used to store authorization data like owner passwords. Such passwords are set during the process of taking ownership of TPM. Some persistent flags related to access control and Op-In mechanisms are also stored here.
PCRs are used to store integrity metrics to store measurements. These are reset every time the system loses power or restarts.
Keys form part of nonvolatile memory that is used to store them for crypto operations. For a key to be used it needs to be loaded into TPM.
TPM acts as the Hardware Root of Trust, providing:
A secure element (SE) is a tamper-resistant platform (typically a one-chip secure microcontroller) capable of securely hosting applications and their confidential and cryptographic data (e.g., key management) in accordance with the rules and security requirements set forth by a set of well-identified, trusted authorities.
The main features of a SE are:
ARM TrustZone technology provides protective measures in the ARM processor, bus fabric, and system peripheral IPs to provide system-wide security. TrustZone technology is implemented in most ARM modern processors including Arm Cortex-A cores and the latest Cortex-M23- and Cortex-M33-based systems.
ARM TrustZone starts at the hardware level by creating two worlds that can run simultaneously on a single core: a secure world and a nonsecure world. Software either resides in the secure world or the nonsecure world. A switch between the two worlds can be done via a monitor call in Cortex-A processors or using core-logic in Cortex-M processors. This partitioning extends beyond the ARM core to memory, bus, interrupts, and peripherals within an SoC.
The ARM core or processor can run in two modes: secure and nonsecure mode. This state of the processor is depicted by a flag called NS. This flag is propagated to the peripherals though the bus (AMBA3 AXI system bus). Between the bus and the various peripherals, such as external memory and I/O peripherals, sits a gatekeeper. This gatekeeper allows/restricts access to these external resources based on policies set. Examples of these gatekeepers are:
TrustZone technology within Cortex-A-based application processors is commonly used to run a trusted boot and a trusted OS to create a TEE. Typical use cases include the protection of authentication mechanisms, cryptography, key material, and digital rights management (DRM). Applications that run in the secure world are called Trusted Apps [14].
TrustZone for Cortex-M is used to protect firmware, peripherals, and I/O, as well as provide isolation for Secure Boot, trusted update, and Root of Trust implementations while providing the deterministic real-time response expected for embedded solutions [14].
What is the definition of a Trustworthy Embedded System? A trustworthy system is a system which does what its stakeholders expect it to do, resisting attackers with both remote and physical access, else it fails safe.
Such a system should have features that allow its stakeholders to prevent or strongly mitigate an attacker’s ability to achieve the following attacks:
A trusted system can be built by rooting the trust in hardware and continuing the Chain of Trust to ensure only authentic software runs on the system (Fig. 15).
Root of Trust begins with a piece of immutable code which can not be changed during the life cycle of an embedded product. This code lies in the ROM (read only memory) of embedded systems and is the first to execute after boot. It is the responsibility of this code to ensure the authenticity of the next-level code before passing control to it. This next-level image is responsible for authenticating the next image, this is how Chain of Trust continues. Typically, ROM code authenticates the boot loader and the boot loader authenticates the operating system which further authenticates user space applications (Figs. 16 and 17).
This CoT is also referred to as a Secure Boot Chain of Trust. Let’s try and understand the significance of authenticating the images. Authentication ensures that the image is from a genuine stakeholder who has the required private key.
Images are typically signed offline using a private key from an asymmetric key pair (e.g., RSA). This signature is then verified using the corresponding public key on the Silicon before the image is executed. It is essential to tie this public key, used in the authentication process, with the underlying hardware Root of Trust. This can be done via a comparison of the hash of this public key with the hash stored in some secure immutable memory on the SoC. This memory can be in the form of one-time programmable fuses which are programmed as part of production when a device is manufactured.
To build a Root of Trust in embedded systems, two things are essential:
What we have talked about up to now is referred to as “Secure Boot” where one component authenticates the next component before execution.
There is another commonly used term called “Measured Boot.” Both “Secure Boot” and “Measured Boot” ensure that a platform is executing code which has not been compromised.
Both Secure Boot and Measured Boot start with the Root of Trust and extend a “Chain of Trust.” The CoT starts in the root, and spreads to the boot loader(s), the operating system, and the applications. Once a Root of Trust is established, Secure Boot and Measured Boot do things differently.
In the case of Measured Boot, the current running component measures or calculates the hash of the next component which is going to execute on the platform. This hash is then stored in a way that it can be retrieved later to find out which components ran on the system. The Measured Boot does not make any decision in terms of good or bad, neither does it stop the platform from running. These measurements are used for attestation with remote servers to ensure that the required software is running on the SoC.
One of the main requirements for Measured Boot is that these hashes (measurements) need to be stored in a location which can be trusted and not easily manipulated. This location would serve as Root of Trust for Storage. The TPM is typically used to store these measurements.
The TPM is a small self-contained security processor that can be attached to a system bus as a simple peripheral. More details on the TPM are available in the next section. Here we will discuss in brief the function provided by the TPM which helps in the Measured Boot. One of the functions a TPM provides is called PCRs, used for storing hashes.
These registers in the TPM are cleared only at hardware reset and cannot be written to directly. The value in these PCRs can be “extended,” that is, the existing value of the PCR is taken along with the new value, and they are concatenated, producing a 40-byte value. Then, the hash of that value is taken and stored in the PCR. Thus as the platform boots, each measurement is stored in the PCRs in a way that unambiguously shows which modules were loaded.
TPM can report these values, signed by a key that only the TPM can access. The resulting data structure, called a Quote, gives the PCR values and a signature, allowing them to be sent to a Remote Attestation server via an untrusted channel. The server can examine the PCRs and associated logs to determine if the platform is running an acceptable image.
Secure Boot with CoT and Measured Boot together help to create an architecture which is resistant to any malware in the boot software, generally referred to as rootkits.
We have come a long way from the time when embedded systems were meant to run a single application, to where present-day embedded systems behave like mini computers. The smartphone market is an obvious example of this. Gone are the days when devices used to operate in isolated environments. With the advent of the IoT (Internet of Things), we have devices which are always connected to a variety of public networks or proprietary networks. Connection to the Internet, increases the possibility of the exposure of embedded devices to cyberattacks. Such attacks are not limited to the Internet, even proprietary networks are vulnerable—a good example of this being the Stuxnet attack. In the Stuxnet attack, traditional malware techniques were used to take over a proprietary network in a locked down facility in Iran.
Over time operating systems have become more and more complex. This increase in size and complexity means that it is not possible to examine all the OS software for security vulnerabilities and issues. These cyberattacks increase the need for built-in security in OSs as these attacks can easily work their way into devices through OS vulnerabilities.
The OS needs to have built-in security features to thwart these attacks regardless of how they enter the system. These features need to focus on:
These can be achieved if the operating system can enforce fine-grained separation between the user and access to resources. This can be done by defining security policies. Furthermore, it is important to ensure that this separation is effective by making sure that execution is completed through a trusted execution path. This path should be free from any flaws and vulnerabilities.
The following sections describe some key security features that can be built in to operating systems for application and data security.
Access control is required to ensure that only authorized users and processes can access resources they are entitled to access. These resources include not only data files but memory, I/O peripherals, and other critical resources of the system.
At a high level, access control policies of a system can be divided into following categories:
DAC, as the name suggests, is discretionary, that is, at the discretion of the user. The user decides the policies on its objects. For example, when a user creates a file, they decide who can has certain permissions on that file. These permissions are stored in the inode associated with the file. Thus each object on a DAC-based system has what is called an ACL (access control List). The ACL has the complete list of users and groups which the creator of an object has granted access to. Here the user who owns the resource gets total control whether they want it or not. One important point to note is that a user can set/change permissions for resources they own. There is another category called the super user, where the DAC policy for managing the system can be bypassed.
In MAC, it is not the user but the system administrator or a central user that controls what resources each user gets access to. It is stricter than DAC and helps in containing bugs in the user space software. In MAC, the object is associated with a security label instead of an ACL. The label contains two pieces of information:
Each user account also has a classification and category associated with them. A user is allowed access to an object only if both the category and classification of the object match that of the user. SELinux, AppArmour, and SMACK are some of the widely used implementations of MAC in the Linux world [15].
Application sandboxing helps to isolate applications from critical system resources thus adding a security layer to prevent malware from affecting systems. Sandboxing is also sometimes referred to as “jailing.” It provides a safe environment which is monitored and controlled, such that the unknown software cannot do any harm to the host computer system [16]. It ensures that a fault in one application doesn’t bring down the complete system.
Virtualization is one of the ways of achieving application sandboxing. Virtualization is the use of hypervisors or virtual machine monitors to create and manage individual partitions that contain guest OSs on a single real machine. The hypervisor allocates system hardware resources, such as memory, I/O, and processor cores, to each partition while maintaining the necessary separation between operating environments. A hypervisor enables hosting multiple virtual environments over a single physical environment. A critical function of the hypervisor from a security stand point is to maintain isolation between partitions and continue running even if another OS crashes [17]. The ability to maintain isolation is highly dependent on the robustness of the underlying hypervisor. There are two types of hypervisors:
Type 1 hypervisors run on bare metal while Type 2 hypervisors have an underlying operating system. Since the security of a Type 2 hypervisor depends on the underlying host operating system, these hypervisors are not used in mission-critical deployments.
CPU hardware assists are generally used for implementation of hypervisors in embedded systems. Popular CPU architectures like PowerPC and ARM have hypervisor extensions defined in them.
Apart from CPU extensions, ARM architecture also provides another capability called ARM TrustZone which provides ways to partition systems into two zones: secure and nonsecure. Trusted software which uses secrets, like keys, completes cryptographic operations, and operates digital rights management software, etc., can be run in the secure world which is isolated from the nonsecure world. Further details about ARM TrustZone are provided in the Section 5.1.
Linux containers are also used for providing application isolation, apart from virtualization. While hypervisors are used to provide virtualization, containers use the functionality of underlying OSs, like namespaces, to restrict applications from accessing certain system resources, files, etc. This effectively means that applications share the same operating system but have separate user spaces. With containers, the operating system, not the physical hardware, is virtualized (Fig. 18).
Let’s discuss the security aspects of the two approaches. If an application running in a container has some vulnerability and affects the operating system, all other applications running in the container would be affected. In a similar situation in the case of a virtual machine, only the OS running that application is affected, leaving the host OS and other VMs unaffected. While a container uses software mechanisms to achieve isolation, virtualization is tied up with hardware and provides more security. However, this added security comes at a price—performance is lower in the case of VMs since many context switches are involved. Containers have lower overheads and are less resource-heavy. So, you need to choose the right sandboxing methodology based on your use case and requirements. For a constrained embedded device, not capable of running virtual machines, containers seem to be the first practical virtualization technology.
Container adoption is on the rise in IoT devices that have limited storage space, bandwidth, and computing power. Docker is a popular container technology which is built on LXC and has an “easy button” to enable developers to build, package, and deploy applications without requiring a hypervisor.
Attackers usually try to modify existing code or inject malicious code into a system, tricking the user to run it. This can be prevented if applications are authenticated before execution. Authentication helps ensure that application code is from a trusted source and has not been modified. A typical way of doing this is by using certificates and signatures. The hash of an application can be compared with the hash present in the certificate that comes along with the application. The certificate always comes from a trusted authority. For example, Apple iOS implements this by enforcing all applications through the App Store.
Normally asymmetric cryptography, using public and private keys, is used for authenticity, but this same effect can be achieved using symmetric key hashes too, like HMAC. Both methodologies have their pros and cons. When using a signature, the confidentiality of the private key needs to be ensured by a single authority and the system just needs to protect the integrity of the public key used for verification. However, since the system doesn’t have the private key, it cannot resign the application or file in case any changes are made to it. This schema can be used for read-only files and applications which don’t change during the lifetime of a system. However, if there are security- critical files which change during the lifetime of a system, they need to be protected by a local symmetric key. The caveat being that this local symmetric key needs to be carefully protected to prevent attackers from using it to sign a malicious application.
Linux IMA (Integrity Measurement Architecture) and EVM (Extended Verification Module) are example frameworks which have been built in Linux for application integrity and authenticity. These frameworks have been available in the Linux kernel since 2.6.30. Linux integrity frameworks provide the capability to detect whether files have been accidentally or maliciously altered, either remotely or locally, appraise a file’s measurement against a “good” value stored as an extended attribute, and enforce local file integrity. These goals are complementary to Mandatory Access Control (MAC) protections provided by LSM modules, such as SElinux and Smack, which, depending on the policy, attempt to protect file integrity [17].
At a very high level, the IMA and EVM provide the following functionality:
The IMA maintains a runtime measurement list which can be anchored in hardware (e.g., TPM) to maintain an aggregate integrity over the list. Hardware anchoring in TPM, or in any other way, helps to ensure that the measurement lists can’t be compromised by a software attack. Further details about this infrastructure can be found in the Linux kernel documentation [17].
The Secure Boot mechanism (as shown below) is provided by NXP on its Layerscape trust architecture SoCs. This mechanism establishes a Chain of Trust with every image being validated before execution.
Root of Trust is established in boot ROM execution phase, by validating the boot loader image before passing control for its execution. Each firmware image is appended with a header. This header contains security information related to the image, such as an image signature or public key. The Chain of Trust ends after validation of the fit image or kernel image (Fig. 19).
Rootfs is another important entity that needs to be authenticated before passing control for its execution. Rootfs can be used in following ways:
One such mechanism provided by the Linux kernel for validating rootfs content is the IMA EVM feature. This provides file-level authentication as discussed in the previous section.
The IMA EVM uses an encrypted-type key. An encrypted key blob for a user is derived by the kernel using the master key. The master key can be a:
A secure key is generated using the CAAM security engine which constitutes random bytes. The key contents are stored in kernel space and are not visible to the user. The user space will only be able to see the key blob.
Blobs are special data structures for holding encrypted data, along with an encrypted version of the key used to decrypt the data. Typically, blobs are used to hold data to be stored in external storage (such as flash memory or in an external file system), making the contents of a blob a semipersistent secret. The secrecy of a blob depends on a device-specific 256-bit master key, which is derived from the OTMPK or ZMK on Layerscape trust architecture–based SoCs.
The IMA EVM operates in two modes: fixed mode and enforced mode. Fixed mode is meant to be executed at the factory setup stage, subsequently, the SoC is always booted in enforced mode. Fixed mode is meant to generate the key and label the file system with the IMA EVM security attributes. In enforced mode the attribute values are either authenticated or appraised only. Enforced mode denies access to the file if any mismatch is found in the security attribute values.
On Layerscape trust architecture–based SoCs the secure key (tied to CAAM hardware) along with an encrypted key type can be used to support the IMA EVM–based authentication mechanism.
The IMA EVM can be enabled on SoCs using a small initramfs image which is validated in the Chain of Trust. Initramfs is meant to perform the following tasks:
The Chain of Trust with the IMA EVM is shown in Fig. 20.
Attackers have in the past exploited (and will probably continue to exploit) applications through user supplied input. One of the most common and oldest forms of attack is the “buffer overrun,” where user- supplied input goes unchecked and ends up writing directly to the operating system and application memory that is normally used to store the application execution code and temporary and global data. Instead, an attacker supplies sufficient data to take control of application execution (by manipulating the stack pointer) and executes, within the application context, the data and code they have written to memory rather than continue to execute the application. To mitigate this attack, several platforms and operating systems, such as Windows XP onwards, Apple iOS, Android, and SELinux, all mark application data as nonexecutable so that even if the attacker manages to write data to memory they will struggle to execute that data.
To mitigate the nonexecution of the overwritten data or where space available is too small to contain all the malicious instructions, attackers attempt to use another technique, “return to-lib-c/return orientated programming.” In this case, they attempt to use already preloaded and existing libraries and the code of the operating system, which they reference in a sequence to try to execute their desired function. To mitigate this attack, several operating systems have also adopted the technique of address space layout randomization. By randomizing the memory locations in which executable code and libraries are loaded the ability of an attacker to readily guess and access the predictable software codes they need is significantly reduced.
Data on an embedded device can be in following states:
Each of these states has unique security challenges which need to be addressed.
Data at rest is data which is stored on a device and is not actively moving from device to device or network to network. This includes data stored on a hard drive, flash drive, or archived/stored in some other way. Protecting data at rest aims to secure inactive data stored on any device or network. This data may include credit card pins, passwords, etc., found on a mobile device. Mobile devices are often subject to specific security protocols like data encryption to protect data at rest from unauthorized access when a device is lost or stolen. The encryption methods used should be strong. Usually AES is preferred as an encryption method for data at rest. Encryption keys should be stored separately from data and should be highly protected. The total security of the data lies with the encryption key. Periodic auditing of sensitive data should be part of a security policy and should occur at scheduled occurrences.
Given below are some open-source encryption solutions in Linux:
Both are supported by Ubuntu, SLES, RedHat, Debian, and CentOS.
Before choosing the right solution for your data at rest you need to answer a fundamental question—What do you mean by data you want to protect? Do you mean complete hard drive data or a file containing sensitive information?
Full disk encryption or authentication involves encrypting and/or verifying the contents of the entire disk at a block level. In the case of Linux, this is performed by the kernel’s device mapper (dm) modules. This method can be used with block devices only (e.g., EMMC and NAND). This software is called dm-crypt and works by encrypting data and writing it onto a storage device (by way of the device driver) using a storage format called LUKS.
Linux Unified Key Setup (LUKS) is a specification for block device encryption. It establishes an on-disk format for the data, as well as a passphrase/key management policy. LUKS uses the kernel device mapper subsystem via the dm-crypt module. This arrangement provides low-level mapping that handles encryption and decryption of device data. User-level operations, such as creating and accessing encrypted devices, are accomplished using the cryptsetup utility (Fig. 21).
Data can be protected at the directory or file level too. Some mechanisms available in Linux that offer this protection are:
The key which is used for encryption needs to be protected. Mechanisms that can be used to protect this key include:
To protect data in motion, an encrypted channel needs to be created for moving data. This encrypted channel can exist at the application layer or at the transport layer. We often select transport layer protections given our desire for code reuse and the wealth of battle- hardened encryption technologies available at the transport layer. The two most widely used mechanisms for transport layer encryption are Transport Layer Security (TLS) or IPsec.
IPsec, TLS, and SSH share a common goal, that is, to provide a secure connection between two peers/devices/endpoints. The difference between them is the layer at which they execute. There is no preferred protocol in that they all offer certain benefits. To decide which one to use, you really need to understand what you are trying to secure. Once you understand that, the choice of which network security protocol to use becomes easy! The security services provided by these protocols include:
IPsec (RFCs 2401, 2406, 2409, 2411) is a protocol suite that runs at the networking layer (L3). It provides confidentiality, integrity protection, data origin authentication, and replay protection for each message by encrypting and signing each one. IPsec is a combination of many RFCs and defines two main protocols: Authentication Header (AH) and Encapsulating Security Payload (ESP). ESP is the preferred choice as it provides both authentication and confidentiality while AH doesn’t provide confidentiality. ESP has two modes of operation: Transport and Tunnel Mode. Transport mode is intended for host-to-host connection and doesn’t hide the original packet’s header information. In comparison Tunnel mode fully encapsulates the IP packet inside a new IP packet, with completely new headers. ESP tunnel mode is the choice when maximum security and confidentiality are required. Transport mode is used for secure sessions between end devices while tunnel mode is used between security gateways.
In order to establish an IPsec Security Association (SA) between two endpoints, the SAs need to be dynamically established via a key management protocol. This is normally done via IKEv1/IKEv2 in the Internet world. The peer who wants to establish a secure connection with a remote host sends the host its identification information in the form of a certificate. It also sends random data to check that messages are still alive and not being replayed. The data is signed by the initiator to assert their origin. The receiving peer verifies the signature to authenticate the sender and then signs the same data and sends it back to the initiator for the converse operation. Each peer computes session keys, based on the exchanged data and agreed algorithm, typically a variant of the Diffie Helman algorithm. These keys are used during session communication.
New generation embedded processors have security engines/crypto accelerators that can to help accelerate the performance of IPSec dramatically. For example, NXP QorIQ, Layerscape family processors have IPSec off-load engines. These engines have flow-through capability which means that they can handle the bulk of the IPSec procession without intervention by the core.
Transport Layer Security (TLS—RFC 2246, 4346, and 5246) is based on SSLv3. It is a Layer 4 protocol as it runs directly on top of TCP ONLY. It uses PKI to provide user authentication as well as symmetric keying for confidentiality protection. It is designed to prevent eavesdropping, tampering, and message forgery. It establishes a secure connection between two peers using origin authentication and session establishment. TLS authentication can be mutual authentication or only server- side authentication:
OpenSSL is a popular open-source SSL/TLS stack. It consists of two major components: libssl, implementation of the SSL/TLS protocol and libcrypto, which is a cryptographic library. GNUTLS is another open-source SSL/TLS library.