The 21st century's flurry of connected devices has transformed the way we live. It can be hard to remember the days without the convenience of smartphones, smartwatches, personal digital assistants (such as Amazon Alexa), connected cars, smart thermostats, or other devices.
This adoption is not going to slow down anytime soon as the industry forecasts that there will be over 25 billion IoT devices globally in the next few years. With the increased adoption of connected technologies, the new normal is to have always-on devices. In other words, the devices should work all the time. Not only that, but we also expect these devices to continuously get smarter and stay secure throughout their life cycles with new features, enhancements, or bug fixes. But how do you make that happen reliably and at scale? Werner Vogels, Amazon's chief technology officer and vice president, often says that "Everything fails all the time." It's challenging to keep any technological solution up and running all the time.
With IoT, these challenges are elevated and more complicated as the edge devices are deployed in diverse operating conditions, exposed to environmental interferences, and have multiple layers of connectivity, communication, and latency. Thus, it's critical to build an edge-to-cloud continuum mechanism to collect feedback from the deployed fleet of edge devices and act on them quickly. This is where DevOps for IoT helps. DevOps is short for development and operations. It facilitates an agile approach to performing continuous integration and continuous deployment (CI/CD) from the cloud to the edge.
In this chapter, we will focus on how DevOps capabilities can be leveraged for IoT workloads. We will also expand our discussion to MLOps at the edge, which implies implementing agile practices for machine learning (ML) workloads. You learned about some of these concepts in the previous chapter when you built an ML pipeline. The focus of this chapter will be on deploying and operating those models efficiently.
You are already familiar with developing local processes on the edge or deploying components from the cloud in a decoupled way. In this chapter, we will explain how to stitch those pieces together using DevOps principles that will help automate the development, integration, and deployment workflow for a fleet of edge devices. This will allow you to efficiently operate an intelligent distributed architecture on the edge (that is, a Greengrass-enabled device) and help your organization achieve a faster time to market for rolling out different products and features.
In this chapter, we will be covering the following topics:
Now, let's dive into this chapter.
DevOps has transformed the way companies do business in today's world. Companies such as Amazon, Netflix, Google, and Facebook conduct hundreds or more deployments every week to push different features, enhancements, or bug fixes. The deployments themselves are typically transparent to the end customers in that they don't experience any downtime from these constant deployments.
DevOps is a methodology that brings developers and operations closer to infer quantifiable technical and business benefits with faster time to market through shorter development cycles and increased release frequency. A common misunderstanding is that DevOps is only a set of new technologies to build and deliver software faster. DevOps also represents a cultural shift to promote ownership, collaboration, and cohesiveness across different teams to foster innovation across the organization. DevOps has been adopted by organizations and companies of all sizes for distributed workloads to deliver innovation, enhancements, and operational efficiency faster. The following diagram shows the virtuous cycle of software delivery:
For the sake of brevity, we are not going to dive deeper into the concepts of DevOps or Agile practices here. Instead, we will focus on introducing the high-level concepts surrounding DevOps and discuss its relevance for IoT workloads.
DevOps brings together different tools and best practices, as follows:
Common toolchains for monitoring include Amazon CloudWatch, Amazon X-Ray, Splunk, and New Relic.
Now that we have covered the the basics of DevOps, let's understand its relevance to IoT and the edge.
The evolution of edge computing from simple radio frequency identification systems to the microcontrollers and microprocessors of today has opened up different use cases across industry segments that require building a distributed architecture on the edge. For example, the connected HBS hub has a diverse set of functionalities, such as the following:
That's a lot of work on the edge! Thus, the traditional ways of developing and delivering embedded software are not sustainable anymore. So, let's discuss the core activities in the life cycle of an IoT device, as depicted in the following table, to understand the relevance of DevOps:
The key components of DevOps such as CI/CD/CM are equally relevant for IoT workloads. This set of activities is often referred to as EdgeOps and, as we observed earlier, they are applied differently between the edge and the cloud. For example, CI is different for the edge because we need to test device software on the same hardware that is deployed in the world. However, because of the higher costs and risks associated with edge deployments, it is common to reduce the frequency of updating devices at the edge. It is also common for organizations to have different sets of hardware for prototyping versus production runtimes.
Now that you understand how to map DevOps phases to different IoT activities, let's expand on those a bit more. The following diagram shows the workflow that's typically involved in the life cycle of a device, from its creation to being decommissioned:
Here, you can see some key differences between an IoT workload and other cloud-hosted workloads. Let's take a look.
The manufacturing process is involved:
Distributed workloads such as web apps, databases, and APIs use the underlying infrastructure provided by the cloud platform. Software developers can use IaC practices and integrate them with other CI/CD mechanisms to provision the cloud resources that are automatically required to host their workload. For edge workloads, the product lives beyond the boundaries of any data center. Although it's possible to run edge applications on virtual infrastructure provided by the cloud platform during the testing or prototyping phases, the real product is always hosted on hardware (such as a Raspberry Pi for this book's project). There is always a dependency on the contract manufacturer (or other vendors) in the supply chain for manufacturing the hardware, as per the required specifications that are followed for programming it with the device firmware. Although the firmware can be developed on the cloud using DevOps practices, flashing the firmware image is done at manufacturing time only. This hinders the end-to-end automation common in traditional DevOps workflows, where the infrastructure (such as an AWS EC2 instance) is readily imaged and available for application deployments. The following diagram shows the typical life cycle of device manufacturing and distribution:
Securing the hardware is quintessential:
Some of the key vulnerabilities for edge workloads that are listed by The Open Web Application Security Project (OWASP) are as follows:
Although distributed workloads may have similar challenges, mitigating them using cloud-native controls makes them easier to automate than IoT workloads. Using AWS as an example, all communications within AWS infrastructure (such as across data centers) are encrypted in transit by default and require no action. Data at rest can be encrypted with a one-click option (or automation) using the key management infrastructure provided by AWS (or customers can bring their own). Every service (or hosted workloads) needs to enable access controls for authentication and authorization through cloud-native Identity & Access Management services, which can be automated as well through IaC implementation. Every service (or hosted workload) can take advantage of observability and traceability through cloud-native monitoring services (such as Amazon CloudTrail or Amazon CloudWatch).
On the contrary, for edge workloads, all of the preceding requirements are required to be fulfilled during manufacturing, assembling, and registering the device, thus putting more onus on the supply chain to manually implement these over one-click or automated workflows. For example, as a best practice, edge devices should perform mutual authentication over TLS1.2 with the cloud using credentials such as X.509 certificates compared to using usernames and passwords or symmetric credentials. In addition, the credentials should have least-privileged access implemented using the right set of permissions (through policies). This can help ensure that the devices are implementing the required access controls to protect the device's identity and that the data in transit is fully encrypted. In addition, device credentials (such as X.509 certificates) on the edge must reside inside a secure element or trusted platform module (TPM) to reduce the risk of unauthorized access and identity compromise. Additionally, secure mechanisms are required to separate the filesystems on the device and encrypt the data at rest using different cryptographic utilities such as dm-crypt, GPG, and Bitlocker. Observability and traceability implementations for different edge components are left to the respective owners.
Lack of standardized frameworks for the edge:
Edge components are no longer limited to routers, switches, miniature servers, or workstations. Instead, the industry is moving toward building distributed architectures on the edge in different ways, as follows:
The following diagram shows an edge-to-cloud workflow that includes various technology capabilities that are common in distributed architectures:
The edge architecture's standards are still evolving. Considering there are different connectivity interfaces, communication protocols, and topologies, there are heterogeneous ways of solving different use cases. For example, connectivity interfaces may include different short-range (such as BLE, Wi-Fi, and Ethernet) or long-range radio networks (such as cellular, NB-IoT, and LoRa). The connectivity interface that's used needs to be determined during the hardware designing phase and is implemented as a one-time process. Communication protocols may include different transport layer protocols over TCP (connection-oriented such as MQTT and HTTPS) or UDP (connectionless such as CoAP). Recall the layers of the Open System Interconnection (OSI) model, which we reviewed in Chapter 2, Foundations of Edge Workloads. The choice of communication interfaces can be flexible, so long as the underlying Layer 4 protocols are supported on the hardware. For example, if the hardware supports UDP, it can be activated with configuration changes, along with installing additional Layer 7 software (such as a COAP client) as required. Thus, this step can be performed through a cloud-to-edge DevOps workflow (that is, an OTA update). Bringing more intelligence to the edge requires dealing with the challenges of running distributed topologies on a computing infrastructure with low horsepower. Thus, it's necessary to define standards and design principles to design, deploy, and operate optimized software workloads on the edge (such as brokers, microservices, containers, caches, and lightweight databases).
Hopefully, this has helped you understand the unique challenges for edge workloads from a DevOps perspective. In the next section, you will understand how AWS IoT Greengrass can help you build and operate distributed workloads on the edge.
In the previous chapters, you learned how to develop and deploy native processes, data streams, and ML models on the edge locally and then deployed them at scale using Greengrass's built-in OTA mechanism. We will explain the reverse approach here; that is, building distributed applications on the cloud using DevOps practices and deploying them to the edge. The following diagram shows the approach to continuously build, test, integrate, and deploy workloads using the OTA update mechanism:
The two most common ways to build a distributed architecture on the edge using AWS IoT Greengrass is by using AWS Lambda services or Docker containers.
I want to make it clear, to avoid any confusion, that the concept of Lambda design, which was introduced in Chapter 5, Ingesting and Streaming Data from the Edge, is an architectural pattern that's used to operate streaming and batch workflows on the edge. AWS Lambda, on the contrary, is a serverless compute service that offers a runtime for executing any type of application with no administration. It allows developers to focus on the business logic, write code in different programming languages (such as C, C++, Java, Node.js, and Go), and upload it as a ZIP file. The service takes it from there in provisioning the underlying infrastructure's resources and scales based on incoming requests or events.
AWS Lambda has been a popular compute choice in designing event-based architectures for real-time processing, batch, and API-driven workloads. Due to this, AWS has decided to extend the Lambda runtime support for edge processing through Amazon IoT Greengrass.
So, are you wondering what the value of implementing AWS Lambda at the edge is?
You are not alone! Considering automated hardware provisioning is not an option for the edge, as explained earlier in this chapter, the value here is around interoperability, consistency, and continuity from the cloud to the edge. It's very common for IoT workloads to have different code bases for the cloud (distributed stack) and the edge (embedded stack), which leads to additional complexity around code integration, testing, and deployment. This results in additional operational overhead and a delayed time to market.
AWS Lambda aimed to bridge this gap so that the cloud and embedded developers can use similar technology stacks for software development and have interoperable solutions. Therefore, building a DevOps pipeline from the cloud to the edge using a common toolchain becomes feasible.
There are several benefits of running Lambda functions on the edge, as follows:
The following diagram shows how an AWS Lambda function that's been deployed on the edge can interact with different components on the physical (such as the filesystem) or abstracted layer (such as stream manager on AWS IoT Greengrass):
Here, you can see that Lambda provides some distinct value propositions out of the box that you have to build yourself with native processes.
As you have understood by now, every solution or architecture has a trade-off. AWS Lambda is not an exception either and can have the following challenges:
Now, let's understand containers for the edge.
A container is a unit of software that packages the necessary code with the required dependencies for the application to run reliably across different computing environments. Essentially, a container provides an abstraction layer to its hosted applications from the underlying OS (such as Ubuntu, Linux, or Windows) or architecture (such as x86 or ARM). In addition, since containers are lightweight, a single server or a virtual machine can run multiple containers. For example, you can run a 3-tier architecture (web, app, and a database) on the same server (or VM) using their respective container images. The two most popular open source frameworks for container management are Docker and Kubernetes.
In this section, we will primarily discuss Docker as it's the only option that's supported natively by AWS IoT Greengrass at the time of writing. Similar to Lambda, Docker supports an exhaustive set of programming languages and toolchains for the developers to develop, operate, and deploy their applications in an agile fashion. The following diagram shows the reference architecture for a Docker-based workload deployed on AWS IoT Greengrass:
So, why run containers over Lambda on the edge?
Containers can bring all of the benefits that Lambda does (and more), along with being heterogeneous (different platforms), open source, and better optimized for edge resources. Containers have a broader developer community as well. Since containers have an orchestration and abstraction layer, it's not dependent on other runtimes such as AWS IoT Greengrass. So, if your organization decides to move away to another edge solution, containers are more portable than Lambda functions.
Running containers at the edge using Greengrass has the following benefits:
The following diagram shows how containerized applications can be developed using a CI/CD approach and be deployed on the edge while running AWS IoT Greengrass:
Next, let's learn about the challenges with Docker on the edge.
Running containers on the edge has some tradeoffs that need to be considered, as follows:
In the lab section of this chapter, you will get your hands dirty by deploying a Docker-based application to the edge using AWS IoT Greengrass.
Similar to other AWS services, AWS IoT Greengrass also supports integration with various IaC solutions such as CloudFormation, CDK, and Terraform. All these tools can help you create cloud-based resources and integrate with different CI/CD pipelines for supporting cloud-to-edge deployments.
Now that you are familiar with the benefits and tradeoffs of the DevOps toolchain, let's learn how that extends to machine learning.
Machine Learning Operations (MLOps) aims to integrate agile methodologies into the end-to-end process of running machine learning workloads. MLOps brings together best practices from data science, data engineering, and DevOps to streamline model design, development, and delivery across the machine learning development life cycle (MLDLC).
As per MLOps special interest group (SIG), MLOps is defined as "The extension of the DevOps methodology to include machine learning and data science assets as first-class citizens within the DevOps ecology." MLOps has gained rapid momentum in the last few years from ML practitioners and is a language-, framework-, platform-, and infrastructure-agnostic practice.
The following diagram shows the virtuous cycle of the MLDLC:
The preceding diagram shows how Operations is a fundamental block of the ML workflow. We introduced some of the concepts of ML design and development in Chapter 7, Machine Learning Workloads at the Edge, so in this section, we will primarily focus on the Operations layer.
There are several benefits of MLOps, as follows:
So, now that you understand what MLOps is, are you curious to know how it's related to IoT and the edge? Let's take a look.
As an IoT/edge SME, you will NOT be owning the MLOps process. Rather, you need to ensure that the dependencies are met on the edge (at the hardware and software layer) for the ML engineers to perform their due diligence in setting up and maintaining this workflow. Thus, don't be surprised by the brevity of this section, as our goal is to only introduce you to the fundamental concepts and the associated services available today on AWS for this subject area. We hope to give you a quick ramp-up so that you are adept at having better conversations with ML practitioners in your organization.
So, with that background, let's consider the scenario where the sensors from the connected HBS hub are reporting various anomalies from different customer installations. This is leading to multiple technician calls and thereby impacting the customer experience and bottom line. Thus, your CTO has decided to build a predictive maintenance solution using ML models to rapidly identify and fix faults through remote operations. The models should be able to identify data drift dynamically and collect additional information around the reported anomalies. Thus, the goal for ML practitioners here is to build an MLOps workflow so that models can be frequently and automatically trained on the collected data, followed by deploying it to the connected HBS hub.
In addition, it's essential to monitor the performance of the ML models that are deployed on the edge to understand their efficiency; for example, to see how many false positives are being generated. Similar to the DevOps workflow, the ML workflow includes different components such as source control for versioning, a training pipeline for CI/CD, testing for model validation, packaging for deployment, and monitoring for assessing efficiency. If this project is a success, it will help the company add more ML intelligence to the edge and mitigate issues predictively to improve customer experience and reduce costs. The following reference architecture depicts a workflow we can use to implement the predictive maintenance of ML models on AWS IoT Greengrass v2:
If we want to implement the preceding architecture, we must try to foresee some common challenges.
Quite often, the most common questions that are asked by edge and ML practitioners related to MLOps are as follows:
This is not an exhaustive list as it continues to expand with ML being adopted more and more on the edge. The answers to some of those questions are a mix of cultural and technical shifts within an organization. Let's look at some examples:
Thus, becoming an ML organization requires time, training, and co-development exercises to be completed across different teams to produce fruitful results.
At the same time, though, ML workflows have certain dependencies, such as on big data workflows or applications required for inferencing. This means that MLOps is a combination of a traditional CI/CD workflow and another workflow engine. This can often become tricky without a robust pipeline and the required toolsets.
With IoT, this problem acts as the force multiplier, as it's required to consider various optimization strategies for different hardware and runtimes before deploying the ML models. For example, in Chapter 7, Machine Learning Workloads at the Edge, you learned how to optimize ML models using Amazon SageMaker Neo so that they can run efficiently in your working environment.
We have gone through the MLOps challenges for the edge in this section. In the next section, we will understand the MLOps toolchain for the edge.
In Chapter 7, Machine Learning Workloads at the Edge, you learned how to develop ML models using Amazon SageMaker, optimize them through SageMaker Neo, and deploy them on the edge using AWS IoT Greengrass v2. In this chapter, I would like to introduce you to another service in the SageMaker family called Edge Manager, which can help address some of the preceding MLOps challenges and which provides the following capabilities out of the box:
As you can see, Edge Manager brings robust capabilities to manage required capabilities for MLOps out of the box and brings native integrations with different AWS services, such as AWS IoT Greengrass. The following is a reference architecture for Edge Manager integrating with various other services that you were exposed to earlier in this book, such as SageMaker and S3:
Note
MLOps is still emerging and can be complicated to implement without the involvement of ML practitioners. If you would like to learn more about this subject, please refer to other books that have been published on this subject.
Now that you have learned the fundamentals of DevOps and MLOps, let's get our hands dirty so that we can apply some of these practices and operate edge workloads in an agile fashion.
In this section, you will learn how to deploy multiple Docker applications to the edge that have already been developed using CI/CD best practices in the cloud. These container images are available in a Docker repository called Docker Hub. The following diagram shows the architecture for this hands-on exercise, where you will complete Steps 1 and 2 to integrate the HBS hub with an existing CI/CD pipeline (managed by your DevOps org), configure the Docker containers, and then deploy and validate them so that they can operate at the edge:
The following are the services you will use in this exercise:
Your objectives for this hands-on section are as follows:
Let's get started.
In this section, you will learn how to deploy a pre-built container from the cloud to the edge:
cd hbshub/artifacts
docker –-version
docker-compose –-version
If Docker Engine and docker-compose are not installed, please refer to the documentation from Docker for your respective platform to complete this before proceeding.
services:
web:
image: "nginx:latest"
app:
image: "hello-world:latest"
db:
image: "redis:latest"
cd ~/hubsub/recipes
nano com.hbs.hub.Container-1.0.0.yaml
---
RecipeFormatVersion: '2020-01-25'
ComponentName: com.hbs.hub.Container
ComponentVersion: '1.0.0'
ComponentDescription: 'A component that uses Docker Compose to run images from Docker Hub.'
ComponentPublisher: Amazon
ComponentDependencies:
aws.greengrass.DockerApplicationManager:
VersionRequirement: ~2.0.0
Manifests:
- Platform:
os: all
Lifecycle:
Startup:
RequiresPrivilege: true
Script: |
cd {artifacts:path}
docker-compose up -d
Shutdown:
RequiresPrivilege: true
Script: |
cd {artifacts:path}
docker-compose down
sudo /greengrass/v2/bin/greengrass-cli deployment create
--recipeDir ~/hbshub/recipes --artifactDir
~/hbshub/artifacts --merge "com.hbs.hub.Container=1.0.0"
sudo /greengrass/v2/bin/greengrass-cli component list
docker container ls
You should see the following output:
In our example here, the app (hello-world) is a one-time process, so it has already been completed. But the remaining two containers are still up and running. If you want to check all the container processes that have run so far, use the following command:
docker ps -a
You should see the following output:
Congratulations – you now have multiple containers successfully deployed on the edge from a Docker repository (Docker Hub). In the real world, if you want to run a local web application on the HBS hub, this pattern can be useful.
Challenge zone (Optional)
Can you figure out how to deploy a Docker image from Amazon ECR or Amazon S3? Although Docker Hub is useful for storing public container images, enterprises will often use a private repository for their home-grown applications.
Hint: You need to make changes to docker-compose with the appropriate URI for the container images and provide the required permissions to the Greengrass role.
With that, you've learned how to deploy any number of containers on the edge (so long as the hardware resource permits it) from heterogeneous sources to develop a multi-faceted architecture on the edge. Let's wrap up this chapter with a quick summary and a set of knowledge check questions.
In this chapter, you were introduced to the DevOps and MLOps concepts that are required to bring operational efficiency and agility to IoT and ML workloads at the edge. You also learned how to deploy containerized applications from the cloud to the edge. This functionality allowed you to build an intelligent, distributed, and heterogeneous architecture on the Greengrass-enabled HBS hub. With this foundation, your organization can continue to innovate with different kinds of workloads, as well as deliver features and functionalities to the end consumers throughout the life cycle of the product. In the next chapter, you will learn about the best practices of scaling IoT operations as your customer base grows from thousands to millions of devices globally. Specifically, you will learn about the different techniques surrounding fleet provisioning and fleet management that are supported by AWS IoT Greengrass.
Before moving on to the next chapter, test your knowledge by answering these questions. The answers can be found at the end of the book:
For more information regarding the topics that were covered in this chapter, take a look at the following resources: