Table of Contents

Preface

Part 1: Introduction to Model Serving

1

Introducing Model Serving

Technical requirements

What is serving?

What are models?

What is model serving?

Understanding the importance of model serving

Using existing tools to serve models

Summary

2

Introducing Model Serving Patterns

Design patterns in software engineering

Understanding the value of model serving patterns

ML serving patterns

Serving philosophy patterns

Patterns of serving approaches

Summary

Further reading

Part 2: Patterns and Best Practices of Model Serving

3

Stateless Model Serving

Technical requirements

Understanding stateful and stateless functions

Stateless functions

Stateful functions

Extracting states from stateful functions

Using stateful functions

States in machine learning models

Using input data as states

Mitigating the impact of states from the ML model

Summary

4

Continuous Model Evaluation

Technical requirements

Introducing continuous model evaluation

What to monitor in model evaluation

Challenges of continuous model evaluation

The necessity of continuous model evaluation

Monitoring errors

Deciding on retraining

Enhancing serving resources

Understanding business impact

Common metrics for training and monitoring

Continuous model evaluation use cases

Evaluating a model continuously

Collecting the ground truth

Plotting metrics on a dashboard

Selecting the threshold

Setting a notification for performance drops

Monitoring model performance when predicting rare classes

Summary

Further reading

5

Keyed Prediction

Technical requirements

Introducing keyed prediction

Exploring keyed prediction use cases

Multi-threaded programming

Multiple instances of the model running asynchronously

Why the keyed prediction model is needed

Exploring techniques for keyed prediction

Passing keys with features from the clients

Removing keys before the prediction

Tagging predictions with keys

Creating keys

Summary

Further reading

6

Batch Model Serving

Technical requirements

Introducing batch model serving

What is batch model serving?

Different types of batch model serving

Manual triggers

Automatic periodic triggers

Using continuous model evaluation to retrain

Serving for offline inference

Serving for on-demand inference

Example scenarios of batch model serving

Case 1 – recommendation

Case 2 – sentiment analysis

Techniques in batch model serving

Setting up a periodic batch update

Storing the predictions in a persistent store

Pulling predictions by the server application

Limitations of batch serving

Summary

Further reading

7

Online Learning Model Serving

Technical requirements

Introducing online model serving

Serving requests

Use cases for online model serving

Case 1 – recommending the nearest emergency center during a pandemic

Case 2 – predicting the favorite soccer team in a tournament

Case 3 – predicting the path of a hurricane or storm

Case 4 – predicting the estimated delivery time of delivery trucks

Challenges in online model serving

Challenges in using newly arrived data for training

Underperforming of the model after online training

Overfitting and class imbalance

Increasing of latency

Handling concurrent requests

Implementing online model serving

Summary

Further reading

8

Two-Phase Model Serving

Technical requirements

Introducing two-phase model serving

Exploring two-phase model serving techniques

Quantized phase one model

Training and saving an MNIST model

Full integer quantization of the model and saving the converted model

Comparing the size and accuracy of the models

Separately trained phase one model with reduced features

Separately trained different models

Use cases of two-phase model serving

Case 4 – route planners

Summary

Further reading

9

Pipeline Pattern Model Serving

Technical requirements

Introducing the pipeline pattern

A DAG

Stages of the machine learning pipeline

Introducing Apache Airflow

Getting started with Apache Airflow

Creating and starting a pipeline using Apache Airflow

Demonstrating a machine learning pipeline using Airflow

Advantages and disadvantages of the pipeline pattern

Summary

Further reading

10

Ensemble Model Serving Pattern

Technical requirements

Introducing the ensemble pattern

Using ensemble pattern techniques

Model update

Aggregation

Model selection

Combining responses

End-to-end dummy example of serving the model

Summary

11

Business Logic Pattern

Technical requirements

Introducing the business logic pattern

Type of business logic

Technical approaches to business logic in model serving

Data validation

Feature transformation

Prediction post-processing

Summary

Part 3: Introduction to Tools for Model Serving

12

Exploring TensorFlow Serving

Technical requirements

Introducing TensorFlow Serving

Servable

Loader

Source

Aspired versions

Manager

Using TensorFlow Serving to serve models

TensorFlow Serving with Docker

Using advanced model configurations

Summary

Further reading

13

Using Ray Serve

Technical requirements

Introducing Ray Serve

Deployment

ServeHandle

Ingress deployment

Deployment graph

Using Ray Serve to serve a model

Using the ensemble pattern in Ray Serve

Using Ray Serve with the pipeline pattern

Summary

Further reading

14

Using BentoML

Technical requirements

Introducing BentoML

Preparing models

Services and APIs

Bento

Using BentoML to serve a model

Summary

Further reading

Part 4: Exploring Cloud Solutions

15

Serving ML Models using a Fully Managed AWS Sagemaker Cloud Solution

Technical requirements

Introducing Amazon SageMaker

Amazon SageMaker features

Using Amazon SageMaker to serve a model

Creating a notebook in Amazon SageMaker

Serving the model using Amazon SageMaker

Summary

Index

Other Books You May Enjoy

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset