Preface

We wrote this book for data engineers and data scientists who are building machine learning systems/models they want to move to production. If you’ve ever had the experience of training an excellent model only to ask your self how to deploy it into production or keep it up to date once it gets there, this is the book for you. We hope this gives you the tools to replace Untitled_5.ipynb with something that works relatively reliably in production.

This book is not intended to serve as your first introduction to machine learning. The next section points to some resources that may be useful if you are just getting started on your machine learning journey.

Our Assumption About You

This book assumes that you either understand how to train models locally, or are working with someone who does. If you don’t, there are many excellent introductory books on machine learning to get you started, including Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Second Edition, by Aurélien Géron (O’Reilly).

The goal of this book is to teach you how to do machine learning in a repeatable way, and how to automate training and deployment of your models. A serious problem here is that this (Kubeflow, and this book) cover a wide range of topics, and it is more than reasonable if you are not intimately familiar with every topic we will pass over at a high level.

While it is out of scope for us to delve deeply into every topic, we would like to provide you a short list of some of our favorite primers on several of the topics you will see covered here at a high level:

Of course there are many others, but those should get you started. Please don’t be overwhelmed by this list—you certainly don’t need to be an expert in each of these topics to effectively deploy and manage Kubeflow. In fact, Kubeflow exists to streamline many of the tasks presented above, however there may be some topic into which you wish to delve deeper—and so the above should be thought of as a “getting started” list.

Containers and Kubernetes are a wide area of practice which is rapidly evolving. If you want to deepen your knowledge of Kubernetes we recommend looking at the following:

Your Responsibility as a Practitioner

This book helps you put your machine learning models into production to solve real-world problems. Solving real-world problems with machine learning is great, but as you go forth and apply your skills to the real world, remember to think about the impact.

First, it’s important to make sure your models are sufficiently accurate, and there are great tools for this in Kubeflow covered in “Training and Deploying a Model”. Even the best tools will not save you from all mistakes, for example, hyperparameter tuning on the same dataset to report final cross-validation results.

Even models with significant predictive power can still have unintended effects and biases that may not show up during the regular training-evaluation phase. Unintended biases can be hard to discover, but there are many stories (e.g., Amazon’s machine learning system, which decided to only hire men) that demonstrate the profound potential implications of our work. Failing to address these issues early on can lead to having to abandon your entire work, as demonstrated by IBM’s decision to stop its facial recognition program and similar pauses across the industry after the implications of racial bias in facial recognition in the hands of law inforcement became clear.

Even initially seemingly unbiased data, like raw purchase records, can turn out to have intense biases resulting in incorrect recommendations or worse. Just because a dataset is public and widely available does not mean it is unbiased. The well known practice of word embeddings has been shown to have many types of bias, including sexism, anti-LGBTQ, and anti-immigrant. When looking at a new dataset it is crucial to look for examples of bias in your data and attempt to mitigate it as much as possible. With the most popular public datasets, various techniques are often discussed in the research, and you can use these to guide your own work.

While this book does not have the tools to solve bias, we encourage you to think critically about potential biases in your system and explore solutions before going into production. If you don’t know where to start, check out Katharine Jarmul’s excellent introductory talk. IBM has a collection of tools and examples in their AI Fairness 360 Open Source Toolkit that can be a great place to start your exploration. A critical step to helping reduce bias in your models is to have a diverse team to notice potential issues early. As Jeff Dean said: “AI is full of promise, with the potential to revolutionize so many different areas of modern society. In order to realize its true potential, our field needs to be welcoming to all people. As it stands today, it is definitely not. Our field has a problem with inclusiveness.”

Tip

And it’s important to note that none of these tasks are a “one and done,” model performance can degrade, and biases can be introduced over time—even if you don’t personally change anything.1

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

We will use warnings to indicate any situations where the resulting pipeline is likely to be non-portable and call out portable alternatives that you can use.

Code Examples

Supplemental material (code examples, etc.) is available for download at https://github.com/intro-to-ml-with-kubeflow. These code examples are available under an Apache 2 license, or as described in “Using Code Examples” at your choice.

There are additional examples under their own respective licenses that you may find useful. The Kubeflow project has an example repo, which at the time of the writing is available under an Apache 2 license. Canonical also has a set of resources which may be of special interest to MicroK8s users.

Using Code Examples

If you have a technical question or a problem using the code examples, please send email to .

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

Additional details on license can be found in the repos.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Kubeflow for Machine Learning by Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky (O’Reilly). Copyright 2021 Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky, 978-1-492-05012-4.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Online Learning

Note

For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.

How to Contact the Authors

For feedback, email us at . For random ramblings, occasionally about Kubeflow, follow us online:

Holden
Trevor
Ilan
Richard
  • Twitter

  • GitHub

Boris
Tip

Early feedback can make a huge difference in the direction and quality of the book. We want to hear from you, especially if there are things you find confusing or wish we covered.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://www.oreilly.com/catalog/9781492050124.

Email to comment or ask technical questions about this book.

For news and information about our books and courses, visit http://oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

The authors would like to thank everyone at O’Reilly Media, especially our editor Amelia Blevins, and the Kubeflow community for making this book possible. Clive Cox and Alejandro Saucedo from Seldon made amazing contributions to Chapter 8, without which this book would be missing key parts. We’d like to thank Google Cloud Platform for resources that allowed us to ensure examples worked on GCP. Perhaps most importantly, we’d like to thank our reviewers with whom this book would not exist in its current form. This includes Taka Shinagawa, Pete MacKinnon, Kevin Hass, Chris Albon, Hannes Hapke, and more. To all early readers and reviewers of books; thank you for your contributions.

Holden

Would like to thank her girlfriend Kris Nóva for her help debugging her first Kubeflow PR, as well as the entire Kubeflow community for being so welcoming. She would also like to thank her wife Carolyn DeSimone, her puppy Timbit DeSimone-Karau (pictured in Figure P-1), and her stuffed animals for the support needed to write. She would like to thank the doctors at SF General and UCSF for fixing up her hands so she could finish writing this book (although she does wish the hands did not hurt anymore) and everyone who came to visit her in the hospital and nursing home. A special thank you to Ann Spencer; the first editor who showed her how to have fun writing. Finally, she would like to thank her datefriend Els van Vessem for their support in recovering after her accident, especially reading stories and reminding her of her love of writing.

Timbit
Figure P-1. Timbit the dog
Ilan

Would like to thank all his colleagues at Bloomberg who took the time to review, mentor, and encourage him to write and contribute to open source. The list includes, but is not limited to: Kimberly Stoddard, Dan Sun, Keith Laban, Steven Bower, and Sudarshan Kadambi. He would also like to thank his family: Galia, Yuriy, and Stan for their unconditional love and support.

Richard

Would like to thank the Google Kubeflow team, including but not limited to: Jeremy Lewi, Abhishek Gupta, Thea Lamkin, Zhenghui Wang, Kunming Qu, Gabriel Wen, Michelle Casbon, and Sarah Maddox—without whose support none of this would have been possible. He would also like to thank his cat Tina (see Figure P-2) for her support and understanding during Covid 19.

Tina
Figure P-2. Tina the cat
Boris

Would like to thank his collegues at Lightbend, especially Karl Wehden for their support in writing the book, their suggestions and proof reads of the early versions of the text and his wife Marina for putting up with his long hours and feeding him during these hours.

Trevor

Would like to thank…

Grievances

The authors would also like to acknowledge the struggles of API changes, which made making this book so frustrating. If you ever struggle with API changes, know that you are not alone; this is annoying to almost everyone.

Holden would also like to acknowledge the times Timbit DeSimone-Karau was a little sh*t and dug up the yard while she was working. We have a special grievance to vent with the person who hit her with their car, slowing down the release of this book.

Trevor has a grievance to air with his girlfriend, who has been badgering him (with increasing persistence) to propose to her throughout this entire project, and while he has been “working on it”—if he hasn’t asked her to marry him by the time this book comes out: Katie, will you marry me?

1 Remember the Twitter bot that became a Nazi with reinforcement learning in less than a weekend?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset