front matter

preface

Given the popularity of social media and online streaming services, you have probably experienced how far machine learning (ML) can go in delivering personalized services. Even though this exciting field opens doors to many new possibilities and has become an indispensable part of our lives, training ML models requires the use of significant amounts of data collected in various ways. When this data is processed through ML models, it is essential to preserve the confidentiality and privacy of individuals and to maintain the trust of those who use these models.

While preserving privacy in ML is essential, it is also challenging. In late 2014, Dr. Chang and his former PhD student, Dr. Mohammad Al-Rubaie, started to investigate the problem of privacy leakage in ML techniques and explored the possibilities of mitigating such leaks. As they continued their research, the Defense Advanced Research Projects Agency (DARPA) initiated a new research program in early 2015 called “Brandeis” (BAA-15-29) with a $50 million initial budget. The program was named after Louis Brandeis, a US Supreme Court Justice who published the “Right to Privacy” paper in the Harvard Law Review journal in 1890. The main goal of the Brandeis program was to seek and develop the technical means to protect private information by enabling safe and predictable sharing of data in which privacy is preserved. Dr. Chang and his team participated in the Brandeis program and developed several privacy-preserving technologies based on compressive privacy and differential privacy techniques.

In 2019, with another research award from US Special Operations Command (USSOCOM), Drs. Chang, Zhuang, and Samaraweera and their team tried to put these privacy and security-enhancing technologies for ML into practical applications by utilizing the latest ML models and tools. With their years of hands-on experience participating in the Brandeis program and many other DoD-sponsored research programs since 2015, they believe it’s now time to put together the techniques they’ve developed. Thus, this book is different from other technical books: it discusses the fundamental concepts of ML and the privacy-preserving techniques for ML, and it offers intuitive examples and implementation code showing how to use them in practice.

They believe this book is the first to provide comprehensive coverage of privacy-preserving ML. Herein, they take you on an exciting journey that covers all the essential concepts, techniques, and hands-on details. If you want to know more after reading this book, you can follow up with the references cited in each chapter and listed at the end of the book.

acknowledgments

A big thanks goes to the DARPA Brandeis program and the project managers with whom we had the pleasure of working on developing new paradigms of privacy-preserving machine learning. We are also very grateful to our team, who worked on the Brandeis project, especially Dr. Sun-Yuan Kung and Dr. Pei-yuan Wu. Furthermore, this book would not have been possible if not for the former and current PhD students of Dr. Chang’s research group at the University of South Florida, especially Dr. Mohammad Al-Rubaie and Sen Wang.

Another huge thanks goes to the editorial and production staff at Manning for all their hard work in producing this book. Finally, we would like to thank all the reviewers for their support and valuable feedback on improving the discussions in the manuscript: Abe Taha, Aditya Kaushik, Alain Couniot, Bhavani Shankar Garikapati, Clifford Thurber, Dhivya Sivasubramanian, Erick Nogueira do Nascimento, Frédéric Flayol, Harald Kuhn, Jaganadh Gopinadhan, James Black, Jeremy Chen, Joseph Wang, Kevin Cheung, Koushik Vikram, Mac Chambers, Marco Carnini, Mary Anne Thygesen, Nate Jensen, Nick Decroos, Pablo Roccatagliata, Raffaella Ventaglio, Raj Sahu, Rani Sharim, Richard Vaughan, Rohit Goswami, Satej Sahu, Shankar Garikapati, Simeon Leyzerzon, Simon Tschöke, Simone Sguazza, Sriram Macharla, Stephen Oates, Tim Kane, Vidhya Vinay, Vincent Ngo, Vishwesh Ravi Shrimali, and Xiangbo Maoyour suggestions helped make this a better book.

about this book

Privacy-Preserving Machine Learning is a comprehensive guide written to help machine learning (ML) enthusiasts avoid data privacy leakages through their ML applications. It begins with some practical use cases and scenarios involving privacy considerations in modern data-driven applications. Then it introduces different techniques for building privacy-assured ML applications.

Who should read this book

Privacy-Preserving Machine Learning is designed for intermediate-level data science enthusiasts of ML (i.e., people with some experience in ML) and people who would like to learn about privacy-preserving techniques for ML so they can integrate them into their applications. Although privacy and security concepts are generally mathematical and hard to follow, this book tries to dismantle complex algorithms into pieces to make them easy to follow, and it provides a series of hands-on exercises and examples.

How this book is organized: A road map

The book has three parts with 10 chapters.

Part 1 explains what privacy-preserving ML is and how differential privacy can be used in practical use cases:

  • Chapter 1 discusses privacy considerations in ML with an emphasis on how severe the privacy threats are when private data is exposed.

  • Chapter 2 introduces the core concepts of differential privacy and formulates the widely adopted differential privacy mechanisms in use today that have served as essential building blocks in various privacy-preserving algorithms and applications.

  • Chapter 3 mainly covers the advanced design principles of differentially private ML algorithms. Toward the latter part of the chapter, we introduce a case study that walks you through the process of designing and analyzing a differentially private algorithm.

Part 2 extends the discussion with another level of differential privacy called local differential privacy and discusses how to generate synthetic data for privacy assurance purposes:

  • Chapter 4 introduces the core concepts and definitions of local differential privacy.

  • Chapter 5 covers the advanced mechanisms of local differential privacy by considering various data types and real-world application scenarios. We also provide a case study on local differential privacy that guides you through the process of designing and analyzing the algorithm.

  • Chapter 6 introduces the concepts and techniques involved in synthetic data generation by discussing how to design a privacy-preserving synthetic data generation scheme for ML tasks.

Part 3 covers the next-level core concepts required to build privacy-assured ML applications:

  • Chapter 7 introduces the importance of privacy preservation in data mining applications, widely used privacy-protection mechanisms, and their characteristics in data mining operations.

  • Chapter 8 extends the discussion of privacy assurance in data mining by covering common privacy models, their characteristics in data mining operations when processing and publishing data, and various threats and vulnerabilities.

  • Chapter 9 introduces compressive privacy for ML and its design and implementation.

  • Chapter 10 puts these concepts together to design a privacy-enhanced platform for research data protection and sharing.

In general, we encourage you to read the first few chapters carefully so you understand the core concepts and the importance of privacy preservation for ML applications. The remainder of the chapters discuss different levels of the core concepts and best practices, and they can mostly be read out of order, based on your particular needs. At the end of each core topic, a case study is introduced with a more comprehensive and thorough analysis of a selected algorithm, which will be of particular interest to readers who want to know more about the process of designing and analyzing privacy-enhanced ML algorithms.

About the code

This book contains many examples of source code, both in numbered listings and in line with normal text. In both cases, the source code is formatted in a fixed-width font like this to separate it from ordinary text. Most of the code is written in the Python language, but some of the use case experiments are presented in Java. Readers are expected to know the basic syntax and how to write and debug Python and Java code. Readers must also be familiar with certain Python scientific computations and machine learning packages, such as NumPy, scikit-learn, PyTorch, TensorFlow, and so on.

In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this was not enough, and listings include line-continuation markers (). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/privacy-preserving-machine-learning. The complete code for the examples in the book is available for download from the Manning website at https://www.manning.com/books/privacy-preserving-machine-learning, and from GitHub at https://github.com/nogrady/PPML/.

liveBook discussion forum

Purchase of Privacy-Preserving Machine Learning includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the authors and other users. To access the forum, go to https://livebook.manning.com/book/privacy-preserving-machine-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contributions to the forum remains voluntary (and unpaid). We suggest you try asking them some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the authors

J. Morris Chang has been a professor in the electrical engineering department at University of South Florida since 2016, with prior faculty positions at Iowa State University (2001-2016), Illinois Institute of Technology (1995-2001), and Rochester Institute of Technology (1993-1995). Before joining academia, he was employed as a computer engineer at AT&T Bell Labs (1988-1990). His recent research efforts cover a broad spectrum of cybersecurity subject areas, from authentication to malware detection, privacy-enhancing technologies, and security in machine learning, and were funded by different Department of Defense (DoD) agencies. Dr. Chang received his PhD in computer engineering from North Carolina State University. He received the IIT University Excellence in Teaching Award in 1999, and was inducted into the NC State University ECE Alumni Hall of Fame in 2019. Over the past 10 years, he has served as the lead principal investigator on various projects funded by DoD agencies. Morris has over 196 publications in refereed journals and conference proceedings. In addition, he has also served in the Institute of Electrical and Electronics Engineers (IEEE) in various positions, including associate editor in chief of IEEE’s IT Professional magazine (2014-2018), associate editor of the IEEE Transactions on Reliability journal (2022), and Program Chairs in Chief of COMPSAC 2019 (IEEE Computer Society signature conference on Computers, Software, and Applications, 2019).

Di Zhuang is a security engineer at Snap Inc. His degrees include a bachelor of information security and a bachelor of laws from Nankai University, Tianjin, China, and a PhD in electrical engineering from the University of South Florida. He is an energetic, skilled security and privacy researcher with interest, expertise, and experience in privacy by design, differential privacy, privacy-preserving machine learning, social network science, and network security. He conducted privacy-preserving machine learning research under the DARPA Brandeis program from 2015 to 2018.

Dumindu Samaraweera is an assistant research professor at the University of South Florida. Dumindu received his BS in computer systems and networking from Curtin University, Australia; a BS in information technology from Sri Lanka Institute of Information Technology, Sri Lanka; and an MS in enterprise application development from Sheffield Hallam University, UK. He obtained his PhD in electrical engineering from the University of South Florida (USF), concentrating his research on cybersecurity and data science. His doctoral dissertation, “Security and Privacy Enhancement Technologies in the Deep Learning Era,” addresses the privacy and security issues identified in today’s data-driven applications and provides in-depth solutions to mitigate such problems. Over the years, he has worked with multiple large-scale US Department of Defense-funded cybersecurity research projects. Before joining USF, he worked in the industry as a software engineer/electrical engineer for more than six years, managing and deploying enterprise-level solutions.

The technical editor for this book, Wilko Henecka, is a senior software engineer at Ambiata, building privacy-preserving software. He holds a PhD in mathematics from Adelaide University and a master’s degree in IT security from Ruhr-University Bochum.

about the cover illustration

The figure on the cover of Privacy-Preserving Machine Learning is “Femme Acadienne”, or “Acadian Woman”, taken from a collection by Jacques Grasset de Saint-Sauveur, published in 1788. Each illustration is finely drawn and colored by hand.

In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset