Preface

Machine learning is the most important new technology today for getting more out of data. It can reveal patterns that aren’t obvious, for example, but it requires data – lots of it. Data gathering isn’t just about data. It affects users and requires the use of applications to clean, manipulate, and analyze the data. Scientists use machine learning to discover new techniques or to create new kinds of data, such as the generation of various kinds of art based on existing inputs or the advancement of medicine through better imaging. Businesses use machine learning to perform tasks, such as detecting credit card fraud, monitoring networks, and implementing factory processes, and to achieve all sorts of other goals where humans and AI work side-by-side.

Hackers don’t always damage data; sometimes they steal it or use it to perform social attacks on a business. Sometimes they simply want money or other goods, and machine learning offers an avenue for acquiring them. A hacker may not steal anything at all – perhaps the target is someone’s reputation. It may surprise you to learn that hackers often use machine learning applications themselves to perform a kind of dance with your machine learning-based security to overcome it. However, hackers have behavioral patterns, and knowing how to detect those patterns is important in the modern computing environment.

Obtaining data in an ethical manner is important because the very act of behaving ethically reduces the security risk associated with data. However, hackers don’t necessarily target users and their data. Perhaps they’re interested in your organization’s trade secrets or committing fraud. They might simply be interested in lurking in the background and committing mischief. So, just keeping your data secure as a means of protecting your machine learning investment isn’t enough. You need to do more.

This book helps you get the big picture from a machine learning perspective using all the latest research available on methods that hackers use to break into your system. It’s about the whole system, not just your application. You will discover techniques that help you gather data ethically and keep it safe, while also preventing all sorts of illegal access methods from even occurring. In fact, you will use machine learning as a tool to keep hackers at bay and discover their true intent for your organization.

Who this book is for

Whether you’re a data scientist, researcher, or manager interested in machine learning techniques from various perspectives, you will need this book because security has already become a major headache for all three groups. The problem with most resources is that they’re written by Ph.D. candidates in a language that only they understand. This book presents security in a way that’s easy to understand and employs a host of diagrams to explain concepts to visual learners. The emphasis is on real-world examples at both theoretical and hands-on levels. You’ll find links to a wealth of examples of real-world break-ins and explanations of why and how they occurred and, most importantly, how you can overcome them.

This book does assume that you’re familiar with machine learning concepts and it helps if you already know a programming language, with an emphasis on Python knowledge. The hands-on Python code is mostly meant to provide details for data scientists and researchers who need to see security concepts in action, rather than at a more theoretical level. A few examples, such as the Pix2Pix GAN in Chapter 10, require an intermediate level of programming knowledge, but most examples are written in a manner that everyone can use.

What this book covers

Chapter 1, Defining Machine Learning Security, explains what machine learning is all about, how it’s affected by security issues, and what impact security can have on the use of your applications from an overview perspective. This chapter also contains guidelines on how to configure your system for use with the source code examples.

Chapter 2, Mitigating Risk at Training by Validating and Maintaining Datasets, explores how ensuring that the data you’re using is actually the data that you think you’re using is essential because your model can be skewed by various forms of corruption and data manipulation.

Chapter 3, Mitigating Inference Risk by Avoiding Adversarial Machine Learning Attacks, gives an overview of the various methods to interfere directly with model development through techniques such as evasion attacks and model poisoning.

Chapter 4, Considering the Threat Environment, considers how hackers target machine learning models and their goals in doing so from an overview perspective. You will discover some basic coded techniques for avoiding many machine learning attacks through standard methodologies.

Chapter 5, Keeping Your Network Clean, gives detailed information on how network attacks work and what you can do to detect them in various ways, including machine learning techniques as your defense. In addition, you will discover how you can use predictive techniques to determine where a hacker is likely to strike next.

Chapter 6, Detecting and Analyzing Anomalies, provides the details on determining whether outliers in your data are anomalies that need mitigation or novelties that require observation as part of a new trend. You will see how to perform anomaly detection using machine learning techniques.

Chapter 7, Dealing with Malware, covers the various kind of malware and what to look for in your own environment. This chapter shows how to take an executable apart so that you can see how it’s put together and then use what you learn to generate machine learning features for use in detection algorithms.

Chapter 8, Locating Potential Fraud, explores the sources of fraud today (and it’s not just hackers), what you can do to detect the potential fraud, and how you can ensure that the model you build will actually detect the fraud with some level of precision. The techniques in this chapter for showing how to discern model goodness also apply to other kinds of machine learning models.

Chapter 9, Defending Against Hackers, contemplates the psychology of hackers by viewing hacker goals and motivations. You will obtain an understanding of why simply building the security wall higher and higher doesn’t work, and what you can do, in addition to building new security protections for your system.

Chapter 10, Considering the Ramifications of Deepfakes, looks at the good and the bad of deepfake technology. You will get an overview of the ramifications of deepfake technology for research, business, and personal use today. This chapter also demonstrates one technique for creating a deepfake model in detail.

Chapter 11, Leveraging Machine Learning for Hacking, explains how hackers view machine learning and how they’re apt to build their own models to use against your organization. We will consider the smart bot threat in detail.

Chapter 12, Embracing and Incorporating Ethical Behavior, explains how behaving ethically not only ensures that you meet both privacy and security requirements that may be specified by law but also has an implication with regard to security, in that properly sanitized datasets have natural security prevention features as well. In addition, you will discover how using properly vetted datasets saves you time, money, and effort in building models that actually perform better.

To get the most out of this book

This book assumes that you’re a manager, researcher, or data scientist with at least a passing understanding of machine learning and machine learning techniques. It doesn’t assume detailed knowledge. To use the example code, it also pays to have some knowledge of working with Python because there are no tutorials provided in the book. All of the coded examples have been tested on both Google Colab and with Anaconda. The Setting up for the book section of Chapter 1, Defining Machine Learning Security, provides detailed setup instructions for the book examples.

The advantages of using Google Colab are that you can code anywhere (even your smartphone or television set, both of which have been tested by other readers) and you don’t have to set anything up. The disadvantages of using Google Colab are that not all of the book examples will run in this environment (especially Chapter 7) and your code will tend to run slower (especially Chapter 10). When working with Google Colab, all you need do is direct your browser to https://colab.research.google.com/notebooks/welcome.ipynb and create a new notebook.

The advantage of using Anaconda is that you have more control over your work environment and you can perform more tasks. The disadvantage of using Anaconda is that you need a desktop system with the required hardware and software, as described in the following table, for most of the book examples. (The MLSec; 01; Check Versions.ipynb example shows how to verify the version numbers of your software.) Some examples will require additional setup requirements and those requirements are covered as part of the example description (for example, when creating the Pix2Pix GAN in Chapter 10, you need to install and configure TensorFlow).

General software covered in the book

Operating system and hardware requirements

Anaconda 3, 2020.07

Windows 7, 10, or 11

macOS 10.13 or above

Linux (Ubuntu, RedHat, and CentOS 7+ all tested)

Python 3.8 or higher (version 3.9.x is highly recommended, versions above 3.10.7 aren’t recommended or tested)

The test system uses this hardware, which is considered minimal:

Intel i7 CPU

8 GB RAM

500 GB hard drive

NumPy 1.18.5 or greater (version 1.21.x is highly recommended)

Scikit-learn 0.23.1 or greater (version 1.0.x is highly recommended)

Pandas 1.1.3 or greater (version 1.4.x is highly recommended)

When working with any version of the book, downloading the downloadable source code is highly recommended to avoid typos. Copying and pasting code from the digital version of the book will very likely result in errors. Remember that Python is a language that depends on formatting to deal with things like structure and to show where programming constructs such as for loops begin and end. The source code downloading instructions appear in the next section.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-Security-Principles or John’s website at http://www.johnmuellerbooks.com/source-code/. If there’s an update to the code, it will be updated in both the GitHub repository and on John’s website.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “For example, Remove_Stop_Words() relies on a list comprehension to perform the actual processing.”

A block of code is set as follows:

import getpass
user = getpass.getuser()
pwd = getpass.getpass("User Name : %s" % user)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

import getpass
user = getpass.getuser()
pwd = getpass.getpass("User Name : %s" % user)

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message. If you have a book content-specific question, please contact John at [email protected] for quick and courteous service. Your feedback is essential to helping me produce better books!

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Expanded book content: As I get input from readers, I often provide additional book insights and updated procedures on my blog at http://blog.johnmuellerbooks.com/.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Machine Learning Security Principles, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781804618851

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset