Chapter 10. Adversarial examples

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Adversarial examples

This chapter covers

A fascinating research area that precedes GANs and has an interwoven history
Deep learning approaches in a computer vision setting
Our own adversarial examples with real images and noise

Over the course of this book, you have come to understand GANs as an intuitive concept. However, in 2014, GANs seemed like a massive leap of faith, especially for those unfamiliar with the emerging field of adversarial examples, including Ian Goodfellow’s and others’ work in this field.^[1] This chapter dives into adversarial examples—specially constructed examples that make other classification algorithms fail catastrophically.

¹

See “Intriguing Properties of Neural Networks,” by Christian Szegedy et al., 2014, https://arxiv.org/pdf/1312.6199.pdf.

We also talk about their connections to GANs and how and why adversarial learning is still largely an unsolved problem in ML—an important but rarely discussed flaw of the current approaches. That is true even though adversarial examples have an important role to play in ML robustness, fairness, and (cyber)security.

There is no denying we have made substantial progress in machine learning’s capacity to match and surpass human-level performance over the last five years—for example, in computer vision (CV) classification tasks or the ability to play games.^[2] However, looking only at metrics and ROC curves^[3] is insufficient for us to understand (a) why neural networks make the decisions they do (how they work) and (b) what errors they are prone to making. This chapter touches on the first and dives into the second. Before we begin, it should be said that although this chapter deals almost exclusively with CV problems, adversarial examples have been identified in diverse areas such as text or even in humans.^[4]

²

What constitutes human-level performance in vision-classification tasks is a complicated topic. However, at least in, for example, Dota 2 and Go, AI has beat human experts by a substantial margin.

³

A receiver operating characteristic (ROC) curve explains the trade-offs between false positives and negatives. We also encountered them in chapter 2. For more details, Wikipedia has an excellent explanation.

⁴

See “Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey,” by Wei Emma Zhang et al., 2019, http://arxiv.org/abs/1901.06796. See also “Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans,” by Gamaleldin F. Elsayed et al., 2018, http://arxiv.org/abs/1802.08195.

First of all, when we speak about neural networks’ performance, we frequently read that their error rate is lower than that of humans on the large ImageNet dataset. This often-cited statistic—which started more as an academic joke than anything else—belies the performance differences hidden underneath this average. While humans’ error rate tends to be driven mostly by their inability to distinguish between different breeds of dogs that appear prominently in this dataset, the machine learning failures are much more ominous. Upon further investigation, adversarial examples were born.

Unlike humans, CV algorithms struggle with problems that are very different in nature and can be close to the training data. Because the algorithm has to make predictions for every picture possible, it has to extrapolate between the isolated and far-apart individual instances it has seen in the training data, even if we have lots of them.

When we have trained networks such as Inception V3 and VGG-19, we have found an amazing way of making image classification work on a thin manifold around the training data. But when people tried to poke holes in the classification ability of these algorithms, they discovered a cosmic crater—current machine learning algorithms get easily fooled by even minor distortions. Virtually all major successful machine learning algorithms to date suffer from this flaw to some extent, and, indeed, some speculate that is why machine learning works at all.

Note

In supervised settings, think of our training set. We have a training manifold—just a fancy word describing a high-dimensional distribution in which our examples live. For example, our 300 × 300 pixel images live in a 270,000 dimensional space (300 × 300 × 3 colors). That makes training very complicated.

10.1. Context of adversarial examples

To start, we want to quickly touch on why we included this chapter toward the end of the book:

With adversarial examples, we are typically trying to generate new examples that fool our existing systems to misclassify the input. We do this usually either as evil attackers or perhaps just as researchers to see how robustly our system will behave. Adversarial examples are about as closely a related topic to GANs as it gets, though important differences exist.
This will give you a sense of why GANs can be so hard to train and why our existing systems are so fragile.
Adversarial examples allow for a different set of applications from GANs, and we hope to give you at least the basics of their capabilities.

In terms of applications, adversarial examples are interesting for several reasons:

As discussed, adversarial examples can be used for malicious purposes, so it is important to test for robustness in critical systems. What if an attacker could easily fool a facial-recognition system to gain access to your phone?
They help us understand machine learning fairness—which is a topic of growing importance. We can use adversarially learned representations that are useful for classifications but do not allow an attacker to recover protected facts, as probably one of the best ways of ensuring that our ML is not discriminating against anyone.
In a similar vein, we can use adversarial learning to protect the privacy of sensitive—perhaps medical or financial—information about individuals. In this case, we are simply focusing on information about individuals not being recoverable.

As current research stands, learning about adversarial examples is the only way to start to understand adversarial defenses, as most papers begin with a description of the types of attacks they defend against and only then try to solve them. At the time of writing this book, no universal defenses work against all types of attack. But whether this is a good reason to study them depends on your view on adversarial examples. We decided not to cover defenses in detail—above the high-level ideas toward the end of this chapter—because anything beyond that is beyond the scope of this book.

10.2. Lies, damned lies, and distributions

To truly understand adversarial examples, we must come back to the domain of CV classification tasks—partially to understand how difficult a task it is. Recall that to go from raw pixels to ultimately being able to classify sets of images is challenging.

This is in part because, in order to have a truly generalizable algorithm, we have to make sensible predictions on data nowhere near anything that we have seen in the training set. Moreover, the pixel-level differences between the image at hand and the closest image in the training set of the same class are large, even when we slightly change the angle at which the picture was taken.

When we have our training set of 100,000 examples of 300 × 300 images in RGB space, we have to somehow deal with 270,000 dimensions. When we consider all possible images (not the ones that we actually observe, but the ones that could happen), the pixel value of each dimension is independent of the other dimensions, because we can always generate a valid picture by rolling a hypothetical 256-sided dice 270,000 times. Therefore, we theoretically have 256^270,000 examples (a number that is 650,225 digits long) at 8-bit color space.

We would need a lot of examples to cover even 1% of this space. Of course, most of these images would not make any sense. Frequently, our training set is a lot sparser than that, so we need our algorithms to train using this relatively limited data to extrapolate even into regions they have not seen at all yet. This is because the algorithm most likely has seen nothing near what we have in the training set.

Note

Having 100,000 examples is frequently cited as a minimum at which deep learning algorithms should really start to shine.

We understand that algorithms have to meaningfully generalize; they have to be able to meaningfully fill in the huge part of space where they have not seen any example. Computer vision algorithms work mostly because they can come up with good guesses for the vast swaths of missing probability, but their strength is also their greatest weakness.

10.3. Use and abuse of training

In this section, we introduce two ways of thinking about adversarial examples—one from first principles and the other by analogy. The first way to think about adversarial examples is to start from the way machine learning classification is trained. Remember that these are networks with tens of millions of parameters. Throughout training, we update some of them so that the class matches the label as provided in the training set. We need to find just the right parameter updates, which is what the stochastic gradient descent (SGD) allows us to do.

Now think back to the simple classifier days, before you knew a lot about GANs. Here we have some sort of learnable classification function f_θ(x) (for example, a deep neural network, or DNN), which is parametrized by θ (parameters of the DNN) and takes x (for example, an image) as input and produces a classification . At training time, we then take and compare it with the true y, which is how we get our loss (L). We then update the parameters of f_θ(x) such that the loss is minimized. Equations 10.1, 10.2, and 10.3 summarize.^[5]

⁵

Please remember, this is just a quick summary, and we have to skip over some details, so if you can point them out—great. If not, we suggest picking up a book such as Deep Learning with Python by François Chollet (Manning, 2017) to brush up on the specifics.

equation 10.1.

equation 10.2.

equation 10.3.

In essence, we have defined prediction as the output of the neural net after being fed an example (equation 10.1). Loss is some form of the difference between the true and predicted label (equation 10.2). The overall problem is then phrased as trying to minimize the difference between the true and predicted labels over the parameters of the DNN, which then constitute the prediction given an example (equation 10.3).

This is all working great, but how do we actually minimize our classification loss? How do we solve the optimization problem as phrased in equation 10.3? We usually use an SGD-based method to take batches of x; then we take the derivative of the loss function with respect to the current parameters (θ_t) multiplied by our learning rate (α), which constitutes our new parameters (θ_t _{+ 1}). See equation 10.4.

equation 10.4.

This was the quickest introduction to deep learning you will ever find. But now that you have this context, think about whether this powerful tool (SGD) could be used for other purposes as well. For instance, what happens when we take a step up the loss space rather than down? Turns out, maximizing the error rather than minimizing it is much easier, but also important. And like many great discoveries, it started as a seeming bug that turned into a hack: what if we start updating the pixels rather than the weights? If we update them maliciously, adversarial examples happen.

Some of you may be confused, about this quick recap of SGD, so let’s remind ourselves what a typical loss space could look like in figure 10.1.

Figure 10.1. In this typical loss space, remember, this is the type of loss value we can feasibly get with our deep learning algorithms. On the left, you have 2D contour lines of equal loss, and on the right, you have a 3D rendering of what a loss space may look like. Remember the mountaineering analogy from chapter 6?

(Source: “Visualizing the Loss Landscape of Neural Nets,” by Tom Goldstein et al., 2018, https://github.com/tomgoldstein/loss-landscape.)

The second useful (though imperfect) mental model to think about adversarial examples is by analogy. You may think of adversarial examples as Conditional GANs like those we encountered in the preceding two chapters. With adversarial examples, we are conditioning on an entire image and trying to produce a domain transferred or similar image, except in a domain that fools the classifier. The “generator” can be a simple stochastic gradient ascent that simply adjusts the image to fool some other classifier.

Whichever of the two ways makes sense to you, let’s now dive straight into adversarial examples and what they look like. They were discovered with an observation of how easy it is to misclassify these altered images. One of the first methods to achieve this is the fast sign gradient method (FSGM), which is as simple as our previous description.

You start with the gradient update (equation 10.4), look at the sign, and then make a small step in the opposite direction. In fact, frequently the images come out looking (almost) identical! A picture is worth a thousand words to show you how little noise is needed; see figure 10.2.

Figure 10.2. A bit of noise makes a lot of difference. The picture in the middle has the noise (difference) applied to it (the picture to the right). Of course, the right picture is heavily amplified—approximately 300 times—and shifted so that it can create a meaningful image.

Now we run a ResNet-50 pretrained classifier on this unmodified vacation image and check the top three predictions, shown in table 10.1; drumroll, please.

Table 10.1. Original image predictions

Order	Class	Confidence
First	mountain_tent	0.6873
Second	promontory	0.0736
Third	valley	0.0717

The top three are all sensible, with mountain_tent taking the top spot, as it should. Table 10.2 shows the adversarial image predictions. The top three miss mountain_tent completely, with some suggestions that at least match the outdoors, but even the modified image is clearly not a suspension bridge.

Table 10.2. Adversarial image predictions

Order	Class	Confidence
First	volcano	0.5914
Second	suspension_bridge	0.1685
Third	valley	0.0869

This is how much we can distort the prediction, with a budget of only approximately 200 pixel values—the equivalent of taking a single almost-black pixel and turning it into an almost-white pixel—spread across the whole image.

A somewhat scary thing is how little code it takes to create this whole example. In this chapter, we’ll use an amazing library called foolbox, which provides many great convenience methods to create adversarial examples. Without further ado, let’s dive into it. We start with our well-known imports, plus foolbox, which is a library designed specifically to make adversarial attacks easier.

Listing 10.1. Our trusty imports

import numpy as np
from keras.applications.resnet50 import ResNet50
from foolbox.criteria import Misclassification, ConfidentMisclassification
from keras.preprocessing import image as img
from keras.applications.resnet50 import preprocess_input, decode_predictions
import matplotlib.pyplot as plt
import foolbox
import pprint as pp
Import keras
%matplotlib inline

Next, we define a convenience function to load in more images.

Listing 10.2. Helper function

def load_image(img_path: str):
  image = img.load_img(img_path, target_size=(224, 224))
  plt.imshow(image)
  x = img.img_to_array(image)
  return x

image = load_image('DSC_0897.jpg')

Next, we have to set Keras to register our model and download ResNet-50 from the Keras convenience function.

Listing 10.3. Creating tables 10.1 and 10.2

keras.backend.set_learning_phase(0)                                      1
kmodel = ResNet50(weights='imagenet')
preprocessing = (np.array([104, 116, 123]), 1)

fmodel = foolbox.models.KerasModel(kmodel, bounds=(0, 255),              2
     preprocessing=preprocessing)                                        2

to_classify = np.expand_dims(image, axis=0)                              3
preds = kmodel.predict(to_classify)                                      4
print('Predicted:', pp.pprint(decode_predictions(preds, top=20)[0]))
label = np.argmax(preds)                                                 5

image = image[:, :, ::-1]                                                6
attack = foolbox.attacks.FGSM(fmodel, threshold=.9,                      7
     criterion=ConfidentMisclassification(.9))                           7
adversarial = attack(image, label)                                       8

new_preds = kmodel.predict(np.expand_dims(adversarial, axis=0))          9
print('Predicted:', pp.pprint(decode_predictions(new_preds, top=20)[0]))

1 Instantiates model
2 Creates the foolbox model object from the Keras model
3 We make the image (1, 224, 224, 3) so that it fits ResNet-50, which expects images for predictions to be in batches.
4 We call predict and print the results.
5 Gets the index of the highest number, as a label to be used later
6 ::-1 reverses the color channels, because Keras ResNet-50 expects BGR instead of RGB.
7 Creates the attack object, setting high misclassification criteria
8 Applies attack on source image
9 Gets the new predictions on the adversarial image

That’s how easy it is to use these examples! Now you may be thinking, maybe that’s just ResNet-50 that suffers from these examples. Well, we have some bad news for you. ResNet not only proved to be the hardest classifier to break as we were testing various code setups for this chapter, but also is an uncontested winner on DAWNBench in every ImageNet category (which is the most challenging task in the CV category on DAWNBench), as shown in figure 10.3.^[6]

⁶

See “Image Classification on ImageNet,” at DAWNBench, https://dawn.cs.stanford.edu/benchmark/#imagenet.

Figure 10.3. DAWNBench is a great place to see the current state-of-the-art models and ResNet-50 dominance, at least as of early July 2019.

But the biggest problem of adversarial examples is their pervasiveness. Adversarial examples generalize beyond deep learning and transfer to different ML techniques. If we generate an adversarial example against one technique, there is a reasonable chance it will work even on another model we are trying to attack, as illustrated in figure 10.4.

Figure 10.4. The numbers here denote the percentage of adversarial examples crafted to fool the classifier in that row that also fooled that column’s classifier. The methods are deep neural networks (DNNs), logistic regression (LR), support-vector machine (SVM), decision trees (DT), nearest neighbors (kNN), and ensembles (Ens.).

(Source: “Transferability in Machine Learning: from Phenomena to Black-Box Attacks Using Adversarial Samples,” by Nicolas Papernot et al., 2016, https://arxiv.org/pdf/1605.07277.pdf.)

10.4. Signal and the noise

Worse yet, many of the adversarial examples are so easy to construct that we can just as easily fool the classifier by Gaussian noise that we can sample from np.random.normal. On the other hand—and to support our earlier point of ResNet-50 being a fairly robust architecture—we will show you that other architectures suffer from this issue much more.

Figure 10.5 shows the result of running ResNet-50 on pure Gaussian noise. However, we can use an adversarial attack on the noise itself to see how misclassified our image can get—rather quickly.

Figure 10.5. It is clear that we do not get a confident classification as a wrong class in most cases on just naively sampled noise. So that is plus points to ResNet-50. On the left, we include the mean and variance we used so that you can see their impact.

In listing 10.4, we’ll use a projected gradient descent (PGD) attack, illustrated in figure 10.6. Although this is still a simple attack, it warrants a high-level explanation. Unlike with the previous attacks, we are now taking a step regardless of where it may lead us—even “invalid” pixel values—and then projecting back onto the feasible space. Now let’s apply the PGD attack onto our Gaussian noise in figure 10.7 and run ResNet-50 to see how we do.

Figure 10.6. Projected gradient descent takes a step in the optimal direction, wherever it may be, and then uses projection to find the nearest equivalent point in the set of points. In this case, we are trying to ensure that we still end up with a valid picture: we take an example x(k) and take the optimal step to y^{(k + 1)} to then project it to a valid set of images as x^{(k + 1)}.

Figure 10.7. When we run ResNet-50 on adversarial noise, we get a different story: most of the items are misclassified after applying a PGD attack—still a simple attack.

To demonstrate that most architectures are even worse, we’ll look into Inception V3—an architecture that has earned fame in the CV community. Indeed, this network has been deemed so reliable that we touched on it in chapter 5. In figure 10.8, you can see that even something that gave birth to the inception score still fails on trivial examples. To dispel any doubts, Inception V3 is still one of the better pretrained networks out there and does have superhuman accuracy.

Figure 10.8. Inception V3 applied to Gaussian noise. Notice that we are not using any attacks; this noise is just sampled from the distribution.

Note

This was just regular Gaussian noise. You can see in the code for yourself that no adversarial step was applied. Sure, you could argue that the noise could have been preprocessed better. But even that is a massive adversarial weakness.

If you are anything like us, you are thinking, no way, I want to see for myself. Well, now we give you the code to reproduce those figures. Because the code for each is similar, we go through it only once and for next time promise DRYer code.

Note

For an explanation of don’t repeat yourself (DRY) code, see Wikipedia at https://en.wikipedia.org/wiki/Don%27t_repeat_yourself.

Listing 10.4. Gaussian noise

fig = plt.figure(figsize=(20,20))
sigma_list = list(max_vals.sigma)                                          1
mu_list = list(max_vals.mu)
conf_list = []

def make_subplot(x, y, z, new_row=False):                                  2
    rand_noise = np.random.normal(loc=mu, scale=sigma, size=(224,224, 3))  3
    rand_noise = np.clip(rand_noise, 0, 255.)                              4
    noise_preds = kmodel.predict(np.expand_dims(rand_noise, axis=0))       5
    prediction, num = decode_predictions(noise_preds, top=20)[0][0][1:3]   6
    num = round(num * 100, 2)
    conf_list.append(num)
    ax = fig.add_subplot(x,y,z)                                            7
    ax.annotate(prediction, xy=(0.1, 0.6),
            xycoords=ax.transAxes, fontsize=16, color='yellow')
    ax.annotate(f'{num}%' , xy=(0.1, 0.4),
            xycoords=ax.transAxes, fontsize=20, color='orange')
    if new_row:
        ax.annotate(f'$mu$:{mu}, $sigma$:{sigma}' ,
                    xy=(-.2, 0.8), xycoords=ax.transAxes,
                    rotation=90, fontsize=16, color='black')
    ax.imshow(rand_noise / 255)                                            8
    ax.axis('off')


for i in range(1,101):                                                     9
    if (i-1) % 10==0:
        mu = mu_list.pop(0)
        sigma = sigma_list.pop(0)
        make_subplot(10,10, i, new_row=True)
    else:
        make_subplot(10,10, i)

plt.show()

1 Lists of means and variances as floats
2 The core function that renders figure 10.8
3 Sample noise for each mean and variance
4 Only 0–255 pixel values permitted
5 Gets our first prediction
6 Gets the predicted class and confidence, respectively
7 Sets up annotating code for figure 10.8 and then adds the annotations and text
8 Division by 255 to convert [0, 255] to [0, 1]
9 The main for loop that allows us to insert subplots into the figure

10.5. Not all hope is lost

Some people now start to worry about the security implications of adversarial examples. However, it is important to keep this in a meaningful perspective of a hypothetical attacker. If the attacker can change every pixel slightly, why not change the whole image?^[7] Why not just feed in another one that is completely different? Why does the passed-in example have to be imperceptibly—rather than visibly—different?

⁷

See “Motivating the Rules of the Game for Adversarial Example Research,” by Justin Gilmer et al., 2018, http://arxiv.org/abs/1807.06732.

Some people give the example of self-driving cars and adversarially perturbing stop signs. But if we can do that, why wouldn’t the attackers completely spray-paint over the stop signs or simply physically obscure the stop sign with a high speed-limit sign for a little while? Because these “traditional attacks,” unlike adversarial examples, will work 100% of the time, whereas an adversarial attack works only when it transfers well and manages to not get distorted by the preprocessing.

This does not mean that when you have a mission-critical ML application, you can just ignore this problem. However, it most cases, adversarial attacks require far more effort than more commonplace vectors of attack, so bearing that in mind is worthwhile.

Yet, as with most security implications, adversarial attacks also have adversarial defenses that attempt to defend against the many types of attacks. The attacks covered in this chapter have been some of the easier ones, but even simpler ones exist—such as drawing a single line through MNIST. Even that is sufficient to fool most classifiers.

Adversarial defenses are an ever-evolving game, in which many good defenses are available against some types of attacks, but not all. The turnaround can be so quick that just three days after the submission deadline for ICLR 2018, seven of the eight proposed and examined defenses were broken.^[8]

⁸

ICLR is the International Conference on Learning Representations, one of the smaller but excellent machine learning conferences. See Anish Athalye on Twitter in 2018, http://mng.bz/ad77. It should be noted that there were three more defenses unexamined by the author.

10.6. Adversaries to GANs

To make the connection with GANs even clearer, imagine a system generating adversarial examples, and another one saying how good that example is—depending on whether the example managed to fool the system or not. Doesn’t that remind you of a Generator (adversary) and a Discriminator (classification algorithm)? These two algorithms are again competing: the adversary is trying to fool the classifier with slight perturbations of the image, and the classifier is trying to not get fooled. Indeed, a way to think of GANs is almost as ML-in-the-loop adversarial examples that eventually come up with images.

On the other hand, you can think of iterated adversarial attacks as if you took a GAN and, rather than specifying that the objective is to generate the most realistic examples, you specify that the objective is to generate examples that will fool the classifier. Of course, you have to always remember that important differences exist, and typically you have a fixed classifier in deployed systems. But that does not preclude us from using this idea in adversarial training in which some implementations even include a repeated retraining of the classifier based on the adversarial examples that fooled it. These techniques are then moving closer to a typical GANs setup.

To give you an example, let’s take a look at one technique that has held its ground for a while as a viable defense. In the Robust Manifold Defense, we take the following steps to defend against the adversarial examples:^[9]

⁹

See “The Robust Manifold Defense: Adversarial Training Using Generative Models,” by Ajil Jalal et al., 2019, https://arxiv.org/pdf/1712.09196.pdf.

We take an image x (adversarial or regular) and
1. Project it back to the latent space z.
2. Use the generator G to generate a similar example to x, called x* by G(z).
We use the classifier C to classify this example C(x*), which generally already tends to misclassify way less than running the classification directly on x.

However, the authors of this defense find out that there are still some ambiguous cases in which the classifier does get fooled by minor perturbations. Still, we encourage you to check out their paper, as these cases tend to be unclear to humans as well, which is a sign of a robust model. To fix this, we apply adversarial training on the manifold: we get some of these adversarial cases into the training set so the classifier learns to distinguish those from the real training data.

This paper demonstrates that using GANs can give us classifiers that do not completely break down after minor perturbations, even against some of the most sophisticated methods. Performance of the downstream classifier does drop as with most of these defenses, because our classifier now has to be trained to implicitly deal with these adversarial cases. But even despite this setback, it is not a universal defense.

Adversarial training, of course, has some interesting applications. For example, for a while, the best results—state of the art—in semi-supervised learning were achieved by using adversarial training.^[10] This was subsequently challenged by GANs (remember chapter 7?) and other approaches, but that does not mean that by the time you are reading these lines, adversarial training will not be the state of the art again.

¹⁰

See “Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning,” by Takeru Miyato et al., 2018, https://arxiv.org/pdf/1704.03976.pdf.

Hopefully, this gave you another reason to study GANs and adversarial examples—partially because in mission-critical classification tasks, GANs may be the best defense going forward or because of other applications beyond the scope of this book.^[11] That is best left for a hypothetical Adversarial Examples in Action.

¹¹

This was a hotly debated topic at ICLR 2019. Though most of these conversations were informal, using (pseudo) invertible generative models as a way to classify “out-of-sample”ness of an image seems like a fruitful avenue.

To sum up, we have laid out the notion of adversarial examples and made the connection to GANs even more specific. This is an underappreciated connection, but one that can solidify your understanding of this challenging subject. Furthermore, one of the defenses against adversarial examples are GANs themselves!^[12] So GANs also have the potential to solve this gap that likely led to their existence in the first place.

¹²

See Jalal et al., 2019, https://arxiv.org/pdf/1712.09196.pdf.

10.7. Conclusion

Adversarial examples are an important field, because even commercial computer vision products suffered from this shortcoming and can still be easily fooled by academics.^[13] Beyond security and machine learning explainability applications, many practical uses remain in fairness and robustness.

¹³

See “Black-Box Adversarial Attacks with Limited Queries and Information,” by Andrew Ilyas et al., 2018, https://arxiv.org/abs/1804.08598.

Furthermore, adversarial examples are an excellent way of solidifying your own understanding of deep learning and GANs. Adversarial examples take advantage of the difficulty in training classifiers in general and the relative ease of fooling the classifier in one particular case. The classifier has to make predictions for many images, and crafting a special offset to fool the classifier exactly right is easy because of the many degrees of freedom. As a result, we can easily get adversarial noise that completely changes the label of a picture without changing the image perceptibly.

Adversarial examples can be found in many domains and many areas of AI, not just deep learning or computer vision. But as you saw in the code, creating the ones in computer vision is not challenging. Defenses against these examples exist, and you saw one using GANs, but adversarial examples are far from being solved completely.

Summary

Adversarial examples, which come from abusing the dimensionality of the problem space, are an important aspect of machine learning because they show us why GANs work and why some classifiers can be easily broken.
We can easily generate our own adversarial examples with real images and noise.
Few meaningful attack vectors can be used with adversarial examples.
Applications of adversarial examples include cybersecurity and machine learning fairness, and we can defend against them by using GANs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10. Adversarial examples

Create new playlist

Sign In

Sign Up

Chapter 10. Adversarial examples

Note

10.1. Context of adversarial examples

10.2. Lies, damned lies, and distributions

Note

10.3. Use and abuse of training

equation 10.1.

equation 10.2.

equation 10.3.

equation 10.4.

Figure 10.2. A bit of noise makes a lot of difference. The picture in the middle has the noise (difference) applied to it (the picture to the right). Of course, the right picture is heavily amplified—approximately 300 times—and shifted so that it can create a meaningful image.

Table 10.1. Original image predictions

Table 10.2. Adversarial image predictions

Listing 10.1. Our trusty imports

Listing 10.2. Helper function

Listing 10.3. Creating tables 10.1 and 10.2

Figure 10.3. DAWNBench is a great place to see the current state-of-the-art models and ResNet-50 dominance, at least as of early July 2019.

10.4. Signal and the noise

Figure 10.5. It is clear that we do not get a confident classification as a wrong class in most cases on just naively sampled noise. So that is plus points to ResNet-50. On the left, we include the mean and variance we used so that you can see their impact.

Figure 10.7. When we run ResNet-50 on adversarial noise, we get a different story: most of the items are misclassified after applying a PGD attack—still a simple attack.

Figure 10.8. Inception V3 applied to Gaussian noise. Notice that we are not using any attacks; this noise is just sampled from the distribution.

Note

Note

Listing 10.4. Gaussian noise

10.5. Not all hope is lost

10.6. Adversaries to GANs

10.7. Conclusion

Summary

Table of Contents for
Chapter 10. Adversarial examples