Chapter 12. Looking ahead

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. Looking ahead

This chapter covers

The ethics of generative models
Three recent improvements that we expect to be dominant in the years to come:
- Relativistic GAN (RGAN)
- Self-Attention GAN (SAGAN)
- BigGAN
Further reading for three more cutting-edge techniques
A summary of the key themes and takeaways from this book

In this final chapter, we want to give you a brief overview of our thoughts about the ethics of GANs. Then we will talk about some important innovations that we expect to be even more important in the future. This chapter includes high-level ideas that we expect to define the future of GANs, but it does not feature any code. We want you to be prepared for the GANtastic journey ahead—even for advances that are yet to be published at the time of writing. Lastly, we will wrap up and say our teary-eyed goodbyes.

12.1. Ethics

The world is beginning to realize that AI ethics—GANs included—is an important issue. Some institutions have decided to not release their expensive, pretrained models for fear of misuse as a tool for generating fake news.^[1] Numerous articles describe the ways in which GANs specifically may have potential malicious uses.^[2]

¹

See “An AI That Writes Convincing Prose Risks Mass-Producing Fake News,” by Will Knight, MIT Technology Review, 2019, http://mng.bz/RPGj.

²

See “Inside the World of AI that Forges Beautiful Art and Terrifying Deepfakes,” by Karen Hao, MIT Technology Review, 2019, http://mng.bz/2JA8. See also “AI Gets Creative Thanks to GANs Innovations,” by Jakub Langr, Forbes, 2019, http://mng.bz/1w71.

We all understand that misinformation can be a huge problem and that GANs with photorealistic, synthetic images could pose a danger. Imagine synthesizing videos of a world leader saying they are about to launch a military strike on another country. Will the correcting information spread quickly enough to soothe the panic that will follow?

This is not a book about AI ethics, so we touch on this topic only briefly. But we strongly believe that it is important for all of us to think about the ethics of what we are doing and about the risks and unintended consequences that our work could have. Given that AI is such a scalable technology, it is vital to think through whether we are helping to create a world we want to live in.

We urge you to think about your principles and to go through at least one of the more evolved ethical frameworks. We are not going to discuss which one is better than the other—after all, humans have generally not yet agreed on a moral framework on much more mundane things—but please put the book down and read at least one of these if you have not already.

Note

You can read about Google’s AI principles at https://ai.google/principles. The Institute for Ethical AI & ML details its principles at https://ethical.institute/principles.html. See also “IBM’s Rometty Lays Out AI Considerations, Ethical Principles,” by Larry Dignan, 2017, ZDNet, http://mng.bz/ZeZm.

For example, the technology known as DeepFakes—although not originally based on GANs—has been cited by many as a source for concern.^[3] DeepFakes—a portmanteau of deep learning and fake imagery—has already proven controversial by generating fake political videos and synthetic involuntary pornographic content. Soon, this technology may be at a point where it would be impossible to tell whether the video or image is authentic. Given GANs’ ability to synthesize new images, they may soon dominate this domain.

³

See “The Liar’s Dividend, and Other Challenges of Deep-Fake News,” by Paul Chadwick, The Guardian, 2018, http://mng.bz/6wN5. See also “If You Thought Fake News Was a Problem, Wait for DeepFakes,” by Roula Khalaf, 2018, Financial Times, http://mng.bz/PO8Y.

To say that everyone should think about the consequences of their research and code seems insufficient, but the reality is that there is no silver bullet. We should consider these implications, even if the initial focus was entirely ethical, regardless of whether we are working in research or industry. We also do not want to give you a dull lecture nor unsubstantiated media-grabbing forecast, but this is a problem we care deeply about.

AI ethics is a real problem already, and we have presented three real problems here—AI-generated fake news, synthesized political proclamations, and involuntary pornography. But many more problems exist, such as Amazon using an AI-hiring tool showing negative bias against women.^[4] But the practical landscape is complicated—some suggest that GANs have a tendency to favor images of women in face-generation. Yet another angle is that GANs also have a potential to help AI be more ethical—by synthesizing the underrepresented class in, for example, face-recognition problems in a semi-supervised setup, thereby improving the quality of classification in less-represented communities.

⁴

See “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women,” by Jeffrey Dastin, 2018, Reuters, http://mng.bz/Jz8K.

We are writing this book partially to make everyone more aware of the possibilities and possible misuses of GANs. We are excited by the future academic and practical applications of GANs and the ongoing research, but we are also aware that some applications may have negative uses. Because it is impossible to “uninvent” a technology, we have to be aware of its capabilities. By no means are we saying that the world would be better off if GANs did not exist—but GANs are just a tool, and as we all know, tools can be misused.

We feel morally compelled to talk about the promises and dangers of this technology, as otherwise misusing it becomes easier by a narrow group of the initiated. Although this book is not written for the general public, we hope that this is one stepping stone toward broader awareness—beyond the mostly academic circles that have dominated the field of GANs for now. Equally, much of the public outreach we are doing—we hope—is contributing to greater knowledge and discussions about this topic.

As more people are aware of this technology, even the existing malicious actors will no longer be able to catch anyone by surprise. We are hoping that GANs will never be a source of malicious acts, but that may be too idealistic. The next best thing is for knowledge of GANs to be available to everyone—not just academics and really invested malicious parties. We also hope (and all evidence thus far seems to point to this reality) that GANs will overall contribute positively to art, science, and engineering. Furthermore, people are also working on DeepFake detection, incorporating ideas from GANs and adversarial examples, but we have to be cautious, because any classifier that can detect these with any degree of accuracy will lend all the more credibility to an example that will manage to fool it.

In many ways, we are also hoping to start a more thorough conversation without any grandstanding—this is an invitation to connect with us through our book forums or our Twitter accounts. We are aware that we need a diverse range of perspectives to keep checking our moral framework. We also are aware that these things will evolve over time, especially as use cases become clearer. Indeed, some people—such as Benedict Evans of a16z—argue that to regulate or talk about the ethics of AI does not make any more sense than to talk about the ethics of databases. What matters is the use case, not the technology.

12.2. GAN innovations

Speaking of use cases, we are aware that GANs are an ever-evolving field. In this section, we want to quickly update you on things that are not as robust in the community as some of the topics in prior chapters, but things we expect to be significant in the future. In the spirit of keeping this practical, we have picked out three GAN innovations that all have an interesting practical application: either a practical paper (RGAN), GitHub project (SAGAN), or artistic application (BigGAN).

12.2.1. Relativistic GAN

Not often do we get to see an update so simple and elegant that it could have been in the original paper, yet powerful enough to beat many of the state-of-the-art algorithms. Relativistic GAN (RGAN) is one such example. The core idea of the RGAN is that in addition to the original GAN (specifically, the NS-GAN that you may recall from chapter 5), we add an extra term to the Generator—forcing it to make the generated data seem more real than the real data.

In other words, the Generator should, in addition to making fake data seem more real, make real data seem comparatively less real, thereby also increasing the stability of the training. But of course, the only data the Generator has control over is the synthetic data, so the Generator can achieve this only comparatively.

The RGAN’s author describes it as being a generalized version of the WGAN, which we discussed previously. Let’s start with the simplified loss function from table 5.1 in chapter 5:

equation 12.1.

equation 12.2.

Recall that equation 12.1 describes the loss function for the Discriminator—where we measure the difference between the real data (D(x)) and the generated ones (D(G(z))). Equation 12.2 then describes the loss function of the Generator, where we are trying to make the Discriminator believe that the samples it is seeing are real.

To go to our closest predecessor, remember that the WGAN is trying to minimize the amount of probability mass we would have to move to get the generated distribution to look like the real one. In this sense, the RGAN has many similarities (for example, the Discriminator is frequently called the critic, and the WGAN is presented as a special case of the RGAN in this paper). Ultimately, both measure the current state of play as a single number—remember the earth mover’s distance?

The innovation of the RGAN is that we no longer get the previous unhelpful dynamic of the Generator always playing catch-up. In other words, the Generator is trying to generate data that looks more realistic than the real data so that it is not always on the defensive. As a result, D(x) can be interpreted as the probability that the real data is more realistic than the generated data.

Before we delve into the difference on a high level, we will introduce a slightly different notation, as to approximate the notation used by the paper, but simplify. In equations 12.3 and 12.4, C(x) acts as a critic similar to a WGAN setup,^[5] and you may think of it as a Discriminator. Furthermore, a() is defined as log(sigmoid()). In the paper, G(z) is replaced by x_f for fake samples, and x gets subscript r to indicate real samples, but we will follow the simpler notation from the earlier chapters.

⁵

Because we are skipping over some details, we want to equip you with the high-level idea and keep the notation consistent so that you can fill in the blanks yourself.

equation 12.3.

equation 12.4.

Importantly, in these equations, we see only one key difference in the Generator: the real data now adds into the loss function. This seemingly simple trick aligns the incentives of the Generator to not be at a permanent disadvantage. To understand this and two other perspectives in an idealized setting, let’s plot the different Discriminator outputs as in figure 12.1.

Figure 12.1. Under divergence minimization (a), the Generator is always playing catch-up with the Discriminator (because divergence is always ≥ 0). In (b), we see what “good” NS-GAN training looks like. Again, the Generator cannot win. In (c), we can see that now the generator can win, but more importantly, the Generator always has something to strive for (and therefore recover useful gradient), no matter the stage of training.

(Source: “The Relativistic Discriminator: A Key Element Missing from Standard GAN,” by Alexia Jolicoeur-Martineau, 2018, http://arxiv.org/abs/1807.00734.)

You may be wondering, why should just adding this term be noteworthy? Well, this simple addition makes the training significantly more stable at a little extra computational cost. This is important, especially when you remember the “Are GANs Created Equal?” paper from chapter 5, where the authors argue that all the major GAN architectures considered so far have an only limited improvement over the original GAN when adjusted for the extra processing requirements. This is because many new GAN architectures are better only at huge computational cost, which makes them less useful, but the RGAN has potential to change GAN architectures across the board.

Always be aware of this trick, because even though a method may take fewer update steps, if each step takes two times longer because of the extra computation, is it really worth it? The peer review process at most conferences is not immune to this weakness, so you have to be careful.

Application

Your next question may be, why should this matter in practice? In less than a year, this paper has gathered more than 50 citations^[6]—which is a lot for a new paper from a previously unknown author. Moreover, people have already written papers using the RGAN to, for example, achieve state-of-the-art speech (that is, best performance ever achieved) enhancement, beating other GAN-based and non-GAN-based methods.^[7]

⁶

The following link names all the papers that cite the RGAN paper: http://mng.bz/omGj.

⁷

See “SERGAN: Speech Enhancement Using Relativistic Generative Adversarial Networks with Gradient Penalty,” by Deepak Baby and Sarah Verhulst, 2019, IEEE-ICASSP, https://ieeexplore.ieee.org/document/8683799.

As you are reading this, the paper should be available, so feel free to take a look. Explaining this paper, with all the background necessary, however, is beyond the scope of this book.

12.2.2. Self-Attention GAN

The next innovation we believe is going to change the landscape is the Self-Attention GAN (SAGAN). Attention is based on a very human idea of how we look at the world—through small patches of focus at a time.^[8] A GAN’s attention works similarly: your mind is consciously able to focus on only a small part of, say, a table, but your brain is able to stitch the whole table together through quick, minor eye movements called saccades while still focusing on only a subset of the image at a time.

⁸

See The Mind Is Flat: The Illusion of Mental Depth and the Improvised Mind by Nick Chater (Penguin, 2018).

The computer equivalent has been used in many fields, including natural language processing (NLP) and computer vision. Attention can help us solve, for example, the problem of convolutional neural networks (CNNs) ignoring much of the picture. As we know, CNNs rely on a small receptive field—as determined by the size of the convolution. However, as you may recall from chapter 5, in GANs, the size of the receptive field is likely to cause problems (such as cows with multiple heads or bodies), and the GAN will not consider them strange.

This is because when generating or evaluating that subset of the image, we may see that a leg is present in one field, but we do not see that other legs are already present in another one. This could be because the convolution ignores the structure of the object or because legs or leg rotations are represented by different, higher-level neurons that do not talk to each other. Our seasoned data scientists will remember that is what Hinton’s CapsuleNets were attempting to solve, but they never really took off. For everyone else, the short story is that no one can say with absolute certainty why attention fixes this, but a good way to think about it is that we can now create feature detectors with a flexible receptive field (shape) to really focus on several key aspects of a given picture (see figure 12.2).

Figure 12.2. The output pixel (2 × 2 patch) ignores anything except the small highlighted region. Attention helps us solve that.

(Source: “Convolution Arithmetic,” by vdmoulin, 2016, https://github.com/vdumoulin/conv_arithmetic.)

Recall that this is especially a problem when our images are, say, 512 × 512, but the largest commonly used convolution sizes are 7, so that is loads of ignored features! Even in higher-level nodes, the neural network may not be appropriately checking for, for example, a head in the right place. As a result, as long as the cow has a cow head next to a cow body, the network does not care about any other head, as long as it has at least one. But the structure is wrong.

These higher-level representations are harder to reason about, and so even researchers disagree as to exactly why this happens, but empirically, the network does not seem to pick it up. Attention allows us to pick out the relevant regions—whatever the shape or size—and consider them appropriately. To see the types of regions that attention can flexibly focus on, consider figure 12.3.

Figure 12.3. Here, we can see the regions of the image that the attention mechanism pays most attention to, given a representative query location. We can see that the attention mechanism generally cares about regions of different shapes and sizes, which is a good sign, given that we want it to pick out the regions of the image that indicate the kind of object it is.

(Source: “Self-Attention Generative Adversarial Networks,” by Han Zhang, 2018, http://arxiv.org/abs/1805.08318.)

Application

DeOldify (https://github.com/jantic/DeOldify) is one of the popular applications of the SAGAN that was made by Jason Antic, a student of Jeremy Howard’s fast.ai course. DeOldify uses the SAGAN to colorize old images and drawings to an amazing level of accuracy. As you can see in figure 12.4, you can turn famous historic photographs and paintings into fully colorized versions.

Figure 12.4. Deadwood, South Dakota, 1877. The image on the right has been colorized . . . for a black-and-white book. Trust us. If you do not believe us, check out the online liveBook on Manning’s website to see for yourself!

12.2.3. BigGAN

Another architecture that has taken the world by storm is BigGAN.^[9] BigGAN has achieved highly realistic 512 × 512 images on all 1,000 classes of ImageNet—a feat previously deemed almost impossible with the current generation of GANs. BigGAN achieved three times the previous best inception score. In brief, BigGAN builds on the SAGAN and spectral normalization and has further innovated in five directions:

⁹

See “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” by Andrew Brock et al., 2019, https://arxiv.org/pdf/1809.11096.pdf.

Scaling up GANs to previously unbelievable computational scale. The BigGAN authors trained with eight times the batch size, which was part of their success—giving already a 46% boost. Theoretically, the resources required to train a BigGAN add up to $59,000 worth of compute.^[10]

¹⁰

See Mario Klingemann’s Twitter post at http://mng.bz/wll2.
BigGAN’s architecture has 1.5 times the number of channels (feature maps) in each layer relative to the SAGAN architecture. This may be due to the complexity of the dataset used.
Improving the stability of the Generator and the Discriminator through controlling the adversarial process, which leads to overall better results. The underlying mathematics are unfortunately beyond the scope of this book, but if you’re interested, we recommend starting with understanding spectral normalization. For those who are not, take solace in the fact that even the authors themselves abandon this strategy in later parts of training and let the mode collapse because of computational costs.
Introducing a truncation trick to give us a way of controlling the trade-off between variety and fidelity. The truncation trick achieves better equality results if we sample closer to the middle of the distribution (truncate it). It makes sense that this would yield better samples, as this is where BigGAN has the “most experience.”
The authors introduce a further three theoretical advancements. According to the authors’ own performance table, however, these seem to have only a marginal effect on the scores and frequently lead to less stability. They are useful for computational efficiency, but we will not discuss them.

Application

One fascinating artistic application of BigGAN is the Ganbreeder app, which was made possible thanks to the pretrained models and Joel Simon’s hard work. Ganbreeder is an interactive web-based (free!) way to explore the latent space of BigGAN. It has been used in numerous artistic applications as a way to come up with new images.

You can either explore the adjacent latent space or use a linear interpolation between two samples of the two images to create new images. Figure 12.5 shows an example of creating Ganbreeder offspring.

Figure 12.5. Every time you click the Make Children button, Ganbreeder gives you a selection of mutated images in the nearby latent space, producing the three images below. You may start from your own sample or someone else’s—thereby making it a collaborative exercise. This is what the Crossbreed section is for, where you can select another interesting sample from other parts of the space and mix the two samples. Lastly, in Edit-Genes, you can edit parameters (such as Castle and Stone Wall, in this case) and add more or less of that feature into the picture.

(Source: Ganbreeder, http://mng.bz/nv28.)

BigGAN is further notable because DeepMind has given us all this compute for free and uploaded pretrained models onto TensorFlow Hub—a machine learning code repository that we used in chapter 6.

12.3. Further reading

We wanted to cover many other topics that seem to be gaining popularity in the works of academics and practitioners, but we did not have the space. Here, we will list at least three of them for interested readers. We hope we have equipped you with all that you need to understand these papers. We picked just three, as we expect this section to be changing quickly:

Style GAN (http://arxiv.org/abs/1812.04948) merges ideas from GANs and “traditional” style transfer to give users much more control over the output they generate. This Conditional GAN from NVIDIA has managed to produce stunning full-HD results with several levels of control—from finer details to overall image. This work builds on chapter 6, so you may want to reread it before delving into this paper.
Spectral normalization (http://arxiv.org/abs/1802.05957) is a complex regularization technique and requires somewhat advanced linear algebra. For now, just remember the use case—stabilizing training by normalizing the weights in a network to satisfy a particular property, which is even formally required in WGAN (touched on in chapter 5). Spectral normalization acts somewhat similarly to gradient penalties.
SPADE, aka GauGAN (https://arxiv.org/pdf/1903.07291.pdf) is cutting-edge work published in 2019 to synthesize photorealistic images based solely on a semantic map of the image, as you may recall from the start of chapter 9. The images can be up to 512 × 256 in resolution, but knowing NVIDIA, this may increase before the end of the year. This may be the most challenging technique of the three, but also one that has gathered the most media attention—probably because of how impressive the tech demo is!

There is so much going on in the world of GANs that it may be impossible to stay up-to-date all the time. However, we hope that in terms of both ethical frameworks and the latest interesting papers, we have given you the resources needed to look at the problems in this ever-evolving space. Indeed, that is our hope, even when it comes to the innovations behind the GANs presented in this chapter. We do not know whether all of these will become part of the routine bag of tricks that people use, but we think that they might. We also hope that this will be true for the most recent innovations listed in this section.

12.4. Looking back and closing thoughts

We hope that the cutting-edge techniques we’ve discussed will give you enough subject material to continue exploring GANs even as our book comes to an end. Before we send you off, however, it is worth looking back and recapping all that you have learned.

We started off with a basic explanation of what GANs are and how they work (chapter 1) and implemented a simple version of this system (chapter 3). We introduced you to generative models in an easier setting with autoencoders (chapter 2). We covered the theory of GANs (chapters 3 and 5) as well as their shortcomings and some of the ways to overcome them (chapter 5). This provided the foundation and tools for the later, advanced chapters.

We implemented several of the most canonical and influential GAN variants—Deep Convolutional GAN (chapter 4) and Conditional GAN (chapter 8)—as well as a few of the most advanced and complex ones—Progressive GANs (chapter 6) and CycleGANs (chapter 9). We also implemented Semi-Supervised GANs (chapter 8), a GAN variant designed to tackle one of the most severe shortcomings in machine learning: the lack of large, labeled datasets. We also explored several of the many practical and innovative applications of GANs (chapter 11), and presented adversarial examples (chapter 10), which are a challenge for all of machine learning.

Along the way, you expanded your theoretical and practical toolbox. From inception score and Fréchet inception distance (chapter 5) to pixel-wise feature normalization (chapter 6), batch normalization (chapter 4), and dropout (chapter 7), you learned about concepts and techniques that will serve you well for GANs and beyond.

As we look back, it is worth highlighting a few themes that came up time and time again as we explored GANs:

GANs are tremendously versatile, in terms of both practical use cases and resilience against theoretical requirements and constraints. This was perhaps most apparent in the case of CycleGAN in chapter 9. This technique not only is unconstrained by the need for paired data that burdened its predecessors, but also can translate between examples in virtually any domain, from apples and oranges to horses and zebras. The versatility of GANs was also evident in chapter 6, where you saw that Progressive GANs can learn to generate equally well images as disparate as human faces and medical mammograms, and in chapter 7, where we needed to make only a handful of adjustments to turn the Discriminator into a multiclass classifier.
GANs are as much an art as they are a science. The beauty and the curse of GANs—and, indeed, deep learning in general—is that our understanding of what makes them work so well in practice is limited. Few known mathematical guarantees exist, and most achievements are experimental only. This makes GANs susceptible to many training pitfalls, such as mode collapse, which you may recall from our discussion in chapter 5. Fortunately, researchers have found many tips and tricks that greatly mitigate these challenges—everything from input preprocessing to our choice of optimizer and activation functions—many of which you learned about and even saw firsthand in code tutorials thought the book. Indeed, as the GAN variants covered in this chapter show, the techniques to improve GANs continue to evolve.

In addition to difficulties in training, it is crucial to keep in mind that even techniques as powerful and versatile as GANs have other important limitations. GANs have been hailed by many as the technique that gave machines the gift of creativity. This is true to a degree—in a few short years, GANs have become the undisputed state-of-the-art technique in synthesizing fake data; however, they fall short of what human creativity can do.

Indeed, as we showed time and time again throughout this book, GANs can mimic the features of almost any existing dataset and come up with examples that look as though they came from that dataset. However, by their very nature, GANs will not stray far from the training data. For instance, if we have a training dataset of classical art masterpieces, the examples our GAN will produce will look more like Michelangelo than Jackson Pollock. Until a new AI paradigm comes along that gives machines true autonomy, it will be ultimately up to the (human) researcher to guide the GAN to the desired end goal.

As you experiment with GANs and their applications, bear in mind not only the practical techniques, tips, and tricks covered throughout this book, but also the ethical considerations discussed in this chapter. With that, we wish you all the best in the GANtastic journey ahead.

—Jakub and Vladimir

Summary

We touched on AI and GAN ethics and discussed the moral frameworks, need for awareness, and openness of discussion.
We equipped you with the innovations we believe will drive the future of GANs, and we gave you the high-level idea behind the following:
- Relativistic GAN, which now ensures that the Generator considers the relative likelihood of real and generated data
- SAGAN, with attention mechanisms that act similarly to human perception
- BigGAN, which allowed us to generate all 1,000 ImageNet classes of unprecedented quality
We highlighted two key recurring themes of our book: (1) the versatility of GANs and (2) the necessity for experimentation because, much like the rest of deep learning, GANs are as much an art as they are a science.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. Looking ahead

Create new playlist

Sign In

Sign Up

Chapter 12. Looking ahead

12.1. Ethics

Note

12.2. GAN innovations

12.2.1. Relativistic GAN

equation 12.1.

equation 12.2.

equation 12.3.

equation 12.4.

Application

12.2.2. Self-Attention GAN

Figure 12.2. The output pixel (2 × 2 patch) ignores anything except the small highlighted region. Attention helps us solve that.

Application

Figure 12.4. Deadwood, South Dakota, 1877. The image on the right has been colorized . . . for a black-and-white book. Trust us. If you do not believe us, check out the online liveBook on Manning’s website to see for yourself!

12.2.3. BigGAN

Application

12.3. Further reading

12.4. Looking back and closing thoughts

Summary

Table of Contents for
Chapter 12. Looking ahead