Chapter 11. Practical applications of GANs

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. Practical applications of GANs

This chapter covers

Use of GANs in medicine
Use of GANs in fashion

As captivating as generating handwritten digits and turning apples into oranges may be, GANs can be used for a lot more. This chapter explores some of the practical applications of GANs. It is only fitting that this chapter focuses on areas where GANs have been harnessed for practical use. After all, one of our main goals with this book is to give you the knowledge and tools necessary to not only understand what has been accomplished with GANs to date, but also to empower you to find new applications of your choosing. There is no better place to start that journey than taking a look at several successful examples of just that.

You have already seen several innovative use cases of GANs. Chapter 6 showed how Progressive GANs can create not only photorealistic renditions of human faces, but also samples of, arguably, much greater practical importance: medical mammograms. Chapter 9 showed how the CycleGAN can create realistic simulated virtual environments by translating clips from a video game into movie-like scenes, which can then be used to train self-driving cars.

This chapter reviews GAN applications in greater detail. We will walk through what motivated these applications, what makes them uniquely suited to benefit from the advances made possible by GANs, and how their creators went about implementing them. Specifically, we will look at GAN applications in medicine and fashion. We chose these two fields based on the following criteria:

They showcase not only academic but also, and primarily, the business value of GANs. They represent how the academic advances achieved by GAN researchers can be applied to solve real-world problems.
They use GAN models that are understandable with the tools and techniques discussed in this book. Instead of introducing new concepts, we will look at how the models we implemented can be applied to uses other than the MNIST.
They are understandable without the need for specialized domain expertise. For example, GAN applications in chemistry and physics tend to be hard to comprehend for anyone without a strong background in the given field.

Moreover, the chosen fields and the examples we selected serve to illustrate the versatility of GANs. In medicine, we show how GANs can be useful in situations with limited data. In fashion, we present the other extreme and explore GAN applications in scenarios where extensive datasets are available. Even if you have no interest in medicine or fashion, the tools and approaches that you will learn about in this chapter are applicable to countless other use cases.

Sadly, as is all too often the case, the practical applications we will review are virtually impossible to reproduce in a coding tutorial because of the proprietary or otherwise hard-to-obtain nature of the training data. Instead of a full coding tutorial like the ones throughout this book, we can provide only a detailed explanation of the GAN models and the implementation choices behind them. Accordingly, by the end of this chapter, you should be fully equipped to implement any of the applications in this chapter by making only small modifications to the GAN models we implemented earlier and feeding them a dataset for the given use case or one similar to it. With that, let’s dive in.

11.1. GANs in medicine

This section presents applications of GANs in medicine. Namely, we look at how to use GAN-produced synthetic data to enlarge a training dataset to help improve diagnostic accuracy.

11.1.1. Using GANs to improve diagnostic accuracy

Machine learning applications in medicine face a range of challenges that lend the field well to benefiting from GANs. Perhaps most important, it is challenging to procure training datasets large enough for supervised machine learning algorithms because of difficulties involved in collecting medical data.^[1] Obtaining samples of medical conditions tends to be prohibitively expensive and impractical.

¹

See “Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification,” by Maayan Frid-Adar et al., 2018, http://mng.bz/rPBg.

Unlike datasets of handwritten letters for optical character recognition (OCR) or footage of roads for self-driving cars, which anyone can procure, examples of medical conditions are harder to come by, and they often require specialized equipment to collect. Not to mention the all-important considerations of patient privacy that limit how medical data can be collected and used.

In addition to difficulties in obtaining medical datasets, it is also challenging to properly label this data, a process that often requires annotations by people with expert knowledge of a given condition.^[2] As a result, many medical applications have been unable to benefit from advances in deep learning and AI.

²

Ibid.

Many techniques have been developed to help address the problem of small labeled datasets. In chapter 7, you learned how GANs can be used to enhance the performance of classification algorithms in a semi-supervised setting. You saw how the SGAN achieved superior accuracy while using only a tiny subset of labels for training. This, however, addresses only half of the problem medical researchers face. Semi-supervised learning helps in situations in which we have a large dataset, but only a small portion of it is labeled. In many medical applications, having labels for a tiny portion of the dataset is only part of the problem—this small portion is often the only data we have! In other words, we do not have the luxury of thousands of additional samples from the same domain just waiting to be labeled or used in a semi-supervised setting.

Medical researchers strive to overcome the challenge of insufficient datasets by using data-augmentation techniques. For images, these include small tweaks and transformations such as scaling (zooming in and out), translations (moving left/right and up/down), and rotations.^[3] These strategies allow a single example to be used to create many others, thereby expanding the dataset size. Figure 11.1 shows examples of data augmentations commonly used in computer vision.

³

Ibid.

Figure 11.1. Techniques used to enlarge a dataset by altering existing data include scaling (zooming in and out), translations (moving left/right and up/down), and rotations. Although effective at increasing dataset sizes, classic data augmentation techniques bring only limited additional data diversity.

(Source: “Data Augmentation: How to Use Deep Learning When You Have Limited Data,” by Bharath Raj, 2018, http://mng.bz/dxPD.)

As you may imagine, standard data augmentation has many limitations. For one, small modifications yield examples that do not diverge far from the original image. As a result, the additional examples do not add much variety to help the algorithm learn to generalize.^[4] In the case of handwritten digits, for example, we want to see the number 6 rendered in different writing styles, not just permutations of the same underlying image.

⁴

Ibid.

In the case of medical diagnostics, we want different examples of the same underlying pathology. Enriching a dataset with synthetic examples, such as those produced by GANs, has the potential to further enrich the available data beyond traditional augmentation techniques. That is precisely what the Israeli researchers Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan set out to investigate.

Encouraged by GANs’ ability to synthesize high-quality images in virtually any domain, Frid-Adar and her colleagues decided to explore the use of GANs for medical data augmentation. They chose to focus on improving the classification of liver lesions. One of their primary motivations for focusing on the liver is that this organ is one of the three most common sites for metastatic cancer, with over 745,000 deaths caused by liver cancer in 2012 alone.^[5] Accordingly, tools and machine learning models that would help doctors diagnose at-risk patients have the potential to save lives and improve outcomes for countless patients.

⁵

See “Cancer Incidence and Mortality Worldwide: Sources, Methods, and Major Patterns in GLOBOCAN 2012,” by J. Ferlay et al., 2015, International Journal of Cancer, https://www.ncbi.nlm.nih.gov/pubmed/25220842.

11.1.2. Methodology

Frid-Adar and her team found themselves in a catch-22 situation: their goal was to train a GAN to augment a small dataset, but GANs themselves need a lot of data to train. In other words, they wanted to use GANs to create a large dataset, but they needed a large dataset to train the GAN in the first place.

Their solution was ingenious. First, they used standard data-augmentation techniques to create a larger dataset. Second, they used this dataset to train a GAN to create synthetic examples. Third, they used the augmented dataset from step 1 along with the GAN-produced synthetic examples from step 2 to train a liver lesion classifier.

The GAN model the researchers used was a variation on the Deep Convolutional GAN (DCGAN) covered in chapter 4. Attesting to the applicability of GANs across a wide array of datasets and scenarios, Frid-Adar et al. had to make only minor tweaks and customizations to make the DCGAN work for their use case. As evidenced by figure 11.2, the only parts of the model that needed adjustment were the dimensions of the hidden layers and the dimensions of the output from the Generator and input into the Discriminator network.

Figure 11.2. The DCGAN model architecture employed by Frid-Adar et al. to generate synthetic images of liver lesions to augment their dataset, aiming to improve classification accuracy. The model architecture is similar to the DCGAN in chapter 4, underscoring the applicability of GANs across a wide array of datasets and use cases. (Note that the figure shows only the GAN flow for fake examples.)

(Source: Frid-Adar et al., 2018, http://mng.bz/rPBg

Instead of 28 × 28 × 1-sized images like those in the MNIST dataset, this GAN deals with images that are 64 × 64 × 1. As noted in their paper, Frid-Adar et al. also used 5 × 5 convolutional kernels—but then again, that is also only a small change to the network hyperparameters. Except for the image size, which is given by the training data, all these adjustments were in all likelihood determined by trial and error. The researchers kept tweaking the parameters until the model produced satisfactory images.

Before we review how well the approach devised by Frid-Adar and her team worked, let’s pause for a moment and appreciate how far your understanding of GANs has progressed. As early as chapter 4 in this book, you had already learned enough about GANs to apply them to a real-world scenario, discussed in a paper presented at the 2018 International Symposium on Biomedical Imaging.^[6]

⁶

See Frid-Adar et al., 2018, http://mng.bz/rPBg.

11.1.3. Results

Using DCGAN for data augmentation, Frid-Adar and her team achieved a significant improvement in classification accuracy compared to the baseline (standard data augmentation only).^[7] Their results are summarized in figure 11.3, which shows the classification accuracy (y-axis) as the number of training examples (x-axis) increases.

⁷

Ibid.

Figure 11.3. This chart shows classification accuracy as new examples are added using two dataset augmentation strategies: standard/classic data augmentation; and augmentation using synthetic examples produced by DCGAN. Using standard augmentation (dotted line), the classification performance peaks at around 80%. Using GAN-created examples (dashed line) boosts the accuracy to over 85%.

(Source: Frid-Adar et al., 2018, http://mng.bz/rPBg.)

The dotted line depicts classification performance for classic data augmentation. The performance improves as the quantity of new (augmented) training examples increases; however, the improvement plateaus around the accuracy of 80%, beyond which additional examples fail to yield improvement.

The dashed line shows the additional increase in accuracy achieved by augmenting the dataset using GAN-produced synthetic examples. Starting from the point beyond which additional classically augmented examples stopped improving accuracy, Frid-Adar et al. added synthetic data produced by their DCGAN. The classification performance improved from around 80% to over 85%, demonstrating the usefulness of GANs.

Improved classification of liver lesions is only one of many data-constrained use cases in medicine that can benefit from data augmentation through GAN-produced synthetic examples. For example, a team of British researchers led by Christopher Bowles from the Imperial College London harnessed GANs (in particular, the Progressive GANs discussed in chapter 6) to boost performance on brain segmentation tasks.^[8] Crucially, an improvement in performance can unlock a model’s usability in practice, especially in fields like medicine, where accuracy may mean the difference between life and death.

⁸

See “GAN Augmentation: Augmenting Training Data Using Generative Adversarial Networks,” by Christopher Bowles et al., 2018, https://arxiv.org/abs/1810.10863.

Let’s switch gears and explore applications of GANs in a field with much lower stakes and a whole different set of considerations and challenges: fashion.

11.2. GANs in fashion

Unlike medicine, for which data is hard to obtain, researchers in fashion are fortunate to have huge datasets at their disposal. Sites like Instagram and Pinterest have countless images of outfits and clothing items, and retail giants like Amazon and eBay have data on millions of purchases of everything from socks to dresses.

In addition to data availability, many other characteristics make fashion well-suited to AI applications. Fashion tastes vary greatly from customer to customer, and the ability to personalize content has the potential to unlock significant business benefits. In addition, fashion trends change frequently, and it is vital for brands and retailers to react quickly and adapt to customers’ shifting preferences.

In this section, we explore some of the innovative uses of GANs in fashion.

11.2.1. Using GANs to design fashion

From drone deliveries to cashier-less grocery stores, Amazon is no stranger to headline news about its futuristic endeavors. In 2017, Amazon earned another one, this time about the company’s ambition to develop an AI fashion designer by using no other technique than GANs.^[9] The story, published in MIT Technology Review, is unfortunately short on details besides the mention of using GANs to design new products matching a particular style.

⁹

See “Amazon Has Developed an AI Fashion Designer,” by Will Knight, 2017, MIT Technology Review, http://mng.bz/VPqX.

Luckily, researchers from Adobe and the University of California, San Diego, published a paper in which they set out to accomplish the same goal.^[10] Their approach can give us a hint about what goes on behind the secretive veil of Amazon’s AI research labs seeking to reinvent fashion. Using a dataset of hundreds of thousands of users, items, and reviews scraped from Amazon, lead author Wang-Cheng Kang and his collaborators trained two separate models: one that recommends fashion and the other that creates it.^[11]

¹⁰

See “This AI Learns Your Fashion Sense and Invents Your Next Outfit,” by Jackie Snow, 2017, MIT Technology Review, http://mng.bz/xlJ8.

¹¹

See “Visually-Aware Fashion Recommendation and Design with Generative Image Models,” by Wang-Cheng Kang et al., 2017, https://arxiv.org/abs/1711.02231.

For our purposes, we can treat the recommendation model as a black box. The only thing we need to know about the model is what it does: for any person-item pair, it returns a preference score; the greater the score, the better match the item is for the person’s tastes. Nothing too unusual.

The latter model is a lot more novel and interesting—not only because it uses GANs, but also thanks to the two creative applications Kang and his colleagues devised:

Creating new fashion items matching the fashion taste of a given individual
Suggesting personalized alterations to existing items based on an individual’s fashion preferences.

In this section, we explore how Kang and his team achieved these goals.

11.2.2. Methodology

Let’s start with the model. Kang and his colleagues use a Conditional GAN (CGAN), with a product’s category as the conditioning label. Their dataset has six categories: tops (men’s and women’s), bottoms (men’s and women’s), and shoes (men’s and women’s).

Recall that in chapter 8, we used MNIST labels to teach a CGAN to produce any handwritten digit we wanted. In a similar fashion (pun intended), Kang et al. use the category labels to train their CGAN to generate fashion items belonging to a specified category. Even though we are now dealing with shirts and pants instead of threes and fours, the CGAN model setup is almost identical to the one we implemented in chapter 8. The Generator uses random noise z and conditioning information (label/category c) to synthesize an image, and the Discriminator outputs a probability that a particular image-category pair is real rather than fake. Figure 11.4 details the network architecture Kang et al. used.

Figure 11.4. The architectures of the CGAN Generator and the Discriminator networks that Kang etal. use in their study. The label c represents the category of clothing. The researchers use it as the conditioning label to guide the Generator to synthesize an image matching the given category, and the Discriminator to identify real image-category pairs.

(Source: Kang et al., 2017, https://arxiv.org/abs/1711.02231.)

Each box represents a layer; fc stands for fully connected layer; st denotes strides for the convolutional kernel whose dimensions (width × height) are given as the first two numbers in the conv/deconv layers; and deconv and conv denote what kind of layer is used: a regular convolution or a transposed convolution, respectively. The number directly after the conv or deconv sets the depth of the layer or, equivalently, the number of convolutional filters used. BN tells us that batch normalization was used on the output of the given layer. Also, notice that Kang et al. chose to use least squares loss instead of cross-entropy loss.

Equipped with a CGAN capable of producing realistic clothing items for each of the top-level categories in their dataset, Kang and his colleagues tested it on two applications with significant practical potential: creating new personalized items and making personalized alterations to existing items.

11.2.3. Creating new items matching individual preferences

To ensure that the produced images are customized to an individual’s fashion taste, Kang and his colleagues came up with an ingenious approach. They started off with the following insight: given that their recommendation model assigns scores to existing items based on how much a person would like the given item, the ability to generate new items maximizing this preference score would likely yield items matching the person’s style and taste.^[12]

¹²

Ibid.

Borrowing a term from economics and choice theory,^[13] Kang et al. call this process preference maximization. What is unique about Kang et al.’s approach is that their universe of possible items is not limited to the corpus of training data or even the entire Amazon catalog. Thanks to their CGAN, they can fine-tune the generation of new items to virtually infinite granularity.

¹³

See “Introduction to Choice Theory,” by Jonathan Levin and Paul Milgrom, 2004, http://mng.bz/AN2p.

The next problem Kang and his colleagues had to solve was ensuring that the CGAN Generator would produce a fashion item maximizing individual preference. After all, their CGAN was trained to produce realistic-looking images for only a given category, not a given person. One possible option would be to keep generating images and check their preference score until we happen upon one whose score is sufficiently high. However, given the virtually infinite variations of the images that can be generated, this approach would be extremely inefficient and time-consuming.

Instead, Kang and his team solved the issue by framing it as an optimization problem: in particular, constraint maximization. The constraint (the boundary within which their algorithm had to operate) is the size of the latent space, given by the size of the vector z. Kang et al. used the standard size (100-dimensional vector) with each number in [–1, 1] range. To make the values differentiable so that they can be used in an optimization algorithm, the authors set each element in the vector z to the tanh function, initialized randomly.^[14]

¹⁴

See Kang et al., 2017, https://arxiv.org/abs/1711.02231.

The researchers then employed gradient ascent. Gradient ascent is just like gradient descent, except that instead of minimizing a cost function by iteratively moving in the direction of the steepest decrease, we are maximizing a reward function (in this case, the score given by the recommendation model) by iteratively moving in the direction of the steepest increase.

Kang et al.’s results are shown in figure 11.5, which compares the top three images from the dataset with the top three generated images for six different individuals. Attesting to the ingenuity of Kang et al.’s solution, the examples they produced have higher preference scores, suggesting that they are a better match for the shoppers’ style and preferences.

Figure 11.5. In the results Kang et al. present in their paper, every image is annotated with its preference score. Each row shows results for a different shopper and product category (men’s and women’s tops, men’s and women’s bottoms, and men’s and women’s shoes).

(Source: Kang et al., 2017, https://arxiv.org/abs/1711.02231.)

The three columns on the left show the items from the dataset with the highest scores; the three columns on the right show generated items with the highest scores. Based on the preference score, the generated images are a better match for the shoppers’ preferences.

Kang and his team didn’t stop there. In addition to creating new items, they explored whether the model they developed could be used to make changes to existing items, tailored to an individual’s style. Given the highly subjective nature of fashion shopping, having the ability to alter a garment until it is “just right” has significant potential business benefits. Let’s see how Kang et al. went about solving this challenge.

11.2.4. Adjusting existing items to better match individual preferences

Recall that the numbers in the latent space (represented by the input vector z) have real-world meaning, and that vectors that are mathematically close to one another (as measured by their distance in the high-dimensional space they occupy) tend to produce images that are similar in terms of content and style. Accordingly, as Kang et al. point out, in order to generate variations of some image A, all we need to do is to find the latent vector zA that the Generator would use to create the image. Then, we could produce images from neighboring vectors to generate similar images.

To make it a little less abstract, let’s look at a concrete example using our favorite dataset, the MNIST. Consider an input vector z’ that, when fed into the Generator, produces an image of the number 9. If we then feed the vector z” that is, mathematically speaking, very close to z’ in the 100-dimensional latent space the vectors occupy, then z” will produce another, slightly different, image of the number 8. This is illustrated in figure 11.6. You saw a little bit of this back in chapter 2. In the context of variational autoencoders, the intermediate/compressed representation works just like z does in the world of GANs.

Figure 11.6. Variations on the digit 9 obtained by moving around in the latent space (image reproduced from chapter 2). Nearby vectors produce variations on the same digit. For example, notice that as we move from left to right in the first row, the numeral 9 starts off being slightly right-slanted but eventually turns fully upright. Also notice that as we move far enough away, the number 9 morphs into another, visually similar digit. Progressive variations like these apply equally to more complex datasets, where the variations tend to be more nuanced.

Of course, in fashion, things are more nuanced. After all, a photo of a dress is incomparably more complex than a grayscale image of a numeral. Moving in the latent space around a vector producing, say, a T-shirt, can produce a T-shirt in different colors, patterns, and styles (V-neck as opposed to crew-neck, for example). It all depends on the types of encodings and meanings the Generator has internalized during training. The best way to find out is to try.

This brings us to the next challenge Kang and his team had to overcome. In order for the preceding approach to work, we need the vector z for the image we want to alter. This would be straightforward if we wanted to modify a synthetic image: we can just record the vector z each time we generate an image so that we can refer to it later. What complicates the situation in our scenario is that we want to modify a real image.

By definition, a real image cannot have been produced by the Generator, so there is no vector z. The best we can do is to find latent space representation of a generated image as close as possible to the one we seek to modify. Put differently, we have to find a vector z that the Generator uses to synthesize an image similar to the real image, and use it as a proxy for the hypothetical z that would have produced the real image.

That is precisely what Kang et al. did. Just as before, they start by formulating the scenario as an optimization problem. They define a loss function in terms of the so-called reconstruction loss (a measure of the difference between two images; the greater the loss, the more different a given pair of images is from one another).^[15] Having formulated the problem in this way, Kang et al. then iteratively find the closest possible generated image for any real image by using gradient descent (minimizing the reconstruction loss). Once we have a fake image that is similar to the real image (and hence also the vector z used to produce it), we can modify it through the latent space manipulations.

¹⁵

Ibid.

This is where the approach Kang and his colleagues devised shows its full potential. We can move around the latent space to points that generate images similar to the one we want to modify, while also optimizing for the preferences of the given user. We can see this process in figure 11.7: as we move from left to right in each row, the shirts and pants get progressively more personalized.

Figure 11.7. The personalization process for six shoppers (three male and three female) using the same starting image: polo shirt for males and a pair of pants for women.

(Source: Kang et al., 2017, https://arxiv.org/abs/1711.02231.)

For instance, the person of the first row was looking for a more colorful option and, as Kang et al. observed, the person in row 5 seems to prefer brighter colors and a more distressed look; and the last person, it appears, prefers skirts over jeans. This is hyperpersonalization at its finest. No wonder Amazon took notice.

The leftmost photo shows the real product from the training dataset; the second photo from the left shows a generated image closest to the real photo that was used as a starting point for the personalization process. Each image is annotated with its preference score. As we move from left to right, the item is progressively optimized for the given individual. As evidenced by the increasing scores, the personalization process improves the likelihood that the item matches the given shopper’s style and taste.

11.3. Conclusion

The applications covered in this chapter only scratch the surface of what is possible with GANs. Countless other use cases exist in medicine and fashion alone, not to mention other fields. What is certain is that GANs have expanded far beyond academia, with myriad applications leveraging their ability to synthesize realistic data.

Summary

Because of the versatility of GANs, they can be harnessed for a wide array of nonacademic applications and easily repurposed to use cases beyond the MNIST.
In medicine, GANs produce synthetic examples that can improve classification accuracy beyond what is possible with standard dataset augmentation strategies.
In fashion, GANs can be used to create new items and alter existing items to better match someone’s personal style. This is accomplished by generating images that maximize preference score provided by a recommendation algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11. Practical applications of GANs

Create new playlist

Sign In

Sign Up

Chapter 11. Practical applications of GANs

11.1. GANs in medicine

11.1.1. Using GANs to improve diagnostic accuracy

11.1.2. Methodology

11.1.3. Results

11.2. GANs in fashion

11.2.1. Using GANs to design fashion

11.2.2. Methodology

11.2.3. Creating new items matching individual preferences

Figure 11.5. In the results Kang et al. present in their paper, every image is annotated with its preference score. Each row shows results for a different shopper and product category (men’s and women’s tops, men’s and women’s bottoms, and men’s and women’s shoes).

11.2.4. Adjusting existing items to better match individual preferences

Figure 11.7. The personalization process for six shoppers (three male and three female) using the same starting image: polo shirt for males and a pair of pants for women.

11.3. Conclusion

Summary

Table of Contents for
Chapter 11. Practical applications of GANs