This chapter covers
Finally, a technological breakthrough of almost universal appeal, seeing as everyone seems to love comparing apples to oranges. In this chapter, you will learn how! But this is no small feat, so we will need at least two sets of Discriminators and two Generators to achieve this. That obviously complicates the architecture, so we will have to spend more time discussing it, but at the very least, it is a great point to start thinking in a fully object-oriented programming (OOP) way.
One fascinating area of GANs’ application that we touched on at the end of the previous chapter is image-to-image translation. In this use, GANs have been massively successful—in video, static images, or even style transfer. Indeed, GANs have been at the forefront of many of these applications as they enable almost a new class of uses. Because of their visual nature, the more successful GAN variants typically make their rounds on YouTube and Twitter, so if you have not seen these videos, we encourage you to check them out by searching for pix2pix, CycleGAN, or vid2vid.
This type of translation in practice means that our input to the Generator is a picture, because we need our Generator (translator) to start from this image. In other words, we are mapping an image from one domain to another. Previously, the latent vector seeding the generation was typically a somewhat uninterpretable vector. Now we are swapping that for an input image.
A good way to think of image-to-image translation is as a special case of the Conditional GAN. However, in this case, we are conditioning on a complete image (rather than just a class)—typically of the same dimensionality as the output image—that is then provided to the network as a kind of a label (presented in chapter 8). One of the first famous examples in this space was an image-translation work coming out of the University of California, Berkeley, as shown in figure 9.1.
(Source: “Image-to-Image Translation with Conditional Adversarial Networks,” by Phillip Isola, https://github.com/phillipi/pix2pix.)
As you can see, we can map from any of the following:
The idea is clearly powerful and versatile; however, the issue lies with the need for paired data. From chapter 8, you understand that we need labels for the Conditional GAN. Because in this case we are using another image as a label, the mapping does not make sense unless we’re mapping to the corresponding image—the exact same image, except in the other domain.
So, the night image needs to be taken from exactly the same place as the day image. The fashion item’s outline needs to have the exact match of a fully colored/synthesized item in the training set in the other domain. In other words, during training, the GAN needs to have access to corresponding labels of the items in the original domain.
This is typically done—for example, in the case of black-and-white images—by first taking loads of colored pictures, applying the B&W filter on all of them, and then using the unmodified image as one domain and the B&W-filtered images as the other. This ensures that we have the corresponding images in both domains. Then we can apply the trained GAN anywhere, but if we do not have an easy way of generating these “perfect” pairs, we are out of luck!
The genius insight of this UC Berkeley group was that we do not, in fact, need perfect pairs.[1] Instead, we simply complete the cycle: we translate from one domain to another and then back again. For example, we go from summer picture (domain A) of a park to a winter one (domain B) and then back again to summer (domain A). Now we have essentially created a cycle, and, ideally, the original picture (a) and the reconstructed picture () are the same. If they are not, we can measure their loss on a pixel level, thereby getting the first loss of our CycleGAN: cycle-consistency loss, which is depicted in figure 9.2.
See “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” by Jun-Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf.
(Source: Jun-Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf.
A common analogy is thinking about the process of back-translation—a sentence in Chinese that is translated to English and then back again to Chinese should give back the same sentence. If not, we can measure the cycle-consistency loss by how much the first and the third sentences differ.
To be able to use the cycle-consistency loss, we need to have two Generators: one translating from A to B, called GAB, sometimes referred to as simply G, and then another one translating from B to A, called GBA, referred to as F for brevity. There are technically two losses—forward cycle-consistency loss and backward cycle-consistency loss—but because all they mean is that as well as , you may think of these as essentially the same, but off by one.
In addition to the cycle-consistency loss, we still have the adversarial loss. Every translation with a Generator GAB has a corresponding Discriminator DB, and GBA has Discriminator DA. The way to think about it is that we are always testing, when translating to domain A, whether the picture looks real; hence we use DA and vice versa.
This is the same idea as with simpler architectures, but now, because of the two losses, we have two Discriminators. We need to make sure that not only the translation from apple to orange looks real, but also that the translation from our estimated orange back to reconstructed apple looks real. Recall that the adversarial loss ensures that the images look real, and as a result, it is still key for the CycleGAN to work. Hence adversarial loss is presented as second. The first Discriminator in the cycle is especially important—otherwise, we’d simply get noise that would help the GAN memorize what it should reconstruct.[2]
In practice, this is a little bit more complicated and would depend on, for example, whether you include both forward and backward cyclical loss. But you may use this as a mental model for how to think of the importance of the adversarial loss—remembering that we have both mappings A-B-A and B-A-B, so both Discriminators get to be the first one at some point.
The idea of identity loss is simple: we want to enforce that CycleGAN preserves the overall color structure (or temperature) of the picture. So we introduce a regularization term that helps us keep the tint of the picture consistent with the original image. Imagine this as a way of ensuring that even after applying many filters onto your image, you still can recover the original image.
This is done by feeding the images already in domain A to the Generator from B to A (GBA), because the CycleGAN should understand that they are already in the correct domain. In other words, we penalize unnecessary changes to the image: if we feed in a zebra and are trying to “zebrafy” an image, we get the same zebra back, as there is nothing to do.[3] Figure 9.3 illustrates the effects of identity loss.
Jun Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf. More at http://mng.bz/loE8.
Even though identity loss is not, strictly speaking, required for the CycleGAN to work, we include it for completeness. Both our implementation and the CycleGAN authors’ latest implementation contain it, because frequently this adjustment leads to empirically better results and enforces a constraint that seems reasonable. But even the CycleGAN paper itself mentions it only briefly as a seeming ex-post justification, so we do not cover it extensively.
Table 9.1 summarizes the losses you’ve learned about in this chapter.
Calculation |
Measures |
Ensures |
|
---|---|---|---|
Adversarial loss | LGAN (G,DB,B,A) = Eb~p(b)[logDB(b)] + Ea~p(a)[log(1-DB(GAB(a))] (This is just the good old NS-GAN presented in chapter 5.) | As in previous cases, the loss measures two terms: first is the likelihood of a given image being the real one rather than the translated image. Second is the part where the Generator may get to fool the Discriminator. Note that this formulation is only for DB, with equivalent DA that comes into the final loss. | That the translated images look real, sharp, and indistinguishable from the real ones. |
Cycle-consistency loss: forward pass | Difference between a and (denoted by [a] | The difference between the images from the original domain a and the twice-translated images . | That the original image and the twice-translated image are the same. If this fails, we may not have a coherent mapping A-B-A. |
Cycle-consistency loss: backward pass | The difference between the images from the original domain b and the twice-translated images . | That the original image and the twice-translated image are the same. If this fails, we may not have a coherent mapping B-A-B. | |
Overall loss | L = LGAN(G,DB,A,B) + LGAN(F,DA,B,A) + λcyc(G,F) | All of the four losses combined (2× adversarial because of two Generators) plus cyclical loss: forward and backward in one term. | That the overall translation is photorealistic and makes sense (provides matching pictures). |
Identity loss (outside the overall loss, for consistency with the CycleGAN paper notation) | Lidentity = Ea~p(a)[|| GBA(a) – a ||] + Eb~p(b) [|| GAB (b) – b ||] | The difference between the image in B and GAB(b) and vice versa. | That the CycleGAN changes parts of the image only when it needs to. |
This notation may be unfamiliar to some, but it represents the L1 norm between the two items. For simplicity, you may think of this as for each pixel, an absolute difference between it and the corresponding pixel on the reconstructed image.
The CycleGAN setup builds directly on the CGAN architecture and is, in essence, two CGANs joined together—or, as the CycleGAN authors themselves point out, an autoencoder. Recall from chapter 2 that we had an input image x and the reconstructed image x*, which was the result of reconstruction after being fed through the latent space z; see figure 9.4.
To translate this diagram into the CycleGAN’s world, a is an image in the A domain, b is an image in B, and is reconstructed A. In CycleGAN’s case, however, we are dealing with a latent space—step 2—of equal dimensionality. It just happens to be another meaningful domain (B) that the CycleGAN has to find. Even with the autoencoder, the latent space was just another domain, though it was not as easily interpretable.
Compared to what we know from chapter 2, the main new concept is the introduction of the adversarial losses. These and many other mixtures of autoencoders and GANs are an active area of research in themselves! So that is also a good area for interested researchers. But for now, think of the two mappings as two autoencoders: F(G(a)) and G(F(b)). We take the basic idea of the autoencoder—including a kind of explicit loss function as substituted by the cycle-consistency loss—and add Discriminators to it. The two Discriminators, one at each step, ensure that both translations (including into the kind of latent space) look like real images in their respective domains.
Before we jump into the actual implementation of the CycleGAN, let’s briefly look at the overall simplified implementation depicted in figure 9.5. There are two flows: in the top diagram, the flow A-B-A starts from an image in domain A, and in the bottom diagram, the flow B-A-B starts with an image in domain B.
(Source: “Understanding and Implementing CycleGAN in TensorFlow,” by Hardik Bansal and Archit Rathore, 2017, https://hardikbansal.github.io/CycleGANBlog/.)
The image then follows two paths: it is (1) fed to the Discriminator to get our decision as to whether it is real or not, and (2) (i) fed to the Generator to translate it to B, then (ii) evaluated by the Discriminator B to see if it looks real in domain B, and eventually (iii) translated back to A to allow us to measure the cyclic loss.
The bottom image is basically an off-by-one cycle of the top image and follows all the same fundamental steps. We’ll use the apple2orange dataset, but many other datasets are available, including the famous horse2zebra dataset, which you can easily use by making a slight modification to the code and downloading the data by using the bash script provided.
To summarize figure 9.5 in another representation for further clarity, table 9.2 reviews all four major networks.
Input |
Output |
Goal |
|
---|---|---|---|
Generator: from A to B | We load either a real picture from A or a translation from B to A. | We translate it to domain B. | Try to create realistic-looking images in domain B. |
Generator: from B to A | We load either a real picture from B or a translation from A to B. | We translate it to domain A. | Try to create realistic-looking images in domain A. |
Discriminator A | We provide a picture in the A domain—either translated or real. | The probability that the picture is real. | Try to not get fooled by the Generator from B to A. |
Discriminator B | We provide a picture in the B domain—either translated or real. | The probability that the picture is real. | Try to not get fooled by the Generator from A to B. |
Figure 9.6 shows the architecture of the Generator. We have re-created the diagram by using the variable names from our code and included the shapes for your benefit. This is an example of a U-Net architecture, because when you draw it in a way that each resolution gets its own level, the network looks like a U.
A couple of things to note here:
As you will see, this just means we concatenate the entire block/tensor to the equivalently colored tensor in the decoder part of the Generator.
The autoencoder is a useful teaching tool for the architecture of the Generator alone as well, because the Generator has an encoder-decoder architecture:
To clarify, the autoencoder model here is useful in two ways. First, the overall CycleGAN architecture can be viewed as training two autoencoders.[5] Second, the U-Net itself has parts referred to as encoder and decoder.
See Jun-Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf.
You may also be a bit puzzled by the downscaling and the subsequent upscaling, but this is just so that we compress the image to the most meaningful representation, but at the same time are able to add back all the detail. It’s the same reasoning as with the autoencoder, except now we also have a path to remember the nuances. This architecture—the U-Net architecture—has just been empirically shown in several domains as better performing on various segmentation tasks. The key idea is that although during downsampling we can focus on classification and understanding of large regions, including higher-resolution skip connections preserves the detail that can then be accurately segmented.
In our implementation of CycleGAN, we’ll use the U-Net architecture with skip connections as shown in figure 9.6, which is more readable. However, many CycleGAN implementations use the ResNet architecture, which you can implement yourself with a bit more work.
The main advantage of ResNet is that it uses fewer parameters and introduces a step in the middle called transformer, which has residual connections in lieu of our encoder-decoder skip connections.
Based on our testing, at least on the dataset used, the apple2orange results remain the same. Instead of explicitly defining the transformer, we provide skip connections (as used in the diagram) from the convolutional to the deconvolutional layers. We will mention these similarities again in code. For now, just remember that.
The CycleGAN’s Discriminator is based on the PatchGAN architecture—we will dive into the technical details in the code section. One thing that may be confusing is that we do not get a single float as an output of this Discriminator, but rather a set of single-channel values that may be thought of as a set of mini-discriminators that we then average together.
Ultimately, this allows the design of the CycleGAN to be fully convolutional, meaning that it can scale relatively easily to higher resolutions. Indeed, in the examples of translating video games to reality or vice versa, the CycleGAN authors have used an upscaled version of the CycleGAN, with only minor modifications thanks to the fully convolutional design. Other than that, the Discriminator should be a relatively straightforward implementation of the Discriminators you have seen before, except there are now two of them.
We have always used objects in TensorFlow and object-oriented programming (OOP) in our code, but we have usually treated the architectures more functionally, because they were generally simple. In the CycleGAN’s case, the architecture is complex, and as a result, we need a structure that allows us to keep accessing the original attributes and methods that we have defined. As a result, we will write out the CycleGAN as a Python class of its own with methods to build the Generator and Discriminator, and run the training.
In this tutorial, we’ll use the Keras-GAN implementation and use Keras with a TensorFlow backend.[6] Tested as late as Keras 2.2.4 and TensorFlow 1.12.0, Keras_contrib was installed from the hash 46fcdb9384b3bc9399c651b2b43640aa54098e64. This time, we have to use a different dataset (also to show you that despite our joke from chapter 2, we do know other datasets). But for educational purposes, we will keep using one of the simpler datasets—apple2orange. Let’s jump right into it by doing all our usual imports, as shown in the following listing.
See the Keras-GAN GitHub repository by Erik Linder-Norén, 2017, https://github.com/eriklindernoren/Keras-GAN.
from __future__ import print_function, division import scipy from keras.datasets import mnist from keras_contrib.layers.normalization import InstanceNormalization from keras.layers import Input, Dense, Reshape, Flatten, Dropout, Concatenate from keras.layers import BatchNormalization, Activation, ZeroPadding2D from keras.layers.advanced_activations import LeakyReLU from keras.layers.convolutional import UpSampling2D, Conv2D from keras.models import Sequential, Model from keras.optimizers import Adam import datetime import matplotlib.pyplot as plt import sys from data_loader import DataLoader import numpy as np import os
As promised, we’ll use the object-oriented style of programming. In the following listing, we create a CycleGAN class with all the initializing parameters, including the data loader. The data loader is defined in the GitHub repository for our book. It simply loads the preprocessed data.
class CycleGAN(): def __init__(self): self.img_rows = 128 1 self.img_cols = 128 1 self.channels = 3 1 self.img_shape = (self.img_rows, self.img_cols, self.channels) self.dataset_name = 'apple2orange' 2 self.data_loader = DataLoader(dataset_name=self.dataset_name, 3 img_res=(self.img_rows, self.img_cols)) patch = int(self.img_rows / 2**4) 4 self.disc_patch = (patch, patch, 1) self.gf = 32 5 self.df = 64 6 self.lambda_cycle = 10.0 7 self.lambda_id = 0.9 * self.lambda_cycle 8 optimizer = Adam(0.0002, 0.5)
Two new terms are lambda_cycle and lambda_id. The second hyperparameter influences identity loss. The CycleGAN authors themselves note that this value influences how dramatic the changes are—especially early in the training process.[7] Setting a lower value leads to unnecessary changes: for example, completely inverting the colors early on. We have selected this value, based on rerunning the training process for apple2orange several times. Frequently, the process is theory-driven alchemy.
See “pytorch-CycleGAN-and-pix2pix Frequently Asked Questions,” by Jun-Yan Zhu, April 2019, http://mng.bz/BY58.
The first hyperparameter—lambda_cycle—controls how strictly the cycle-consistency loss is enforced. Setting this value higher will ensure that your original and reconstructed images are as close together as possible.
So now that we have our basic parameters out of the way, we will build the basic network, as shown in listing 9.3. We will start from the high-level view and move down. This entails the following:
self.d_A = self.build_discriminator() 1 self.d_B = self.build_discriminator() 1 self.d_A.compile(loss='mse', 1 optimizer=optimizer, 1 metrics=['accuracy']) 1 self.d_B.compile(loss='mse', 1 optimizer=optimizer, 1 metrics=['accuracy']) 1 self.g_AB = self.build_generator() 2 self.g_BA = self.build_generator() 2 img_A = Input(shape=self.img_shape) 3 img_B = Input(shape=self.img_shape) 3 fake_B = self.g_AB(img_A) 4 fake_A = self.g_BA(img_B) 4 reconstr_A = self.g_BA(fake_B) 5 reconstr_B = self.g_AB(fake_A) 5 img_A_id = self.g_BA(img_A) 6 img_B_id = self.g_AB(img_B) 6 self.d_A.trainable = False 7 self.d_B.trainable = False 7 valid_A = self.d_A(fake_A) 8 valid_B = self.d_B(fake_B) 8 self.combined = Model(inputs=[img_A, img_B], 9 outputs=[valid_A, valid_B, 9 reconstr_A, reconstr_B, 9 img_A_id, img_B_id]) 9 self.combined.compile(loss=['mse', 'mse', 9 'mae', 'mae', 9 'mae', 'mae'], 9 loss_weights=[1, 1, self.lambda_cycle, self.lambda_cycle, self.lambda_id, self.lambda_id], optimizer=optimizer)
One last thing to clarify from the preceding code: the outputs from the combined model come in lists of six. This is because we always get validities (from the Discriminator), reconstruction, and identity losses—one for A-B-A and one for the B-A-B cycle—hence six. The first two are squared errors, and the rest are mean absolute errors. The relative weights are influenced by the lambda factors described earlier.
Next, we build the Generator code in listing 9.4, which uses the skip connections as we described in section 9.5.2. This is the U-Net architecture. This architecture is simpler to write than the ResNet architecture, which some implementations use. Within our Generator function we first define the helper functions:
Instance normalization is similar to the batch normalization in chapter 4, except that instead of normalizing based on information from the entire batch, we normalize each feature map within each channel separately. Instance normalization often results in better-quality images for tasks such as style transfer or image-to-image translation—just what we need for the CycleGAN!
Here, transposed convolution is—some argue—a more correct term. However, just think of it as the opposite of convolution, or deconvolution.
In step 2d, we’re using a simple UpSampling2D, which is not a learned parameter, but rather uses the nearest neighbors interpolation.
def build_generator(self): """U-Net Generator""" def conv2d(layer_input, filters, f_size=4): """Layers used during downsampling""" d = Conv2D(filters, kernel_size=f_size, strides=2, padding='same')(layer_input) d = LeakyReLU(alpha=0.2)(d) d = InstanceNormalization()(d) return d def deconv2d(layer_input, skip_input, filters, f_size=4, dropout_rate=0): """Layers used during upsampling""" u = UpSampling2D(size=2)(layer_input) u = Conv2D(filters, kernel_size=f_size, strides=1, padding='same', activation='relu')(u) if dropout_rate: u = Dropout(dropout_rate)(u) u = InstanceNormalization()(u) u = Concatenate()([u, skip_input]) return u d0 = Input(shape=self.img_shape) 1 d1 = conv2d(d0, self.gf) 2 d2 = conv2d(d1, self.gf * 2) 2 d3 = conv2d(d2, self.gf * 4) 2 d4 = conv2d(d3, self.gf * 8) 2 u1 = deconv2d(d4, d3, self.gf * 4) 3 u2 = deconv2d(u1, d2, self.gf * 2) 3 u3 = deconv2d(u2, d1, self.gf) 3 u4 = UpSampling2D(size=2)(u3) output_img = Conv2D(self.channels, kernel_size=4, strides=1, padding='same', activation='tanh')(u4) return Model(d0, output_img)
Now for the Discriminator method, which uses a helper function that creates layers formed of 2D convolutions, LeakyReLU, and optionally, InstanceNormalization.
We apply these layers the following way, as shown in listing 9.5:
def build_discriminator(self): def d_layer(layer_input, filters, f_size=4, normalization=True): """Discriminator layer""" d = Conv2D(filters, kernel_size=f_size, strides=2, padding='same')(layer_input) d = LeakyReLU(alpha=0.2)(d) if normalization: d = InstanceNormalization()(d) return d img = Input(shape=self.img_shape) d1 = d_layer(img, self.df, normalization=False) d2 = d_layer(d1, self.df * 2) d3 = d_layer(d2, self.df * 4) d4 = d_layer(d3, self.df * 8) validity = Conv2D(1, kernel_size=4, strides=1, padding='same')(d4) return Model(img, validity)
With all networks written, now we will implement the method that creates our training loop. For the CycleGAN training algorithm, the details of each training iteration are as follows.
For each training iteration do
End for
The following listing implements this CycleGAN training algorithm.
def train(self, epochs, batch_size=1, sample_interval=50): start_time = datetime.datetime.now() valid = np.ones((batch_size,) + self.disc_patch) 1 fake = np.zeros((batch_size,) + self.disc_patch) for epoch in range(epochs): for batch_i, (imgs_A, imgs_B) in enumerate( self.data_loader.load_batch(batch_size)): fake_B = self.g_AB.predict(imgs_A) 2 fake_A = self.g_BA.predict(imgs_B) 2 dA_loss_real = self.d_A.train_on_batch(imgs_A, valid) 3 dA_loss_fake = self.d_A.train_on_batch(fake_A, fake) 3 dA_loss = 0.5 * np.add(dA_loss_real, dA_loss_fake) 3 3 dB_loss_real = self.d_B.train_on_batch(imgs_B, valid) 3 dB_loss_fake = self.d_B.train_on_batch(fake_B, fake) 3 dB_loss = 0.5 * np.add(dB_loss_real, dB_loss_fake) 3 d_loss = 0.5 * np.add(dA_loss, dB_loss) 4 g_loss = self.combined.train_on_batch([imgs_A, imgs_B], 5 [valid, valid, imgs_A, imgs_B, imgs_A, imgs_B]) if batch_i % sample_interval == 0: 6 self.sample_images(epoch, batch_i) 7
We have written all of this complicated code and are now ready to instantiate a CycleGAN object and look at some results, from the sampled images:
gan = CycleGAN() gan.train(epochs=100, batch_size=64, sample_interval=10)
Figure 9.7 shows some results of our hard work.
When you run these results, we hope you will be as impressed as we were. Because of the absolutely astonishing results, lots of researchers flocked to improve on the technique. This section details a CycleGAN extension and then discusses some CycleGAN applications.
“Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data” is a really neat extension to standard CycleGAN that injects latent space information during both translations. Presented at ICML 2018 in Stockholm, Augmented CycleGAN gives us extra variables that drive the generative process.[10] In the same way that we have used latent space in Conditional GANs’ case, we can use it in the CycleGAN setting over and above what CycleGAN already does.
See “Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation,” by Ehsan Hosseini-Asl, 2019, https://arxiv.org/pdf/1807.00374.pdf.
For example, if we have an outline of a shoe in the A domain, we can generate a sample in the B domain, where the same type of shoe is blue. In traditional CycleGAN’s case, it would always be blue. But now, with the latent variables at our disposal, it can be orange, yellow, or whatever we choose.
This is also a useful framework to think about the limitations of the original CycleGAN: because we are not given any extra seeding parameters (such as an extra latent vector z), we cannot control or alter what comes out the other end. If from a particular handbag outline we get an image that is orange, it will always be orange. Augmented CycleGAN gives us more control over the outcomes, as shown in figure 9.8.
(Source: “Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data,” by Amjad Almahairi et al., 2018, http://arxiv.org/abs/1802.10151.)
Many CycleGAN (or CycleGAN-inspired) applications have been proposed in the short time it has been around. They usually revolve around creating simulated virtual environments and subsequently making them photorealistic. For example, imagine you need more training data for a self-driving car company: just simulate it in Unity or a GTA 5 graphics engine and then use CycleGAN to translate the data.
This works especially well if you need to have particular risk situations that are expensive or time-consuming to re-create (for example, car crashes, or fire trucks speeding to reach a destination), but you need them in your dataset. For a self-driving car company, this could be extremely useful to balance the dataset with at-risk situations, which are rare, but correct behavior is all the more important.
One example of this kind of framework is Cycle Consistent Adversarial Domain Adaptation (CyCADA).[11] Unfortunately, a full explanation of the way it works is beyond the scope of this chapter. This is because there are many more such frameworks: some even experiment with CycleGAN in language, music, or other forms of domain adaptation. To give you a sense of the complexity, figure 9.9 shows the architecture and design of CyCADA.
See “CyCADA: Cycle-Consistent Adversarial Domain Adaptation,” by Judy Hoffman et al., 2017, https://arxiv.org/pdf/1711.03213.pdf.