CHAPTER 6

Federated Transfer Learning

We have discussed horizontal federated learning (HFL) and vertical federated learning (VFL) in Chapters 4 and 5, respectively. HFL requires all participating parties share the same feature space while VFL require parties share the same sample space. In practice, however, we often face situations in which there are not enough shared features or samples among the participating parties. In those cases, one can still build a federated learning model combined with transfer learning that transfers knowledge among the parties to achieve better performance. We refer to the combination of federated learning and transfer learning as Federated Transfer Learning (FTL). In this chapter, we provide a formal definition of FTL and discuss the differences between FTL and traditional transfer learning. We then introduce a secure FTL framework proposed in Liu et al. [2019], and conclude this chapter with a summary of the challenges and open issues.

6.1    HETEROGENEOUS FEDERATED LEARNING

Both HFL and VFL require all participants share either the same feature space or the same sample space in order to build an effective shared machine learning (ML) model. In more practical scenarios, however, datasets maintained by participants may be highly heterogeneous in one way or the other.

•  Datasets may share only a handful of samples and features.

•  Distributions among those datasets could be quite different.

•  The size of those datasets could vary greatly.

•  Some participants may only have data with no or limited labels.

To address these issues, federated learning can be combined with transfer learning techniques [Pan and Yang, 2010] to enable a broader range of businesses and applications that have only small data (few overlapping samples and features) and weak supervision (few labels) to build effective and accurate ML models while complying with data privacy and security law [Yang et al., 2019, Liu et al., 2019]. We refer to the combination of federated learning and transfer learning as FTL, which deals with problems that exceed the scope of the existing HFL and VFL settings.

6.2    FEDERATED TRANSFER LEARNING

Transfer learning is a learning technique to provide solutions for cross-domain knowledge transfer. In many applications, we only have a small amount of labeled data or weak supervision such that ML models cannot be built reliably [Pan and Yang, 2010]. In such situations, we can still build high-performance ML models by leveraging and adapting models from similar tasks or domains. In recent years, there have been an increasing number of research works on applying transfer learning to various fields ranging from image classification [Zhu et al., 2011] to natural language understanding and sentiment analysis [Li et al., 2017, Pan et al., 2010].

The essence of transfer learning is to find the invariant between a resource-rich source domain and a resource-scarce target domain, and exploit that invariant to transfer knowledge from source domain to target domain. Based on approaches used to conduct transfer learning, Pan and Yang [2010] divides transfer learning into mainly three categories: (i) instance-based transfer, (ii) feature-based transfer, and (iii) model-based transfer. FTL extends the traditional transfer learning to the privacy-preserving distributed machine learning (DML) paradigm. Here, we describe how these three categories of transfer learning techniques can be applied to HFL and VFL, respectively.

•  Instance-based FTL. For HFL, data of participating parties are typically drawn from different distributions, which may lead to the poor performance of ML models trained on those data. Participating parties can selectively pick or re-weight training data samples to relieve the distribution difference such that the objective loss function can be optimally minimized. For VFL, participating parties may have quite different business objectives. Thus, aligned samples and some of their features may have a negative impact on the federated transfer learning, which is referred to as negative transfer [Pan and Yang, 2010]. In this scenario, participating parties can selectively choose features and samples for avoiding negative transfer.

•  Feature-based FTL. Participating parties collaboratively learn a common feature representation space, in which the distribution and semantic difference among feature representations transformed from raw data can be relieved and such that knowledge can be transferable across different domains. For HFL, the common feature representation space can be learned through minimizing the maximum mean discrepancy (MMD) [Pan et al., 2009] among samples of participating parties. While for VFL, the common feature representation space can be learned through minimizing the distance between representations of aligned samples belonging to different parties.

•  Model-based FTL. Participating parties collaboratively learn shared models that can benefit for transfer learning. Alternatively, participating parties utilize pre-trained models as the whole or part of the initial models for a federated learning task. HFL is a kind of model-based FTL since during training a shared global model is being learned based on data of all parties, and that shared global model is served as a pre-trained model to be fine-tuned by each party in each communication round [McMahan et al., 2016a]. For VFL, predictive models can be learned from aligned samples for inferring missing features and labels (i.e., the blank spaces in Figure 1.4). Then, the enlarged training samples can be used to train a more accurate shared model.

Formally, FTL aims to provide solutions for situations when:

Image

where Xi and Yi denote the feature space and the label space of the i th party, respectively; Ii stands for the sample space, and matrix Di represents the dataset held by the i th party [Yang et al., 2019]. The objective is to predict labels for newly incoming samples or existing unlabeled samples as accurately as possible.

In Section 6.3, we will introduce a secure feature-based FTL framework proposed by Liu et al. [2019] that helps predict labels for target domain by exploiting knowledge transferred from source domain.

From the technical perspective, FTL differs from traditional transfer learning mainly in the following two ways.

•  FTL builds models based on data distributed among multiple parties, and the data belonging to each party cannot be gathered together or exposed to other parties. Traditional transfer learning has no such constraint.

•  FTL requires the preservation of user privacy and the protection of data (and model) security, which is not a significant concern in traditional transfer learning.

FTL brings traditional transfer learning into the privacy-preserving DML paradigm. Therefore, we should define the security that a FTL system must guarantee.

Definition 6.1 Security definition of a FTL system. An FTL system typically involves two parties, namely the source domain party and the target domain party. A multi-party FTL system can be regarded as a combination of multiple two-party FTL subsystems. It is assumed that both parties are honest-but-curious. That is, all parties in the federation follow the federation protocols and rules but they will try to deduce information from data received. Consider a threat model with a semi-honest adversary who can corrupt at most one of the two parties of a two-party FTL system. For a protocol P performing (OA, OB) = P(IA, IB), where OA and OB are party A’s and party B’s respective outputs, and IA and IB are their respective inputs, P is secure against party A if there exists an infinite number of (I′B, O′B) pairs such that (OA, O′B) = P(IA, I′B). Such a security definition has been adopted in Du et al. [2004]. It provides a practical solution to control information disclosure as compared to complete zero knowledge security.

6.3    THE FTL FRAMEWORK

In this section, we introduce a secure feature-based FTL framework proposed by Liu et al. [2019]. Figure 6.1 illustrates this FTL framework in which a predictive model learned from feature representations of aligned samples belonging to party A and party B is utilized to predict labels for unlabeled samples of party B.

Image

Figure 6.1: Illustration of FTL [Yang et al., 2019]. A predictive model learned from feature representations of aligned samples belonging to party A and party B is utilized to predict labels for unlabeled samples of party B.

Consider a source domain party A with dataset Image where Image and Image is the i th label, a target domain party B with dataset Image where Image are separately held by two private parties and cannot be exposed to each other. We also assume that there exists a limited set of co-occurring samples Image and a small set of labels for B’s data in party A: Image, where Nc is the number of available target labels.

Without loss of generality, we assume all labels are in party A, but all the description here can be adapted to the case where labels exist in party B. One can find the commonly shared sample ID set in a privacy-preserving setting by masking data IDs with encryption techniques such as the RSA scheme. Here, we assume that A and B already found or both know their commonly shared sample IDs. Given the above setting, the objective is for the two parities to collaboratively build a transfer learning model to predict labels for the target-domain party B as accurately as possible without exposing data to each other.

In recent years, DNNs have been widely adopted in transfer learning to find the implicit transfer mechanism [Oquab et al., 2014]. Here, we explore a general scenario in which hidden representations of A and B are produced by two neural networks Image and Image, where Image and Image, d is the dimension of the hidden representation layer. Figure 6.2 illustrates the architecture of two neural networks.

Image

Figure 6.2: The architecture of neural networks of source and target domains.

To label the data in the target domain, a general approach is to introduce a prediction function Image. For example, Shu et al. [2015] used a translator function, Image. We can then write the training objective function using the available labeled dataset as:

Image

where ΘA, ΘB are training parameters of NetA and NetB, respectively. Let LA and LB be the number of layers for NetA and NetB, respectively. Then, Image where Image and Image are the training parameters for the l th layer. ℓ1 denotes the loss function. For logistic loss, l1(y, φ) = log(1 + e–yφ).

In addition, we also aim to minimize the alignment loss between A and B.

Image

where ℓ2 denotes the alignment loss which can be represented as Image. For simplicity, we assume it can be expressed in the form Image, where κ is a constant.

The final objective function is:

Image

where γ and λ are the weight parameters, and Image are the regularization terms.

The next step is to obtain the gradients for updating ΘA, ΘB through back propagation. For i ∈ {A, B}, we have

Image

Under the condition that A and B shall not expose their raw data, privacy-preserving approaches need to be developed to compute the loss in Equation (6.4) and the gradients in Equation (6.5). We describe two secure federated transfer learning approaches at high level for computing Equations (6.4) and (6.5). One is based on homomorphic encryption [Acar et al., 2018] and the other is based on secret sharing. In both approaches, we adopt second-order Taylor approximation for computing (6.4) and (6.5).

6.3.1    ADDITIVELY HOMOMORPHIC ENCRYPTION

Additively homomorphic encryption [Acar et al., 2018] and polynomial approximations have been widely used for privacy-preserving ML. The trade-offs between efficiency and privacy by adopting such approximations have been discussed in detail in Aono et al. [2016], Kim et al. [2018], and Phong et al. [2018]. Applying Equations (6.4) and (6.5), and additively homomorphic encryption (denoted as ⟦·⟧, see also Section 2.4.2), we obtain the privacy preserved loss function and the corresponding gradients for the two domains as:

Image
Image
Image

Let ⟦·⟧A and ⟦·⟧B be homomorphic encryption operators with public keys from A and B, respectively. Let Image be a set of intermediate components computed and encrypted by party A for calculating Image. Let Image be a set of intermediate components computed and encrypted by party B for calculating Image, respectively.

Note that we exclude mathematical details of loss and gradients calculation, and focus on the collaboration between participating parties. We refer interested readers to Liu et al. [2019] for detailed explaination of the secure FTL framework.

6.3.2    THE FTL TRAINING PROCESS

With Equations (6.6), (6.7), and (6.8), we can now design a federated algorithm for training the FTL model. The training process contains the following steps.

•  Step 1: Party A and party B initialize and run their independent neural networks NetA and NetB locally to obtain hidden representations Image and Image.

•  Step 2: Party A computes and encrypts a list of intermediate components, denoted as Image and sends them to B to assist with the calculations of gradients Image. While party B computes and encrypts a list of intermediate components, denoted as Image, and sends them to A to assist with the calculations of gradients Image and loss L.

•  Step 3: Based on Image and ⟦LBB received, party A computes Image and ⟦LB via (6.6) and (6.8). Then party A creates random mask mA and add it to Image to obtain Image. Party A sends Image and ⟦LB to B. Based on Image received, party B computes Image via Equation (6.7). Then party B creates random mask mB and add it to Image to obtain Image. Party B sends Image to A.

•  Step 4: Party A decrypts Image and sends it to B. While party B decrypts Image and L, and sends them to A.

•  Step 5: Party A and party B remove random masks and obtain gradients Image and Image, respectively. Then the two parties update their respective model with the decrypted gradients.

•  Step 6: Party A sends termination signals to B once the loss L converges. Otherwise, goes to step 1 to continue the training process.

Recently, there are a large number of works discussing the potential risks associated with indirect privacy leakage through gradients [Bonawitz et al., 2016, Hitaj et al., 2017, McSherry, 2017, Phong et al., 2018, Shokri and Shmatikov, 2015]. To prevent the two parties from knowing each other’s gradients, A and B further mask their own gradient with an encrypted random value. A and B then exchange the encrypted masked gradients and loss and obtain the decrypted values. Here, the encryption step is to prevent a malicious third-party from eavesdropping the transmissions, while the masking step is to prevent A and B from knowing each other’s exact gradient value.

6.3.3    THE FTL PREDICTION PROCESS

Once the FTL model has been trained, it can be used to provide predictions for unlabeled data in party B. The prediction process for each unlabeled data point involves the following steps.

•  Step 1: Party B computes Image with the trained neural network parameters ΘB, and sends encrypted Image to party A.

•  Step 2: Party A evaluates Image and masks the result with a random value, and sends the encrypted and masked Image to B.

•  Step 3: Party B decrypts Image and sends Image back to party A.

•  Step 4: Party A obtains Image and the label Image, and sends the label Image to B.

Note that the only source of performance loss over the secure FTL process is second-order Taylor approximation of the final loss function, rather than at every nonlinear activation layer of the neural network [Hesamifard et al., 2017]. The computations inside the networks are unaffected. As demonstrated in Liu et al. [2019], the errors in loss and gradient calculations, as well as the loss in accuracy by adopting our approach are minimal. Therefore, the approach is scalable and flexible to changes in neural network structures.

6.3.4    SECURITY ANALYSIS

As demonstrated in Liu et al. [2019], both the FTL training process and the FTL prediction process are secure under our security definition (see Definition 6.1), provided that the underlying additively homomorphic encryption scheme is secure.

During training, raw data DA and DB, as well as the local models NetA and NetB are never exposed and only the encrypted hidden representations are exchanged. In each iteration, the only non-encrypted values party A and party B receive are the gradients of the model parameters, which are aggregated from all variables and masked by random numbers. At the end of the training process, each party (A or B) remains oblivious to the data structure of the other party and each obtains model parameters associated only with its own features. At inference time, the two parties need to collaborate in order to compute the prediction results.

Note the protocol does not deal with a malicious party. If party A fakes its inputs and submits only one non-zero input, it may be able tell the value of Image at the position of that input. It still cannot tell Image or ΘB, and neither party will be able to obtain correct results.

6.3.5    SECRET SHARING-BASED FTL

Homomorphic encryption techniques are capable of providing a high level of security for the information or knowledge shared among parties, thereby protecting the privacy of data and models belonging to each party. However, homomorphic encryption techniques typically need extensive computational resources and massive parallelization to scale, which make them impractical in many applications that require real-time throughput.

An alternative secure protocol to homomorphic encryption is secret sharing. The biggest advantages of the secret sharing approach include (i) there is no accuracy loss, and (ii) computation is much more efficient than homomorphic encryption approach. The drawback of the secret sharing approach is that one has to offline generate and store many triplets before online computation.

To facilitate the description of secret sharing-based FTL algorithm, we rewrite Equations (6.6), (6.7), and (6.8) as follows:

Image
Image
Image

where LA and Image are computed solely by party A, and LB and Image are computed solely by party B. LAB, Image and Image are computed collaboratively by A and B through secret sharing scheme.

The whole process of computing (6.9), (6.10), and (6.11) can be performed securely through secret sharing with the help of Beaver’s triples. The secret sharing-based FTL training process is summarized in the following steps.

•  Step 1: Party A and party B initialize and run their independent neural networks NetA and NetB locally to obtain hidden representations Image and Image.

•  Step 2: Party A and party B collaboratively computes LAB through secret sharing. Party A computes LA and sends it to party B. Party B computes LB and sends it to party A.

•  Step 3: Party A and party B individually reconstruct loss L via Equation (6.9).

•  Step 4: Party A and party B collaboratively computes Image and Image through secret sharing.

•  Step 5: Party A computes its gradients via Image and updates its local model Image. While at the same time, party B computes its gradients via Image and updates it local model Image;

•  Step 6: Party A sends termination signals to B once the loss L converges. Otherwise, goes to step 1 to continue the training process.

After training is completed, we proceed to the prediction phase. At the high level, the prediction process is quite simple. It involves the following two steps.

•  Step 1: Party A and party B run their trained neural networks NetA and NetB locally to obtain hidden representations Image and Image.

•  Step 2: based on Image and Image, party A and party B collaboratively reconstruct Image through secret sharing and calculate the label Image.

Note that in both training and prediction processes, the only information that any party receives regarding any private value of the other party is only a share of that private value based on the secret sharing scheme. Therefore, no party is able to learn any information about the private values it is not supposed to learn.

6.4    CHALLENGES AND OUTLOOK

Traditional transfer learning is typically conducted in sequential or centralized way. Sequential transfer learning [Ruder, 2019] means that transfer knowledge is first learned on source task and then applied to target domain to improve the performance of the target model. Sequential transfer learning is ubiquitous and effective in computer vision where it is typically practiced in the form of pretrained model on large image datasets such as ImageNet [Bagdasaryan et al., 2009]. It is also commonly used in natural language processing to encode language units (e.g., word, sentence or document) in the form of distributed representations. The centralized transfer learning indicates that the models and data involved in transfer learning are located in one place. Thus, traditional transfer learning is not applicable in many practical applications where data is scattered among multiple parties and its privacy is a major concern. FTL is a feasible and promising solution to address those issues.

Research work on incorporating transfer learning into federated learning framework is fast-growing. However, for practical applications, FTL still faces many challenges. We list three of them as follows.

•  We need to develop schemes to learn the transferable knowledge in a way that it can well capture the invariant between participants. Different from sequential and centralized transfer learning where the transfer knowledge is typically represented in one universal pre-trained model, transfer knowledge in FTL is distributed among local models. Each participant has total control in designing and training its local model. A balance should be achieved between autonomy and generalization performance of the FTL models.

•  We need to determine how to learn a representation of transfer knowledge in a distributed environment while preserving the privacy of the shared representation of all participants. Under the federated learning framework, transfer knowledge is not only learned in a distributed manner, but also is typically not allowed to be exposed to any participant. Thus, we need to figure out precisely what each participant contributes to the shared representation in the federation and consider how to preserve the privacy of the shared representation.

•  We need to design efficient secure protocols that can be employed in federated transfer learning. FTL usually requires closer interactions among participants in terms of communication frequency and the size of transferred data. Careful consideration should be taken when designing or choosing secure protocols in order to achieve a balance between security and overhead.

There are certainly many other challenges that are waiting for researchers and engineers to address. We envision that with the high practical value brought by FTL, more and more institutes and enterprises would invest resources and efforts into the research and implementation of FTL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset