6 Federated Transfer Learning

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6

Federated Transfer Learning

We have discussed horizontal federated learning (HFL) and vertical federated learning (VFL) in Chapters 4 and 5, respectively. HFL requires all participating parties share the same feature space while VFL require parties share the same sample space. In practice, however, we often face situations in which there are not enough shared features or samples among the participating parties. In those cases, one can still build a federated learning model combined with transfer learning that transfers knowledge among the parties to achieve better performance. We refer to the combination of federated learning and transfer learning as Federated Transfer Learning (FTL). In this chapter, we provide a formal definition of FTL and discuss the differences between FTL and traditional transfer learning. We then introduce a secure FTL framework proposed in Liu et al. [2019], and conclude this chapter with a summary of the challenges and open issues.

6.1 HETEROGENEOUS FEDERATED LEARNING

Both HFL and VFL require all participants share either the same feature space or the same sample space in order to build an effective shared machine learning (ML) model. In more practical scenarios, however, datasets maintained by participants may be highly heterogeneous in one way or the other.

• Datasets may share only a handful of samples and features.

• Distributions among those datasets could be quite different.

• The size of those datasets could vary greatly.

• Some participants may only have data with no or limited labels.

To address these issues, federated learning can be combined with transfer learning techniques [Pan and Yang, 2010] to enable a broader range of businesses and applications that have only small data (few overlapping samples and features) and weak supervision (few labels) to build effective and accurate ML models while complying with data privacy and security law [Yang et al., 2019, Liu et al., 2019]. We refer to the combination of federated learning and transfer learning as FTL, which deals with problems that exceed the scope of the existing HFL and VFL settings.

6.2 FEDERATED TRANSFER LEARNING

Transfer learning is a learning technique to provide solutions for cross-domain knowledge transfer. In many applications, we only have a small amount of labeled data or weak supervision such that ML models cannot be built reliably [Pan and Yang, 2010]. In such situations, we can still build high-performance ML models by leveraging and adapting models from similar tasks or domains. In recent years, there have been an increasing number of research works on applying transfer learning to various fields ranging from image classification [Zhu et al., 2011] to natural language understanding and sentiment analysis [Li et al., 2017, Pan et al., 2010].

The essence of transfer learning is to find the invariant between a resource-rich source domain and a resource-scarce target domain, and exploit that invariant to transfer knowledge from source domain to target domain. Based on approaches used to conduct transfer learning, Pan and Yang [2010] divides transfer learning into mainly three categories: (i) instance-based transfer, (ii) feature-based transfer, and (iii) model-based transfer. FTL extends the traditional transfer learning to the privacy-preserving distributed machine learning (DML) paradigm. Here, we describe how these three categories of transfer learning techniques can be applied to HFL and VFL, respectively.

• Instance-based FTL. For HFL, data of participating parties are typically drawn from different distributions, which may lead to the poor performance of ML models trained on those data. Participating parties can selectively pick or re-weight training data samples to relieve the distribution difference such that the objective loss function can be optimally minimized. For VFL, participating parties may have quite different business objectives. Thus, aligned samples and some of their features may have a negative impact on the federated transfer learning, which is referred to as negative transfer [Pan and Yang, 2010]. In this scenario, participating parties can selectively choose features and samples for avoiding negative transfer.

• Feature-based FTL. Participating parties collaboratively learn a common feature representation space, in which the distribution and semantic difference among feature representations transformed from raw data can be relieved and such that knowledge can be transferable across different domains. For HFL, the common feature representation space can be learned through minimizing the maximum mean discrepancy (MMD) [Pan et al., 2009] among samples of participating parties. While for VFL, the common feature representation space can be learned through minimizing the distance between representations of aligned samples belonging to different parties.

• Model-based FTL. Participating parties collaboratively learn shared models that can benefit for transfer learning. Alternatively, participating parties utilize pre-trained models as the whole or part of the initial models for a federated learning task. HFL is a kind of model-based FTL since during training a shared global model is being learned based on data of all parties, and that shared global model is served as a pre-trained model to be fine-tuned by each party in each communication round [McMahan et al., 2016a]. For VFL, predictive models can be learned from aligned samples for inferring missing features and labels (i.e., the blank spaces in Figure 1.4). Then, the enlarged training samples can be used to train a more accurate shared model.

Formally, FTL aims to provide solutions for situations when:

where X_i and Y_i denote the feature space and the label space of the i th party, respectively; I_i stands for the sample space, and matrix D_i represents the dataset held by the i th party [Yang et al., 2019]. The objective is to predict labels for newly incoming samples or existing unlabeled samples as accurately as possible.

In Section 6.3, we will introduce a secure feature-based FTL framework proposed by Liu et al. [2019] that helps predict labels for target domain by exploiting knowledge transferred from source domain.

From the technical perspective, FTL differs from traditional transfer learning mainly in the following two ways.

• FTL builds models based on data distributed among multiple parties, and the data belonging to each party cannot be gathered together or exposed to other parties. Traditional transfer learning has no such constraint.

• FTL requires the preservation of user privacy and the protection of data (and model) security, which is not a significant concern in traditional transfer learning.

FTL brings traditional transfer learning into the privacy-preserving DML paradigm. Therefore, we should define the security that a FTL system must guarantee.

Definition 6.1 Security definition of a FTL system. An FTL system typically involves two parties, namely the source domain party and the target domain party. A multi-party FTL system can be regarded as a combination of multiple two-party FTL subsystems. It is assumed that both parties are honest-but-curious. That is, all parties in the federation follow the federation protocols and rules but they will try to deduce information from data received. Consider a threat model with a semi-honest adversary who can corrupt at most one of the two parties of a two-party FTL system. For a protocol P performing (O_A, O_B) = P(I_A, I_B), where O_A and O_B are party A’s and party B’s respective outputs, and I_A and I_B are their respective inputs, P is secure against party A if there exists an infinite number of (I′_B, O′_B) pairs such that (O_A, O′_B) = P(I_A, I′_B). Such a security definition has been adopted in Du et al. [2004]. It provides a practical solution to control information disclosure as compared to complete zero knowledge security.

6.3 THE FTL FRAMEWORK

In this section, we introduce a secure feature-based FTL framework proposed by Liu et al. [2019]. Figure 6.1 illustrates this FTL framework in which a predictive model learned from feature representations of aligned samples belonging to party A and party B is utilized to predict labels for unlabeled samples of party B.

Figure 6.1: Illustration of FTL [Yang et al., 2019]. A predictive model learned from feature representations of aligned samples belonging to party A and party B is utilized to predict labels for unlabeled samples of party B.

Consider a source domain party A with dataset where and is the i th label, a target domain party B with dataset where are separately held by two private parties and cannot be exposed to each other. We also assume that there exists a limited set of co-occurring samples and a small set of labels for B’s data in party A: , where N_c is the number of available target labels.

Without loss of generality, we assume all labels are in party A, but all the description here can be adapted to the case where labels exist in party B. One can find the commonly shared sample ID set in a privacy-preserving setting by masking data IDs with encryption techniques such as the RSA scheme. Here, we assume that A and B already found or both know their commonly shared sample IDs. Given the above setting, the objective is for the two parities to collaboratively build a transfer learning model to predict labels for the target-domain party B as accurately as possible without exposing data to each other.

In recent years, DNNs have been widely adopted in transfer learning to find the implicit transfer mechanism [Oquab et al., 2014]. Here, we explore a general scenario in which hidden representations of A and B are produced by two neural networks and , where and , d is the dimension of the hidden representation layer. Figure 6.2 illustrates the architecture of two neural networks.

Figure 6.2: The architecture of neural networks of source and target domains.

To label the data in the target domain, a general approach is to introduce a prediction function . For example, Shu et al. [2015] used a translator function, . We can then write the training objective function using the available labeled dataset as:

where Θ^A, Θ^B are training parameters of Net^A and Net^B, respectively. Let L_A and L_B be the number of layers for Net^A and Net^B, respectively. Then, where and are the training parameters for the l th layer. ℓ₁ denotes the loss function. For logistic loss, l₁(y, φ) = log(1 + e^–yφ).

In addition, we also aim to minimize the alignment loss between A and B.

where ℓ₂ denotes the alignment loss which can be represented as . For simplicity, we assume it can be expressed in the form , where κ is a constant.

The final objective function is:

where γ and λ are the weight parameters, and are the regularization terms.

The next step is to obtain the gradients for updating Θ^A, Θ^B through back propagation. For i ∈ {A, B}, we have

Under the condition that A and B shall not expose their raw data, privacy-preserving approaches need to be developed to compute the loss in Equation (6.4) and the gradients in Equation (6.5). We describe two secure federated transfer learning approaches at high level for computing Equations (6.4) and (6.5). One is based on homomorphic encryption [Acar et al., 2018] and the other is based on secret sharing. In both approaches, we adopt second-order Taylor approximation for computing (6.4) and (6.5).

6.3.1 ADDITIVELY HOMOMORPHIC ENCRYPTION

Additively homomorphic encryption [Acar et al., 2018] and polynomial approximations have been widely used for privacy-preserving ML. The trade-offs between efficiency and privacy by adopting such approximations have been discussed in detail in Aono et al. [2016], Kim et al. [2018], and Phong et al. [2018]. Applying Equations (6.4) and (6.5), and additively homomorphic encryption (denoted as ⟦·⟧, see also Section 2.4.2), we obtain the privacy preserved loss function and the corresponding gradients for the two domains as:

Let ⟦·⟧_A and ⟦·⟧_B be homomorphic encryption operators with public keys from A and B, respectively. Let be a set of intermediate components computed and encrypted by party A for calculating . Let be a set of intermediate components computed and encrypted by party B for calculating , respectively.

Note that we exclude mathematical details of loss and gradients calculation, and focus on the collaboration between participating parties. We refer interested readers to Liu et al. [2019] for detailed explaination of the secure FTL framework.

6.3.2 THE FTL TRAINING PROCESS

With Equations (6.6), (6.7), and (6.8), we can now design a federated algorithm for training the FTL model. The training process contains the following steps.

• Step 1: Party A and party B initialize and run their independent neural networks Net^A and Net^B locally to obtain hidden representations and .

• Step 2: Party A computes and encrypts a list of intermediate components, denoted as and sends them to B to assist with the calculations of gradients . While party B computes and encrypts a list of intermediate components, denoted as , and sends them to A to assist with the calculations of gradients and loss L.

• Step 3: Based on and ⟦L^B⟧_B received, party A computes and ⟦L⟧_B via (6.6) and (6.8). Then party A creates random mask m^A and add it to to obtain . Party A sends and ⟦L⟧_B to B. Based on received, party B computes via Equation (6.7). Then party B creates random mask m^B and add it to to obtain . Party B sends to A.

• Step 4: Party A decrypts and sends it to B. While party B decrypts and L, and sends them to A.

• Step 5: Party A and party B remove random masks and obtain gradients and , respectively. Then the two parties update their respective model with the decrypted gradients.

• Step 6: Party A sends termination signals to B once the loss L converges. Otherwise, goes to step 1 to continue the training process.

Recently, there are a large number of works discussing the potential risks associated with indirect privacy leakage through gradients [Bonawitz et al., 2016, Hitaj et al., 2017, McSherry, 2017, Phong et al., 2018, Shokri and Shmatikov, 2015]. To prevent the two parties from knowing each other’s gradients, A and B further mask their own gradient with an encrypted random value. A and B then exchange the encrypted masked gradients and loss and obtain the decrypted values. Here, the encryption step is to prevent a malicious third-party from eavesdropping the transmissions, while the masking step is to prevent A and B from knowing each other’s exact gradient value.

6.3.3 THE FTL PREDICTION PROCESS

Once the FTL model has been trained, it can be used to provide predictions for unlabeled data in party B. The prediction process for each unlabeled data point involves the following steps.

• Step 1: Party B computes with the trained neural network parameters Θ^B, and sends encrypted to party A.

• Step 2: Party A evaluates and masks the result with a random value, and sends the encrypted and masked to B.

• Step 3: Party B decrypts and sends back to party A.

• Step 4: Party A obtains and the label , and sends the label to B.

Note that the only source of performance loss over the secure FTL process is second-order Taylor approximation of the final loss function, rather than at every nonlinear activation layer of the neural network [Hesamifard et al., 2017]. The computations inside the networks are unaffected. As demonstrated in Liu et al. [2019], the errors in loss and gradient calculations, as well as the loss in accuracy by adopting our approach are minimal. Therefore, the approach is scalable and flexible to changes in neural network structures.

6.3.4 SECURITY ANALYSIS

As demonstrated in Liu et al. [2019], both the FTL training process and the FTL prediction process are secure under our security definition (see Definition 6.1), provided that the underlying additively homomorphic encryption scheme is secure.

During training, raw data D_A and D_B, as well as the local models Net^A and Net^B are never exposed and only the encrypted hidden representations are exchanged. In each iteration, the only non-encrypted values party A and party B receive are the gradients of the model parameters, which are aggregated from all variables and masked by random numbers. At the end of the training process, each party (A or B) remains oblivious to the data structure of the other party and each obtains model parameters associated only with its own features. At inference time, the two parties need to collaborate in order to compute the prediction results.

Note the protocol does not deal with a malicious party. If party A fakes its inputs and submits only one non-zero input, it may be able tell the value of at the position of that input. It still cannot tell or Θ_B, and neither party will be able to obtain correct results.

6.3.5 SECRET SHARING-BASED FTL

Homomorphic encryption techniques are capable of providing a high level of security for the information or knowledge shared among parties, thereby protecting the privacy of data and models belonging to each party. However, homomorphic encryption techniques typically need extensive computational resources and massive parallelization to scale, which make them impractical in many applications that require real-time throughput.

An alternative secure protocol to homomorphic encryption is secret sharing. The biggest advantages of the secret sharing approach include (i) there is no accuracy loss, and (ii) computation is much more efficient than homomorphic encryption approach. The drawback of the secret sharing approach is that one has to offline generate and store many triplets before online computation.

To facilitate the description of secret sharing-based FTL algorithm, we rewrite Equations (6.6), (6.7), and (6.8) as follows:

where L_A and are computed solely by party A, and L_B and are computed solely by party B. L_AB, and are computed collaboratively by A and B through secret sharing scheme.

The whole process of computing (6.9), (6.10), and (6.11) can be performed securely through secret sharing with the help of Beaver’s triples. The secret sharing-based FTL training process is summarized in the following steps.

• Step 1: Party A and party B initialize and run their independent neural networks Net^A and Net^B locally to obtain hidden representations and .

• Step 2: Party A and party B collaboratively computes L_AB through secret sharing. Party A computes L_A and sends it to party B. Party B computes L_B and sends it to party A.

• Step 3: Party A and party B individually reconstruct loss L via Equation (6.9).

• Step 4: Party A and party B collaboratively computes and through secret sharing.

• Step 5: Party A computes its gradients via and updates its local model . While at the same time, party B computes its gradients via and updates it local model ;

• Step 6: Party A sends termination signals to B once the loss L converges. Otherwise, goes to step 1 to continue the training process.

After training is completed, we proceed to the prediction phase. At the high level, the prediction process is quite simple. It involves the following two steps.

• Step 1: Party A and party B run their trained neural networks Net^A and Net^B locally to obtain hidden representations and .

• Step 2: based on and , party A and party B collaboratively reconstruct through secret sharing and calculate the label .

Note that in both training and prediction processes, the only information that any party receives regarding any private value of the other party is only a share of that private value based on the secret sharing scheme. Therefore, no party is able to learn any information about the private values it is not supposed to learn.

6.4 CHALLENGES AND OUTLOOK

Traditional transfer learning is typically conducted in sequential or centralized way. Sequential transfer learning [Ruder, 2019] means that transfer knowledge is first learned on source task and then applied to target domain to improve the performance of the target model. Sequential transfer learning is ubiquitous and effective in computer vision where it is typically practiced in the form of pretrained model on large image datasets such as ImageNet [Bagdasaryan et al., 2009]. It is also commonly used in natural language processing to encode language units (e.g., word, sentence or document) in the form of distributed representations. The centralized transfer learning indicates that the models and data involved in transfer learning are located in one place. Thus, traditional transfer learning is not applicable in many practical applications where data is scattered among multiple parties and its privacy is a major concern. FTL is a feasible and promising solution to address those issues.

Research work on incorporating transfer learning into federated learning framework is fast-growing. However, for practical applications, FTL still faces many challenges. We list three of them as follows.

• We need to develop schemes to learn the transferable knowledge in a way that it can well capture the invariant between participants. Different from sequential and centralized transfer learning where the transfer knowledge is typically represented in one universal pre-trained model, transfer knowledge in FTL is distributed among local models. Each participant has total control in designing and training its local model. A balance should be achieved between autonomy and generalization performance of the FTL models.

• We need to determine how to learn a representation of transfer knowledge in a distributed environment while preserving the privacy of the shared representation of all participants. Under the federated learning framework, transfer knowledge is not only learned in a distributed manner, but also is typically not allowed to be exposed to any participant. Thus, we need to figure out precisely what each participant contributes to the shared representation in the federation and consider how to preserve the privacy of the shared representation.

• We need to design efficient secure protocols that can be employed in federated transfer learning. FTL usually requires closer interactions among participants in terms of communication frequency and the size of transferred data. Careful consideration should be taken when designing or choosing secure protocols in order to achieve a balance between security and overhead.

There are certainly many other challenges that are waiting for researchers and engineers to address. We envision that with the high practical value brought by FTL, more and more institutes and enterprises would invest resources and efforts into the research and implementation of FTL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6 Federated Transfer Learning

Create new playlist

Sign In

Sign Up

Table of Contents for
6 Federated Transfer Learning