Training to be fair

There are multiple ways to train models to be fairer. A simple approach could be using the different fairness measures that we have listed in the previous section as an additional loss. However, in practice, this approach has turned out to have several issues, such as having poor performance on the actual classification task.

An alternative approach is to use an adversarial network. Back in 2016, Louppe, Kagan, and Cranmer published the paper Learning to Pivot with Adversarial Networks, available at https://arxiv.org/abs/1611.01046. This paper showed how to use an adversarial network to train a classifier to ignore a nuisance parameter, such as a sensitive feature.

In this example, we will train a classifier to predict whether an adult makes over $50,000 in annual income. The challenge here is to make our classifier unbiased from the influences of race and gender, with it only focusing on features that we can discriminate on, including their occupation and the gains they make from their capital.

To this end, we must train a classifier and an adversarial network. The adversarial network aims to classify the sensitive attributes, a, gender and race, from the predictions of the classifier:

Training to be fair

Making an unbiased classifier to detect the income of an adult

The classifier aims to classify by income but also aims to fool the adversarial network. The classifier's minimization objective formula is as follows:

Training to be fair

Within that formula, Training to be fair is a binary cross-entropy loss of the classification, while Training to be fair is the adversarial loss. Training to be fair represents a hyperparameter that we can use to amplify or reduce the impact of the adversarial loss.

Note

Note: This implementation of the adversarial fairness method follows an implementation by Stijn Tonk and Henk Griffioen. You can find the code to this chapter on Kaggle at https://www.kaggle.com/jannesklaas/learning-how-to-be-fair.

Stijn's and Henk's original blogpost can be found here: https://blog.godatadriven.com/fairness-in-ml.

To train this model fairly, we not only need data X and targets y, but also data about the sensitive attributes, A. In the example we're going to work on, we'll be taking data from the 1994 US census provided by the UCI repository: https://archive.ics.uci.edu/ml/datasets/Adult.

To make loading the data easier, it has been transformed into a CSV file with column headers. As a side note, please refer to the online version to see the data as viewing the data would be difficult in the format of the book.

First, we load the data. The dataset contains data about people from a number of different races, but for the simplicity of this task, we will only be focusing on white and black people for the race attribute. To do this, we need to run the following code:

path = '../input/adult.csv'
input_data = pd.read_csv(path, na_values="?")
input_data = input_data[input_data['race'].isin(['White', 'Black'])]

Next, we select the sensitive attributes, in this case we're focusing on race and gender, into our sensitive dataset, A. We one-hot encode the data so that "Male" equals one for the gender attribute and White equals one for the race attribute. We can achieve this by running the following code:

sensitive_attribs = ['race', 'gender']
A = input_data[sensitive_attribs]
A = pd.get_dummies(A,drop_first=True)
A.columns = sensitive_attribs

Our target is the income attribute. Therefore, we need to encode >50K as 1 and everything else as zero, which is achieved by writing this code:

y = (input_data['income'] == '>50K').astype(int)

To get our training data, we firstly remove the sensitive and target attributes. Then we fill all of the missing values and one-hot encode all of the data, as you can see in the following code:

X = input_data.drop(labels=['income', 'race', 'gender'],axis=1)

X = X.fillna('Unknown')

X = pd.get_dummies(X, drop_first=True)

Finally, we split the data into train and test sets. As seen in the following code, we then stratify the data to ensure that the same number of high earners are in both the test and training data:

X_train, X_test, y_train, y_test, A_train, A_test = 
train_test_split(X, y, A, test_size=0.5, stratify=y, random_state=7)

To ensure the data works nicely with the neural network, we're now going to scale the data using scikit-learn's StandardScaler:

scaler = StandardScaler().fit(X_train)

X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns, index=X_train.index)
                       
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

We need a metric of how fair our model is. We are using the disparate impact selection rule. The p_rule method calculates the share of people classified to have over $50,000 income from both groups and then returns the ratio of selections in the disadvantaged demographic over the ratio of selections in the advantaged group.

The goal is for the p_rule method to return at least 80% in order to meet the four-fifths rule for both race and gender. The following code shows how this function is only used for monitoring, and not as a loss function:

def p_rule(y_pred, a_values, threshold=0.5):
    y_a_1 = y_pred[a_values == 1] > threshold if threshold else y_pred[a_values == 1]                                           #1
    y_a_0 = y_pred[a_values == 0] > threshold if threshold else y_pred[a_values == 0] 
    odds = y_a_1.mean() / y_a_0.mean()                          #2
    return np.min([odds, 1/odds]) * 100

Let's explore this code in some more detail. As you can see from the preceding code block, it's created with two key features, which are as follows:

  1. Firstly, we select who is given a selected threshold. Here, we classify everyone whom the model assigns a chance of over 50% of making $50,000 or more as a high earner.
  2. Secondly, we calculate the selection ratio of both demographics. We divide the ratio of the one group by the ratio of the other group. By returning the minimum of either the odds or one divided by the odds, we ensure the return of a value below one.

To make the model setup a bit easier, we need to define the number of input features and the number of sensitive features. This is something that is simply done by running these two lines:

n_features=X_train.shape[1]
n_sensitive=A_train.shape[1]

Now we set up our classifier. Note how this classifier is a standard classification neural network. It features three hidden layers, some dropout, and a final output layer with a sigmoid activation, which occurs since this is a binary classification task. This classifier is written in the Keras functional API.

To make sure you understand how the API works, go through the following code example and ensure you understand why the steps are taken:

clf_inputs = Input(shape=(n_features,))
x = Dense(32, activation='relu')(clf_inputs)
x = Dropout(0.2)(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.2)(x)
outputs = Dense(1, activation='sigmoid', name='y')(x)
clf_net = Model(inputs=[clf_inputs], outputs=[outputs])

The adversarial network is a classifier with two heads: one to predict the applicant's race from the model output, and one to predict the applicant's gender:

adv_inputs = Input(shape=(1,))
x = Dense(32, activation='relu')(adv_inputs)
x = Dense(32, activation='relu')(x)
x = Dense(32, activation='relu')(x)
out_race = Dense(1, activation='sigmoid')(x)
out_gender = Dense(1, activation='sigmoid')(x)
adv_net = Model(inputs=[adv_inputs], outputs=[out_race,out_gender])

As with generative adversarial networks, we have to make the networks trainable and untrainable multiple times. To make this easier, the following function will create a function that makes a network and all its layers either trainable or untrainable:

def make_trainable_fn(net):              #1
    def make_trainable(flag):            #2
        net.trainable = flag             #3
        for layer in net.layers:
            layer.trainable = flag
    return make_trainable                #4

From the preceding code, there are four key features that we should take a moment to explore:

  1. The function accepts a Keras neural network, for which the train switch function will be created.
  2. Inside the function, a second function is created. This second function accepts a Boolean flag (True/False).
  3. When called, the second function sets the network's trainability to the flag. If False is passed, the network is not trainable. Since the layers of the network can also be used in other networks, we ensure that each individual layer is not trainable, too.
  4. Finally, we return the function.

Using a function to create another function might seem convoluted at first, but this allows us to create "switches" for the neural network easily. The following code snippet shows us how to create switch functions for the classifier and the adversarial network:

trainable_clf_net = make_trainable_fn(clf_net)
trainable_adv_net = make_trainable_fn(adv_net)

To make the classifier trainable, we can use the function with the True flag:

trainable_clf_net(True)

Now we can compile our classifier. As you will see later on in this chapter, it is useful to keep the classifier network as a separate variable from the compiled classifier with which we make predictions:

clf = clf_net
clf.compile(loss='binary_crossentropy', optimizer='adam')

Remember that to train our classifier, we need to run its predictions through the adversary as well as obtaining the adversary loss and applying the negative adversary loss to the classifier. This is best done by packing the classifier and adversary into one network.

To do this, we must first create a new model that maps from the classifier inputs to the classifier and adversary outputs. We define the adversary output to be a nested function of the adversarial network and the classifier network. This way, the predictions of the classifier get immediately passed on to the adversary:

adv_out = adv_net(clf_net(clf_inputs))

We then define the classifier output to be the output of the classifier network, just as we would for classification:

clf_out = clf_net(clf_inputs)

Then, we define the combined model to map from the classifier input, that is, the data about an applicant, to the classifier output and adversary output:

clf_w_adv = Model(inputs=[clf_inputs], outputs=[clf_out]+adv_out)

When training the combined model, we only want to update the weights of the classifier, as we will train the adversary separately. We can use our switch functions to make the classifier network trainable and the adversarial network untrainable:

trainable_clf_net(True)
trainable_adv_net(False)

Remember the hyperparameter, Training to be fair, from the preceding minimization objective. We need to set this parameter manually for both sensitive attributes. As it turns out, the networks train best if lambda for race is set much higher than lambda for gender.

With the lambda values in hand, we can create the weighted loss:

loss_weights = [1.]+[-lambda_param for lambda_param in lambdas]

The preceding expression leads to loss weights of [1.,-130,-30]. This means the classification error has a weight of 1, the race prediction error of the adversary has a weight of -130, and the gender prediction error of the adversary has a weight of -30. Since the losses of the adversarial's prediction have negative weights, gradient descent will optimize the parameters of the classifier to increase these losses.

Finally, we can compile the combined network:

clf_w_adv.compile(loss='binary_crossentropy'), loss_weights=loss_weights,optimizer='adam')

With the classifier and combined classifier-adversarial model in place, the only thing missing is a compiled adversarial model. To get this, we'll first define the adversarial model to map from the classifier inputs to the outputs of the nested adversarial-classifier model:

adv = Model(inputs=[clf_inputs], outputs=adv_net(clf_net(clf_inputs)))

Then, when training the adversarial model, we want to optimize the weights of the adversarial network and not of the classifier network, so we use our switch functions to make the adversarial trainable and the classifier not trainable:

trainable_clf_net(False)
trainable_adv_net(True)

Finally, we compile the adversarial model just like we would with a regular Keras model:

adv.compile(loss='binary_crossentropy', optimizer='adam')

With all the pieces in hand, we can now pretrain the classifier. This means we train the classifier without any special fairness considerations:

trainable_clf_net(True)
clf.fit(X_train.values, y_train.values, epochs=10)

After we have trained the model, we can make predictions on the validation set to evaluate both the model's fairness and accuracy:

y_pred = clf.predict(X_test)

Now we'll calculate the model's accuracy and p_rule for both gender and race. In all calculations, we're going to use a cutoff point of 0.5:

acc = accuracy_score(y_test,(y_pred>0.5))* 100
print('Clf acc: {:.2f}'.format(acc))

for sens in A_test.columns:
    pr = p_rule(y_pred,A_test[sens])
    print('{}: {:.2f}%'.format(sens,pr))
out:
Clf acc: 85.44
race: 41.71%
gender: 29.41%

As you can see, the classifier achieves a respectable accuracy, 85.44%, in predicting incomes. However, it is deeply unfair. It gives women only a 29.4% chance to make over $50,000 than it does men.

Equally, it discriminates strongly on race. If we used this classifier to judge loan applications, for instance, we would be vulnerable to discrimination lawsuits.

Note

Note: Neither gender or race was included in the features of the classifier. Yet, the classifier discriminates strongly on them. If the features can be inferred, dropping sensitive columns is not enough.

To get out of this mess, we will pretrain the adversarial network before training both networks to make fair predictions. Once again, we use our switch functions to make the classifier untrainable and the adversarial trainable:

trainable_clf_net(False)
trainable_adv_net(True)

As the distributions for race and gender in the data might be skewed, we're going to use weighted classes to adjust for this:

class_weight_adv = compute_class_weights(A_train)

We then train the adversary to predict race and gender from the training data through the predictions of the classifier:

adv.fit(X_train.values, np.hsplit(A_train.values, A_train.shape[1]), class_weight=class_weight_adv, epochs=10)

NumPy's hsplit function splits the 2D A_train matrix into two vectors that are then used to train the two model heads.

With the classifier and adversary pretrained, we will now train the classifier to fool the adversary in order to get better at spotting the classifier's discrimination. Before we start, we need to do some setup. We want to train for 250 epochs, with a batch size of 128, with two sensitive attributes:

n_iter=250
batch_size=128
n_sensitive = A_train.shape[1]

The combined network of the classifier and adversarial also needs some class weights. The weights for the income predictions, less/more than $50,000, are both one. For the adversarial heads of the combined model, we use the preceding computed adversarial class weights:

class_weight_clf_w_adv = [{0:1., 1:1.}]+class_weight_adv

To keep track of metrics, we set up one DataFrame for validation metrics, accuracy, and area under the curve, as well as for the fairness metrics. The fairness metrics are the p_rule values for race and gender:

val_metrics = pd.DataFrame()
fairness_metrics = pd.DataFrame()

Inside the main training loop, three steps are performed: training the adversarial network, training the classifier to be fair, and printing out validation metrics. For better explanations, all three are printed separately here.

Within the code, you will find them in the same loop, where idx is the current iteration:

for idx in range(n_iter):

The first step is to train the adversarial network. To this end, we're going to make the classifier untrainable, the adversarial network trainable, and then train the adversarial network just as we did before. To do this, we need to run the following code block:

trainable_clf_net(False)
trainable_adv_net(True)
adv.fit(X_train.values, np.hsplit(A_train.values, A_train.shape[1]), batch_size=batch_size, class_weight=class_weight_adv, epochs=1, verbose=0)

Training the classifier to be a good classifier but also to fool the adversary and be fair involves three steps. Firstly, we make the adversary untrainable and the classifier trainable:

trainable_clf_net(True)
trainable_adv_net(False)

Then we sample a batch from X, y, and A:

indices = np.random.permutation(len(X_train))[:batch_size]
X_batch = X_train.values[indices]
y_batch = y_train.values[indices]
A_batch = A_train.values[indices]

Finally, we train the combined adversary and classifier. Since the adversarial network is set to not be trainable, only the classifier network will be trained. However, the loss from the adversarial network's predictions of race and gender gets backpropagated through the entire network, so that the classifier learns to fool the adversarial network:

clf_w_adv.train_on_batch(X_batch, [y_batch]+
p.hsplit(A_batch, n_sensitive),class_weight=class_weight_clf_w_adv)

Finally, we want to keep track of progress by first making predictions on the test:

y_pred = pd.Series(clf.predict(X_test).ravel(), index=y_test.index)

We then calculate the area under the curve (ROC AUC) and the accuracy of the predictions, and save them in the val_metrics DataFrame:

roc_auc = roc_auc_score(y_test, y_pred)
acc = accuracy_score(y_test, (y_pred>0.5))*100

val_metrics.loc[idx, 'ROC AUC'] = roc_auc
val_metrics.loc[idx, 'Accuracy'] = acc

Next up, we calculate p_rule for both race and gender and save those values in the fairness metrics:

for sensitive_attr :n A_test.columns:
    fairness_metrics.loc[idx, sensitive_attr] =
    p_rule(y_pred,A_test[sensitive_attr])

If we plot both the fairness and validation metrics, we'll arrive at the following plot:

Training to be fair

Pivot train progress

As you can see, the fairness scores of the classifier steadily increase with training. After about 150 epochs, the classifier satisfies the four-fifths rule. At the same time, the p-values are well over 90%. This increase in fairness comes at only a small decrease in accuracy and area under the curve. The classifier trained in this manner is clearly a fairer classifier with similar performance, and is thus preferred over a classifier trained without fairness criteria.

The pivot approach to fair machine learning has a number of advantages. Yet, it cannot rule out unfairness entirely. What if, for example, there was a group that the classifier discriminates against that we did not think of yet? What if it discriminates on treatment, instead of impact? To make sure our models are not biased, we need more technical and social tools, namely interpretability, causality, and diverse development teams.

In the next section, we'll discuss how to train machine learning models that learn causal relationships, instead of just statistical associations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset