Credit risk

Credit risk is the risk associated with an investment where the borrower is not able to repay the amount to the lender. This can happen on account of poor financial conditions of the borrower, and it represents a risk for the lender. The risk is for the lender to incur loss due to non-payment and hence disruption of cash flows and increased collection costs. The loss may be complete or partial. There are multiple scenarios in which a lender can suffer loss. Some of the scenarios are given here:

  • A customer not making a payment on a mortgage loan, credit card, line of credit, or other type of loan
  • Business/consumer not paying due trade invoice
  • A business not paying an employee's due earned wages
  • A business/government bond issuer not making payment on a due coupon or principal
  • An insurance company not obliging its policy obligation due
  • A bank not returning funds of depositors

It is a practice of mitigating losses by understanding the adequacy of a bank's capital and loan loss reserves at any given time. In order to reduce the credit risk, the lender needs to develop a mechanism to perform a credit check on the prospective borrower. Generally, banks quantify the credit risk using two metrics - expected loss and economic capital. Expected loss is the value of a possible loss times the probability of that loss occurring. Economic capital is the amount of capital necessary to cover unexpected losses. There are three risk parameters that are essential in the process of calculating the EL and EC measurements: the probability of default (PD), loss given default (LGD), and exposure at default (EAD). Calculation of PD is more important, so we will be discussing it.

For building the PD model, let us use the subset of German Credit Data available in R. Data used for the analysis is given by executing the following code:

> data(GermanCredit) 
> LRData<-GermanCredit[,1:10] 

Before starting the modeling, we need to understand the data, which can be done by executing the following code:

> str(LRData) 

It gives us the column types and kind of values it has, as shown here:

Credit risk

Figure 7.11: Column description of the dataset

In this example, our target variable is Class. Class = Good means non-defaulter and Class = bad means defaulter. Now, to understand the distribution of all the numeric variables, we can compute all the basic statistics related to the numeric attributes. This can be done by executing the following code:

> summary(LRData) 

A sample of the output generated by the preceding code is displayed here:

Credit risk

Figure 7.12 Basic statistics of numeric variables

Now let us prepare our data for modeling by executing the following code:

> set.seed(100) 
> library(caTools) 
> res = sample.split(LRData$Class, 0.6) 
> Train_data = subset(LRData, res == TRUE) 
> Test_data=subset(LRData,res==FALSE) 

The preceding code generates Train and Test data for modeling.

The proportion of selecting the Train and Test data is quite subjective. Now we can do basic statistics for imputation of missing/outlier values and exploratory analysis (such as information value analysis and correlation matrix) of the independent variables with respect to dependent variables for understanding the relationship.

Now let us try to fit the model on the Train data, which can be done by executing the following code:

> lgfit = glm(Class ~. , data=Train_data, family="binomial") 
> summary(lgfit) 

It generates the summary of the model as displayed here:

Credit risk

Figure 7.13: Output summary of logistic regression

As we can see in the summary, by means of Pvalues, there are significant as well as insignificant attributes in the model. Keeping in mind the significance of attributes and multicollinearity, we can iterate the model to find the best model. In our case, let us rerun the model with only significant attributes.

This can be done by executing the following code:

> lgfit = glm(Class ~Duration+InstallmentRatePercentage+Age , data=Train_data, family="binomial") 
> summary(lgfit) 

It generates the summary output as follows:

Credit risk

Figure 7.14: Output summary of logistic regression having only significant attributes

The output summary shows that all the attributes considered in the model are significant.

There are a lot of statistics in logistic regression for checks of model accuracy and in this case, we will be showing the ROC curve and the confusion matrix for accuracy checks.

We can compute the threshold for classification by KS statistics but here let us assume the threshold value is 0.5 and try to score our Train sample by executing the following code:

> Train_data$predicted.risk = predict(lgfit, newdata=Train_data, type="response") 
> table(Train_data$Class, as.numeric(Train_data$predicted.risk >= 0.5)) 

It generates the confusion matrix as displayed here:

Credit risk

Figure 7.15: Confusion matrix for logistic regression

Now, let us compute the auc by executing the following code:

> library(ROCR) 
> pred = prediction(Train_data$predicted.risk, Train_data$Class) 
> as.numeric(performance(pred, "auc")@y.values) 

It gives the value of auc as shown here:

0.67925265 

Now, let us plot the ROC curve by executing the following code:

> predict_Train = predict(lgfit, type="response") 
> ROCpred = prediction(predict_Train, Train_data$Class) 
> ROCperf = performance(ROCpred, "tpr", "fpr") 
> plot(ROCperf) 

It plots the ROC curve as shown here:

Credit risk

Figure 7.16: ROC curve

We can use the same model fit created on Train_data and score Test_data and check whether the accuracy measures are in the same range or not to validate the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset