Loan progress columns

Our target goal is to make a prediction of inherent risk based on loan application data, but some of the columns contain information about loan payment progress or they were assigned by Lending Club itself. In this example, for simplicity, we will drop them and focus only on columns that are part of the loan-application process. It is important to mention that in real-life scenarios, even these columns could carry interesting information (for example, payment progress) usable for prediction. However, we wanted to build our model based on the initial application of the loan and not when a loan has already been a) accepted and b) there is historical payment history that would not be known at the time of receiving the application. Based on the data dictionary, we detected the following columns:

val loanProgressColumns = Seq("funded_amnt", "funded_amnt_inv", "grade", "initial_list_status",
"issue_d", "last_credit_pull_d", "last_pymnt_amnt", "last_pymnt_d",
"next_pymnt_d", "out_prncp", "out_prncp_inv", "pymnt_plan",
"recoveries", "sub_grade", "total_pymnt", "total_pymnt_inv",
"total_rec_int", "total_rec_late_fee", "total_rec_prncp")

Now, we can directly record all the columns that we need to remove since they do not bring any value for modelling:

val columnsToRemove = (idColumns ++ constantColumns ++ stringColumns ++ loanProgressColumns)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset