Description Usage Format Source Examples
The data sets represented here cover thousands of loans through the Lending Club platform, which is a platform that allows individuals to lend to other individuals. Of course, not all loans are created equal. Someone who is a essentially a sure bet to pay back a loan will have an easier time getting a loan with a low interest rate than someone who appears to be riskier. And for people who are very risky? They may not even get a loan offer, or they may not have accepted the loan offer due to a high interest rate. It is important to keep that last part in mind, since this data set only represents loans actually made, i.e. do not mistake this data for loan applications!
1 | data("loans_full_schema")
|
loans_full_schema is a data frame with
10,000 observations on the following 56 variables.
loans50 is a data frame with 50 observations
that includes 15 of the following variables.
emp_titleJob title.
emp_lengthNumber of years in the job, rounded down.
If longer than 10 years, then this is represented by the value
10.
stateTwo-letter state code.
home_ownershipThe ownership status of the applicant's residence.
annual_incomeAnnual income.
verified_incomeType of verification of the applicant's income.
debt_to_incomeDebt-to-income ratio.
annual_income_jointIf this is a joint application, then the annual income of the two parties applying.
verification_income_jointType of verification of the joint income.
debt_to_income_jointDebt-to-income ratio for the two parties.
delinq_2yDelinquencies on lines of credit in the last 2 years.
months_since_last_delinqMonths since the last delinquency.
earliest_credit_lineYear of the applicant's earliest line of credit
inquiries_last_12mInquiries into the applicant's credit during the last 12 months.
total_credit_linesTotal number of credit lines in this applicant's credit history.
open_credit_linesNumber of currently open lines of credit.
total_credit_limitTotal available credit, e.g. if only credit cards, then the total of all the credit limits. This excludes a mortgage.
total_credit_utilizedTotal credit balance, excluding a mortgage.
num_collections_last_12mNumber of collections in the last 12 months. This excludes medical collections.
num_historical_failed_to_payThe number of derogatory public records, which roughly means the number of times the applicant failed to pay.
months_since_90d_lateMonths since the last time the applicant was 90 days late on a payment.
current_accounts_delinqNumber of accounts where the applicant is currently delinquent.
total_collection_amount_everThe total amount that the applicant has had against them in collections.
current_installment_accountsNumber of installment accounts, which are (roughly) accounts with a fixed payment amount and period. A typical example might be a 36-month car loan.
accounts_opened_24mNumber of new lines of credit opened in the last 24 months.
months_since_last_credit_inquiryNumber of months since the last credit inquiry on this applicant.
num_satisfactory_accountsNumber of satisfactory accounts.
num_accounts_120d_past_dueNumber of current accounts that are 120 days past due.
num_accounts_30d_past_dueNumber of current accounts that are 30 days past due.
num_active_debit_accountsNumber of currently active bank cards.
total_debit_limitTotal of all bank card limits.
num_total_cc_accountsTotal number of credit card accounts in the applicant's history.
num_open_cc_accountsTotal number of currently open credit card accounts.
num_cc_carrying_balanceNumber of credit cards that are carrying a balance.
num_mort_accountsNumber of mortgage accounts.
account_never_delinq_percentPercent of all lines of credit where the applicant was never delinquent.
tax_liensa numeric vector
public_record_bankruptNumber of bankruptcies listed in the public record for this applicant.
loan_purposeThe category for the purpose of the loan.
application_typeThe type of application: either
individual or joint.
loan_amountThe amount of the loan the applicant received.
termThe number of months of the loan the applicant received.
interest_rateInterest rate of the loan the applicant received.
installmentMonthly payment for the loan the applicant received.
gradeGrade associated with the loan.
sub_gradeDetailed grade associated with the loan.
issue_monthMonth the loan was issued.
loan_statusStatus of the loan.
initial_listing_statusInitial listing status of the loan. (I think this has to do with whether the lender provided the entire loan or if the loan is across multiple lenders.)
disbursement_methodDispersement method of the loan.
balanceCurrent balance on the loan.
paid_totalTotal that has been paid on the loan by the applicant.
paid_principalThe difference between the original loan amount and the current balance on the loan.
paid_interestThe amount of interest paid so far by the applicant.
paid_late_feesLate fees paid by the applicant.
This data comes from Lending Club (https://www.lendingclub.com/info/download-data.action), which provides a large set of data on the people who received loans through their platform.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | head(loans_full_schema)
head(loan50)
d <- loans_full_schema
# _____ Summarizing Data Sections _____ #
x <- d$homeownership
y <- d$application_type
(t1 <- table(x, y))
apply(t1, 1, sum)
prop.table(t1, 1)
prop.table(t1, 2)
barplot(t1)
barplot(t1, beside = TRUE)
barplot(prop.table(t1, 2))
mosaicplot(apply(t1, 2, sum))
mosaicplot(t(t1))
mosaicplot(t1)
pie(apply(t1, 1, sum))
barplot(apply(t1, 1, sum))
# _____ Multiple regression, initial fit and variable selection _____ #
d$credit_util <- round(ifelse(d$total_credit_limit == 0, 0,
d$total_credit_utilized / d$total_credit_limit), 4)
d$past_bankr <- (d$public_record_bankrupt > 0) + 0
dim(d)
par(mfrow = c(2, 4))
boxPlot(d$interest_rate, d$verified_income)
plot(d$debt_to_income, d$interest_rate)
boxPlot(d$interest_rate, d$term)
plot(d$current_installment_accounts, d$interest_rate)
# boxPlot(d$interest_rate, d$issue_month)
plot(d$total_debit_limit, d$interest_rate)
plot(d$inquiries_last_12m, d$interest_rate)
# Renaming some variables for convenience
d$ver_income <- ifelse(d$verified_income == "Verified", "verified",
ifelse(d$verified_income == "Not Verified", "not", "source_only"))
d$credit_checks <- d$inquiries_last_12m
d$issued <- gsub("-", "", d$issue_month, fixed = TRUE)
these <- d$annual_income %in% 0:1
d$debt_to_income[these] <- d$total_credit_utilized[these] /
d$annual_income_joint[these]
# Variables for the model
co <- c(
"ver_income",
"debt_to_income",
"credit_util",
"past_bankr",
"term",
"issued",
# "total_debit_limit",
"credit_checks")
head(d[, c("interest_rate", co)])
F <- function(x, sub = 1:length(x)) {
as.formula(paste("interest_rate ~", paste(x[sub], collapse = "+")))
}
AdjR2 <- function(x) {
summary(x)$adj.r.squared
}
m <- lm(F(co), data = d); summary(m); AdjR2(m)
# m <- lm(F(co, -4), data = d); summary(m); AdjR2(m)
m <- lm(F(co, -6), data = d); summary(m); AdjR2(m)
# m <- lm(F(co, -c(4, 6)), data = d); summary(m); AdjR2(m)
# Diagnostics
## Not run:
# Normality (not critical here)
qqnorm(m$residuals)
qqline(m$residuals)
# Homoskedasticity
library(ggplot2)
my_geom <- c("point", "smooth")
qplot(m$fitted, abs(m$residuals), geom = my_geom)
# Residuals against predictors
# boxPlot(m$residuals, d$ver_income)
qplot(d$debt_to_income, m$residuals, geom = my_geom)
qplot(d$credit_util, m$residuals, geom = my_geom)
# boxPlot(m$residuals, d$past_bankr)
# boxPlot(m$residuals, d$term)
qplot(d$total_debit_limit, m$residuals, geom = my_geom)
qplot(d$credit_checks, m$residuals, geom = my_geom)
## End(Not run)
# Refitting with square root of debit limit.
d$debt_to_income_capped_at_50 <-
ifelse(d$debt_to_income > 50, 50, d$debt_to_income)
co <- c(
"ver_income",
"debt_to_income_capped_at_50",
"credit_util",
"past_bankr",
"term",
"issued",
# "sqrt(total_debit_limit)",
"credit_checks")
m <- lm(F(co), data = d); summary(m); AdjR2(m)
# m <- lm(F(co, -4), data = d); summary(m); AdjR2(m)
m <- lm(F(co, -6), data = d); summary(m); AdjR2(m)
# m <- lm(F(co, -c(4, 6)), data = d); summary(m); AdjR2(m)
# Diagnostics for refitted model
## Not run:
qplot(d$debt_to_income_capped_at_50, m$residuals, geom = my_geom)
qplot(d$credit_util, m$residuals, geom = my_geom)
qplot(sqrt(d$total_debit_limit), m$residuals, geom = my_geom)
qplot(d$credit_checks, m$residuals, geom = my_geom)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.