Exercise: South German Credit

file <- "data/german.rds"
stopifnot(file.exists(file))
data <- readRDS(file)
head(data)

In this exercise we will use the 'South German Credit' data set. It contains a classification of the credit risk of 1000 individuals into 'good' and 'bad' together with 20 additional attributes.

Simply download the file r xfun::embed_file(file, text = "german.rds") by clicking or download it from the corresponding homepage http://archive.ics.uci.edu/ml/datasets/South+German+Credit which also provides more detailed information on the data set.

We can import/read this file using data <- readRDS(...). The file contains the following information:

The Tasks

We would like to find out how the credit risk of a person depends on the provided additional attributes of the person and the considered credit itself. Therefore, our response in this case is the binary variable credit_risk, as covariates we have 20 additional variables (17 categorical, 3 numeric).

Apply the forest-building function cforest to build a forest model as described in the following points:

# data <- readRDS("data/german.rds")
f <- credit_risk ~  status + duration + credit_history + purpose  + amount + savings + employment_duration + installment_rate + personal_status_sex + other_debtors + present_residence + property + age + other_installment_plans + housing + number_credits + job + people_liable + telephone + foreign_worker

library("partykit")
cf <- cforest(formula = f, data = data, ntree = 50)

newclient <- data.frame(status = "no checking account",
                        duration = 12,
                        credit_history = "no credits taken/all credits paid back duly",
                        purpose = "repairs",
                        amount = 5000,
                        savings = "unknown/no savings account",
                        employment_duration = "4 <= ... < 7 yrs",
                        installment_rate =  "< 20",
                        personal_status_sex = "male : married/widowed",
                        other_debtors = "none",
                        present_residence = "1 <= ... < 4 yrs",
                        property = "real estate",
                        age = 40,
                        other_installment_plans = "none",
                        housing = "own",
                        number_credits = "1",
                        job = "skilled employee/official",
                        people_liable = "0 to 2",
                        telephone = "no",
                        foreign_worker = "no"
                        )

newclient2 <- newclient
newclient2$purpose <- "furniture/equipment"

predict(cf, newdata = newclient)
predict(cf, newdata = newclient2)
set.seed(4)
trainid <- sample(1:NROW(data), size = 667, replace = FALSE)
train <- data[trainid,]
test <- data[-trainid,]
library("ranger")
library("caret")

rf <- ranger(formula = f, data = train, num.trees = 50)
rf$confusion.matrix
rf <- ranger(formula = f, data = train, num.trees = 500)
rf$confusion.matrix

rf <- ranger(formula = f, data = train, num.trees = 50)

pred_cf <- predict(cf, newdata = test)
confusionMatrix(pred_cf, test$credit_risk)

pred_rf <- predict(rf, data = test)$prediction
confusionMatrix(pred_rf, test$credit_risk)

varimp(cf)
rf <- ranger(f, data = train, num.trees = 50, importance = "impurity")
importance(rf)


Try the partykit package in your browser

Any scripts or data that you put into this service are public.

partykit documentation built on April 14, 2023, 5:09 p.m.