Exercise: South German Credit

file <- "data/german.rds"
stopifnot(file.exists(file))
data <- readRDS(file)
head(data)

In this exercise we will use the 'South German Credit' data set. It contains a classification of the credit risk of 1000 individuals into 'good' and 'bad' together with 20 additional attributes.

Simply download the file r xfun::embed_file(file, text = "german.rds") by clicking or download it from the corresponding homepage http://archive.ics.uci.edu/ml/datasets/South+German+Credit which also provides more detailed information on the data set.

We can import/read this file using data <- readRDS(...). The file contains the following information:

The Tasks

We would like to find out how the credit risk of a person depends on the provided additional attributes of the person and the considered credit itself. By employing a tree model we are looking for a separation into homogeneous subgroups based on the additional information.

Our response in this case is the binary variable credit_risk, as covariates we have 20 additional variables (17 categorical, 3 numeric).

Apply the CTree algorithm to build the tree models described in the following steps:

# data <- readRDS("data/german.rds")
formula <- credit_risk ~  status + duration + credit_history + purpose  + amount + savings + employment_duration + installment_rate + personal_status_sex + other_debtors + present_residence + property + age + other_installment_plans + housing + number_credits + job + people_liable + telephone + foreign_worker

library("partykit")
ct <- ctree(formula, data = data)
ct <- ctree(formula, data = data, control = ctree_control(alpha = 0.04, minbucket = 15, maxdepth = 4))

library("caret")
caret::confusionMatrix(data$credit_risk, predict(ct, newdata = data))

newclient <- data.frame(status = "no checking account",
                        duration = 24,
                        credit_history = "no credits taken/all credits paid back duly",
                        purpose = "repairs",
                        amount = 4000,
                        savings = "unknown/no savings account",
                        employment_duration = "4 <= ... < 7 yrs",
                        installment_rate =  "25 <= ... < 35",
                        personal_status_sex = "male : married/widowed",
                        other_debtors = "none",
                        present_residence = "1 <= ... < 4 yrs",
                        property = "real estate",
                        age = 35,
                        other_installment_plans = "none",
                        housing = "own",
                        number_credits = "1",
                        job = "skilled employee/official",
                        people_liable = "0 to 2",
                        telephone = "no",
                        foreign_worker = "no"
                        )
predict(ct, newdata = newclient)

newclient2 <- newclient
newclient2$duration <- 12
predict(ct, newdata = newclient2)
plot(ct)
set.seed(4)
trainid <- sample(1:NROW(data), size = 667, replace = FALSE)
train <- data[trainid,]
test <- data[-trainid,]

ctrain <- ctree(formula, data = train)
predtest <- predict(ctrain, newdata = test)

library("caret")
caret::confusionMatrix(test$credit_risk, predtest)


ctrain <- ctree(formula, data = train, control = ctree_control(alpha = 0.01))
plot(ctrain)
predtest <- predict(ctrain, newdata = test)
caret::confusionMatrix(test$credit_risk, predtest)


Try the partykit package in your browser

Any scripts or data that you put into this service are public.

partykit documentation built on April 11, 2023, 6:12 p.m.