GermanCredit: Statlog German Credit
In evtree: Evolutionary Learning of Globally Optimal Trees

Description Usage Format Details Source Examples

The dataset contains data of past credit applicants. The applicants are rated as good or bad. Models of this data can be used to determine if new applicants present a good or bad credit risk.

1	data("GermanCredit")

A data frame containing 1,000 observations on 21 variables.

status: factor variable indicating the status of the existing checking account, with levels ... < 0 DM, 0 <= ... < 200 DM, ... >= 200 DM/salary for at least 1 year and no checking account.
duration: duration in months.
credit_history: factor variable indicating credit history, with levels no credits taken/all credits paid back duly, all credits at this bank paid back duly, existing credits paid back duly till now, delay in paying off in the past and critical account/other credits existing.
purpose: factor variable indicating the credit's purpose, with levels car (new), car (used), furniture/equipment, radio/television, domestic appliances, repairs, education, retraining, business and others.
amount: credit amount.
savings: factor. savings account/bonds, with levels ... < 100 DM, 100 <= ... < 500 DM, 500 <= ... < 1000 DM, ... >= 1000 DM and unknown/no savings account.
employment_duration: ordered factor indicating the duration of the current employment, with levels unemployed, ... < 1 year, 1 <= ... < 4 years, 4 <= ... < 7 years and ... >= 7 years.
installment_rate: installment rate in percentage of disposable income.
personal_status_sex: factor variable indicating personal status and sex, with levels male:divorced/separated, female:divorced/separated/married, male:single, male:married/widowed and female:single.
other_debtors: factor. Other debtors, with levels none, co-applicant and guarantor.
present_residence: present residence since?
property: factor variable indicating the client's highest valued property, with levels real estate, building society savings agreement/life insurance, car or other and unknown/no property.
age: client's age.
other_installment_plans: factor variable indicating other installment plans, with levels bank, stores and none.
housing: factor variable indicating housing, with levels rent, own and for free.
number_credits: number of existing credits at this bank.
job: factor indicating employment status, with levels unemployed/unskilled - non-resident, unskilled - resident, skilled employee/official and management/self-employed/highly qualified employee/officer.
people_liable: Number of people being liable to provide maintenance.
telephone: binary variable indicating if the customer has a registered telephone number.
foreign_worker: binary variable indicating if the customer is a foreign worker.
credit_risk: binary variable indicating credit risk, with levels good and bad.

The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).

The original data was provided by:

Professor Dr. Hans Hofmann, Institut fuer Statistik und Oekonometrie, Universitaet Hamburg, FB Wirtschaftswissenschaften, Von-Melle-Park 5, 2000 Hamburg 13

The dataset has been taken from the UCI Repository Of Machine Learning Databases at

http://archive.ics.uci.edu/ml/.

data("GermanCredit")
summary(GermanCredit)
## Not run: 
gcw <- array(1, nrow(GermanCredit))
gcw[GermanCredit$credit_risk == "bad"] <- 5
suppressWarnings(RNGversion("3.5.0"))
set.seed(1090)
gct <- evtree(credit_risk ~ . , data = GermanCredit, weights = gcw) 
gct
table(predict(gct), GermanCredit$credit_risk)
plot(gct)

## End(Not run)