Statlog German Credit

Description

The dataset contains data of past credit applicants. The applicants are rated as good or bad. Models of this data can be used to determine if new applicants present a good or bad credit risk.

Usage

1
data("GermanCredit")

Format

A data frame containing 1,000 observations on 21 variables.

status

factor variable indicating the status of the existing checking account, with levels ... < 100 DM, 0 <= ... < 200 DM, ... >= 200 DM/salary for at least 1 year and no checking account.

duration

duration in months.

credit_history

factor variable indicating credit history, with levels no credits taken/all credits paid back duly, all credits at this bank paid back duly, existing credits paid back duly till now, delay in paying off in the past and critical account/other credits existing.

purpose

factor variable indicating the credit's purpose, with levels car (new), car (used), furniture/equipment, radio/television, domestic appliances, repairs, education, retraining, business and others.

amount

credit amount.

savings

factor. savings account/bonds, with levels ... < 100 DM, 100 <= ... < 500 DM, 500 <= ... < 1000 DM, ... >= 1000 DM and unknown/no savings account.

employment_duration

ordered factor indicating the duration of the current employment, with levels unemployed, ... < 1 year, 1 <= ... < 4 years, 4 <= ... < 7 years and ... >= 7 years.

installment_rate

installment rate in percentage of disposable income.

personal_status_sex

factor variable indicating personal status and sex, with levels male:divorced/separated, female:divorced/separated/married, male:single, male:married/widowed and female:single.

other_debtors

factor. Other debtors, with levels none, co-applicant and guarantor.

present_residence

present residence since?

property

factor variable indicating the client's highest valued property, with levels real estate, building society savings agreement/life insurance, car or other and unknown/no property.

age

client's age.

other_installment_plans

factor variable indicating other installment plans, with levels bank, stores and none.

housing

factor variable indicating housing, with levels rent, own and for free.

number_credits

number of existing credits at this bank.

job

factor indicating employment status, with levels unemployed/unskilled - non-resident, unskilled - resident, skilled employee/official and management/self-employed/highly qualified employee/officer.

people_liable

Number of people being liable to provide maintenance.

telephone

binary variable indicating if the customer has a registered telephone number.

foreign_worker

binary variable indicating if the customer is a foreign worker.

credit_risk

binary variable indicating credit risk, with levels good and bad.

Details

The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).

Source

The original data was provided by:

Professor Dr. Hans Hofmann, Institut fuer Statistik und Oekonometrie, Universitaet Hamburg, FB Wirtschaftswissenschaften, Von-Melle-Park 5, 2000 Hamburg 13

The dataset has been taken from the UCI Repository Of Machine Learning Databases at

http://archive.ics.uci.edu/ml/.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data("GermanCredit")
summary(GermanCredit)
## Not run: 
gcw <- array(1, nrow(GermanCredit))
gcw[GermanCredit$credit_risk == "bad"] <- 5
set.seed(1090)
gct <- evtree(credit_risk ~ . , data = GermanCredit, weights = gcw) 
gct
table(predict(gct), GermanCredit$credit_risk)
plot(gct)

## End(Not run)