The dataset contains data of past credit applicants. The applicants are rated as good or bad. Models of this data can be used to determine if new applicants present a good or bad credit risk.
A data frame containing 1,000 observations on 21 variables.
factor variable indicating the status of the existing checking account, with levels
... < 100 DM,
0 <= ... < 200 DM,
... >= 200 DM/salary for at least 1 year and
no checking account.
duration in months.
factor variable indicating credit history, with levels
no credits taken/all credits paid back duly,
all credits at this bank paid back duly,
existing credits paid back duly till now,
delay in paying off in the past and
critical account/other credits existing.
factor variable indicating the credit's purpose, with levels
factor. savings account/bonds, with levels
... < 100 DM,
100 <= ... < 500 DM,
500 <= ... < 1000 DM,
... >= 1000 DM and
unknown/no savings account.
ordered factor indicating the duration of the current employment, with levels
... < 1 year,
1 <= ... < 4 years,
4 <= ... < 7 years and
... >= 7 years.
installment rate in percentage of disposable income.
factor variable indicating personal status and sex, with levels
factor. Other debtors, with levels
present residence since?
factor variable indicating the client's highest valued property, with levels
building society savings agreement/life insurance,
car or other and
factor variable indicating other installment plans, with levels
factor variable indicating housing, with levels
number of existing credits at this bank.
factor indicating employment status, with levels
unemployed/unskilled - non-resident,
unskilled - resident,
skilled employee/official and
management/self-employed/highly qualified employee/officer.
Number of people being liable to provide maintenance.
binary variable indicating if the customer has a registered telephone number.
binary variable indicating if the customer is a foreign worker.
binary variable indicating credit risk, with levels
The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).
The original data was provided by:
Professor Dr. Hans Hofmann, Institut fuer Statistik und Oekonometrie, Universitaet Hamburg, FB Wirtschaftswissenschaften, Von-Melle-Park 5, 2000 Hamburg 13
The dataset has been taken from the UCI Repository Of Machine Learning Databases at
1 2 3 4 5 6 7 8 9 10 11 12