bank | R Documentation |
Direct marketing campaigns (phone calls) of a Portuguese banking institution to make clients subscribe a term deposit.
data(bank)
The data contains 41188 observations and 19 variables. See the UCI Machine Learning Repository for details.
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the variable duration
has been dropped in order to learn as
realistic predictive model;
the variable pdays
has been dropped because it is not defined
for the vast majority of samples;
observations where loan
is "unknown"
have been dropped
because the corresponding regression coefficient estimated by glm()
is NA
;
the three observations where default
is "yes"
have been
dropped to avoid errors in cross-validation (if all those three
observations are in the test fold it is impossible to compute predictions
from them).
In that paper, subscribed
is the response variable, age
is the
sensitive attribute and the remaining variables are used as predictors.
The data contains the following variables:
age
as a numeric variable;
job
, a factor with 12 levels ranging from "blue-collar"
to "services"
;
marital
, a factor with levels "divorced"
,
"married"
, "single"
and "unknown"
;
education
, a factor with 8 levels ranging from
"basic.4y"
to "university.degree"
;
default
, a factor with levels "no"
and "unknown"
;
housing
, a factor with levels "yes"
and "no"
;
loan
, a factor with levels "yes"
and "no"
;
contact
, a factor with levels "cellular"
and
"telephone"
;
month
, a factor with 12 levels for the months of the year;
day_of_week
, a factor with 7 levels for the days of the week;
campaign
, the number of contacts performed during this
campaign;
previous
, the number of contacts performed before this
campaign;
poutcome
, a factor with levels "failure"
,
"nonexistent"
and "success"
;
emp_var_rate
, the (numeric) quarterly employment variation
rate;
cons_price_idx
, the (numeric) monthly consumer price index;
cons_conf_idx
, the (numeric) monthly consumer confidence index;
euribor3m
, the (numeric) euribor 3-month rate;
nr_employed
, a numeric variable with the number of employees
in the company in that quarter;
subscribed
, a factor with levels "yes"
and "no"
.
UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/bank+marketing
data(bank)
# remove loans with unknown status, the corresponding coefficient is NA in glm().
bank = bank[bank$loan != "unknown", ]
# short-hand variable names.
r = bank[, "subscribed"]
s = bank[, c("age")]
p = bank[, setdiff(names(bank), c("subscribed", "age"))]
m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05)
summary(m)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.