synthetic: Synthetic Data Set to Test Fair Models
In fairml: Fair Models in Machine Learning

synthetic data sets

R Documentation

Synthetic Data Set to Test Fair Models

Description

Synthetic data set used as test cases in the fairml package.

Usage

data(vu.test)

Format

The data are stored a list with following three elements:

gaussian, binomial, poisson, coxph and multinomial are response variables for the different families;
X, a numeric matrix containing 3 predictors called X1, X2 and X3;
S, a numeric matrix containing 3 sensitive attributes called S1, S2 and S3.

Note

This data set is called vu.test because it is generated from very unfair models in which sensitive attributes explain the lion's share of the overall explained variance or deviance.

The code used to generate the predictors and the sensitive attributes is as follows.

library(mvtnorm)
sigma = matrix(0.3, nrow = 6, ncol = 6)
diag(sigma) = 1
n = 1000
X = rmvnorm(n, mean = rep(0, 6), sigma = sigma)
S = X[, 4:6]
X = X[, 1:3]
colnames(X) = c("X1", "X2", "X3")
colnames(S) = c("S1", "S2", "S3")

The continuous response in gaussian is produced as follows.

gaussian = 2 + 2 * X[, 1] + 3 * X[, 2] + 4 * X[, 3] + 5 * S[, 1] +
               6 * S[, 2] + 7 * S[, 3] + rnorm(n, sd = 10)

The discrete response in binomial is produced as follows.

nu = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] +
         0.9 * S[, 2] + 1.0 * S[, 3]
binomial = rbinom(n = nrow(X), size = 1, prob = exp(nu) / (1 + exp(nu)))
binomial = as.factor(binomial)

The log-linear response in poisson is produced as follows.

nu = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] +
         0.9 * S[, 2] + 1.0 * S[, 3]
poisson = rpois(n = nrow(X), lambda = exp(nu))

The response for the Cox proportional hazards coxph is produced as follows.

fx = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] +
         0.9 * S[, 2] + 1.0 * S[, 3]
hx = exp(fx)
ty = rexp(length(fx), hx)
tcens = rbinom(n = length(fx), prob = 0.3, size = 1)
coxph = cbind(time = ty, status = 1 - tcens)

The discrete response in multinomial is produced as follows.

nu1 = 1 + 0.5 * X[, 1] + 0.6 * X[, 2] + 0.7 * X[, 3] + 0.8 * S[, 1] +
          0.9 * S[, 2] + 1.0 * S[, 3]
nu2 = 1 + 0.2 * X[, 1] + 0.2 * X[, 2] + 0.2 * X[, 3] + 0.6 * S[, 1] +
          0.6 * S[, 2] + 0.6 * S[, 3]
nu3 = 1 + 0.7 * X[, 1] + 0.6 * X[, 2] + 0.5 * X[, 3] + 0.1 * S[, 1] +
          0.1 * S[, 2] + 0.1 * S[, 3]
nu4 = 1 + 0.4 * X[, 1] + 0.4 * X[, 2] + 0.4 * X[, 3] + 0.4 * S[, 1] +
          0.4 * S[, 2] + 0.4 * S[, 3]
norm = exp(nu1) + exp(nu2) + exp(nu3) + exp(nu4)
probs = matrix(c(exp(nu1) / norm, exp(nu2) / norm,
                 exp(nu3) / norm, exp(nu4) / norm),
        ncol = 4, byrow = FALSE)
multinomial = apply(probs, MARGIN = 1,
                function(x) sample(letters[1:4], size = 1, prob = x))
multinomial = factor(multinomial, labels = letters[1:4])

Author(s)

Marco Scutari

Examples

summary(fgrrm(response = vu.test$gaussian, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 1, family = "gaussian"))
summary(fgrrm(response = vu.test$binomial, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 1, family = "binomial"))
summary(fgrrm(response = vu.test$poisson, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 1, family = "poisson"))
summary(fgrrm(response = vu.test$coxph, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 1, family = "cox"))
summary(fgrrm(response = vu.test$multinomial, predictors = vu.test$X,
  sensitive = vu.test$S, unfairness = 1, family = "multinomial"))

fairml documentation built on June 8, 2025, 11:38 a.m.