h2o.infogram_train_subset_models: Train models over subsets selected using infogram
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.infogram_train_subset_models

R Documentation

Train models over subsets selected using infogram

Description

Train models over subsets selected using infogram

Usage

h2o.infogram_train_subset_models(
  ig,
  model_fun,
  training_frame,
  test_frame,
  y,
  protected_columns,
  reference,
  favorable_class,
  feature_selection_metrics = c("safety_index"),
  metric = "euclidean",
  air_metric = "selectedRatio",
  alpha = 0.05,
  ...
)

Arguments

`ig`	Infogram object trained with the same protected columns
`model_fun`	Function that creates models. This can be something like h2o.automl, h2o.gbm, etc.
`training_frame`	Training frame
`test_frame`	Test frame
`y`	Response column
`protected_columns`	Protected columns
`reference`	List of values corresponding to a reference for each protected columns. If set to NULL, it will use the biggest group as the reference.
`favorable_class`	Positive/favorable outcome class of the response.
`feature_selection_metrics`	One or more columns from the infogram@admissible_score.
`metric`	Metric supported by stats::dist which is used to sort the features.
`air_metric`	Metric used for Adverse Impact Ratio calculation. Defaults to “selectedRatio“.
`alpha`	The alpha level is the probability of rejecting the null hypothesis that the protected group and the reference came from the same population when the null hypothesis is true.
`...`	Parameters that are passed to the model_fun.

Value

frame containing aggregations of intersectional fairness across the models

Examples

## Not run: 
library(h2o)
h2o.connect()
data <- h2o.importFile(paste0("https://s3.amazonaws.com/h2o-public-test-data/smalldata/",
                              "admissibleml_test/taiwan_credit_card_uci.csv"))
x <- c('LIMIT_BAL', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1',
       'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2',
       'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6')
y <- "default payment next month"
protected_columns <- c('SEX', 'EDUCATION')

for (col in c(y, protected_columns))
  data[[col]] <- as.factor(data[[col]])

splits <- h2o.splitFrame(data, 0.8)
train <- splits[[1]]
test <- splits[[2]]
reference <- c(SEX = "1", EDUCATION = "2")  # university educated man
favorable_class <- "0" # no default next month

ig <- h2o.infogram(x, y, train, protected_columns = protected_columns)
print(ig@admissible_score)
plot(ig)

infogram_models <- h2o.infogram_train_subset_models(ig, h2o.gbm, train, test, y,
                                                    protected_columns, reference,
                                                    favorable_class)

pf <- h2o.pareto_front(infogram_models, x_metric = "air_min",
                       y_metric = "AUC", optimum = "top right")
plot(pf)
pf@pareto_front

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.