hmda.partition: Partition Data for HMDA Analysis
In HMDA: Holistic Multimodel Domain Analysis for Exploratory Machine Learning

hmda.partition

R Documentation

Partition Data for HMDA Analysis

Description

Partition a data frame into training, testing, and optionally validation sets, and upload these sets to a local H2O server. If an outcome column y is provided and is a factor or character, stratified splitting is used; otherwise, a random split is performed. The proportions must sum to 1.

Usage

hmda.partition(
  df,
  y = NULL,
  train = 0.8,
  test = 0.2,
  validation = NULL,
  seed = 2025
)

Arguments

`df`	A data frame to partition.
`y`	A string with the name of the outcome column. Must match a column in `df`.
`train`	A numeric value for the proportion of the training set.
`test`	A numeric value for the proportion of the testing set.
`validation`	Optional numeric value for the proportion of the validation set. Default is `NULL`. If specified, train + test + validation must equal 1.
`seed`	A numeric seed for reproducibility. Default is 2025.

Details

This function uses the splitTools package to perform the partition. When y is provided and is a factor or character, a stratified split is performed to preserve class proportions. Otherwise, a basic random split is used. The partitions are then converted to H2O frames using h2o::as.h2o().

Value

A named list containing the partitioned data frames and their corresponding H2O frames:

hmda.train: Training set (data frame).
hmda.test: Testing set (data frame).
hmda.validation: Validation set (data frame), if any.
hmda.train.hex: Training set as an H2O frame.
hmda.test.hex: Testing set as an H2O frame.
hmda.validation.hex: Validation set as an H2O frame, if applicable.

Author(s)

E. F. Haghish

Examples

## Not run: 
  # Example: Random split (80% train, 20% test) using iris data
  data(iris)
  splits <- hmda.partition(
              df = iris,
              train = 0.8,
              test = 0.2,
              seed = 2025
            )
  train_data <- splits$hmda.train
  test_data  <- splits$hmda.test

  # Example: Stratified split (70% train, 15% test, 15% validation)
  # using iris data, stratified by Species
  splits_strat <- hmda.partition(
                     df = iris,
                     y = "Species",
                     train = 0.7,
                     test = 0.15,
                     validation = 0.15,
                     seed = 2025
                   )
  train_strat <- splits_strat$hmda.train
  test_strat  <- splits_strat$hmda.test
  valid_strat <- splits_strat$hmda.validation

## End(Not run)

HMDA documentation built on April 4, 2025, 6:06 a.m.