hmda.partition: Partition Data for HMDA Analysis

View source: R/hmda.partition.R

hmda.partitionR Documentation

Partition Data for HMDA Analysis

Description

Partition a data frame into training, testing, and optionally validation sets, and upload these sets to a local H2O server. If an outcome column y is provided and is a factor or character, stratified splitting is used; otherwise, a random split is performed. The proportions must sum to 1.

Usage

hmda.partition(
  df,
  y = NULL,
  train = 0.8,
  test = 0.2,
  validation = NULL,
  seed = 2025
)

Arguments

df

A data frame to partition.

y

A string with the name of the outcome column. Must match a column in df.

train

A numeric value for the proportion of the training set.

test

A numeric value for the proportion of the testing set.

validation

Optional numeric value for the proportion of the validation set. Default is NULL. If specified, train + test + validation must equal 1.

seed

A numeric seed for reproducibility. Default is 2025.

Details

This function uses the splitTools package to perform the partition. When y is provided and is a factor or character, a stratified split is performed to preserve class proportions. Otherwise, a basic random split is used. The partitions are then converted to H2O frames using h2o::as.h2o().

Value

A named list containing the partitioned data frames and their corresponding H2O frames:

hmda.train

Training set (data frame).

hmda.test

Testing set (data frame).

hmda.validation

Validation set (data frame), if any.

hmda.train.hex

Training set as an H2O frame.

hmda.test.hex

Testing set as an H2O frame.

hmda.validation.hex

Validation set as an H2O frame, if applicable.

Author(s)

E. F. Haghish

Examples

## Not run: 
  # Example: Random split (80% train, 20% test) using iris data
  data(iris)
  splits <- hmda.partition(
              df = iris,
              train = 0.8,
              test = 0.2,
              seed = 2025
            )
  train_data <- splits$hmda.train
  test_data  <- splits$hmda.test

  # Example: Stratified split (70% train, 15% test, 15% validation)
  # using iris data, stratified by Species
  splits_strat <- hmda.partition(
                     df = iris,
                     y = "Species",
                     train = 0.7,
                     test = 0.15,
                     validation = 0.15,
                     seed = 2025
                   )
  train_strat <- splits_strat$hmda.train
  test_strat  <- splits_strat$hmda.test
  valid_strat <- splits_strat$hmda.validation

## End(Not run)


HMDA documentation built on April 4, 2025, 6:06 a.m.