scale_data: Standardize Training and Application Data

View source: R/data_preprocessing.R

scale_dataR Documentation

Standardize Training and Application Data

Description

This function standardizes numeric columns of the train_data and applies the same scaling (mean and standard deviation) to the corresponding columns in apply_data. It returns the standardized data along with the scaling parameters (means and standard deviations). This is particularly important for neural network approaches as they tend to be numerically unstable and deteriorate otherwise.

Usage

scale_data(train_data, apply_data)

Arguments

train_data

A data frame containing the training dataset to be standardized. It must contain numeric columns.

apply_data

A data frame containing the dataset to which the scaling from train_data will be applied.

Value

A list containing the following elements:

train

The standardized training data.

apply

The apply_data scaled using the means and standard deviations from the train_data.

means

The means of the numeric columns in train_data.

sds

The standard deviations of the numeric columns in train_data.

Examples

data(mock_env_data)
detrended_list <- list(
  train = mock_env_data[1:80, ],
  apply = mock_env_data[81:100, ]
)
scale_result <- scale_data(
  train_data = detrended_list$train,
  apply_data = detrended_list$apply
)
scaled_train <- scale_result$train
scaled_apply <- scale_result$apply

ubair documentation built on April 12, 2025, 2:12 a.m.