preProcessData: Preprocess a Dataset Using Specified Methods

View source: R/utils.R

preProcessDataR Documentation

Preprocess a Dataset Using Specified Methods

Description

This function preprocesses a dataset by applying a variety of transformation methods, such as centering, scaling, or imputation. Users can also specify columns to exclude from preprocessing. The function supports a variety of preprocessing methods, including dimensionality reduction and imputation techniques, and ensures proper method application order.

Usage

preProcessData(
  data,
  outcome,
  excludeClasses,
  methods = c("center", "scale"),
  settings
)

Arguments

data

A data frame or matrix representing the dataset to be preprocessed.

outcome

A character string representing the outcome variable, if any, for outcome-based transformations.

excludeClasses

A character vector specifying the column names to exclude from preprocessing. Default is NULL, meaning all columns are included in the preprocessing.

methods

A character vector specifying the preprocessing methods to apply. Default methods are c("center", "scale"). Available methods include: - "medianImpute": Impute missing values with the median. - "bagImpute": Impute missing values using bootstrap aggregation. - "knnImpute": Impute missing values using k-nearest neighbors. - "center": Subtract the mean from each feature. - "scale": Divide features by their standard deviation. - "pca": Principal Component Analysis for dimensionality reduction. - Other methods such as "BoxCox", "YeoJohnson", "range", etc.

settings

A named list containing settings for the analysis. If NULL, defaults will be used. The settings list may contain: - seed: An integer seed value for reproducibility.

Details

The function applies various transformations to the dataset as specified by the user. It ensures that methods are applied in the correct order to maintain data integrity and consistency. If fewer than two columns remain after excluding specified columns, the function halts and returns NULL. The function also handles categorical columns by skipping their transformation. Users can also specify outcome variables for specialized preprocessing.

Value

A list containing:

  • processedMat: The preprocessed dataset.

  • preprocessParams: The preprocessing parameters that were applied to the dataset.


immunaut documentation built on April 12, 2025, 1:22 a.m.