tl_prepare_data: Data Preprocessing for tidylearn

View source: R/preprocessing.R

tl_prepare_dataR Documentation

Data Preprocessing for tidylearn

Description

Unified preprocessing functions that work with both supervised and unsupervised workflows Prepare Data for Machine Learning

Usage

tl_prepare_data(
  data,
  formula = NULL,
  impute_method = "mean",
  scale_method = "standardize",
  encode_categorical = TRUE,
  remove_zero_variance = TRUE,
  remove_correlated = FALSE,
  correlation_cutoff = 0.95
)

Arguments

data

A data frame

formula

Optional formula (for supervised learning)

impute_method

Method for missing value imputation: "mean", "median", "mode", "knn"

scale_method

Scaling method: "standardize", "normalize", "robust", "none"

encode_categorical

Whether to encode categorical variables (default: TRUE)

remove_zero_variance

Remove zero-variance features (default: TRUE)

remove_correlated

Remove highly correlated features (default: FALSE)

correlation_cutoff

Correlation threshold for removal (default: 0.95)

Details

Comprehensive preprocessing pipeline including imputation, scaling, encoding, and feature engineering

Value

A list containing processed data and preprocessing metadata

Examples


processed <- tl_prepare_data(iris, Species ~ ., scale_method = "standardize")
model <- tl_model(processed$data, Species ~ ., method = "logistic")


tidylearn documentation built on Feb. 6, 2026, 5:07 p.m.