get_resampled_df: Resampling Methods for Data Processing

get_resampled_dfR Documentation

Resampling Methods for Data Processing

Description

This function includes various resampling methods applied to input data for each column to prepare it for analysis. These methods help to transform the data distribution and improve model fitting.

Usage

get_resampled_df(
  data,
  resample_size,
  data_degree = NULL,
  resample_only = FALSE
)

Arguments

data

A data frame to be resampled.

resample_size

An integer specifying the size of the resample.

data_degree

A numeric vector indicating the degree of each column in the data (optional).

resample_only

A logical value indicating whether to return only the resampled data (default is FALSE).

Details

  • Coordinate: This method refers to the preservation of the original data values as reference coordinates during processing. It ensures that the transformations applied are based on the initial structure of the data.

  • Deskewing:Deskewing is the process of adjusting the data distribution to reduce skewness, making it more symmetric. If the absolute value of skewness is greater than or equal to 1, deskewing techniques will be applied to normalize the distribution, which can enhance model performance.

  • Smoothing: Smoothing techniques reduce noise in the data by averaging or modifying data points. This is especially useful when there are many unique values in the original data column, as it helps to stabilize the dataset and prevent overfitting during model training.

  • Flattening: Flattening modifies the data to create a more uniform distribution across its range. This method is employed when the frequency of certain categories in categorical variables is low, replacing some original values with randomly selected unique values from the dataset to reduce sparsity.

  • Symmetrizing: Symmetrizing adjusts the data so that it becomes more balanced around its mean. This is crucial for achieving better statistical properties and improving the robustness of the model fitting process.

Value

A list containing:

resampled_df

A data frame of resampled data.

resampled_df_log

A data frame recording the resampling process for each column.


catalytic documentation built on April 4, 2025, 5:51 a.m.