preprocess: Preprocessing Functions for Data Normalization and...

preprocessR Documentation

Preprocessing Functions for Data Normalization and Standardization

Description

A collection of functions for preprocessing numeric data, including standardization, L2 norm normalization, Min-Max normalization, centered-type normalization, interval-type normalization, and negative-to-positive transformation. Each function transforms a numeric vector to a standardized or normalized scale, handling different types of indicators (positive, negative, centered, or interval-based).

Usage

standardize(x, center = TRUE, scale = TRUE)

normalize(x)

rescale(x, type = "+", a = 0, b = 1)

rescale_middle(x, m)

rescale_interval(x, a, b)

to_positive(x, type = "minmax")

Arguments

center

Logical or numeric scalar, passed to base::scale for centering (for standardize). Default is TRUE.

scale

Logical or numeric scalar, passed to base::scale for scaling (for standardize). Default is TRUE.

type

Character scalar indicating the transformation direction or type:

"+"

Positive direction (larger values are better, for rescale).

"-"

Negative direction (smaller values are better, for rescale).

"reciprocal"

Reciprocal transformation (for neg_to_pos).

"minmax"

Min-max transformation (for neg_to_pos).

a

Numeric scalar, lower bound of the output range or interval (for rescale and rescale_interval).

b

Numeric scalar, upper bound of the output range or interval (for rescale and rescale_interval).

m

Numeric scalar, the optimal value for centered-type normalization (for rescale_middle).

X

Numeric vector to be preprocessed.

switch

Character scalar indicating the specific transformation for neg_to_pos:

"reciprocal"

Applies reciprocal transformation (1/x).

"minmax"

Applies min-max transformation (max(x) - x).

Details

These functions are tailored for different indicator types in data analysis:

  • standardize: Applies Z-score standardization, transforming data to have mean 0 and standard deviation 1. Suitable for normally distributed data or when equalizing variances.

  • normalize: Normalizes data by dividing by the L2 (Euclidean) norm, scaling the vector to unit length. Useful for machine learning or similarity computations.

  • rescale: Performs Min-Max normalization, scaling data to a specified range (default 0, 1). Supports positive or negative indicators.

  • rescale_middle: Normalizes centered-type indicators, where values closer to an optimal value m are better. Output is in 0, 1.

  • rescale_interval: Normalizes interval-type indicators, where values in the optimal interval [a, b] are best. Output is in 0, 1.

  • to_positive: Converts negative indicators to positive using either reciprocal transformation (1/x) or min-max transformation (max(x) - x). The type and switch parameters must match (e.g., both "reciprocal" or both "minmax").

Value

A numeric vector of the same length as x, transformed according to the specified method:

  • standardize: Standardized values (mean = 0, sd = 1).

  • normalize: Normalized values using L2 norm (Euclidean norm).

  • rescale: Normalized values in [a, b] (default 0, 1).

  • rescale_middle: Normalized values in 0, 1, where 1 indicates x = m.

  • rescale_interval: Normalized values in 0, 1, where 1 indicates x in [a, b].

  • to_positive: Transformed values where negative indicators are converted to positive using either reciprocal or min-max transformation.

Examples

# Standardization
x = c(4, 1, NA, 5, 8)
standardize(x)

# L2 norm normalization
normalize(x)

# Min-Max normalization (positive direction)
rescale(x)                # Scale to [0, 1]
rescale(x, type = "-", a = 0.002, b = 0.996)  # Reverse scaling

# Negative-to-positive transformation
to_positive(x)                       # Min-max transformation
to_positive(x, type = "reciprocal")  # Reciprocal transformation

# Centered-type normalization
PH = 6:9
rescale_middle(PH, 7)

# Interval-type normalization
Temp = c(35.2, 35.8, 36.6, 37.1, 37.8, 38.4)
rescale_interval(Temp, 36, 37)


zhjx19/mathmodels documentation built on June 2, 2025, 12:18 a.m.