balance_data: Balance Binary Data by Resampling: Under-Over Sampling

balance_dataR Documentation

Balance Binary Data by Resampling: Under-Over Sampling

Description

This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.

Usage

balance_data(df, var, rate = 1, target = "auto", seed = 0, quiet = FALSE)

Arguments

df

Vector or Dataframe. Contains different variables in each column, separated by a specific character

var

Variable. Which variable should we used to re-sample dataset?

rate

Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows

target

Character. If binary, which value should be reduced? If kept in "auto", then the most frequent value will be reduced.

seed

Numeric. Seed to replicate and obtain same values

quiet

Boolean. Keep quiet? If not, messages will be printed

Value

data.frame. Reduced sampled data.frame following the rate of appearance of a specific variable.

See Also

Other Data Wrangling: categ_reducer(), cleanText(), date_cuts(), date_feats(), file_name(), formatHTML(), holidays(), impute(), left(), normalize(), num_abbr(), ohe_commas(), ohse(), quants(), removenacols(), replaceall(), replacefactor(), textFeats(), textTokenizer(), vector2text(), year_month(), zerovar()

Examples

data(dft) # Titanic dataset
df <- balance_data(dft, Survived, rate = 0.5)
df <- balance_data(dft, .data$Survived, rate = 0.1, target = "TRUE")

laresbernardo/lares documentation built on Oct. 23, 2024, 12:05 p.m.