balance_data | R Documentation |
This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.
balance_data(df, var, rate = 1, target = "auto", seed = 0, quiet = FALSE)
df |
Vector or Dataframe. Contains different variables in each column, separated by a specific character |
var |
Variable. Which variable should we used to re-sample dataset? |
rate |
Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows |
target |
Character. If binary, which value should be reduced? If kept in
|
seed |
Numeric. Seed to replicate and obtain same values |
quiet |
Boolean. Keep quiet? If not, messages will be printed |
data.frame. Reduced sampled data.frame following the rate
of
appearance of a specific variable.
Other Data Wrangling:
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
removenacols()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
data(dft) # Titanic dataset
df <- balance_data(dft, Survived, rate = 0.5)
df <- balance_data(dft, .data$Survived, rate = 0.1, target = "TRUE")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.