subsample | R Documentation |
Subsampling imbalanced data using up-sampling, down-sampling, or SMOTE.
subsample(
data,
class,
sampling = c("none", "up", "down", "smote"),
seed_samp = NULL
)
data |
data frame with rows as samples, columns as features |
class |
true/reference class vector used for supervised learning |
sampling |
the default is "none", in which no subsampling is performed. Other options include "up" (Up-sampling the minority class), "down" (Down-sampling the majority class), and "smote" (synthetic points for the minority class and down-sampling the majority class). Subsampling is only applicable to the training set. |
seed_samp |
random seed used for reproducibility in subsampling training sets for model generation |
To deal with class imbalances, we can subsample the data so that the class proportions are more uniform.
A subsampled dataset where corresponding strata of class
are more
balanced. The resulting class
variable is not included in the data
output.
Derek Chiu
# Create imbalanced version of iris dataset
iris_imbal <- iris[1:130, ]
# Up-sampling
iris_up <- subsample(iris_imbal, iris_imbal$Species, sampling = "up")
nrow(iris_up)
# Down-sampling
iris_down <- subsample(iris_imbal, iris_imbal$Species, sampling = "down")
nrow(iris_down)
# SMOTE
iris_smote <- subsample(iris_imbal, iris_imbal$Species, sampling = "smote")
nrow(iris_smote)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.