undersample_tomek | R Documentation |
A Tomek link is a minority instance and majority instance that are each other's nearest neighbor. This function removes sufficient Tomek links that are an instance of cls to yield m instances of cls. If desired, samples are randomly discarded to yield m rows if insufficient Tomek links are in the data.
undersample_tomek(data, cls, cls_col, m, tomek = "minor", force_m = TRUE, ...)
data |
Dataset to be undersampled. |
cls |
Majority class to be undersampled. |
cls_col |
Column in data containing class memberships. |
m |
Desired number of samples in undersampled dataset. |
tomek |
Definition used to determine if a point is considered a minority in the Tomek link definition.
|
force_m |
If |
... |
Additional arguments passed to |
Undersampled dataframe containing only cls
.
table(iris$Species)
undersamp <- undersample_tomek(iris, "setosa", "Species", 15, tomek = "diff", force_m = TRUE)
nrow(undersamp)
undersamp2 <- undersample_tomek(iris, "setosa", "Species", 15, tomek = "diff", force_m = FALSE)
nrow(undersamp2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.