Description Usage Arguments Value Examples
A Tomek link is a minority instance and majority instance that are each other's nearest neighbor. This function removes sufficient Tomek links that are an instance of cls to yield m instances of cls. If desired, samples are randomly discarded to yield m rows if insufficient Tomek links are in the data.
1 2 3 4 5 6 7 8 9 | undersample_tomek(
data,
cls,
cls_col,
m,
tomek = "minor",
force_m = TRUE,
dist_calc = "euclidean"
)
|
data |
Dataset to be undersampled. |
cls |
Majority class to be undersampled. |
cls_col |
Column in data containing class memberships. |
m |
Desired number of samples in undersampled dataset. |
tomek |
Definition used to determine if a point is considered a minority in the Tomek link definition.
|
force_m |
If |
dist_calc |
Distance calculation method. See |
Undersampled dataframe containing only cls
.
1 2 3 4 5 | table(iris$Species)
undersamp <- undersample_tomek(iris, "setosa", "Species", 15, tomek = "diff", force_m = TRUE)
nrow(undersamp)
undersamp2 <- undersample_tomek(iris, "setosa", "Species", 15, tomek = "diff", force_m = FALSE)
nrow(undersamp2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.