BLSMOTE | R Documentation |
BLSMOTE()
applies BLSMOTE (Borderline-SMOTE) which is a
variation of the SMOTE algorithm that generates synthetic samples only in the
vicinity of the borderline instances in imbalanced datasets.
BLSMOTE(x, y, k1 = 5, k2 = 5, type = "type1")
x |
feature matrix or data.frame. |
y |
a factor class variable with two classes. |
k1 |
number of neighbors to link. Default is 5. |
k2 |
number of neighbors to determine safe levels. Default is 5. |
type |
"type1" or "type2". Default is "type1". |
BLSMOTE works by focusing on the instances that are near the decision boundary between the minority and majority classes, known as borderline instances. These instances are more informative and potentially more challenging for classification, and thus generating synthetic samples in their vicinity can be more effective than generating them randomly.
Note: Much faster than smotefamily::BLSMOTE()
.
a list with resampled dataset.
x_new |
Resampled feature matrix. |
y_new |
Resampled target variable. |
x_syn |
Generated synthetic data. |
C |
Number of synthetic samples for each positive class samples. |
Fatih Saglam, saglamf89@gmail.com
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1 (pp. 878-887). Springer Berlin Heidelberg.
set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))
plot(x, col = y)
# resampling
m <- BLSMOTE(x = x, y = y, k1 = 5, k2 = 5)
plot(m$x_new, col = m$y_new)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.