SMOTEWB | R Documentation |
Resampling with SMOTE with boosting.
SMOTEWB(x, y, n_weak_classifier = 100, class_weights = NULL, k_max = NULL, ...)
x |
feature matrix. |
y |
a factor class variable with two classes. |
n_weak_classifier |
number of weak classifiers for boosting. |
class_weights |
numeric vector of length two. First number is for
positive class, and second is for negative. Higher the relative weight,
lesser noises for that class. By default, |
k_max |
to increase maximum number of neighbors. Default is
|
... |
additional inputs for ada::ada(). |
SMOTEWB (Saglam & Cengiz, 2022) is a SMOTE-based oversampling method which can handle noisy data and adaptively decides the appropriate number of neighbors to link during resampling with SMOTE.
Trained model based on this method gives significantly better Matthew Correlation Coefficient scores compared to others.
a list with resampled dataset.
x_new |
Resampled feature matrix. |
y_new |
Resampled target variable. |
x_syn |
Generated synthetic data. |
w |
Boosting weights for original dataset. |
k |
Number of nearest neighbors for positive class samples. |
C |
Number of synthetic samples for each positive class samples. |
Fatih Saglam, saglamf89@gmail.com
Sağlam, F., & Cengiz, M. A. (2022). A novel SMOTE-based resampling technique trough noise detection and the boosting procedure. Expert Systems with Applications, 200, 117023.
Can work with 2 classes only yet.
set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))
plot(x, col = y)
# resampling
m <- SMOTEWB(x = x, y = y, n_weak_classifier = 150)
plot(m$x_new, col = m$y_new)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.