sampling_target | R Documentation |
To solve the imbalanced class, perform sampling in the train set of split_df.
sampling_target( .data, method = c("ubUnder", "ubOver", "ubSMOTE"), seed = NULL, perc = 50, k = ifelse(method == "ubSMOTE", 5, 0), perc.over = 200, perc.under = 200 )
.data |
an object of class "split_df", usually, a result of a call to split_df(). |
method |
character. sampling methods. "ubUnder" is under-sampling, and "ubOver" is over-sampling, "ubSMOTE" is SMOTE(Synthetic Minority Over-sampling TEchnique). |
seed |
integer. random seed used for sampling |
perc |
integer. The percentage of positive class in the final dataset. It is used only in under-sampling. The default is 50. perc can not exceed 50. |
k |
integer. It is used only in over-sampling and SMOTE. If over-sampling and if K=0: sample with replacement from the minority class until we have the same number of instances in each class. under-sampling and if K>0: sample with replacement from the minority class until we have k-times the original number of minority instances. If SMOTE, the number of neighbours to consider as the pool from where the new examples are generated |
perc.over |
integer. It is used only in SMOTE. per.over/100 is the number of new instances generated for each rare instance. If perc.over < 100 a single instance is generated. |
perc.under |
integer. It is used only in SMOTE. perc.under/100 is the number of "normal" (majority class) instances that are randomly selected for each smoted observation. |
In order to solve the problem of imbalanced class, sampling is performed by under sampling, over sampling, SMOTE method.
An object of train_df.
The attributes of the train_df class are as follows.:
sample_seed : integer. random seed used for sampling
method : character. sampling methods.
perc : integer. perc argument value
k : integer. k argument value
perc.over : integer. perc.over argument value
perc.under : integer. perc.under argument value
binary : logical. whether the target variable is a binary class
target : character. target variable name
minority : character. the level of the minority class
majority : character. the level of the majority class
library(dplyr) # Credit Card Default Data head(ISLR::Default) # Generate data for the example sb <- ISLR::Default %>% split_by(default) # under-sampling with random seed under <- sb %>% sampling_target(seed = 1234L) under %>% count(default) # under-sampling with random seed, and minority class frequency is 40% under40 <- sb %>% sampling_target(seed = 1234L, perc = 40) under40 %>% count(default) # over-sampling with random seed over <- sb %>% sampling_target(method = "ubOver", seed = 1234L) over %>% count(default) # over-sampling with random seed, and k = 10 over10 <- sb %>% sampling_target(method = "ubOver", seed = 1234L, k = 10) over10 %>% count(default) # SMOTE with random seed smote <- sb %>% sampling_target(method = "ubSMOTE", seed = 1234L) smote %>% count(default) # SMOTE with random seed, and perc.under = 250 smote250 <- sb %>% sampling_target(method = "ubSMOTE", seed = 1234L, perc.under = 250) smote250 %>% count(default)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.