ubUnder | R Documentation |
The function removes randomly some instances from the majority (negative) class and keeps all instances in the minority (positive) class in order to obtain a more balanced dataset. It allows two ways to perform undersampling: i) by setting the percentage of positives wanted after undersampling (percPos method), ii) by setting the sampling rate on the negatives, (percUnder method). For percPos, "perc"has to be (N.1/N * 100) <= perc <= 50, where N.1 is the number of positive and N the total number of instances. For percUnder, "perc"has to be (N.1/N.0 * 100) <= perc <= 100, where N.1 is the number of positive and N.0 the number of negative instances.
ubUnder(X, Y, perc = 50, method = "percPos", w = NULL)
X |
the input variables of the unbalanced dataset. |
Y |
the response variable of the unbalanced dataset. It must be a binary factor where the majority class is coded as 0 and the minority as 1. |
perc |
percentage of sampling. |
method |
method to perform under sampling ("percPos", "percUnder"). |
w |
weights used for sampling the majority class, if NULL all majority instances are sampled with equal weights |
The function returns a list:
X |
input variables |
Y |
response variable |
id.rm |
index of instances removed |
ubBalance
library(unbalanced) data(ubIonosphere) n<-ncol(ubIonosphere) output<-ubIonosphere$Class input<-ubIonosphere[ ,-n] data<-ubUnder(X=input, Y= output, perc = 40, method = "percPos") newData<-cbind(data$X, data$Y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.