ubUnder: Under-sampling

View source: R/ubUnder.R

ubUnderR Documentation

Under-sampling

Description

The function removes randomly some instances from the majority (negative) class and keeps all instances in the minority (positive) class in order to obtain a more balanced dataset. It allows two ways to perform undersampling: i) by setting the percentage of positives wanted after undersampling (percPos method), ii) by setting the sampling rate on the negatives, (percUnder method). For percPos, "perc"has to be (N.1/N * 100) <= perc <= 50, where N.1 is the number of positive and N the total number of instances. For percUnder, "perc"has to be (N.1/N.0 * 100) <= perc <= 100, where N.1 is the number of positive and N.0 the number of negative instances.

Usage

ubUnder(X, Y, perc = 50, method = "percPos", w = NULL)

Arguments

X

the input variables of the unbalanced dataset.

Y

the response variable of the unbalanced dataset. It must be a binary factor where the majority class is coded as 0 and the minority as 1.

perc

percentage of sampling.

method

method to perform under sampling ("percPos", "percUnder").

w

weights used for sampling the majority class, if NULL all majority instances are sampled with equal weights

Value

The function returns a list:

X

input variables

Y

response variable

id.rm

index of instances removed

See Also

ubBalance

Examples

library(unbalanced)
data(ubIonosphere)
n<-ncol(ubIonosphere)
output<-ubIonosphere$Class
input<-ubIonosphere[ ,-n]

data<-ubUnder(X=input, Y= output, perc = 40,  method = "percPos")
newData<-cbind(data$X, data$Y)

dalpozz/unbalanced documentation built on June 3, 2022, 2:42 a.m.