Over-/Undersample | R Documentation |
For classification purposes we might want to have balanced datasets. If the response variable has not a prevalence of 50%, we can sample records for getting as much response A cases as response B. This is called oversample. Undersample means to sample the (lower) number of cases A from the records of case B.
OverSample(x, vname)
UnderSample(x, vname)
x |
a data frame containing predictors and response |
vname |
the name of the response variable to be used to over/undersample |
a data frame with balanced response variable
Andri Signorell <andri@signorell.net>
BestCut
OverSample(d.pima2, "diabetes")
UnderSample(d.pima2, "diabetes")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.