Description Usage Arguments Details Value References See Also
This function is identical to the function of the same name available on the
R package UBL
.
The function performs a random under-sampling strategy for
imbalanced regression problems. Essentially, a percentage of
cases of the "class(es)" (bumps below a relevance threshold defined)
selected by the user are randomly removed. Alternatively, the strategy
can be applied to either balance all the existing "classes"" or to
"smoothly invert" the frequency of the examples in each "class".
1 2 | RandUnderRegress(form, dat, rel = "auto", thr.rel = 0.5,
C.perc = "balance", repl = FALSE)
|
form |
A formula describing the prediction problem. |
dat |
A data frame containing the original imbalanced data set. |
rel |
The relevance function which can be automatically ("auto") determined (the default) or may be provided by the user through a matrix with interpolating points. |
thr.rel |
A number indicating the relevance threshold below which a case is considered as belonging to the normal "class". |
C.perc |
A vector containing the under-sampling percentage/s to apply to all/each "class" (bump) obtained with the relevance threshold. Examples are randomly removed from the "class(es)". If only one percentage is provided this value is reused in all the "classes" that have values below the relevance threshold. A different percentage can be provided to each "class". In this case, the percentages should be provided in ascending order of target variable value. The under-sampling percentage(s), should be a number below 1, meaning that the normal cases (cases below the threshold) are under-sampled by the corresponding percentage. If the number 1 is provided then those examples are not changed. Alternatively, C.perc parameter may be set to "balance" or "extreme", cases where the under-sampling percentages are automatically estimated to either balance or invert the frequencies of the examples in the "classes" (bumps). |
repl |
A boolean value controlling the possibility of having repetition of examples in the under-sampled data set. Defaults to FALSE. |
The only difference between this function and the original function is in the requirements
imposed on the argument C.perc.
This function performs a random under-sampling strategy for dealing with
imbalanced regression problems. The examples removed are randomly selected among the
examples belonging to the normal "class(es)" (bump of relevance below the threshold defined).
The user can chose one or more bumps to be under-sampled.
The function returns a data frame with the new data set resulting from the application of the random under-sampling strategy.
Paula Branco, Rita P. Ribeiro, Luis Torgo (2016)., UBL: an R Package for Utility-Based Learning, CoRR abs/1604.08079 [cs.MS], URL: http://arxiv.org/abs/1604.08079
RandUnderRegress
, RandOverRegress
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.