WERCSRegress | R Documentation |
This function handles imbalanced regression problems using the relevance function provided to re-sample the data set. The relevance function is used to introduce replicas of the most important examples and to remove the least important examples.
WERCSRegress(form, dat, rel = "auto", thr.rel = NA,
C.perc = "balance", O = 0.5, U = 0.5)
form |
A formula describing the prediction problem |
dat |
A data frame containing the original (unbalanced) data set |
rel |
The relevance function which can be automatically ("auto") determined (the default) or may be provided by the user through a matrix with interpolating points. |
thr.rel |
The default is NA which means that no threshold is used when performing the over/under-sampling. In this case, the over-sampling is performed by assigning a higher probability for selecting an example to the examples with higher relevance. On the other hand, the under-sampling is performed by removing the examples with less relevance. The user may chose a number between 0 and 1 indicating the relevance threshold above which a case is considered as belonging to the rare "class". |
C.perc |
A list containing the percentage(s) of under- or/and
over-sampling to apply to each "class" obtained with the threshold. This parameter is only used when a relevance threshold (thr.rel) is set. Otherwise it is ignored. The |
O |
A number expressing the importance given to over-sampling when the thr.rel parameter is NA. When O increases the number of examples to include during the over-sampling step also increases. Default to 0.5. |
U |
A number expressing the importance given to under-sampling when the thr.rel parameter is NA. When U increases, the number of examples selected during the under-sampling step also increases. Defaults to 0.5. |
The function returns a data frame with the new data set resulting from the application of the importance sampling strategy.
Paula Branco paobranco@gmail.com, Rita Ribeiro rpribeiro@dcc.fc.up.pt and Luis Torgo ltorgo@dcc.fc.up.pt
RandUnderRegress, RandOverRegress
if (requireNamespace("DMwR2", quietly = TRUE)) {
data(algae, package ="DMwR2")
clean.algae <- data.frame(algae[complete.cases(algae), ])
# defining a threshold on the relevance
IS.ext <-WERCSRegress(a7~., clean.algae, rel = "auto",
thr.rel = 0.7, C.perc = "extreme")
IS.bal <-WERCSRegress(a7~., clean.algae, rel = "auto", thr.rel = 0.7,
C.perc = "balance")
myIS <-WERCSRegress(a7~., clean.algae, rel = "auto", thr.rel = 0.7,
C.perc = list(0.2, 6))
# neither threshold nor C.perc defined
IS.auto <- WERCSRegress(a7~., clean.algae, rel = "auto")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.