Description Usage Arguments Value References See Also
View source: R/sampling_methods.R
Based on randOverRegress (R
package UBL
).
This function performs a random over-sampling strategy for imbalanced
regression problems with a bias based on spatio-temporal
contextual information. Basically a percentage of cases of the "class(es)"
(bumps above a relevance threshold defined) selected by the user are randomly
over-sampled with a sampling bias based on a spatio-temporal weight.
Alternatively, it can either balance all the existing "classes"
(the default) or it can "smoothly invert" the frequency of the examples in each class.
1 2 3 4 |
form |
a model formula |
dat |
the original training set (with the unbalanced distribution) |
alpha |
weighting parameter for temporal and spatial re-sampling probabilities. Default 0.5 |
beta |
weighting parameter for spatiotemporal weight and phi for re-sampling probabilities. Default 0.9 |
rel |
relevance determined automatically (default) with uba package or provided by the user |
thr.rel |
relevance threshold above which a case is considered as belonging to the rare "class" |
epsilon |
minimum weight to be added to all observations. Default 1E-4 |
C.perc |
A vector containing the over-sampling percentage/s to apply to all/each "class" (bump) obtained with the relevance threshold. Replicas of the examples are are randomly added in each "class". If only one percentage is provided this value is reused in all the "classes" that have values above the relevance threshold. A different percentage can be provided to each "class". In this case, the percentages should be provided in ascending order of target variable value. The over-sampling percentage(s), should be numbers above 0, meaning that the important cases (cases above the threshold) are over-sampled by the corresponding percentage. If the number 1 is provided then the number of extreme examples will be doubled. Alternatively, C.perc parameter may be set to "balance" or "extreme", cases where the over-sampling percentages are automatically estimated to either balance or invert the frequencies of the examples in the "classes" (bumps). |
repl |
allowed to perform sampling with replacement |
type |
character string indicating the type of bias used. Default is "add". More types to be added in future work |
site_id |
the name of the column containing location IDs |
time |
the column name of the time-stamp |
sites_sf |
An sf obejct containing station and IDs and
geometry points of the locations. As an alternative, provide
|
lon |
the name of the column containing the location's longitude |
lat |
the name of the column containing the location's latitude |
crs |
the code for the Coordinate Reference System |
The function returns a data frame with the new data set resulting from the application of the spatio-temporally biased over-sampling strategy.
Paula Branco, Rita P. Ribeiro, Luis Torgo (2016)., UBL: an R Package for Utility-Based Learning, CoRR abs/1604.08079 [cs.MS], URL: http://arxiv.org/abs/1604.08079
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.