RandomResample: Random re-sampling for imbalanced regression problems (with...

Description Usage Arguments References See Also

Description

Random re-sampling for imbalanced regression problems (with spatio-temporal bias)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
RandomResample(
  form,
  dat,
  type,
  C.perc,
  thr.rel,
  rel = "auto",
  cf = 1.5,
  repl = ifelse(type == "under", FALSE, TRUE),
  pert = 0.1,
  time = NULL,
  site_id = NULL,
  bias = FALSE,
  ...
)

Arguments

form

A formula describing the prediction problem.

dat

A data frame containing the original imbalanced data set.

type

Type of re-sampling to apply. Can be one of "under", "over", and "gauss", depending on whether the user wants to under-sample normal cases, over-sample extreme cases or add Gaussian noise to replicated extreme cases.

C.perc

Vector containing percentage values (or a single value that will be used for all bumps. In under-sampling, C.perc of the size of each bump of normal values will be kept in the final data set. In the case of over-sampling and Gaussian noise, C.perc of the size of each bump of extreme values will be added to the final data set. Bumps are ordered in ascending order of the target value.

thr.rel

A number indicating the relevance threshold below which a case is considered as belonging to the normal "class".

rel

The relevance function which can be automatically ("auto") determined (the default) or may be provided by the user through a matrix with the interpolating points.

cf

Parameter needed if rel = 'auto'. The default is 1.5.

repl

A Boolean value controlling whether replication is allowed when re-sampling observations. Defaults to FALSE when under-sampling and to TRUE when over-sampling or adding Gaussian noise.

pert

Standard deviation of gaussian noise as a percentage of of each variable original standard deviation. Only necessary if type = "gauss"

time

Column name of the time-stamp (if available). Only necessary if bias = TRUE or type = "gauss"

site_id

Column containing location IDs (if available). Only necessary if bias = TRUE or type = "gauss"

bias

Boolean indicating whether spatio-temporal bias should be factored in while re-sampling

...

Parameters to feed to sample_wts in case bias = TRUE.

References

Paula Branco, Rita P. Ribeiro, Luis Torgo (2016)., UBL: an R Package for Utility-Based Learning, CoRR abs/1604.08079 [cs.MS], URL: http://arxiv.org/abs/1604.08079

See Also

RandUnderRegress, link{sample_wts}.


mrfoliveira/STResampling-JDSA2020 documentation built on June 28, 2021, 7:01 p.m.