econModel: Meaningful Social and Economic Data from ALFRED and elsewhere

Description Usage Arguments Value References Examples

From an xts object, produce more or less jittered or duplicate nearby observations. The workhorse package here is the R CRAN package UBL (Utility Based Learning) and its *Regress functions. This is a smart(er) wrapper.

GaussNoiseRegress : function (form, dat, rel = "auto", thr.rel = 0.5,
                              C.perc = "balance", pert = 0.1, repl = FALSE)

# default
# from current data, makes "exact replicated" copies
ImpSampRegress : function (form, dat, rel = "auto", thr.rel = NA,
                           C.perc = "balance", O = 0.5, U = 0.5)

RandOverRegress : function (form, dat, rel = "auto", thr.rel = 0.5,
                            C.perc = "balance", repl = TRUE)

# from current data, makes "jittered" copies
RandUnderRegress : function (form, dat, rel = "auto", thr.rel = 0.5,
                             C.perc = "balance", repl = FALSE)

SmoteRegress : function (form, dat, rel = "auto", thr.rel = 0.5,
                         C.perc = "balance", k = 5, repl = FALSE,
                         dist = "Euclidean", p = 2)

UtilOptimRegress : function (form, train, test, type = "util",
                             strat = "interpol",
                             strat.parms = list(method = "bilinear"),
                             control.parms, m.pts, minds, maxds, eps = 0.1)

# Help with UtilOptimRegress(just above) parameter control.parms

    phi.control : function (y, method = "extremes", extr.type = "both",
                            coef = 1.5, control.pts = NULL)

rebalanceData(
  x,
  x2 = NULL,
  Fmla = NULL,
  TrainDates = NULL,
  TestDates = NULL,
  UBLFunction = NULL,
  ...
)

`x`	xts object of training data. Default is none. Required.
`x2`	xts object of testing data. Default is NULL. Required in UtilOptimRegress. Only used in UtilOptimRegress. Otherwise an error.
`Fmla`	Default is NULL. Required. Formula that is sent to the UBL function.
`TrainDates`	Default is NULL. Not Required. Absolute training start dates(times) and end dates(times) as a vector of a pair. Alternately, this can be a list of vectors of pairs.
`TestDates`	Default is NULL. Not Required. This parameter can only be used with UtilOptimRegress. Absolute testing start dates(times) and end dates(times) as a vector of a pair. Alternately, this can be a list of vectors of pairs.
`UBLFunction`	Default is NULL. Default is the ImpSampRegress function. Not Required. An R Package UBL *Regress function. Enter the functoin name enclosed in a "string" or bare function name.
`...`	Dots passed to the UBL function. Defaults follow. thr.rel = 0.5. C.perc = list(1, 2) : means make the important data to be from single in size to double in size. Relevance function (rel): xts coredata values greater than zero are important. In opposite, xts coredata values less than zero are not important.

Modified xts that ahs removed data and/or has duplicate(multiplicate) index items at the same time points in time with the "jittered" coredata values or "exact replicated" coredata values.

SmoteRegress challenges #2 https://github.com/paobranco/UBL/issues/2

question about new/replicated UBL data and range of creation area #3 https://github.com/paobranco/UBL/issues/3

P. Branco, L. Torgo and R.P. Ribeiro, Pre-processing approaches for imbalanced distributions in regression, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.11.100 https://web.cs.dal.ca/~branco/PDFfiles/j14.pdf

Volume 74 by the Proceedings of Machine Learning Research on 11 October 2017 https://github.com/mlresearch/v74

(BROKEN LINK) Luis Torgo: Learning with Imbalanced Domains, a tutorial, 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications Co-located with ECML/PKDD 2018 http://lidta.dcc.fc.up.pt/Slides/TutorialLIDTA.pdf

Paula Branco, Rita P. Ribeiro, Luis Torgo: UBL: an R package for Utility-based Learning, (Submitted on 27 Apr 2016 (v1), last revised 12 Jul 2016 (this version, v2)) https://arxiv.org/abs/1604.0807

Ribeiro, R.P.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011), Chapter 3 Utility-based Regression https://www.dcc.fc.up.pt/~rpribeiro/publ/rpribeiroPhD11.pdf

Paula Branco and Luis Torgo and Rita Ribeiro: A Survey of Predictive Modeling on Imbalanced Domains, ACM Comput. Surv., 2016 volume 49 number 2-31 https://web.cs.dal.ca/~ltorgo/publication/2016_btr16/2016_BTR16.pdf

## Not run: 
set.seed(1L)
DataValues <- data.frame(x = as.numeric(seq_len(1000)), y = rnorm(1000, 0, 1))
row.names(DataValues) <- seq_len(1000)

table(DataValues$y > 0.00)
FALSE  TRUE
518   482

# Relevance function
Rlvce <- matrix(c(-0.01, 0, 0, 0.00, 0.5, 0.5, 0.01, 1, 0), ncol = 3, byrow = T,
                dimnames = list(
                  yvalues = character(),
                  col = c("yvalues", "relevance", "slope_of_y_values")
                )
)

# Relevant observations: import to me.
# I want MORE of these "relevant" observations
# (compared to "not very relevant" observations.)
#
# yvalues: negative(-) values are not VERY relevant
# yvalues: positive(+) values are VERY relevant
# relevance column:  0 - not very relevant, 1 - very relevant
#
# Relevance function defines a graphic with a smooth non-strait line
# It uses exactly only: yvalues and slope_of_yvalues
# see the references.
# This Relevance function is a curved line of half of a hill.
#
Rlvce
# +/-
col
yvalues yvalues relevance slope_of_y_values
[1,]   -0.01       0.0               0.0 # yvalues less than thr.rel (bottom of hill)
[2,]    0.00       0.5               0.5 # relevance col: thr.rel = 0.5
[3,]    0.01       1.0               0.0 # yvalues greater than thr.rel (top of hill)

# default "threashold of relevance" (thr.rel) between "not very relevant" and "relavent"
# ranges
# "thr.rel = 0.5"                                                # [1,]->[2,] [2,]->[3,]
Results <- UBL::SmoteRegress(y ~ ., DataValues, rel = Rlvce, C.perc = list(0.5, 2.5))

# no change
Results <- UBL::SmoteRegress(y ~ ., DataValues, rel = Rlvce, C.perc = list(1, 1))
> identical(sort(DataValues[,"x"]), sort(Results[,"x"]))
[1] TRUE
> identical(sort(DataValues[,"y"]), sort(Results[,"y"]))
[1] TRUE

# new jitters of the current data
#
# double the number of (important) revelant observations
# default "thr.rel = 0.5"
#                                           # 100% percent, # 200% percent
Results <- UBL::SmoteRegress(y ~ ., DataValues, rel = Rlvce, C.perc = list(1, 2))

table(Results$y > 0.00)
FALSE  TRUE
518   964

# new replicas of the current data
#
# default "thr.rel = NA" # to create/destroy obs like smote (thr.rel = 0.5)
Results <- UBL::ImpSampRegress(y ~ ., DataValues, rel = Rlvce, thr.rel = 0.5, C.perc = list(1, 2))
table(Results$y > 0.00)
FALSE  TRUE
518   964

# see the replicated data points
tail(Results[order(Results$x),],30)
# Results[order(as.integer(row.names(Results))),]

# half the number of (un-important) not very relevant observations
#
Results <- UBL::SmoteRegress(y ~ ., DataValues, rel = Rlvce, C.perc = list(0.5, 1))
table(Results$y > 0.00)
FALSE  TRUE
259   482

Results <- UBL::ImpSampRegress(y ~ ., DataValues, rel = Rlvce, thr.rel = 0.5, C.perc = list(0.5, 1))
table(Results$y > 0.00)
FALSE  TRUE
259   482

# xts object

DataIndex  <- zoo::as.Date(0L:999L)
DataXts <- xts::as.xts(DataValues, DataIndex, dateFormat= "Date")
table(DataXts[,"y"] > 0.00)

# double the "important" data (jitters)
ResultsXts <- rebalanceData(y ~ ., DataXts, UBLFunction = "UBL::SmoteRegress")
table(ResultsXts[,"y"] > 0.00)

# double the "important" data (exact data)
ResultsXts <- rebalanceData(y ~ ., DataXts)
table(ResultsXts[,"y"] > 0.00)

# half the "not important" data
ResultsXts <- rebalanceData(y ~ ., DataXts, C.perc = list(0.5, 1))
table(ResultsXts[,"y"] > 0.00)

## End(Not run)

AndreMikulec/econModel documentation built on June 30, 2021, 9:48 a.m.

AndreMikulec/econModel index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AndreMikulec/econModel
Meaningful Social and Economic Data from ALFRED and elsewhere

rebalanceData: Create/Remove More or Less Observations
In AndreMikulec/econModel: Meaningful Social and Economic Data from ALFRED and elsewhere

Description

Usage

Arguments

Value

References

Examples

Related to rebalanceData in AndreMikulec/econModel...

R Package Documentation

Browse R Packages

We want your feedback!

AndreMikulec/econModel Meaningful Social and Economic Data from ALFRED and elsewhere

rebalanceData: Create/Remove More or Less Observations In AndreMikulec/econModel: Meaningful Social and Economic Data from ALFRED and elsewhere

Description

Usage

Arguments

Value

References

Examples

Related to rebalanceData in AndreMikulec/econModel...

R Package Documentation

Browse R Packages

We want your feedback!

AndreMikulec/econModel
Meaningful Social and Economic Data from ALFRED and elsewhere

rebalanceData: Create/Remove More or Less Observations
In AndreMikulec/econModel: Meaningful Social and Economic Data from ALFRED and elsewhere