Description Usage Arguments Details Value Methods Note Author(s) References See Also Examples
Generic function for contaminating data.
1 2 3 4 | contaminate(x, control, ...)
## S4 method for signature 'data.frame,ContControl'
contaminate(x, control, i)
|
x |
the data to be contaminated. |
control |
a control object of a class inheriting from the virtual class
|
i |
an integer giving the element of the slot |
... |
if |
With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.
In order to extend the framework by a user-defined control class
"MyContControl"
(which must extend
"VirtualContControl"
), a method
contaminate(x, control, i)
with signature
'data.frame, MyContControl'
needs to be implemented. In case the
contaminated observations need to be identified at a later stage of the
simulation, e.g., if conflicts with inserting missing values should be
avoided, a logical indicator variable ".contaminated"
should be added
to the returned data set.
A data.frame
containing the contaminated data. In addition, the
column ".contaminated"
, which consists of logicals indicating the
contaminated observations, is added to the data.frame
.
x = "data.frame", control = "character"
contaminate data using
a control class specified by the character string control
. The
slots of the control object may be supplied as additional arguments.
x = "data.frame", control = "ContControl"
contaminate data as
defined by the control object control
.
x = "data.frame", control = "missing"
contaminate data using a
control object of class "ContControl"
. Its slots may be supplied as
additional arguments.
Since version 0.3, contaminate
no longer checks if the auxiliary
variable with probability weights are numeric and contain only finite positive
values (sample
still throws an error in these cases). This has
been removed to improve computational performance in simulation studies.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
"DCARContControl"
, "DARContControl"
,
"ContControl"
, "VirtualContControl"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)
# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)
# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000))
## distributed at random
foo <- generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
# using a control object
darc <- DARContControl(target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)
# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.