contaminate: Contaminate data

Description Usage Arguments Details Value Methods Note Author(s) References See Also Examples

Description

Generic function for contaminating data.

Usage

1
2
3
4
contaminate(x, control, ...)

## S4 method for signature 'data.frame,ContControl'
contaminate(x, control, i)

Arguments

x

the data to be contaminated.

control

a control object of a class inheriting from the virtual class "VirtualContControl" or a character string specifying such a control class (the default being "DCARContControl").

i

an integer giving the element of the slot epsilon of control to be used as contamination level.

...

if control is a character string or missing, the slots of the control object may be supplied as additional arguments. See "DCARContControl" and "DARContControl" for details on the slots.

Details

With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.

In order to extend the framework by a user-defined control class "MyContControl" (which must extend "VirtualContControl"), a method contaminate(x, control, i) with signature 'data.frame, MyContControl' needs to be implemented. In case the contaminated observations need to be identified at a later stage of the simulation, e.g., if conflicts with inserting missing values should be avoided, a logical indicator variable ".contaminated" should be added to the returned data set.

Value

A data.frame containing the contaminated data. In addition, the column ".contaminated", which consists of logicals indicating the contaminated observations, is added to the data.frame.

Methods

x = "data.frame", control = "character"

contaminate data using a control class specified by the character string control. The slots of the control object may be supplied as additional arguments.

x = "data.frame", control = "ContControl"

contaminate data as defined by the control object control.

x = "data.frame", control = "missing"

contaminate data using a control object of class "ContControl". Its slots may be supplied as additional arguments.

Note

Since version 0.3, contaminate no longer checks if the auxiliary variable with probability weights are numeric and contain only finite positive values (sample still throws an error in these cases). This has been removed to improve computational performance in simulation studies.

Author(s)

Andreas Alfons

References

Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.

Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.

Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.

Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.

See Also

"DCARContControl", "DARContControl", "ContControl", "VirtualContControl"

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)

# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
    dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)

# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05,
    dots = list(mean = 5e+05, sd = 10000))


## distributed at random
foo <- generate(size = 10, distribution = rnorm,
    dots = list(mean = 0, sd = 2))

# using a control object
darc <- DARContControl(target = "V1",
    epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)

# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V1",
    epsilon = 0.2, fun = function(x) x * 100)

simFrame documentation built on Oct. 14, 2021, 5:24 p.m.