simulateData: Simluate missing/censored data

Description Usage Arguments Value Examples

View source: R/simulateData.R

Description

Data generator for missing/censored data with Normal distribution.

Usage

1
2
3
simulateData(n, param.formula = list(mu = ~exp(x1) + x2, sigma =
  ~sqrt(x2)), name = "x1", subset = NULL, prob = 0.8, damage = 1/3,
  family = "NO", correlation = NULL)

Arguments

n

Number of generated observations.

param.formula

list. Formulas of the parameters to be estimated.

name

character. Specifies variable name to be defected.

subset

formula. States a condition (e.g. ~x1 > 0.6) which specifies the fraction of observations, that are to be defected.
Default: The entire dataset is potentially subject to defect. Note, that if 'subset' does not exclusivly use the 'name' variable, this implies that the independence assumption of MICE is not met (on purpose).
e.g. of unmet condition: (x2 < 0.3 & x3 < 0.2).

prob

numeric value. Specifies the binomial probability for each observation in 'subset' to be defected.

damage

By users defintion, it specifies what type and how the data is to be defected. 'damage' = NA generates missing data. A value between [0, 1] implies right censoring (e.g. 'damage' = 1/3), [1,...] left censoring. The value is used to multiply the true value of 'name' in order to defect the data. The generalization for fixed interval factors is 'damage' = list(1/3, 4/3), where the values specifiy the factor for the lower and the upper bound respectively. More realistic examples can be generated with vector valued 'damage': If 'damage' = c(0.1, 1) is a vector of length 2, it specifies the min and max value of a uniform distribution, from which a factor is randomly drawn for each observation with which the true data is multiplied. The generalization for random interval factors is 'damage' = list(c(0.2, 1), c(1,3)), where the first vector specifies the unif interval for factors affecting the lower bound and the second affecting the upper bound. NOTE: if a list is provided, both members must either vectors or single values.

family

character. Specifies the gamlss family, from which the dependent variable is drawn, e.g. 'NO'.

correlation

matrix. If a correlation/covariance matrix is provided, the drawn variables are uniformely drawn, but correlated according to this matrix.

Value

List of Dataframes. 'truedata' and 'defected' are dataframes containing the dependent (generated according to the param.formula list), the generated covariates, and a censoring/missing 'indicator' The mere difference between the two dataframes is, that 'defected' has artificially generated censored/missing values according to the 'defect' specification.

Examples

1
2
3
4
5
6
# missing: damage = NA
# right: damage = ~ 1/3*x1
# rightRandom:  damage = c(0.01, 1)
# left:  damage = 4/3
# intervalfix: damage = list(1/3, 4/3)
# intervalRandom: damage = list(c(0.01, 1), c(1.01, 2))

TiStat/Imputegamlss documentation built on May 20, 2019, 9:25 a.m.