addMissing: Add missing values to a vector given a MCAR, MAR, or MNAR...
In SimDesign: Structure for Organizing Monte Carlo Simulation Designs

addMissing

R Documentation

Add missing values to a vector given a MCAR, MAR, or MNAR scheme

Description

Given an input vector, replace elements of this vector with missing values according to some scheme. Default method replaces input values with a MCAR scheme (where on average 10% of the values will be replaced with NAs). MAR and MNAR are supported by replacing the default FUN argument.

Usage

addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)

Arguments

`y`	an input vector that should contain missing data in the form of `NA`'s
`fun`	a user defined function indicating the missing data mechanism for each element in `y`. Function must return a vector of probability values with the length equal to the length of `y`. Each value in the returned vector indicates the probability that the respective element in y will be replaced with `NA`. Function must contain the argument `y`, representing the input vector, however any number of additional arguments can be included
`...`	additional arguments to be passed to `FUN`

Details

Given an input vector y, and other relevant variables inside (X) and outside (Z) the data-set, the three types of missingness are:

MCAR: Missing completely at random (MCAR). This is realized by randomly sampling the values of the input vector (y) irrespective of the possible values in X and Z. Therefore missing values are randomly sampled and do not depend on any data characteristics and are truly random
MAR: Missing at random (MAR). This is realized when values in the dataset (X) predict the missing data mechanism in y; conceptually this is equivalent to P(y = NA | X). This requires the user to define a custom missing data function
MNAR: Missing not at random (MNAR). This is similar to MAR except that the missing mechanism comes from the value of y itself or from variables outside the working dataset; conceptually this is equivalent to P(y = NA | X, Z, y). This requires the user to define a custom missing data function

Value

the input vector y with the sampled NA values (according to the FUN scheme)

Author(s)

Phil Chalmers rphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations with the SimDesign Package. The Quantitative Methods for Psychology, 16(4), 248-280. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.20982/tqmp.16.4.p248")}

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24(3), 136-156. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10691898.2016.1246953")}

Examples

## Not run: 

set.seed(1)
y <- rnorm(1000)

## 10% missing rate with default FUN
head(ymiss <- addMissing(y), 10)

## 50% missing with default FUN
head(ymiss <- addMissing(y, rate = .5), 10)

## missing values only when female and low
X <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE),
                level = sample(c('high', 'low'), 1000, replace=TRUE))
head(X)

fun <- function(y, X, ...){
    p <- rep(0, length(y))
    p[X$group == 'female' & X$level == 'low'] <- .2
    p
}

ymiss <- addMissing(y, X, fun=fun)
tail(cbind(ymiss, X), 10)

## missingness as a function of elements in X (i.e., a type of MAR)
fun <- function(y, X){
   # missingness with a logistic regression approach
   df <- data.frame(y, X)
   mm <- model.matrix(y ~ group + level, df)
   cfs <- c(-5, 2, 3) #intercept, group, and level coefs
   z <- cfs %*% t(mm)
   plogis(z)
}

ymiss <- addMissing(y, X, fun=fun)
tail(cbind(ymiss, X), 10)

## missing values when y elements are large (i.e., a type of MNAR)
fun <- function(y) ifelse(abs(y) > 1, .4, 0)
ymiss <- addMissing(y, fun=fun)
tail(cbind(y, ymiss), 10)


## End(Not run)

SimDesign documentation built on April 3, 2025, 6:06 p.m.