delete_MAR_rank: Create MAR values using a ranking mechanism

View source: R/delete_rank.R

delete_MAR_rankR Documentation

Create MAR values using a ranking mechanism

Description

Create missing at random (MAR) values using a ranking mechanism in a data frame or a matrix

Usage

delete_MAR_rank(
  ds,
  p,
  cols_mis,
  cols_ctrl,
  n_mis_stochastic = FALSE,
  ties.method = "average",
  miss_cols,
  ctrl_cols
)

Arguments

ds

A data frame or matrix in which missing values will be created.

p

A numeric vector with length one or equal to length cols_mis; the probability that a value is missing.

cols_mis

A vector of column names or indices of columns in which missing values will be created.

cols_ctrl

A vector of column names or indices of columns, which controls the creation of missing values in cols_mis. Must be of the same length as cols_mis.

n_mis_stochastic

Logical, should the number of missing values be stochastic? If n_mis_stochastic = TRUE, the number of missing values for a column with missing values cols_mis[i] is a random variable with expected value nrow(ds) * p[i]. If n_mis_stochastic = FALSE, the number of missing values will be deterministic. Normally, the number of missing values for a column with missing values cols_mis[i] is round(nrow(ds) * p[i]). Possible deviations from this value, if any exists, are documented in Details.

ties.method

How ties are handled. Passed to rank.

miss_cols

Deprecated, use cols_mis instead.

ctrl_cols

Deprecated, use cols_ctrl instead.

Details

This function creates missing at random (MAR) values in the columns specified by the argument cols_mis. The probability for missing values is controlled by p. If p is a single number, then the overall probability for a value to be missing will be p in all columns of cols_mis. (Internally p will be replicated to a vector of the same length as cols_mis. So, all p[i] in the following sections will be equal to the given single number p.) Otherwise, p must be of the same length as cols_mis. In this case, the overall probability for a value to be missing will be p[i] in the column cols_mis[i]. The position of the missing values in cols_mis[i] is controlled by cols_ctrl[i]. The following procedure is applied for each pair of cols_ctrl[i] and cols_mis[i] to determine the positions of missing values:

At first, the probability for a value to be missing is calculated. This probability for a missing value in a row of cols_mis[i] is proportional to the rank of the value in cols_ctrl[i] in the same row. If n_mis_stochastic = FALSE these probabilities are given to the prob argument of sample. If n_mis_stochastic = TRUE, they are scaled to sum up to nrow(ds) * p[i]. Then for each probability a uniformly distributed random number is generated. If this random number is less than the probability, the value in cols_mis[i] is set NA.

The ranks are calculated via rank. The argument ties.method is directly passed to this function. Possible choices for ties.method are documented in rank.

For high values of p it is mathematically not possible to get probabilities proportional to the ranks. In this case, a warning is given. This warning can be silenced by setting the option missMethods.warn.too.high.p to false.

Value

An object of the same class as ds with missing values.

References

Santos, M. S., Pereira, R. C., Costa, A. F., Soares, J. P., Santos, J., & Abreu, P. H. (2019). Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access, 7, 11651-11667

See Also

rank, delete_MNAR_rank

Other functions to create MAR: delete_MAR_1_to_x(), delete_MAR_censoring(), delete_MAR_one_group()

Examples

ds <- data.frame(X = 1:20, Y = 101:120)
delete_MAR_rank(ds, 0.2, "X", "Y")

missMethods documentation built on Sept. 16, 2022, 5:08 p.m.