delete_MAR_rank | R Documentation |
Create missing at random (MAR) values using a ranking mechanism in a data frame or a matrix
delete_MAR_rank( ds, p, cols_mis, cols_ctrl, n_mis_stochastic = FALSE, ties.method = "average", miss_cols, ctrl_cols )
ds |
A data frame or matrix in which missing values will be created. |
p |
A numeric vector with length one or equal to length |
cols_mis |
A vector of column names or indices of columns in which missing values will be created. |
cols_ctrl |
A vector of column names or indices of columns, which
controls the creation of missing values in |
n_mis_stochastic |
Logical, should the number of missing values be
stochastic? If |
ties.method |
How ties are handled. Passed to |
miss_cols |
Deprecated, use |
ctrl_cols |
Deprecated, use |
This function creates missing at random (MAR) values in the columns
specified by the argument cols_mis
.
The probability for missing values is controlled by p
.
If p
is a single number, then the overall probability for a value to
be missing will be p
in all columns of cols_mis
.
(Internally p
will be replicated to a vector of the same length as
cols_mis
.
So, all p[i]
in the following sections will be equal to the given
single number p
.)
Otherwise, p
must be of the same length as cols_mis
.
In this case, the overall probability for a value to be missing will be
p[i]
in the column cols_mis[i]
.
The position of the missing values in cols_mis[i]
is controlled by
cols_ctrl[i]
.
The following procedure is applied for each pair of cols_ctrl[i]
and
cols_mis[i]
to determine the positions of missing values:
At first, the probability for a value to be missing is calculated. This
probability for a missing value in a row of cols_mis[i]
is
proportional to the rank of the value in cols_ctrl[i]
in the same row.
If n_mis_stochastic = FALSE
these probabilities are given to the
prob
argument of sample
. If n_mis_stochastic
= TRUE
, they are scaled to sum up to nrow(ds) * p[i]
. Then for each
probability a uniformly distributed random number is generated. If this
random number is less than the probability, the value in cols_mis[i]
is set NA
.
The ranks are calculated via rank
.
The argument ties.method
is directly passed to this function.
Possible choices for ties.method
are documented in
rank
.
For high values of p
it is mathematically not possible to get
probabilities proportional to the ranks. In this case, a warning is given.
This warning can be silenced by setting the option
missMethods.warn.too.high.p
to false.
An object of the same class as ds
with missing values.
Santos, M. S., Pereira, R. C., Costa, A. F., Soares, J. P., Santos, J., & Abreu, P. H. (2019). Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access, 7, 11651-11667
rank
, delete_MNAR_rank
Other functions to create MAR:
delete_MAR_1_to_x()
,
delete_MAR_censoring()
,
delete_MAR_one_group()
ds <- data.frame(X = 1:20, Y = 101:120) delete_MAR_rank(ds, 0.2, "X", "Y")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.