na.test: Little's Missing Completely at Random (MCAR) Test

View source: R/na.test.R

na.testR Documentation

Little's Missing Completely at Random (MCAR) Test

Description

This function performs Little's Missing Completely at Random (MCAR) test

Usage

na.test(x, digits = 2, p.digits = 3, as.na = NULL, check = TRUE, output = TRUE)

Arguments

x

a matrix or data frame with incomplete data, where missing values are coded as NA.

digits

an integer value indicating the number of decimal places to be used for displaying results.

p.digits

an integer value indicating the number of decimal places to be used for displaying the p-value.

as.na

a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis.

check

logical: if TRUE, argument specification is checked.

output

logical: if TRUE, output is shown.

Details

Little (1988) proposed a multivariate test of Missing Completely at Random (MCAR) that tests for mean differences on every variable in the data set across subgroups that share the same missing data pattern by comparing the observed variable means for each pattern of missing data with the expected population means estimated using the expectation-maximization (EM) algorithm (i.e., EM maximum likelihood estimates). The test statistic is the sum of the squared standardized differences between the subsample means and the expected population means weighted by the estimated variance-covariance matrix and the number of observations within each subgroup (Enders, 2010). Under the null hypothesis that data are MCAR, the test statistic follows asymptotically a chi-square distribution with \sum k_j - k degrees of freedom, where k_j is the number of complete variables for missing data pattern j, and k is the total number of variables. A statistically significant result provides evidence against MCAR.

Note that Little's MCAR test has a number of problems (see Enders, 2010). First, the test does not identify the specific variables that violates MCAR, i.e., the test does not identify potential correlates of missingness (i.e., auxiliary variables). Second, the test is based on multivariate normality, i.e., under departure from the normality assumption the test might be unreliable unless the sample size is large and is not suitable for categorical variables. Third, the test investigates mean differences assuming that the missing data pattern share a common covariance matrix, i.e., the test cannot detect covariance-based deviations from MCAR stemming from a Missing at Random (MAR) or Missing Not at Random (MNAR) mechanism because MAR and MNAR mechanisms can also produce missing data subgroups with equal means. Fourth, simulation studies suggest that Little's MCAR test suffers from low statistical power, particularly when the number of variables that violate MCAR is small, the relationship between the data and missingness is weak, or the data are MNAR (Thoemmes & Enders, 2007). Fifth, the test can only reject, but cannot prove the MCAR assumption, i.e., a statistically not significant result and failing to reject the null hypothesis of the MCAR test does not prove the null hypothesis that the data is MCAR. Finally, under the null hypothesis the data are actually MCAR or MNAR, while a statistically significant result indicates that missing data are MAR or MNAR, i.e., MNAR cannot be ruled out regardless of the result of the test.

This function is based on the prelim.norm function in the norm package which can handle about 30 variables. With more than 30 variables specified in the argument x, the prelim.norm function might run into numerical problems leading to results that are not trustworthy. In this case it is recommended to reduce the number of variables specified in the argument x. If the number of variables cannot be reduced, it is recommended to use the LittleMCAR function in the BaylorEdPsych package which can deal with up to 50 variables. However, this package was removed from the CRAN repository and needs to be obtained from the archive along with the mvnmle which is needed for using the LittleMCAR function. Note that the mcar_test function in the naniar package is also based on the prelim.norm function which results are not trustworthy whenever the warning message In norm::prelim.norm(data) : NAs introduced by coercion to integer range is printed on the console.

Value

Returns an object of class misty.object, which is a list with following entries:

call

function call

type

type of analysis

data

matrix or data frame specified in x

args

specification of function arguments

result

result table

Note

Code is adapted from the R function by Eric Stemmler: tinyurl.com/r-function-for-MCAR-test

Author(s)

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Enders, C. K. (2010). Applied missing data analysis. Guilford Press.

Thoemmes, F., & Enders, C. K. (2007, April). A structural equation model for testing whether data are missing completely at random. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Little, R. J. A. (1988). A test of Missing Completely at Random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202. https://doi.org/10.2307/2290157

See Also

as.na, na.as, na.auxiliary, na.coverage, na.descript, na.indicator, na.pattern, na.prop.

Examples

na.test(airquality)

misty documentation built on Nov. 15, 2023, 1:06 a.m.

Related to na.test in misty...