na.test: Missing Completely at Random (MCAR) Test
In misty: Miscellaneous Functions 'T. Yanagida'

na.test

R Documentation

Missing Completely at Random (MCAR) Test

Description

This function performs Little's Missing Completely at Random (MCAR) test and Jamshidian and Jalal's approach for testing the MCAR assumption. By default, the function performs the Little's MCAR test.

Usage

na.test(data, ..., print = c("all", "little", "jamjal"),
        impdat = NULL, delete = 6, method = c("npar", "normal"),
        m = 20, seed = 123, nrep = 10000, n.min = 30,
        pool = c("m", "med", "min", "max", "random"),
        alpha = 0.05, digits = 2, p.digits = 3, as.na = NULL,
        write = NULL, append = TRUE, check = TRUE, output = TRUE)

Arguments

`data`	a data frame with incomplete data, where missing values are coded as `NA`.
`...`	an expression indicating the variable names in `data`, e.g., `na.test(dat, x1, x2, x3)`. Note that the operators `.`, `+`, `-`, `~`, `:`, `::`, and `!` can also be used to select variables, see 'Details' in the `df.subset` function.
`print`	a character vector indicating which results to be printed on the console, i.e. `"all"` for Little's MCAR test and Jamshidian and Jalal's approach, `"little"` (default) for Little's MCAR test, and `"jamjal"` for Jamshidian and Jalal's approach.
`impdat`	an object of class `mids` from the mice package to provide a data set multiply imputed in the mice package. The function will not impute the data data set specified in the argument `data` when specifying this argument and will use the imputed data sets provided in the argument `impdat` for performing the Jamshidian and Jalal's approach. Note that the argument `data` still needs to be specified because the variables used for the analysis are extracted from the data frame specified in `data`.
`delete`	an integer value indicating missing data patterns consisting of `delete` number of cases or less removed from the Jamshidian and Jalal's approach. The default setting is `delete = 6`.
`method`	a character string indicating the imputation method, i.e., `"npar"` for using a non-parametric imputation method by Sirvastava and Dolatabadi (2009) or `"normal"` for imputing missing data assuming that the data come from a multivariate normal distribution (see Jamshidian & Jalal, 2010).
`m`	an integer value indicating the number of multiple imputations. The default setting is `m = 20`.
`seed`	an integer value that is used as argument by the `set.seed` function for offsetting the random number generator before performing Jamshidian and Jalal's approach. The default setting is `seed = 123`. Set the value to `NULL` to specify a system selected seed.
`nrep`	an integer value indicating the replications used to simulate the Neyman distribution to determine the cut off value for the Neyman test. Larger values increase the accuracy of the Neyman test. The default setting is `nrep = 10000`.
`n.min`	an integer value indicating the minimum number of cases in a group that triggers the use of asymptotic Chi-square distribution in place of the empirical distribution in the Neyman test of uniformity.
`pool`	a character string indicating the pooling method, i.e., `"m"` for computing the average test statistic and p-values, `"med"` for computing the median test statistic and p-values, `"min"` for computing the maximum test statistic and minimum p-values, `"max"` for computing the minimum test statistic and maximum p-values, and `"random"` for randomly choosing a test statistic and corresponding p-value from repeated complete data analyses. The default setting is `pool = "med"`, i.e., median test statistic and p-values are computed as suggested by Eekhout, Wiel and Heymans (2017).
`alpha`	a numeric value between 0 and 1 indicating the significance level of the Hawkins test. The default setting is `alpha = 0.05`, i.e., the Anderson-Darling non-parametric test is provided when the p-value of the Hawkins test is less than or equal `0.05`.
`digits`	an integer value indicating the number of decimal places to be used for displaying results.
`p.digits`	an integer value indicating the number of decimal places to be used for displaying the p-value.
`as.na`	a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis.
`write`	a character string naming a text file with file extension `".txt"` (e.g., `"Output.txt"`) for writing the output into a text file.
`append`	logical: if `TRUE` (default), output will be appended to an existing text file with extension `.txt` specified in `write`, if `FALSE` existing text file will be overwritten.
`check`	logical: if `TRUE` (default), argument specification is checked.
`output`	logical: if `TRUE` (default), output is shown.

Details

Little's MCAR Test

Little (1988) proposed a multivariate test of Missing Completely at Random (MCAR) that tests for mean differences on every variable in the data set across subgroups that share the same missing data pattern by comparing the observed variable means for each pattern of missing data with the expected population means estimated using the expectation-maximization (EM) algorithm (i.e., EM maximum likelihood estimates). The test statistic is the sum of the squared standardized differences between the subsample means and the expected population means weighted by the estimated variance-covariance matrix and the number of observations within each subgroup (Enders, 2010). Under the null hypothesis that data are MCAR, the test statistic follows asymptotically a chi-square distribution with \sum k_j - k degrees of freedom, where k_j is the number of complete variables for missing data pattern j, and k is the total number of variables. A statistically significant result provides evidence against MCAR.

Note that Little's MCAR test has a number of problems (see Enders, 2010).

First, the test does not identify the specific variables that violates MCAR, i.e., the test does not identify potential correlates of missingness (i.e., auxiliary variables).
Second, the test is based on multivariate normality, i.e., under departure from the normality assumption the test might be unreliable unless the sample size is large and is not suitable for categorical variables.
Third, the test investigates mean differences assuming that the missing data pattern share a common covariance matrix, i.e., the test cannot detect covariance-based deviations from MCAR stemming from a Missing at Random (MAR) or Missing Not at Random (MNAR) mechanism because MAR and MNAR mechanisms can also produce missing data subgroups with equal means.
Fourth, simulation studies suggest that Little's MCAR test suffers from low statistical power, particularly when the number of variables that violate MCAR is small, the relationship between the data and missingness is weak, or the data are MNAR (Thoemmes & Enders, 2007).
Fifth, the test can only reject, but cannot prove the MCAR assumption, i.e., a statistically not significant result and failing to reject the null hypothesis of the MCAR test does not prove the null hypothesis that the data is MCAR.
Sixth, under the null hypothesis the data are actually MCAR or MNAR, while a statistically significant result indicates that missing data are MAR or MNAR, i.e., MNAR cannot be ruled out regardless of the result of the test.

The function for performing Little's MCAR test is based on the mlest function from the mvnmle package which can handle up to 50 variables. Note that the mcar_test function in the naniar package is based on the prelim.norm function from the norm package. This function can handle about 30 variables, but with more than 30 variables specified in the argument data, the prelim.norm function might run into numerical problems leading to results that are not trustworthy (i.e., p.value = 1). In that case, the warning message In norm::prelim.norm(data) : NAs introduced by coercion to integer range is printed on the console.

Jamshidian and Jalal's Approach for Testing MCAR

Jamshidian and Jalal (2010) proposed an approach for testing the Missing Completely at Random (MCAR) assumption based on two tests of multivariate normality and homogeneity of covariances among groups of cases with identical missing data patterns:

In the first step, missing data are multiply imputed (m = 20 times by default) using a non-parametric imputation method (method = "npar" by default) by Sirvastava and Dolatabadi (2009) or using a parametric imputation method assuming multivariate normality of data (method = "normal") for each group of cases sharing a common missing data pattern.
In the second step, a modified Hawkins test for multivariate normality and homogeneity of covariances applicable to complete data consisting of groups with a small number of cases is performed. A statistically not significant result indicates no evidence against multivariate normality of data or homogeneity of covariances, while a statistically significant result provides evidence against multivariate normality of data or homogeneity of covariances (i.e., violation of the MCAR assumption). Note that the Hawkins test is a test of multivariate normality as well as homogeneity of covariance. Hence, a statistically significant test is ambiguous unless the researcher assumes multivariate normality of data.
In the third step, if the Hawkins test is statistically significant, the Anderson-Darling non-parametric test is performed. A statistically not significant result indicates evidence against multivariate normality of data but no evidence against homogeneity of covariances, while a statistically significant result provides evidence against homogeneity of covariances (i.e., violation of the MCAR assumption). However, no conclusions can be made about the multivariate normality of data when the Anderson-Darling non-parametric test is statistically significant.

In summary, a statistically significant result of both the Hawkins and the Anderson-Darling non-parametric test provides evidence against the MCAR assumption. The test statistic and the significance values of the Hawkins test and the Anderson-Darling non-parametric based on multiply imputed data sets are pooled by computing the median test statistic and significance value (pool = "med" by default) as suggested by Eekhout, Wiel, and Heymans (2017).

Note that out of the problems listed for the Little's MCAR test the first, second (i.e., approach is not suitable for categorical variables), fifth, and sixth problems also apply to the Jamshidian and Jalal's approach for testing the MCAR assumption.

In practice, rejecting or not rejecting the MCAR assumption may not be relevant as modern missing data handling methods like full information maximum likelihood (FIML) estimation, Bayesian estimation, or multiple imputation are asymptotically valid under the missing at random (MAR) assumption (Jamshidian & Yuan, 2014). It is more important to distinguish MAR from missing not at random (MNAR), but MAR and MNAR mechanisms cannot be distinguished without auxiliary information.

Value

Returns an object of class misty.object, which is a list with following entries:

`call`	function call
`type`	type of analysis
`data`	matrix or data frame specified in `data`
`args`	specification of function arguments
`result`	list with result tables, i.e., `little` for the result table of the Little's MCAR test, `jamjal` for the list with results of the Jamshidian and Jalal's approach, `hawkins` for the result table of the Hawkins test, and `anderson` for the result table of the Anderson-Darling non-parametric test

Note

The code for Little's MCAR test is a modified copy of the LittleMCAR function in the BaylorEdPsych package by A. Alexander Beaujean. The code for Jamshidian and Jalal's approach is a modified copy of the TestMCARNormality function in the MissMech package by Mortaza Jamshidian, Siavash Jalal, Camden Jansen, and Mao Kobayashi (2024).

Author(s)

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Beaujean, A. A. (2012). BaylorEdPsych: R Package for Baylor University Educational Psychology Quantitative Courses. R package version 0.5. http://cran.nexr.com/web/packages/BaylorEdPsych/index.html

Eekhout, I., M. A. Wiel, & M. W. Heymans (2017). Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: Power and applicability analysis. BMC Medical Research Methodology, 17:129. https://doi.org/10.1186/s12874-017-0404-7

Enders, C. K. (2022). Applied missing data analysis (2nd ed.). The Guilford Press.

Little, R. J. A. (1988). A test of Missing Completely at Random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202. https://doi.org/10.2307/2290157

Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4), 649-674. https://doi.org/10.1007/s11336-010-9175-3

Jamshidian, M., & Yuan, K.H. (2014). Examining missing data mechanisms via homogeneity of parameters, homogeneity of distributions, and multivariate normality. WIREs Computational Statistics, 6(1), 56-73. https://doi.org/10.1002/wics.1287

Mortaza, J., Siavash, J., Camden, J., & Kobayashi, M. (2024). MissMech: Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random. R package version 1.0.4. https://doi.org/10.32614/CRAN.package.MissMech

Srivastava, M.S., & Dolatabadi, M. (2009). Multiple imputation and other resampling scheme for imputing missing observations. Journal of Multivariate Analysis, 100, 1919-1937. https://doi.org/10.1016/j.jmva.2009.06.003

Thoemmes, F., & Enders, C. K. (2007, April). A structural equation model for testing whether data are missing completely at random. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Examples

# Example 1: Perform Little's MCAR test and Jamshidian and Jalal's approach
na.test(airquality)

# Example 2: Perform Jamshidian and Jalal's approach
na.test(airquality, print = "jamjal")

## Not run: 
# Example 3: Write results into a text file
na.test(airquality, write = "NA_Test.txt")

## End(Not run)

misty documentation built on Aug. 18, 2025, 5:16 p.m.

misty index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

misty
Miscellaneous Functions 'T. Yanagida'

na.test: Missing Completely at Random (MCAR) Test
In misty: Miscellaneous Functions 'T. Yanagida'

Missing Completely at Random (MCAR) Test

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to na.test in misty...

R Package Documentation

Browse R Packages

We want your feedback!

misty Miscellaneous Functions 'T. Yanagida'

na.test: Missing Completely at Random (MCAR) Test In misty: Miscellaneous Functions 'T. Yanagida'

Missing Completely at Random (MCAR) Test

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to na.test in misty...

R Package Documentation

Browse R Packages

We want your feedback!

misty
Miscellaneous Functions 'T. Yanagida'

na.test: Missing Completely at Random (MCAR) Test
In misty: Miscellaneous Functions 'T. Yanagida'