na.test | R Documentation |
This function performs Little's Missing Completely at Random (MCAR) test and Jamshidian and Jalal's approach for testing the MCAR assumption. By default, the function performs the Little's MCAR test.
na.test(..., data = NULL, print = c("all", "little", "jamjal"),
impdat = NULL, delete = 6, method = c("npar", "normal"),
m = 20, seed = 123, nrep = 10000, n.min = 30,
pool = c("m", "med", "min", "max", "random"),
alpha = 0.05, digits = 2, p.digits = 3, as.na = NULL,
write = NULL, append = TRUE, check = TRUE, output = TRUE)
... |
a matrix or data frame with incomplete data, where missing
values are coded as |
data |
a data frame when specifying one or more variables in the
argument |
print |
a character vector indicating which results to be printed on
the console, i.e. |
impdat |
an object of class |
delete |
an integer value indicating missing data patterns consisting
of |
method |
a character string indicating the imputation method, i.e.,
|
m |
an integer value indicating the number of multiple imputations.
The default setting is |
seed |
an integer value that is used as argument by the |
nrep |
an integer value indicating the replications used to simulate
the Neyman distribution to determine the cut off value for the
Neyman test. Larger values increase the accuracy of the Neyman
test. The default setting is |
n.min |
an integer value indicating the minimum number of cases in a group that triggers the use of asymptotic Chi-square distribution in place of the empirical distribution in the Neyman test of uniformity. |
pool |
a character string indicating the pooling method, i.e.,
|
alpha |
a numeric value between 0 and 1 indicating the significance
level of the Hawkins test. The default setting is |
digits |
an integer value indicating the number of decimal places to be used for displaying results. |
p.digits |
an integer value indicating the number of decimal places to be used for displaying the p-value. |
as.na |
a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. |
write |
a character string naming a text file with file extension
|
append |
logical: if |
check |
logical: if |
output |
logical: if |
Little (1988) proposed a multivariate test of Missing Completely at Random
(MCAR) that tests for mean differences on every variable in the data set
across subgroups that share the same missing data pattern by comparing the
observed variable means for each pattern of missing data with the expected
population means estimated using the expectation-maximization (EM) algorithm
(i.e., EM maximum likelihood estimates). The test statistic is the sum of
the squared standardized differences between the subsample means and the
expected population means weighted by the estimated variance-covariance
matrix and the number of observations within each subgroup (Enders, 2010).
Under the null hypothesis that data are MCAR, the test statistic follows
asymptotically a chi-square distribution with \sum k_j - k
degrees of
freedom, where k_j
is the number of complete variables for missing data
pattern j
, and k
is the total number of variables. A statistically
significant result provides evidence against MCAR.
Note that Little's MCAR test has a number of problems (see Enders, 2010).
First, the test does not identify the specific variables that violates MCAR, i.e., the test does not identify potential correlates of missingness (i.e., auxiliary variables).
Second, the test is based on multivariate normality, i.e., under departure from the normality assumption the test might be unreliable unless the sample size is large and is not suitable for categorical variables.
Third, the test investigates mean differences assuming that the missing data pattern share a common covariance matrix, i.e., the test cannot detect covariance-based deviations from MCAR stemming from a Missing at Random (MAR) or Missing Not at Random (MNAR) mechanism because MAR and MNAR mechanisms can also produce missing data subgroups with equal means.
Fourth, simulation studies suggest that Little's MCAR test suffers from low statistical power, particularly when the number of variables that violate MCAR is small, the relationship between the data and missingness is weak, or the data are MNAR (Thoemmes & Enders, 2007).
Fifth, the test can only reject, but cannot prove the MCAR assumption, i.e., a statistically not significant result and failing to reject the null hypothesis of the MCAR test does not prove the null hypothesis that the data is MCAR.
Sixth, under the null hypothesis the data are actually MCAR or MNAR, while a statistically significant result indicates that missing data are MAR or MNAR, i.e., MNAR cannot be ruled out regardless of the result of the test.
The function for performing Little's MCAR test is based on the mlest
function from the mvnmle package which can handle up to 50 variables.
Note that the mcar_test
function in the naniar package is based
on the prelim.norm
function from the norm package. This function
can handle about 30 variables, but with more than 30 variables specified in
the argument data
, the prelim.norm
function might run into
numerical problems leading to results that are not trustworthy (i.e.,
p.value = 1
). In that case, the warning message
In norm::prelim.norm(data) : NAs introduced by coercion to integer range
is printed on the console.
Jamshidian and Jalal (2010) proposed an approach for testing the Missing Completely at Random (MCAR) assumption based on two tests of multivariate normality and homogeneity of covariances among groups of cases with identical missing data patterns:
In the first step, missing data are multiply imputed
(m = 20
times by default) using a non-parametric imputation method
(method = "npar"
by default) by Sirvastava and Dolatabadi (2009)
or using a parametric imputation method assuming multivariate normality
of data (method = "normal"
) for each group of cases sharing a common
missing data pattern.
In the second step, a modified Hawkins test for multivariate normality and homogeneity of covariances applicable to complete data consisting of groups with a small number of cases is performed. A statistically not significant result indicates no evidence against multivariate normality of data or homogeneity of covariances, while a statistically significant result provides evidence against multivariate normality of data or homogeneity of covariances (i.e., violation of the MCAR assumption). Note that the Hawkins test is a test of multivariate normality as well as homogeneity of covariance. Hence, a statistically significant test is ambiguous unless the researcher assumes multivariate normality of data.
In the third step, if the Hawkins test is statistically significant, the Anderson-Darling non-parametric test is performed. A statistically not significant result indicates evidence against multivariate normality of data but no evidence against homogeneity of covariances, while a statistically significant result provides evidence against homogeneity of covariances (i.e., violation of the MCAR assumption). However, no conclusions can be made about the multivariate normality of data when the Anderson-Darling non-parametric test is statistically significant.
In summary, a statistically significant result of both the Hawkins and the
Anderson-Darling non-parametric test provides evidence against the MCAR assumption.
The test statistic and the significance values of the Hawkins test and the
Anderson-Darling non-parametric based on multiply imputed data sets are pooled
by computing the median test statistic and significance value (pool = "med"
by default) as suggested by Eekhout, Wiel, and Heymans (2017).
Note that out of the problems listed for the Little's MCAR test the first, second (i.e., approach is not suitable for categorical variables), fifth, and sixth problems also apply to the Jamshidian and Jalal's approach for testing the MCAR assumption.
In practice, rejecting or not rejecting the MCAR assumption may not be relevant as modern missing data handling methods like full information maximum likelihood (FIML) estimation, Bayesian estimation, or multiple imputation are asymptotically valid under the missing at random (MAR) assumption (Jamshidian & Yuan, 2014). It is more important to distinguish MAR from missing not at random (MNAR), but MAR and MNAR mechanisms cannot be distinguished without auxiliary information.
Returns an object of class misty.object
, which is a list with following
entries:
call |
function call |
type |
type of analysis |
data |
matrix or data frame specified in |
args |
specification of function arguments |
result |
list with result tables, i.e., |
The code for Little's MCAR test is a modified copy of the LittleMCAR
function in the BaylorEdPsych package by A. Alexander Beaujean. The code
for Jamshidian and Jalal's approach is a modified copy of the TestMCARNormality
function in the MissMech package by Mortaza Jamshidian, Siavash Jalal,
Camden Jansen, and Mao Kobayashi (2024).
Takuya Yanagida takuya.yanagida@univie.ac.at
Beaujean, A. A. (2012). BaylorEdPsych: R Package for Baylor University Educational Psychology Quantitative Courses. R package version 0.5. http://cran.nexr.com/web/packages/BaylorEdPsych/index.html
Eekhout, I., M. A. Wiel, & M. W. Heymans (2017). Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: Power and applicability analysis. BMC Medical Research Methodology, 17:129. https://doi.org/10.1186/s12874-017-0404-7
Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
Little, R. J. A. (1988). A test of Missing Completely at Random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202. https://doi.org/10.2307/2290157
Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4), 649-674. https://doi.org/10.1007/s11336-010-9175-3
Jamshidian, M., & Yuan, K.H. (2014). Examining missing data mechanisms via homogeneity of parameters, homogeneity of distributions, and multivariate normality. WIREs Computational Statistics, 6(1), 56-73. https://doi.org/10.1002/wics.1287
Mortaza, J., Siavash, J., Camden, J., & Kobayashi, M. (2024). MissMech: Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random. R package version 1.0.4. https://doi.org/10.32614/CRAN.package.MissMech
Srivastava, M.S., & Dolatabadi, M. (2009). Multiple imputation and other resampling scheme for imputing missing observations. Journal of Multivariate Analysis, 100, 1919-1937. https://doi.org/10.1016/j.jmva.2009.06.003
Thoemmes, F., & Enders, C. K. (2007, April). A structural equation model for testing whether data are missing completely at random. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
as.na
, na.as
, na.auxiliary
,
na.coverage
, na.descript
, na.indicator
,
na.pattern
, na.prop
.
# Example 1a: Perform Little's MCAR test and Jamshidian and Jalal's approach
na.test(airquality)
# Example 1b: Alternative specification using the 'data' argument,
na.test(., data = airquality)
# Example 2: Perform Jamshidian and Jalal's approach
na.test(airquality, print = "jamjal")
## Not run:
# Example 3: Write results into a text file
na.test(airquality, write = "NA_Test.txt")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.