| na.auxiliary | R Documentation |
This function computes (1) a matrix with Pearson product-moment correlation for continuous variables, multiple correlation coefficient for categorical and continuous variables, and Phi coefficient and Cramer's V for categorical variables to identify variables related to the incomplete variable (i.e., correlates of incomplete variables), (2) a matrix with Cohen's d, Phi coefficient and Cramer's V for comparing cases with and without missing values, and (3) semi-partial correlations of an outcome variable conditional on the predictor variables of a substantive model with a set of candidate auxiliary variables to identify correlates of an incomplete outcome variable as suggested by Raykov and West (2016).
na.auxiliary(data, ..., model = NULL, categ = NULL, estimator = c("ML", "MLR"),
missing = c("fiml", "two.stage", "robust.two.stage", "doubly.robust"),
adjust = TRUE, weighted = FALSE, correct = FALSE,
tri = c("both", "lower", "upper"), digits = 2, p.digits = 3,
as.na = NULL, write = NULL, append = TRUE,
check = TRUE, output = TRUE)
data |
a data frame with incomplete data, where missing
values are coded as |
... |
an expression indicating the variable names in |
model |
a character string specifying the substantive model predicting
a continuous outcome variable using a set of predictor variables
to estimate semi-partial correlations between the outcome
variable and a set of candidate auxiliary variables. The default
setting is |
categ |
a character vector specifying the variables that are treated
as categorical (see 'Details'). Note that variables that are
factors or character vectors will be automatically added to
the argument |
estimator |
a character string indicating the estimator to be used
when estimating semi-partial correlation coefficients, i.e.,
|
missing |
a character string indicating how to deal with missing data
when estimating semi-partial correlation coefficients,
i.e., |
adjust |
logical: if |
weighted |
logical: if |
correct |
logical: if |
tri |
a character string indicating which triangular of the correlation
matrix to show on the console, i.e., |
digits |
integer value indicating the number of decimal places digits to be used for displaying correlation coefficients and Cohen's d estimates. |
p.digits |
an integer value indicating the number of decimal places to be used for displaying the p-value. |
as.na |
a numeric vector indicating user-defined missing values,
i.e. these values are converted to |
write |
a character string naming a file for writing the output into
either a text file with file extension |
append |
logical: if |
check |
logical: if |
output |
logical: if |
The function computes matrices with statistical measures depending on the level of measurement of the variables involved in the analysis:
Continuous variables: Product-moment correlation coefficient is computed for continuous variables.
Continuous and categorical variable: Multiple correlation coefficient (R) is computed based on a linear model with a dummy-coded categorical variable as predictor, where the multiple correlation coefficient is the square root of the coefficient of determination of this model. Note that the multiple R for a binary predictor variable is equivalent to the point-biserial correlation coefficient between the binary variable and the continuous outcome.
Categorical variables: Phi coefficient is computed for two dichotomous variables, while Cramer's V is computed when one of the categorical variables is polyotomous
Continuous variable: Cohen's d is computed to investigate mean differences in the continuous variable depending on cases with and without missing values.
Categorical variable: Phi coefficient is computed to investigate the association between the grouping variable (0 = observed, 1 = missing) and a dichotomous variable, while Cramer's V is computed when the categorical variable is polytomous.
Categorical variables are removed before computing semi-partial correlations based on the approach suggested by Raykov and West (2016).
Note that factors and characters are treated as categorical variables regardless
of the specification of the argument categ, while numeric vectors in the data
frame are treated as continuous variables if they are not specified in the
argument categ.
Returns an object of class misty.object, which is a list with following
entries:
call |
function call |
type |
type of analysis |
data |
data frame used for the current analysis |
model |
lavaan model syntax for estimating the semi-partial correlations |
model.fit |
fitted lavaan model for estimating the semi-partial correlations |
args |
pecification of function arguments |
result |
list with result tables |
Takuya Yanagida takuya.yanagida@univie.ac.at
Enders, C. K. (2022). Applied missing data analysis (2nd ed.). The Guilford Press.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Raykov, T., & West, B. T. (2016). On enhancing plausibility of the missing at random assumption in incomplete data analyses via evaluation of response-auxiliary variable correlations. Structural Equation Modeling, 23(1), 45–53. https://doi.org/10.1080/10705511.2014.937848
van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman & Hall.
as.na, na.as, na.coverage,
na.descript, na.indicator, na.pattern,
na.prop, na.test
# Example 1a: Auxiliary variables
na.auxiliary(airquality)
# Example 1b: Auxiliary variables, "Month" as categorical variable
na.auxiliary(airquality, categ = "Month")
# Example 2: Semi-partial correlation coefficients
na.auxiliary(airquality, model = "Ozone ~ Solar.R + Wind")
## Not run:
# Example 3a: Write Results into a text file
na.auxiliary(airquality, write = "NA_Auxiliary.txt")
# Example 3a: Write Results into an Excel file
na.auxiliary(airquality, write = "NA_Auxiliary.xlsx")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.