na.auxiliary | R Documentation |
This function computes (1) a matrix with Pearson product-moment correlation for continuous variables, multiple correlation coefficient for categorical and continuous variables, and Phi coefficient and Cramer's V for categorical variables to identify variables related to the incomplete variable (i.e., correlates of incomplete variables), (2) a matrix with Cohen's d, Phi coefficient and Cramer's V for comparing cases with and without missing values, and (3) semi-partial correlations of an outcome variable conditional on the predictor variables of a substantive model with a set of candidate auxiliary variables to identify correlates of an incomplete outcome variable as suggested by Raykov and West (2016).
na.auxiliary(data, ..., model = NULL, categ = NULL, estimator = c("ML", "MLR"),
missing = c("fiml", "two.stage", "robust.two.stage", "doubly.robust"),
adjust = TRUE, weighted = FALSE, correct = FALSE,
tri = c("both", "lower", "upper"), digits = 2, p.digits = 3,
as.na = NULL, write = NULL, append = TRUE,
check = TRUE, output = TRUE)
data |
a data frame with incomplete data, where missing
values are coded as |
... |
an expression indicating the variable names in |
model |
a character string specifying the substantive model predicting
a continuous outcome variable using a set of predictor variables
to estimate semi-partial correlations between the outcome
variable and a set of candidate auxiliary variables. The default
setting is |
categ |
a character vector specifying the variables that are treated
as categorical (see 'Details'). Note that variables that are
factors or character vectors will be automatically added to
the argument |
estimator |
a character string indicating the estimator to be used
when estimating semi-partial correlation coefficients, i.e.,
|
missing |
a character string indicating how to deal with missing data
when estimating semi-partial correlation coefficients,
i.e., |
adjust |
logical: if |
weighted |
logical: if |
correct |
logical: if |
tri |
a character string indicating which triangular of the correlation
matrix to show on the console, i.e., |
digits |
integer value indicating the number of decimal places digits to be used for displaying correlation coefficients and Cohen's d estimates. |
p.digits |
an integer value indicating the number of decimal places to be used for displaying the p-value. |
as.na |
a numeric vector indicating user-defined missing values,
i.e. these values are converted to |
write |
a character string naming a file for writing the output into
either a text file with file extension |
append |
logical: if |
check |
logical: if |
output |
logical: if |
The function computes matrices with statistical measures depending on the level of measurement of the variables involved in the analysis:
Continuous variables: Product-moment correlation coefficient is computed for continuous variables.
Continuous and categorical variable: Multiple correlation coefficient (R) is computed based on a linear model with a dummy-coded categorical variable as predictor, where the multiple correlation coefficient is the square root of the coefficient of determination of this model. Note that the multiple R for a binary predictor variable is equivalent to the point-biserial correlation coefficient between the binary variable and the continuous outcome.
Categorical variables: Phi coefficient is computed for two dichotomous variables, while Cramer's V is computed when one of the categorical variables is polyotomous
Continuous variable: Cohen's d is computed to investigate mean differences in the continuous variable depending on cases with and without missing values.
Categorical variable: Phi coefficient is computed to investigate the association between the grouping variable (0 = observed, 1 = missing) and a dichotomous variable, while Cramer's V is computed when the categorical variable is polytomous.
Categorical variables are removed before computing semi-partial correlations based on the approach suggested by Raykov and West (2016).
Note that factors and characters are treated as categorical variables regardless
of the specification of the argument categ
, while numeric vectors in the data
frame are treated as continuous variables if they are not specified in the
argument categ
.
Returns an object of class misty.object
, which is a list with following
entries:
call |
function call |
type |
type of analysis |
data |
data frame used for the current analysis |
model |
lavaan model syntax for estimating the semi-partial correlations |
model.fit |
fitted lavaan model for estimating the semi-partial correlations |
args |
pecification of function arguments |
result |
list with result tables |
Takuya Yanagida takuya.yanagida@univie.ac.at
Enders, C. K. (2022). Applied missing data analysis (2nd ed.). The Guilford Press.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Raykov, T., & West, B. T. (2016). On enhancing plausibility of the missing at random assumption in incomplete data analyses via evaluation of response-auxiliary variable correlations. Structural Equation Modeling, 23(1), 45–53. https://doi.org/10.1080/10705511.2014.937848
van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman & Hall.
as.na
, na.as
, na.coverage
,
na.descript
, na.indicator
, na.pattern
,
na.prop
, na.test
# Example 1a: Auxiliary variables
na.auxiliary(airquality)
# Example 1b: Auxiliary variables, "Month" as categorical variable
na.auxiliary(airquality, categ = "Month")
# Example 2: Semi-partial correlation coefficients
na.auxiliary(airquality, model = "Ozone ~ Solar.R + Wind")
## Not run:
# Example 3a: Write Results into a text file
na.auxiliary(airquality, write = "NA_Auxiliary.txt")
# Example 3a: Write Results into an Excel file
na.auxiliary(airquality, write = "NA_Auxiliary.xlsx")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.