item.cfa: Confirmatory Factor Analysis

View source: R/item.cfa.R

item.cfaR Documentation

Confirmatory Factor Analysis

Description

This function is a wrapper function for conducting confirmatory factor analysis with continuous and/or ordered-categorical indicators by calling the cfa function in the R package lavaan.

Usage

item.cfa(..., data = NULL, model = NULL, rescov = NULL, hierarch = FALSE,
         meanstructure = TRUE, ident = c("marker", "var", "effect"),
         parameterization = c("delta", "theta"), ordered = NULL, cluster = NULL,
         estimator = c("ML", "MLM", "MLMV", "MLMVS", "MLF", "MLR",
                       "GLS", "WLS", "DWLS", "WLSM", "WLSMV",
                       "ULS", "ULSM", "ULSMV", "DLS", "PML"),
         missing = c("listwise", "pairwise", "fiml",
                     "two.stage", "robust.two.stage", "doubly.robust"),
         print = c("all", "summary", "coverage", "descript", "fit", "est",
                   "modind", "resid"),
         mod.minval = 6.63, resid.minval = 0.1, digits = 3, p.digits = 3,
         as.na = NULL, write = NULL, append = TRUE, check = TRUE, output = TRUE)

Arguments

...

a matrix or data frame. If model = NULL, confirmatory factor analysis based on a measurement model with one factor labeled f comprising all variables in the matrix or data frame is conducted. Note that the cluster variable is excluded from x when specifying cluster. If model is specified, the matrix or data frame needs to contain all variables used in the argument model and the cluster variable when specifying cluster. Alternatively, an expression indicating the variable names in data e.g., item.cfa(x1, x2, x3, data = dat). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function.

data

a data frame when specifying one or more variables in the argument .... Note that the argument is NULL when specifying a vector, factor, matrix, array, data frame, or list for the argument ....

model

a character vector specifying a measurement model with one factor, or a list of character vectors for specifying a measurement model with more than one factor, e.g., model = c("x1", "x2", "x3", "x4") for specifying a measurement model with one factor labeled f comprising four indicators, or model = list(factor1 = c("x1", "x2", "x3", "x4"), factor2 = c("x5", "x6", "x7", "x8")) for specifying a measurement model with two latent factors labeled factor1 and factor2 each comprising four indicators. Note that the name of each list element is used to label factors, i.e., all list elements need to be named, otherwise factors are labeled with "f1", "f2", "f3" and so on.

rescov

a character vector or a list of character vectors for specifying residual covariances, e.g. rescov = c("x1", "x2") for specifying a residual covariance between items x1 and x2, or rescov = list(c("x1", "x2"), c("x3", "x4")) for specifying residual covariances between items x1 and x2, and items x3 and x4.

hierarch

logical: if TRUE, a second-order factor model is specified given at least three first-order factors were specified in model. Note that it is not possible to specify more than one second-order factor.

meanstructure

logical: if TRUE (default), intercept/means of observed variables means of latent variables will be added to the model. Note that meanstructure = FALSE is only applicable when the missing is listwise, pairwise, or doubly-robust.

ident

a character string indicating the method used for identifying and scaling latent variables, i.e., "marker" for the marker variable method fixing the first factor loading of each latent variable to 1, "var" for the fixed variance method fixing the variance of each latent variable to 1, or "effect" for the effects-coding method using equality constraints so that the average of the factor loading for each latent variable equals 1. By default, fixed variance method is used when hierarch = FALSE, whereas marker variable method is used when hierarch = TRUE.

parameterization

a character string indicating the method used for identifying and scaling latent variables when indicators are ordered, i.e., "delta" (default) for delta parameterization and "theta" for theta parameterization.

ordered

if NULL (default), all indicators of the measurement model are treated as continuous. If TRUE, all indicators of the measurement model are treated as ordered (ordinal). Alternatively, a character vector indicating which variables to treat as ordered (ordinal) variables can be specified.

cluster

either a character string indicating the variable name of the cluster variable in ... or data, or a vector representing the nested grouping structure (i.e., group or cluster variable) for computing cluster-robust standard errors. Note that cluster-robust standard errors are not available when treating indicators of the measurement model as ordered (ordinal).

estimator

a character string indicating the estimator to be used (see 'Details'). By default, "MLR" is used for CFA models with continuous indicators (i.e., ordered = FALSE) and "WLSMV" is used for CFA model with ordered-categorical indicators (i.e., ordered = TRUE).

missing

a character string indicating how to deal with missing data, i.e., "listwise" for listwise deletion, "pairwise" for pairwise deletion, "fiml" for full information maximum likelihood method, two.stage for two-stage maximum likelihood method, robust.two.stage for robust two-stage maximum likelihood method, and doubly-robust for doubly-robust method (see 'Details'). By default, "fiml" is used for CFA models with continuous indicators which are estimated by using estimator = "MLR", and "pairwise" for CFA models with ordered-categorical indicators which are estimated by using estimator = "pairwise" by default.

print

a character string or character vector indicating which results to show on the console, i.e. "all" for all results, "summary" for a summary of the specification of the estimation method and missing data handling in lavaan, "coverage" for the variance-covariance coverage of the data, "descript" for descriptive statistics, "fit" for model fit, "est" for parameter estimates, "modind" for modification indices and "resid" for the residual correlation matrix and standardized residual means By default, a summary of the specification, model fit, and parameter estimates are printed.. By default, a summary of the specification, model fit, and parameter estimates are printed.

mod.minval

numeric value to filter modification indices and only show modifications with a modification index value equal or higher than this minimum value. By default, modification indices equal or higher 6.63 are printed. Note that a modification index value of 6.63 is equivalent to a significance level of \alpha = .01.

resid.minval

numeric value indicating the minimum absolute residual correlation coefficients and standardized means to highlight in boldface. By default, absolute residual correlation coefficients and standardized means equal or higher 0.1 are highlighted. Note that highlighting can be disabled by setting the minimum value to 1.

digits

an integer value indicating the number of decimal places to be used for displaying results.

p.digits

an integer value indicating the number of decimal places to be used for displaying the p-value.

as.na

a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to x but not to cluster.

write

a character string naming a file for writing the output into either a text file with file extension ".txt" (e.g., "Output.txt") or Excel file with file extension ".xlsx" (e.g., "Output.xlsx"). If the file name does not contain any file extension, an Excel file will be written.

append

logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.

check

logical: if TRUE (default), argument specification is checked.

output

logical: if TRUE (default), output is shown.

Details

Estimator

The R package lavaan provides seven estimators that affect the estimation, namely "ML", "GLS", "WLS", "DWLS", "ULS", "DLS", and "PML". All other options for the argument estimator combine these estimators with various standard error and chi-square test statistic computation. Note that the estimators also differ in how missing values can be dealt with (e.g., listwise deletion, pairwise deletion, or full information maximum likelihood, FIML).

  • "ML": Maximum likelihood parameter estimates with conventional standard errors and conventional test statistic. For both complete and incomplete data using pairwise deletion or FIML.

  • "MLM": Maximum likelihood parameter estimates with conventional robust standard errors and a Satorra-Bentler scaled test statistic that are robust to non-normality. For complete data only.

  • "MLMV": Maximum likelihood parameter estimates with conventional robust standard errors and a mean and a variance adjusted test statistic using a scale-shifted approach that are robust to non-normality. For complete data only.

  • "MLMVS": Maximum likelihood parameter estimates with conventional robust standard errors and a mean and a variance adjusted test statistic using the Satterthwaite approach that are robust to non-normality. For complete data only.

  • "MLF": Maximum likelihood parameter estimates with standard errors approximated by first-order derivatives and conventional test statistic. For both complete and incomplete data using pairwise deletion or FIML.

  • "MLR": Maximum likelihood parameter estimates with Huber-White robust standard errors a test statistic which is asymptotically equivalent to the Yuan-Bentler T2* test statistic that are robust to non-normality and non-independence of observed when specifying a cluster variable using the argument cluster. For both complete and incomplete data using pairwise deletion or FIML.

  • "GLS": Generalized least squares parameter estimates with conventional standard errors and conventional test statistic that uses a normal-theory based weight matrix. For complete data only. and conventional chi-square test. For both complete and incomplete data.

  • "WLS": Weighted least squares parameter estimates (sometimes called ADF estimation) with conventional standard errors and conventional test statistic that uses a full weight matrix. For complete data only.

  • "DWLS": Diagonally weighted least squares parameter estimates which uses the diagonal of the weight matrix for estimation with conventional standard errors and conventional test statistic. For both complete and incomplete data using pairwise deletion.

  • "WLSM": Diagonally weighted least squares parameter estimates which uses the diagonal of the weight matrix for estimation, but uses the full weight matrix for computing the conventional robust standard errors and a Satorra-Bentler scaled test statistic. For both complete and incomplete data using pairwise deletion.

  • "WLSMV": Diagonally weighted least squares parameter estimates which uses the diagonal of the weight matrix for estimation, but uses the full weight matrix for computing the conventional robust standard errors and a mean and a variance adjusted test statistic using a scale-shifted approach. For both complete and incomplete data using pairwise deletion.

  • "ULS": Unweighted least squares parameter estimates with conventional standard errors and conventional test statistic. For both complete and incomplete data using pairwise deletion.

  • "ULSM": Unweighted least squares parameter estimates with conventional robust standard errors and a Satorra-Bentler scaled test statistic. For both complete and incomplete data using pairwise deletion.

  • "ULSMV": Unweighted least squares parameter estimates with conventional robust standard errors and a mean and a variance adjusted test statistic using a scale-shifted approach. For both complete and incomplete data using pairwise deletion.

  • "DLS": Distributionally-weighted least squares parameter estimates with conventional robust standard errors and a Satorra-Bentler scaled test statistic. For complete data only.

  • "PML": Pairwise maximum likelihood parameter estimates with Huber-White robust standard errors and a mean and a variance adjusted test statistic using the Satterthwaite approach. For both complete and incomplete data using pairwise deletion.

Missing Data

The R package lavaan provides six methods for dealing with missing data:

  • "listwise": Listwise deletion, i.e., all cases with missing values are removed from the data before conducting the analysis. This is only valid if the data are missing completely at random (MCAR).

  • "pairwise": Pairwise deletion, i.e., each element of a variance-covariance matrix is computed using cases that have data needed for estimating that element. This is only valid if the data are missing completely at random (MCAR).

  • "fiml": Full information maximum likelihood (FIML) method, i.e., likelihood is computed case by case using all available data from that case. FIML method is only applicable for following estimators: "ML", "MLF", and "MLR".

  • "two.stage": Two-stage maximum likelihood estimation, i.e., sample statistics is estimated using EM algorithm in the first step. Then, these estimated sample statistics are used as input for a regular analysis. Standard errors and test statistics are adjusted correctly to reflect the two-step procedure. Two-stage method is only applicable for following estimators: "ML", "MLF", and "MLR".

  • "robust.two.stage": Robust two-stage maximum likelihood estimation, i.e., two-stage maximum likelihood estimation with standard errors and a test statistic that are robust against non-normality. Robust two-stage method is only applicable for following estimators: "ML", "MLF", and "MLR".

  • "doubly.robust": Doubly-robust method only applicable for pairwise maximum likelihood estimation (i.e., estimator = "PML".

Convergence and model idenfitification checks

In line with the R package lavaan, this functions provides several checks for model convergence and model identification:

  • Degrees of freedom: An error message is printed if the number of degrees of freedom is negative, i.e., the model is not identified.

  • Model convergence: An error message is printed if the optimizer has not converged, i.e., results are most likely unreliable.

  • Standard errors: An error message is printed if the standard errors could not be computed, i.e., the model might not be identified.

  • Variance-covariance matrix of the estimated parameters: A warning message is printed if the variance-covariance matrix of the estimated parameters is not positive definite, i.e., the smallest eigenvalue of the matrix is smaller than zero or very close to zero.

  • Negative variances of observed variables: A warning message is printed if the estimated variances of the observed variables are negative.

  • Variance-covariance matrix of observed variables: A warning message is printed if the estimated variance-covariance matrix of the observed variables is not positive definite, i.e., the smallest eigenvalue of the matrix is smaller than zero or very close to zero.

  • Negative variances of latent variables: A warning message is printed if the estimated variances of the latent variables are negative.

  • Variance-covariance matrix of latent variables: A warning message is printed if the estimated variance-covariance matrix of the latent variables is not positive definite, i.e., the smallest eigenvalue of the matrix is smaller than zero or very close to zero.

Note that unlike the R package lavaan, the item.cfa function does not provide any results when the degrees of freedom is negative, the model has not converged, or standard errors could not be computed.

Model Fit

The item.cfa function provides the chi-square test, incremental fit indices (i.e., CFI and TLI), and absolute fit indices (i.e., RMSEA, and SRMR) to evaluate overall model fit. However, different versions of the CFI, TLI, and RMSEA are provided depending on the estimator. Unlike the R package lavaan, the different versions are labeled with Standard, Scaled, and Robust in the output:

  • "Standard": CFI, TLI, and RMSEA without any non-normality corrections. These fit measures based on the normal theory maximum likelihood test statistic are sensitive to deviations from multivariate normality of endogenous variables. Simulation studies by Brosseau-Liard et al. (2012), and Brosseau-Liard and Savalei (2014) showed that the uncorrected fit indices are affected by non-normality, especially at small and medium sample sizes (e.g., n < 500).

  • "Scaled": Population-corrected robust CFI, TLI, and RMSEA with ad hoc non-normality corrections that simply replace the maximum likelihood test statistic with a robust test statistic (e.g., mean-adjusted chi-square). These fit indices change the population value being estimated depending on the degree of non-normality present in the data. Brosseau-Liard et al. (2012) demonstrated that the ad hoc corrected RMSEA increasingly accepts poorly fitting models as non-normality in the data increases, while the effect of the ad hoc correction on the CFI and TLI is less predictable with non-normality making fit appear worse, better, or nearly unchanged (Brosseau-Liard & Savalei, 2014).

  • "Robust": Sample-corrected robust CFI, TLI, and RMSEA with non-normality corrections based on formula provided by Li and Bentler (2006) and Brosseau-Liard and Savalei (2014). These fit indices do not change the population value being estimated and can be interpreted the same way as the uncorrected fit indices when the data would have been normal.

In conclusion, the use of sample-corrected fit indices (Robust) instead of population-corrected fit indices (Scaled) is recommended. Note that when sample size is very small (e.g., n < 200), non-normality correction does not appear to adjust fit indices sufficiently to counteract the effect of non-normality (Brosseau-Liard & Savalei, 2014).

Modification Indices and Residual Correlation Matrix

The item.cfa function provides modification indices and the residual correlation matrix when requested by using the print argument. Modification indices (aka score tests) are univariate Lagrange Multipliers (LM) representing a chi-square statistic with a single degree of freedom. LM approximates the amount by which the chi-square test statistic would decrease if a fixed or constrained parameter is freely estimated (Kline, 2023). However, (standardized) expected parameter change (EPC) values should also be inspected since modification indices are sensitive to sample size. EPC values are an estimate of how much the parameter would be expected to change if it were freely estimated (Brown, 2023). The residual correlation matrix is computed by separately converting the sample covariance and model-implied covariance matrices to correlation matrices before calculation differences between observed and predicted covariances (i.e., type = "cor.bollen"). As a rule of thumb, absolute correlation residuals greater than .10 indicate possible evidence for poor local fit, whereas smaller correlation residuals than 0.05 indicate negligible degree of model misfit (Maydeu-Olivares, 2017). There is no reliable connection between the size of diagnostic statistics (i.e., modification indices and residuals) and the type or amount of model misspecification since (1) diagnostic statistics are themselves affected by misspecification, (2) misspecification in one part of the model distorts estimates in other parts of the model (i.e., error propagation), and (3) equivalent models have identical residuals but contradict the pattern of causal effects (Kline, 2023). Note that according to Kline' (2023) "any report of the results without information about the residuals is deficient" (p. 172).

Value

Returns an object of class misty.object, which is a list with following entries:

call

function call

type

type of analysis

data

matrix or data frame specified in x

args

specification of function arguments

model

specified model

model.fit

fitted lavaan object (mod.fit)

check

results of the convergence and model identification check

result

list with result tables, i.e., summary for the specification of the estimation method and missing data handling in lavaan, "coverage" for the variance-covariance coverage of the data, "descript" for descriptive statistics, itemfreq for absolute frequencies (freq), percentages (perc), and (v.perc) valid percentages, "fit" for model fit, "param" for parameter estimates, and "modind" for modification indices.

Note

The function uses the functions cfa, lavInspect, lavTech, modindices, parameterEstimates, and standardizedsolution provided in the R package lavaan by Yves Rosseel (2012).

Author(s)

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Brosseau-Liard, P. E., Savalei, V., & Li. L. (2012). An investigation of the sample performance of two nonnormality corrections for RMSEA, Multivariate Behavioral Research, 47, 904-930. https://doi.org/10.1080/00273171.2014.933697

Brosseau-Liard, P. E., & Savalei, V. (2014) Adjusting incremental fit indices for nonnormality. Multivariate Behavioral Research, 49, 460-470. https://doi.org/10.1080/00273171.2014.933697

Brown, T. A. (2023). Confirmatory factor analysis. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (2nd ed.) (pp. 361–379). The Guilford Press.

Kline, R. B. (2023). Principles and practice of structural equation modeling (5th ed.). Guilford Press.

Li, L., & Bentler, P. M. (2006). Robust statistical tests for evaluating the hypothesis of close fit of misspecified mean and covariance structural models. UCLA Statistics Preprint #506. University of California.

Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558. https://doi.org/10.1007/s11336-016-9552-7

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48, 1-36. https://doi.org/10.18637/jss.v048.i02

See Also

item.alpha, item.omega, item.scores

Examples

## Not run: 
# Load data set "HolzingerSwineford1939" in the lavaan package
data("HolzingerSwineford1939", package = "lavaan")

#----------------------------------------------------------------------------
# Measurement model with one factor

# Example 1a: Specification using the argument 'x'
item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")])

# Example 1b: Alternative specification using the 'data' argument
item.cfa(x1:x3, data = HolzingerSwineford1939)

# Example 1c: Alternative specification using the argument 'model'
item.cfa(HolzingerSwineford1939, model = c("x1", "x2", "x3"))

# Example 1d: Alternative specification using the 'data' and 'model' argument
item.cfa(., data = HolzingerSwineford1939, model = c("x1", "x2", "x3"))

# Example 1e: Alternative specification using the argument 'model'
item.cfa(HolzingerSwineford1939, model = list(visual = c("x1", "x2", "x3")))

# Example 1f: Alternative specification using the  'data' and 'model' argument
item.cfa(., data = HolzingerSwineford1939, model = list(visual = c("x1", "x2", "x3")))

#----------------------------------------------------------------------------
# Measurement model with three factors

# Example 2: Specification using the argument 'model'
item.cfa(HolzingerSwineford1939,
         model = list(visual = c("x1", "x2", "x3"),
                      textual = c("x4", "x5", "x6"),
                      speed = c("x7", "x8", "x9")))

#----------------------------------------------------------------------------
# Residual covariances

# Example 3a: One residual covariance
item.cfa(HolzingerSwineford1939,
         model = list(visual = c("x1", "x2", "x3"),
                      textual = c("x4", "x5", "x6"),
                      speed = c("x7", "x8", "x9")),
         rescov = c("x1", "x2"))

# Example 3b: Two residual covariances
item.cfa(HolzingerSwineford1939,
         model = list(visual = c("x1", "x2", "x3"),
                      textual = c("x4", "x5", "x6"),
                      speed = c("x7", "x8", "x9")),
         rescov = list(c("x1", "x2"), c("x4", "x5")))

#----------------------------------------------------------------------------
# Second-order factor model based on three first-order factors

# Example 4
item.cfa(HolzingerSwineford1939,
         model = list(visual = c("x1", "x2", "x3"),
                      textual = c("x4", "x5", "x6"),
                      speed = c("x7", "x8", "x9")),
         hierarch = TRUE)

#----------------------------------------------------------------------------
# Measurement model with ordered-categorical indicators

# Example 5
item.cfa(round(HolzingerSwineford1939[, c("x4", "x5", "x6")]), ordered = TRUE)

#----------------------------------------------------------------------------
# Cluster-robust standard errors

# Load data set "Demo.twolevel" in the lavaan package
data("Demo.twolevel", package = "lavaan")

# Example 6a: Specification using a variable in 'x'
item.cfa(Demo.twolevel[, c("y4", "y5", "y6", "cluster")], cluster = "cluster")

# Example 6b: Specification of the cluster variable in 'cluster'
item.cfa(Demo.twolevel[, c("y4", "y5", "y6")], cluster = Demo.twolevel$cluster)

# Example 6c: Alternative specification using the 'data' argument
item.cfa(y4:y6, data = Demo.twolevel, cluster = "cluster")

#----------------------------------------------------------------------------
# Print argument

# Example 7a: Request all results
item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")], print = "all")

# Example 7b: Request modification indices with value equal or higher than 5
item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3", "x4")],
         print = "modind", mod.minval = 5)

#----------------------------------------------------------------------------
# lavaan summary of the estimated model

# Example 8
mod <- item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")], output = FALSE)

lavaan::summary(mod$model.fit, standardized = TRUE, fit.measures = TRUE)

#----------------------------------------------------------------------------
# Write Results

# Example 9a: Write results into a text file
item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")], write = "CFA.txt")

# Example 9b: Write results into an Excel file
item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")], write = "CFA.xlsx")

result <- item.cfa(HolzingerSwineford1939[, c("x1", "x2", "x3")], output = FALSE)
write.result(result, "CFA.xlsx")

## End(Not run)

misty documentation built on Oct. 24, 2024, 5:10 p.m.

Related to item.cfa in misty...