lavCor: Polychoric, polyserial and Pearson correlations

View source: R/lav_cor.R

lavCorR Documentation

Polychoric, polyserial and Pearson correlations


Fit an unrestricted model to compute polychoric, polyserial and/or Pearson correlations.


lavCor(object, ordered = NULL, group = NULL, missing = "listwise", 
       ov.names.x = NULL, sampling.weights = NULL,
       se = "none", test = "none", 
       estimator = "two.step", baseline = FALSE, ..., 
       cor.smooth = FALSE, cor.smooth.tol = 1e-04, output = "cor")



Either a data.frame, or an object of class lavaan. If the input is a data.frame, and some variables are declared as ordered factors, lavaan will treat them as ordinal variables.


Character vector. Only used if object is a data.frame. Treat these variables as ordered (ordinal) variables. Importantly, all other variables will be treated as numeric (unless they are declared as ordered in the original data frame.)


Only used if object is a data.frame. Specify a grouping variable.


If "listwise", cases with missing values are removed listwise from the data frame. If "direct" or "ml" or "fiml" and the estimator is maximum likelihood, an EM algorithm is used to estimate the unrestricted covariance matrix (and mean vector). If "pairwise", pairwise deletion is used. If "default", the value is set depending on the estimator and the mimic option.


Only used if object is a data.frame. Specify a variable containing sampling weights.


Only used if object is a data.frame. Specify variables that need to be treated as exogenous. Only used if at least one variable is declared as ordered.


Only used if output (see below) contains standard errors. See lavOptions for possible options.


Only used if output is "fit" or "lavaan". See lavOptions for possible options.


If "none" or "two.step" or "two.stage", only starting values are computed for the correlations (and thresholds), without any further estimation. If all variables are continuous, the starting values are the sample covariances (converted to correlations if output = "cor"). If at least one variable is ordered, the thresholds are computed using univariate information only. The polychoric and/or polyserial correlations are computed in a second stage, keeping the values of the thresholds constant. If an estimator (other than "two.step" or "two.stage") is specified (for example estimator = "PML"), these starting values are further updated by fitting the unrestricted model using the chosen estimator. See the lavaan function for alternative estimators.


Only used if output is "fit" or "lavaan". If TRUE, a baseline model is also estimated. Note that the test argument should also be set to a value other than "none".


Optional parameters that are passed to the lavaan function.


Logical. Only used if output = "cor". If TRUE, ensure the resulting correlation matrix is positive definite. The following simple method is used: an eigenvalue decomposition is computed; then, eigenvalues smaller than cor.smooth.tol are set to be equal to cor.smooth.tol, before the matrix is again reconstructed. Finally, the matrix (which may no longer have unit diagonal elements) is converted to a correlation matrix using cov2cor.


Numeric. Smallest eigenvalue used when reconstructing the correlation matrix after an eigenvalue decomposition.


If "cor", the function returns the correlation matrix only. If "cov", the function returns the covariance matrix (this only makes a difference if at least one variable is numeric). If "th" or "thresholds", only the thresholds are returned. If "sampstat", the output equals the result of lavInspect(fit, "sampstat") where fit is the unrestricted model. If "est" or "pe" or "parameterEstimates", the output equals the result of parameterEstimates(fit). Finally, if output is "fit" or "lavaan", the function returns an object of class lavaan.


This function is a wrapper around the lavaan function, but where the model is defined as the unrestricted model. The following free parameters are included: all covariances/correlations among the variables, variances for continuous variables, means for continuous variables, thresholds for ordered variables, and if exogenous variables are included (ov.names.x is not empty) while some variables are ordered, also the regression slopes enter the model.


By default, if output = "cor" or output = "cov", a symmetric matrix (of class "lavaan.matrix.symmetric", which only affects the way the matrix is printed). If output = "th", a named vector of thresholds. If output = "fit" or output = "lavaan", an object of class lavaan.


Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.

Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.

See Also



# Holzinger and Swineford (1939) example
HS9 <- HolzingerSwineford1939[,c("x1","x2","x3","x4","x5",

# Pearson correlations

# ordinal version, with three categories
HS9ord <- lapply(HS9, cut, 3, labels = FALSE) )

# polychoric correlations, two-stage estimation
lavCor(HS9ord, ordered=names(HS9ord))

# thresholds only
lavCor(HS9ord, ordered=names(HS9ord), output = "th")

# polychoric correlations, with standard errors
lavCor(HS9ord, ordered=names(HS9ord), se = "standard", output = "est")

# polychoric correlations, full output
fit.un <- lavCor(HS9ord, ordered=names(HS9ord), se = "standard", 
                 output = "fit")

lavaan documentation built on July 26, 2023, 5:08 p.m.