generateData: Generate data from a lavaan model syntax
In M-E-Rademaker/cSEM.DGP: Generate Data for Structural Equation Models

Description Usage Arguments Details Value Examples

View source: R/generateData.R

Generate data based on the parameters of a structural equation model in lavaan model syntax.

generateData(
 .model                    = NULL,
 .empirical                = FALSE,
 .handle_negative_definite = c("stop", "drop", "set_NA"),
 .return_type              = c("data.frame", "matrix", "cor"),
 .N                        = 200,
 .skewness                 = NULL,
 .kurtosis                 = NULL,
 ...
 )

`.model`	A model in lavaan model syntax.
`.empirical`	Logical. If `TRUE`, mu and Sigma of the normal distribution specify the empirical not the population mean and covariance matrix. Ignored if `return.type = "cor"`. Defaults to `FALSE`.
`.handle_negative_definite`	Character string. How should negative definite indicator correlation matrices be handled? One of `"stop"`, `"drop"` or `"set_NA"` in which case an `NA` is produced. Defaults to `"stop"`.
`.return_type`	Character string. One of `"data.frame"`, `"matrix"` or `"cor"` in which case the indicator correlation matrix is returned. Defaults to `"data.frame"`.
`.N`	Integer. The number of observations to generate. Ignored if `return.type = "cor"`. Defaults to `200`.
`.skewness`	List. List of predefined values for the skewness of the indicators.
`.kurtosis`	List. List of predefined values for the kurtosis of the indicators.
`...`	`"name" = vector_of_values` pairs. `"name"` is a character string giving the label used for the parameter of interest. `vector_of_values` is a numeric vector of values to use for the paramter given by `"name"`.

Generate data for structural equation models including up to 8 constructs if a structural model is given or an unlimited number if only the correlation between constructs is needed. To be precise, if users specify a structural model we support a maximum of 5 exogenous constructs. Depending on the number of exogenous constructs the following number of endogenous constructs is allowed:

If there is 1 exogenous construct : a maximum of 7 endogenous constructs is allowed
If there are 2 exogenous constructs: a maximum of 6 endogenous constructs is allowed
If there are 3 exogenous constructs: a maximum of 5 endogenous constructs is allowed
If there are 4 exogenous constructs: a maximum of 4 endogenous constructs is allowed
If there are 5 exogenous constructs: a maximum of 4 endogenous constructs is allowed

The reason for the limitation is that data is generated such that the model-implied variances of the constructs are always unity. Since the model-implied construct covariance matrix is a complex function of the structural residual variances which are in turn a complex function of the path coefficients the equation for each construct variance grows massively with each additional construct added. Since for a given number of constructs the number of possible model specifications grows rapidly, we solved the variance equations symbolically as a function of the path coefficients in Mathematica. With more than 8 constructs the size of these symbolic representation becomes computationally infeasible.

Generation is based on parameter values given in lavaan model syntax. Currently, linear models and models containing second order constructs are supported. Supplying a model containing nonlinear terms causes an error.

For the structural model equations (~) values are interpreted as path coefficients. For measurement model equations values are taken to be loadings if the concept is modeled as a common factor (=~). If the concept is modeled as a composite (<~) values are interpreted as (unscaled) weights! In the latter case, indicators are allowed to be arbitrarily correlated. Hence, the correlation between indicators needs to be set as well. Indicator correlations measurement error correlations, and correlations between exogenous constructs are set using the (~~) operator. Note that when writing, for instance, x1 ~~ 0.2*x2 (where x1 and x2 are indicators of some construct eta1), the interpretation depends on whether eta1 is modeled as a composite or a common factor. In the former case x1 ~~ 0.2*x2 is a correlation between indicators, in the latter case it is interpreted as a measurement error correlation.

In addition to supplying numeric values, variable values for parameters are allowed. To achieve this, the package makes use of lavaan's labeling capabilities. Users may replace a given parameter in, i.e. the structural model by a symbolic name and assign a vector of values to that name by passing a "name" = vector_of_values argument to generateData(). These values will be used to generate data for all possible combinations of these values with the remaining fixed parameters.

If .return_type is "data.frame" or "matrix" normally distributed data with zero mean and variance-covariance matrix equal to the indicator correlation matrix which would be returned if .return_type = "cor" (i.e., the population indicator correlation matrix) is generated.

The generated data. Either as a data.frame (return_type = "data.frame"), a numeric matrix (return.type = "matrix"), or a correlation matrix (return.type = "cor"). If variable parameters have been set a nested tibble is returned.

# ==============================================================================
# Without variable parameters
# ==============================================================================
## DGP with constructs modeled as common factors
dgp <- "
# Structural model
eta2 ~ 0.4*eta1
eta3 ~ 0.4*eta1 + 0.35*eta2

# Measurement model
eta1 =~ 0.8*y11 + 0.9*y12 + 0.8*y13
eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23
eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33
"

dat <- generateData(dgp, .return_type = "cor")
dat

## DGP with a construct modeled as a composite
# If the model contains composites, within-block indicator correlation
# needs to be set as well.
dgp <- "
# Structural model
eta2 ~ 0.2*eta1
eta3 ~ 0.4*eta1 + 0.35*eta2

# Measurement model
eta1 <~ 0.7*y11 + 0.9*y12 + 0.8*y13
eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23
eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33

# Within block indicator correlation of eta1
y11 ~~ 0.2*y12
y11 ~~ 0.3*y13
y12 ~~ 0.5*y13
"

dat <- generateData(dgp, .return_type = "matrix")
dat[1:4, ]

# ==============================================================================
# With variable parameters
# ==============================================================================
### Linear DGP -----------------------------------------------------------------
# Add a label and assign values to for each name
dgp <- "
# Structural model
eta2 ~ 0.2*eta1
eta3 ~ gamma*eta1 + 0.35*eta2

# Measurement model
eta1 <~ 0.7*y11 + 0.9*y12 + 0.8*y13
eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23
eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33

# Within block indicator correlation
y11 ~~ 0.2*y12
y11 ~~ 0.3*y13
y12 ~~ epsilon*y13
"

dat <- generateData(dgp,
                    "gamma" = c(-0.4, -0.2, 0, 0.2, 0.4),
                    "epsilon" = c(0.1, 0.2, 0.3), .return_type = "data.frame")
dat

### DGP containing a second order construct ------------------------------------
# Second order constructs are supported as well.
dgp_2ndorder <- "
## Path model / Regressions
eta2 ~ 0.5*eta1
eta3 ~ 0.35*eta1 + 0.4*eta2

## Composite model
eta1 <~ 0.8*y41 + 0.6*y42 + 0.6*y43
eta2 <~ 2*y51 + 3*y52 + 5*y53
c1   <~ 0.8*y11 + 0.4*y12
c2   <~ 0.5*y21 + 0.3*y22 + 0.2*y23 + 0.4*y24

## Higher order composite
eta3 <~ 0.4*c1 + 0.4*c2

## Composite indicator correlations
# eta1
y41 ~~ 0.5*y42
y41 ~~ 0.5*y43
y42 ~~ 0.5*y43

# eta2
y51 ~~ 0.2*y52
y51 ~~ 0.3*y53
y52 ~~ 0.4*y53

# eta3 (the 2nd order construct)
c1 ~~ 0.49*c2

# c1-c2
y11 ~~ 0.3125*y12

y21 ~~ 0.4*y22
y21 ~~ 0.3*y23
y21 ~~ 0.31*y24
y22 ~~ 0.28*y23
y22 ~~ 0.31*y24
y23 ~~ 0.3*y24
"

dat <- generateData(dgp_2ndorder, .return_type = "data.frame", .empirical = TRUE)
dat[1:5, ]

## Estimate using cSEM
require(cSEM)

aa <- cSEM::csem(dat, dgp_2ndorder)
cSEM::summarize(aa) ## parameters estimates are identical to the DGP