Simulate Data from Structural Equation Model

Share:

Description

Interprets the input graph as a structural equation model, generates random path coefficients, and simulates data from the model. This is a very bare-bones function and probably not very useful except for quick validation purposes (e.g. checking that an implied vanishing tetrad truly vanishes in simulated data). For more elaborate simulation studies, please use the lavaan package or similar facilities in other packages.

Usage

1
2
simulateSEM(x, b.default = NULL, b.lower = -0.6, b.upper = 0.6, eps = 1,
  N = 500, standardized = TRUE)

Arguments

x

the input graph, a DAG (which may contain bidirected edges).

b.default

default path coefficient applied to arrows for which no coefficient is defined in the model syntax.

b.lower

lower bound for random path coefficients, applied if b.default=NULL.

b.upper

upper bound for path coefficients.

eps

residual variance (only meaningful if standardized=FALSE).

N

number of samples to generate.

standardized

whether a standardized output is desired (all variables have variance 1).

If standardized=TRUE, all path coefficients are interpreted as standardized coefficients. But not all standardized coefficients are compatible with all graph structures. For instance, the graph structure z <- x -> y -> z is incompatible with standardized coefficients of 0.9, since this would imply that the variance of z must be larger than 1. For large graphs with many parallel paths, it can be very difficult to find coefficients that work.

Details

Data are generated in the following manner. Each directed arrow is assigned a path coefficient that can be given using the attribute "beta" in the model syntax (see the examples). All coefficients not set in this manner are set to the b.default argument, or if that is not given, are chosen uniformly at random from the interval given by b.lower and b.upper (inclusive; set both parameters to the same value for constant path coefficients). Each bidirected arrow a <-> b is replaced by a substructure a <- L -> b, where L is an exogenous latent variable. Path coefficients on such substructures are set to sqrt(x), where x is again chosen at random from the given interval; if x is negative, one path coefficient is set to -sqrt(x) and the other to sqrt(x). All residual variances are set to eps.

Value

Returns a data frame containing N values for each variable in x.

Examples

1
2
3
4
## Simulate data with pre-defined path coefficients of -.6
g <- dagitty('dag{z -> x [beta=-.6] x <- y [beta=-.6] }')
x <- simulateSEM( g ) 
cov(x)