Simulate Data from Structural Equation Model


Interprets the input graph as a structural equation model, generates random path coefficients, and simulates data from the model. This is a very bare-bones function and probably not very useful except for quick validation purposes (e.g. checking that an implied vanishing tetrad truly vanishes in simulated data). For more elaborate simulation studies, please use the lavaan package or similar facilities in other packages.


simulateSEM(x, b.default = NULL, b.lower = -0.6, b.upper = 0.6, eps = 1,
  N = 500, standardized = TRUE)



the input graph, a DAG (which may contain bidirected edges).


default path coefficient applied to arrows for which no coefficient is defined in the model syntax.


lower bound for random path coefficients, applied if b.default=NULL.


upper bound for path coefficients.


residual variance (only meaningful if standardized=FALSE).


number of samples to generate.


whether a standardized output is desired (all variables have variance 1).

If standardized=TRUE, all path coefficients are interpreted as standardized coefficients. But not all standardized coefficients are compatible with all graph structures. For instance, the graph structure z <- x -> y -> z is incompatible with standardized coefficients of 0.9, since this would imply that the variance of z must be larger than 1. For large graphs with many parallel paths, it can be very difficult to find coefficients that work.


Data are generated in the following manner. Each directed arrow is assigned a path coefficient that can be given using the attribute "beta" in the model syntax (see the examples). All coefficients not set in this manner are set to the b.default argument, or if that is not given, are chosen uniformly at random from the interval given by b.lower and b.upper (inclusive; set both parameters to the same value for constant path coefficients). Each bidirected arrow a <-> b is replaced by a substructure a <- L -> b, where L is an exogenous latent variable. Path coefficients on such substructures are set to sqrt(x), where x is again chosen at random from the given interval; if x is negative, one path coefficient is set to -sqrt(x) and the other to sqrt(x). All residual variances are set to eps.


Returns a data frame containing N values for each variable in x.


## Simulate data with pre-defined path coefficients of -.6
g <- dagitty('dag{z -> x [beta=-.6] x <- y [beta=-.6] }')
x <- simulateSEM( g ) 

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.