Description Usage Arguments Details Value Author(s) See Also Examples
This function can be used to generate data, analyze the generated data, and summarized into a result object where parameter estimates, standard errors, fit indices, and other characteristics of each replications are saved.
1 2 3 4 5 6 7 8 9 10 11  sim(nRep, model, n, generate = NULL, ..., rawData = NULL, miss = NULL, datafun=NULL,
lavaanfun = "lavaan", outfun=NULL, outfundata = NULL, pmMCAR = NULL,
pmMAR = NULL, facDist = NULL, indDist = NULL, errorDist = NULL,
sequential = FALSE, saveLatentVar = FALSE, modelBoot = FALSE, realData = NULL,
covData = NULL, maxDraw = 50, misfitType = "f0", misfitBounds = NULL,
averageNumMisspec = FALSE, optMisfit=NULL, optDraws = 50,
createOrder = c(1, 2, 3), aux = NULL, group = NULL, mxFit = FALSE,
mxMixture = FALSE, citype = NULL, cilevel = 0.95, seed = 123321, silent = FALSE,
multicore = options('simsem.multicore')[[1]], cluster = FALSE,
numProc = NULL, paramOnly = FALSE, dataOnly=FALSE, smartStart=FALSE,
previousSim = NULL, completeRep = FALSE, stopOnError = FALSE)

nRep 
Number of replications. If any of the 
model 
There are three options for this argument: 1. 
n 
Sample size. Either a single value, or a list of values to vary sample size across replications. The 
generate 
There are four options for this argument: 1. 
rawData 
There are two options for this argument: 1. a list of data frames to be used in simulations or 2. a population data. If a list of data frames is specified, the 
miss 
A missing data template created using the 
datafun 
A function to be applied to each generated data set across replications. 
lavaanfun 
The character of the function name used in running lavaan model ( 
outfun 
A function to be applied to the 
outfundata 
A function to be applied to the 
pmMCAR 
The percentage of data completely missing at random (0 <= pmMCAR < 1). Either a single value or a vector of values in order to vary pmMCAR across replications (with length equal to nRep or a divisor of nRep). The 
pmMAR 
The percentage of data missing at random (0 <= pmCAR < 1). Either a single value or a vector of values in order to vary pmCAR across replications (with length equal to nRep or a divisor of nRep). The 
facDist 
Factor distributions. Either a list of 
indDist 
Indicator distributions. Either a list of 
errorDist 
An object or list of objects of type 
sequential 
If 
saveLatentVar 
If 
modelBoot 
When specified, a modelbased bootstrap is used for data generation (for use with the 
realData 
A data.frame containing real data. Generated data will follow the distribution of this data set. 
covData 
A data.frame containing covariate data, which can have any distributions. This argument is required when users specify 
maxDraw 
The maximum number of attempts to draw a valid set of parameters (no negative error variance, standardized coefficients over 1). 
misfitType 
Character vector indicating the fit measure used to assess the misfit of a set of parameters. Can be "f0", "rmsea", "srmr", or "all". 
misfitBounds 
Vector that contains upper and lower bounds of the misfit measure. Sets of parameters drawn that are not within these bounds are rejected. 
averageNumMisspec 
If 
optMisfit 
Character vector of either "min" or "max" indicating either maximum or minimum optimized misfit. If not null, the set of parameters out of the number of draws in "optDraws" that has either the maximum or minimum misfit of the given misfit type will be returned. 
optDraws 
Number of parameter sets to draw if optMisfit is not null. The set of parameters with the maximum or minimum misfit will be returned. 
createOrder 
The order of 1) applying equality/inequality constraints, 2) applying misspecification, and 3) fill unspecified parameters (e.g., residual variances when total variances are specified). The specification of this argument is a vector of different orders of 1 (constraint), 2 (misspecification), and 3 (filling parameters). For example, 
aux 
The names of auxiliary variables saved in a vector. 
group 
The name of the group variable. This argument is used when 
mxFit 
A logical whether to find an extensive list of fit measures (which will be slower). This argument is applicable when 
mxMixture 
A logical whether to the analysis model is a mixture model. This argument is applicable when 
citype 
Type of confidence interval. For the current version, this argument will be forwarded to the 
cilevel 
Confidence level. For the current version, this argument will be forwarded to the 
seed 
Random number seed. Note that the seed number is always fixed in the 
silent 
If 
multicore 
Users may put 
cluster 
Not applicable now. Used to specify nodes in hpc in order to be parallelizable. 
numProc 
Number of processors for using multiple processors. If it is 
paramOnly 
If 
dataOnly 
If 
smartStart 
Defaults to FALSE. If TRUE, population parameter values that are real numbers will be used as starting values. When tested in small models, the time elapsed when using population values as starting values was greater than the time reduced during analysis, and convergence rates were not affected. 
previousSim 
A result object that users wish to add the results of the current simulation in 
completeRep 
If 
stopOnError 
If 
... 
Additional arguments to be passed to 
This function is executed as follows: 1. parameters are drawn from the specified datageneration model (applicable only simsem model template, SimSem
, only), 2. the drawn (or the specified) parameters are used to create data, 3. data can be transformed using the datafun
argument, 4. specified missingness (if any) is imposed, 5. data are analyzed using the specified analysis model, 6. parameter estimates, standard errors, fit indices, and other characteristics of a replication are extracted, 7. additional outputs (if any) are extracted using the outfun
argument, and 8. results across replications are summarized in a result object, SimResult
).
There are six ways to provide or generate data in this function:
SimSem
can be used as a template to generate data, which can be created by the model
function. The SimSem
can be specified in the generate
argument.
lavaan
script, parameter table for the lavaan
package, or a list of arguments for the simulateData
function. The lavaan
script can be specified in the generate
argument.
MxModel
object from the OpenMx
package. The MxModel
object can be specified in the generate
argument.
A list of raw data for each replication can be provided for the rawData
argument. The sim
function will analyze each data and summarize the result. Note that the generate
, n
and nRep
could not be specified if the list of raw data is provided.
Population data can be provided for the rawData
argument. The sim
function will randomly draw sample data sets and analyze data. Note that the n
and nRep
must be specified if the population data are provided. The generate
argument must not be specified.
A function can be used to generate data. The function must take sample size in a numeric format (or a vector of numerics for multiple groups) and return a data frame of the generated data set. Note that parameter values and their standardized values can be provided by using the attributes of the resulting data set. That is, users can assign parameter values and standardized parameter values to attr(data, "param")
and attr(data, "stdparam")
.
Note that all generated or provided data can be transformed based on BollenStine approach by providing a real data in the realData
argument if any of the first three methods are used.
There are four ways to analyze the data sets for each replication by setting the model
argument as
SimSem
can be used as a template for data analysis.
lavaan
script, parameter table for the lavaan
package, or a list of arguments for the lavaan
, sem
, cfa
, or growth
function. Note that if the desired function to analyze data can be specified in the lavaanfun
argument, which the default is the lavaan
function
MxModel
object from the OpenMx
package. The object does not need to have data inside. Note that if users need an extensive fit indices, the mxFit
argument should be specified as TRUE
. If users wish to analyze by mixture model, the mxMixture
argument should be TRUE
such that the sim
function knows how to handle the data.
A function that takes a data set and returns a list. The list must contain at least three objects: a vector of parameter estimates (coef
), a vector of standard error (se
), and the convergence status as TRUE
or FALSE
(converged
). There are seven optional objects in the list: a vector of fit indices (fit
), a vector of standardized estimates (std
), a vector of standard errors of standardized estimates (stdse
), fraction missing type I (FMI1
), fraction missing type II (FMI2
), lower bounds of confidence intervals (cilower
), and upper bounds of confidence intervals (ciupper
). Note that the coef
, se
, std
, stdse
, FMI1
, FMI2
, cilower
, and ciupper
must be a vector with names. The name of those vectors across different objects must be the same. Users may optionally specify other objects in the list; however, the results of the other objects will not be automatically combined. Users need to specify the outfun
argument to get the extra objects. For example, researchers may specify residuals
in the list. The outfun argument should have the function as follows: function(obj) obj$residuals
.
Any combination of datageneration methods and dataanalysis methods are valid. For example, data can be simulated using lavaan script and analyzed by MxModel
. Paralleled processing can be enabled using the multicore
argument.
A result object (SimResult
)
Patrick Miller (University of Notre Dame; [email protected]) Sunthud Pornprasertmanit ([email protected])
SimResult
for the resulting output description
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104  # Please go to www.simsem.org for more examples.
# Example of using simsem model template
library(lavaan)
loading < matrix(0, 6, 2)
loading[1:3, 1] < NA
loading[4:6, 2] < NA
LY < bind(loading, 0.7)
latent.cor < matrix(NA, 2, 2)
diag(latent.cor) < 1
RPS < binds(latent.cor, 0.5)
RTE < binds(diag(6))
VY < bind(rep(NA,6),2)
CFA.Model < model(LY = LY, RPS = RPS, RTE = RTE, modelType = "CFA")
# In reality, more than 5 replications are needed.
Output < sim(5, CFA.Model, n=200)
summary(Output)
# Example of using simsem model template
popModel < "
f1 =~ 0.7*y1 + 0.7*y2 + 0.7*y3
f2 =~ 0.7*y4 + 0.7*y5 + 0.7*y6
f1 ~~ 1*f1
f2 ~~ 1*f2
f1 ~~ 0.5*f2
y1 ~~ 0.49*y1
y2 ~~ 0.49*y2
y3 ~~ 0.49*y3
y4 ~~ 0.49*y4
y5 ~~ 0.49*y5
y6 ~~ 0.49*y6
"
analysisModel < "
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
"
Output < sim(5, model=analysisModel, n=200, generate=popModel, std.lv=TRUE, lavaanfun = "cfa")
summary(Output)
# Example of using population data
pop < data.frame(y1 = rnorm(100000, 0, 1), y2 = rnorm(100000, 0, 1))
covModel < "
y1 ~~ y2
"
Output < sim(5, model=covModel, n=200, rawData=pop, lavaanfun = "cfa")
summary(Output)
# Example of data transformation: Transforming to standard score
fun1 < function(data) {
temp < scale(data)
as.data.frame(temp)
}
# Example of additional output: Extract modification indices from lavaan
fun2 < function(out) {
inspect(out, "mi")
}
# In reality, more than 5 replications are needed.
Output < sim(5, CFA.Model,n=200,datafun=fun1, outfun=fun2)
summary(Output)
# Get modification indices
getExtraOutput(Output)
# Example of additional output: Comparing latent variable correlation
outfundata < function(out, data) {
predictcor < inspect(out, "coef")$psi[2, 1]
latentvar < attr(data, "latentVar")[,c("f1", "f2")]
latentcor < cor(latentvar)[2,1]
latentcor  predictcor
}
Output < sim(5, CFA.Model,n=200, sequential = TRUE, saveLatentVar = TRUE,
outfundata = outfundata)
getExtraOutput(Output)
# Example of analyze using a function
analyzeFUN < function(data) {
out < lm(y2 ~ y1, data=data)
coef < coef(out)
se < sqrt(diag(vcov(out)))
fit < c(loglik = as.numeric(logLik(out)))
converged < TRUE # Assume to be convergent all the time
return(list(coef = coef, se = se, fit = fit, converged = converged))
}
Output < sim(5, model=analyzeFUN, n=200, rawData=pop, lavaanfun = "cfa")
summary(Output)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.