View source: R/sim_n_datasets.r
| sim_n_datasets | R Documentation |
DAG object
This function takes a single DAG object and generates a list of multiple datasets, possible using parallel processing
sim_n_datasets(dag, n_sim, n_repeats, n_cores=1,
data_format="raw", data_format_args=list(),
seed=stats::runif(1), progressbar=TRUE, ...)
dag |
A |
n_sim |
A single number specifying how many observations per dataset should be generated. |
n_repeats |
A single number specifying how many datasets should be generated. |
n_cores |
A single number specifying the amount of cores that should be used. If |
data_format |
An optional character string specifying the output format of the generated datasets. If |
data_format_args |
An optional list of named arguments passed to the function specified by |
seed |
A seed for the random number generator. By supplying a value to this argument, the results will be replicable, even if parallel processing is used to generate the datasets (using |
progressbar |
Either |
... |
Further arguments passed to the |
Generating a number of datasets from a single defined dag object is usually the first step when conducting monte-carlo simulation studies. This is simply a convenience function which automates this process using parallel processing (if specified).
Note that for more complex monte-carlo simulations this function may not be ideal, because it does not allow the user to vary aspects of the data-generation mechanism inside the main for loop, because it can only handle a single dag. For example, if the user wants to simulate n_repeats datasets with confounding and n_repeats datasets without confounding, he/she has to call this function twice. This is not optimal, because setting up the clusters for parallel processing takes some processing time. If many different dags should be used, it would make more sense to write a single function that generates the dag itself for each of the desired settings. This can sadly not be automated by us though.
Returns a list of length n_repeats containing datasets generated according to the supplied dag object.
Robin Denz
empty_dag, node, node_td, sim_from_dag, sim_discrete_time, sim2data
library(simDAG)
# some example DAG
dag <- empty_dag() +
node("death", type="binomial", parents=c("age", "sex"), betas=c(1, 2),
intercept=-10) +
node("age", type="rnorm", mean=10, sd=2) +
node("sex", parents="", type="rbernoulli", p=0.5) +
node("smoking", parents=c("sex", "age"), type="binomial",
betas=c(0.6, 0.2), intercept=-2)
# generate 10 datasets without parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100)
if (requireNamespace("doSNOW") & requireNamespace("doRNG") &
requireNamespace("foreach")) {
# generate 10 datasets with parallel processing
out <- sim_n_datasets(dag, n_repeats=10, n_cores=2, n_sim=100)
}
# generate 10 datasets and transforming the output
# (using the sim2data function internally)
dag <- dag + node_td("CV", type="time_to_event", prob_fun=0.01)
out <- sim_n_datasets(dag, n_repeats=10, n_cores=1, n_sim=100,
max_t=20, data_format="start_stop")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.