simcausal | R Documentation |
The simcausal R package is a tool for specification and simulation of complex longitudinal data structures that are based on structural equation models. The package provides a flexible tool for conducting transparent and reproducible simulation studies, with a particular emphasis on the types of data and interventions frequently encountered in typical causal inference problems, such as, observational data with time-dependent confounding, selection bias, and random monitoring processes. The package interface allows for concise expression of complex functional dependencies between a large number of nodes, where each node may represent a time-varying random variable. The package allows for specification and simulation of counterfactual data under various user-specified interventions (e.g., static, dynamic, deterministic, or stochastic). In particular, the interventions may represent exposures to treatment regimens, the occurrence or non-occurrence of right-censoring events, or of clinical monitoring events. simcausal enables the computation of a selected set of user-specified features of the distribution of the counterfactual data that represent common causal quantities of interest, such as, treatment-specific means, the average treatment effects and coefficients from working marginal structural models. For additional details and examples please see the package vignette and the function-specific documentation.
To see the package vignette use: vignette("simcausal_vignette", package="simcausal")
To see all available package documentation use: help(package = 'simcausal')
The following routines will be generally invoked by a user, in the same order as presented below.
DAG.empty
Initiates an empty DAG
object that contains no nodes.
node
Defines node(s) in the structural equation model and its conditional distribution(s) using a language of vector-like R expressions. A call to node
can specify either a single node or multiple nodes at once.
add.nodes
or +node
Provide two equivalent ways of growing the structural equation model by adding new nodes and their conditional distributions.
Sequentially define nodes in the DAG
object, with each node representing the outcomes of one or more structural equation(s), altogether making-up the causal model of interest.
set.DAG
Performs consistency checks and locks the DAG
object so that no additional nodes can be subsequently added to the structural equation model.
sim
or simobs
Simulates iid observations of the complete node sequence defined by the DAG
object. The output dataset is stored as a data.frame
and is referred to as the observed data.
add.action
or +action
Provide two equivalent ways to define one or more actions.
Each action modifies the conditional distribution for a subset of nodes in the original DAG
object. The resulting data generating distribution is referred to as the post-intervention distribution.
It is saved in the DAG
object alongside the original structural equation model (DAG
object).
sim
or simfull
Simulates independent observations from one or more post-intervention distribution(s).
Produces a named list of data.frame
s, collectively referred to as the full data.
The number of output data.frame
s is equal to the number of post-intervention distributions specified in the actions
argument, where each data.frame
object is an iid sample from a particular post-intervention distribution.
set.targetE
and set.targetMSM
Define two distinct types of target causal parameters.
The function set.targetE
defines causal parameters as the expected value(s) of DAG
node(s) under one post-intervention distribution or the contrast of such expected value(s) from two post-intervention distributions.
The function set.targetMSM
defines causal parameters based on a user-specified working marginal structural model.
eval.target
Evaluates the previously defined causal parameter using simulated full data
The following most common types of output are produced by the package:
parameterized causal DAG
model - object that specifies the structural equation model, along with interventions and the causal target parameter of interest.
observed data - data simulated from the (pre-intervention) distribution specified by the structural equation model.
full data - data simulated from one or more post-intervention distributions defined by actions on the structural equation model.
causal target parameter - the true value of the causal target parameter evaluated with full data.
Check for updates and report bugs at https://github.com/osofr/simcausal.
Maintainer: Fred Gruber fgruber@gmail.com [contributor]
Authors:
Oleg Sofrygin oleg.sofrygin@gmail.com
Mark J. van der Laan laan@berkeley.edu
Romain Neugebauer Romain.S.Neugebauer@kp.org
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.