causalSamp: Sample from a causal model

View source: R/sample_data.R

causalSampR Documentation

Sample from a causal model

Description

Obtain samples from a causal model using the rejection sampling approach of Evans and Didelez (2024).

Usage

causalSamp(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  pars,
  family,
  link = NULL,
  dat = NULL,
  method = "rejection",
  control = list(),
  seed
)

Arguments

n

number of samples required

formulas

list of lists of formulas

pars

list of lists of parameters

family

families for Z,X,Y and copula

link

list of link functions

dat

data frame of covariates

method

only "rejection" is valid

control

list of options for the algorithm

seed

random seed used for replication

Details

Samples from a given causal model using rejection sampling (or, if everything is discrete, direct sampling).

The entries for formula and family should each be a list with four entries, corresponding to the Z, X, Y and the copula. formula determines the model, so it is crucial that every variable to be simulated is represented there exactly once. Each entry of that list can either be a single formula, or a list of formulae. Each corresponding entry in family should be the same length as the list in formula or of length 1 (in which case it will be repeated for all the variables therein).

We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.

The family variables for the copula are also numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

pars should be a named list containing: either entries z, x, y and cop, or variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit and probit links are available.

Control parameters are oversamp (default value 10), trace (default value 0, increasing to 1 increases verbosity of output), max_oversamp (default value 1000), warn (which currently does nothing), max_wt which is set to 1, and increases each time the function is recalled. Control parameters also include cop, which gives a keyword for the copula that defaults to "cop".

This function is kept largely for the replication of simulations from Evans and Didelez (2024).

Value

A data frame containing the simulated data.

References

Evans, R.J. and Didelez, V. Parameterizing and simulating from causal models (with discussion). Journal of the Royal Statistical Society, Series B, 2024.

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
causalSamp(100, pars = pars)



rje42/causl documentation built on June 1, 2025, 2:50 p.m.