rfrugal: Sample from a causal model
In rje42/causl: Methods for Specifying, Simulating from and Fitting Causal Models

rfrugal

R Documentation

Sample from a causal model

Description

Obtain samples from a causal model parameterized as in Evans and Didelez (2024).

Usage

rfrugal(n, causl_model, control = list())

rfrugalParam(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  family = c(1, 1, 1, 1),
  pars,
  link = NULL,
  dat = NULL,
  method = "inversion",
  control = list(),
  ...
)

Arguments

`n`	number of samples required
`causl_model`	object of class `causl_model`
`control`	list of options for the algorithm
`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`link`	list of link functions
`dat`	optional data frame of covariates
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`
`...`	other arguments, such as custom families
`estimand`	quantity to control, default is `"ATE"`

Details

Samples from a given causal model under the frugal parameterization.

The entries for formula and family should each be a list with four entries, corresponding to the Z, X, Y and the copula. formula determines the model, so it is crucial that every variable to be simulated is represented there exactly once. Each entry of that list can either be a single formula, or a list of formulae. Each corresponding entry in family should be the same length as the list in formula or of length 1 (in which case it will be repeated for all the variables therein).

We use the following codes for different families of distributions:

val	family
0	binomial
1	gaussian
2	t
3	Gamma
4	beta
5	binomial
6	lognormal
11	ordinal
10	categorical

The family variables for the copula are also numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

pars should be a named list containing variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit, probit, and log links are available.

A variety of sampling methods are implemented. The inversion method with pair-copulas is the default (method="inversion"), but we cam also use a multivariate copula (method="inversion_mv") or even rejection sampling (method="rejection"). Note that the inveresion_mv method simulates the entire copula, so it cannot depend upon intermediate variables.

The only control parameters are cop: which gives a keyword for the copula that defaults to "cop"; quiet which defaults to FALSE but will reduce output if set to TRUE; and (if rejection sampling is selected) careful: this logical enables one to implement the full rejection sampling method, which means we do get exact samples (note this method is generally very slow, especially if we have an outlying value, so the default is FALSE).

Value

A data frame containing the simulated data.

Functions

rfrugalParam(): old function for simulation

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
rfrugalParam(100, pars = pars)

rje42/causl documentation built on June 1, 2025, 2:50 p.m.