knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = TRUE,
  fig.width=7, 
  fig.height=7
)
library(causl)

Simulation

The former main function for simulating data is causalSamp(). Suppose we wish to have \begin{align} Z &\sim N(0, \; \sigma_z^2)\ X \mid Z=z &\sim N(\mu_x, \; \sigma_x^2)\ Y \mid \operatorname{do}(X=x) &\sim N(\mu_y, \; \sigma_y^2) \end{align} with $\mu_x = 0.5 z$, $\mu_y = 0.3 x$, and $\sigma_z^2 = \sigma_x^2 = \sigma_y^2 = 1$. Then we set:

pars <- list(z = list(beta = 0, phi=1),
             x = list(beta = c(0,0.5), phi=1),
             y = list(beta = c(0,0.3), phi=1),
             cop = list(beta = matrix(c(0.5,0.25), ncol=1)))

Note that the copula parameters have been selected for a linear predictor of the form $\eta = 0.5 + 0.25x$, and the default link function for the Gaussian copula is $g(\eta) = \operatorname{logit}((\eta+1)/2)$.

Then we simulate as follows:

set.seed(124)  # for consistency
dat <- causalSamp(1e3, formulas=list(z~1, x~z, y~x, ~x), family=rep(1,4), pars=pars)
pairs(dat)

Note that the distribution is as we expect. Regressing \code{x} on \code{z} should give a coefficient close to 0.5.

lmXZ <- lm(x ~ z, data=dat)
summary(lmXZ)$coef
# lmYX <- lm(y ~ x, data=dat, weights = 1/dnorm(predict(lmXZ)))
# summary(lmYX)

Note that it is not necessary to use the standard variable names z, x and y, but if you choose not to then you must specify the names you want to use with the formulas argument.

pars <- list(a = list(beta = 0, phi=1),
             b = list(beta = c(0,0.5), phi=1),
             c = list(beta = c(0,0.3), phi=1),
             cop = list(beta = matrix(c(0.5,0.25), ncol=1)))
set.seed(124)
dat2 <- causalSamp(1e3, formulas=list(a ~ 1, b ~ a, c ~ b,  ~ b), pars=pars)
pairs(dat2)

causalSamp() has various default settings. The copula $C_{YZ}$, and the distribution families of $X$, $Y$ and $Z$ all default to Gaussian. It over-samples by a factor of 10 so that some may be rejected later, though this is increased later on if necessary up to the control parameter max_oversamp (with default value 1000).

Fitting Models

We can fit a model using maximum likelihood.

out <- fit_causl(dat, formulas = list(z~1, y~x, ~x))
out

We see that the x coefficient in the regression parameter is correct (even surprisingly so!).

Note that if we allow z to depend upon x we obtain a biased estimate.

out2 <- fit_causl(dat, form = list(y~x, z~x, ~x))
out2

Other families

We can also perform maximum likelihood estimation with other parametric families. Suppose that $Z$ is t-distributed and $Y$ Gamma distributed. We can simulate with

forms <- list(Z ~ 1, X ~ Z, Y ~ X, ~ 1)
fams <- c(2, 5, 3, 1)
pars <- list(Z = list(beta = 0, phi=1, par2=4),
             X = list(beta = c(0,0.5)),
             Y = list(beta = c(1,-0.3), phi=2),
             cop = list(beta = 1))
set.seed(124)
dat <- rfrugalParam(1e3, formulas=forms, family=fams, pars=pars)

Then we can perform maximum likelihood estimation on this data:

(out <- fit_causl(dat, formulas = forms[-2], family = fams[-2], other_pars=list(Z=list(par2=4L))))


rje42/causl documentation built on June 1, 2025, 2:50 p.m.