knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
options(digits=3)

The causl package now allows users to specify custom families with which to perform their simulations. To do this, one must create a function that takes as an argument the link function, and returns a list containing:

The function should also give the output the class "causl_family", so that it is interpreted appropriately.

Example (Poisson distribution)

Suppose we wish to simulate from a Poisson distribution as part of a frugal model. Then we can define a function such as the below:

library(causl)

poisson_causl_fam <- function (link) {
  if (missing(link)) link = "log"

  ## write functions
  dens <- function (x, mu, log=FALSE) dpois(x, lambda=mu, log=log)
  quan <- function (p, mu) qpois(p, lambda=mu)
  sim <- function (n, mu) {
    qx <- runif(n)
    x <- qpois(qx, lambda=mu)
    attr(x, "quantile") <- qx
    return(x)
  }
  probs <- function (x, mu) ppois(x, lambda=mu)

  default <- function(theta) list(x=1, mu=1)

  ## define family
  out <- list(name="poisson", ddist=dens, qdist=quan, rdist=sim, pdist=probs,
              pars=c("mu"), default=default, link=link)
  class(out) <- "causl_family"

  return(out)
}

Some remarks are in order. Note that the ddist function must have a log argument to allow the log-density to be evaluated. In addition, since this is a discrete distribution, if we use it in a copula we must know the exact quantile used in the simulation (not just the one returned by applying ppois). Hence we simulate a uniform and then obtain the Poisson random variables by inversion from qpois. For continuous distributions this is unnecessary, and glm_sim will just apply the pdist function.

Now, let us try to simulate some data using this distribution. We will need to use poisson_causl_fam in our family argument:

forms <- list(Z ~ 1, X ~ Z, Y ~ X, ~ 1)
fams <- list(1, poisson_causl_fam(), 1, 1)
pars <- list(Z = list(beta = 0, phi = 1),
             X = list(beta = c(0.3,1)),
             Y = list(beta = c(-1,0.5), phi = 1),
             cop = list(beta = 1))
cm <- causl_model(formulas = forms, family = fams, pars = pars)

set.seed(123)
dat <- rfrugal(n=1e3, causl_model=cm)

Treatment model

Then we can check that the new variables have been simulated correctly. We ought to have: \begin{align} P(X = x \mid z) &= \frac{1}{x!} \exp{\alpha_0 + \alpha_1 z x - e^{\alpha_0 + \alpha_1 z}}, \end{align} with $\alpha = r pars$X$beta[1]$ and $\alpha_1 = r pars$X$beta[2]$. Let's perform a regression to check the parameters:

modX <- glm(X ~ Z, family=quasipoisson, data=dat)
summary(modX)$coefficients

So we can see that the estimates are well within two standard errors of the true value. In addition,

summary(modX)$dispersion

the dispersion parameter is close to its nominal value of 1.

Outcome model

We can also fit the outcome model using maximum likelihood estimation with sandwich errors.

out <- fit_causl(dat = dat, formulas = list(Y ~ X, Z ~ 1, cop ~ 1),
                 family = c(1, 1, 1))

This gives us estimates of $\beta_0 = r out$par[1]$ (r out$sandwich_se[1]) and $\beta_1 = r out$par[2]$ (r out$sandwich_se[2]), which is consistent with the values $-1$ and $0.5$ that we used to simulate.

Custom Link Functions

The example above works because log is an existing link function specified in the package. Suppose, however, that we want to use the sqrt function as the link; this is not used with any built-in families.

For this we need to specify a custom_links argument to poisson_causl_fam; each entry in this named list should itself be a list, containing:

The names of the entries should correspond to the string used as the name of the link function.

 poisson_causl_fam <- function (link) {
  if (missing(link)) link = "log"

  ## write functions
  dens <- function (x, mu, log=FALSE) dpois(x, lambda=mu, log=log)
  quan <- function (p, mu) qpois(p, lambda=mu)
  sim <- function (n, mu) {
    qx <- runif(n)
    x <- qpois(qx, lambda=mu)
    attr(x, "quantile") <- qx
    return(x)
  }
  probs <- function (x, mu) ppois(x, lambda=mu)

  default <- function(theta) list(x=1, mu=1)

  custom_links <- list(sqrt = list(linkfun = function (x) sqrt(x),
                                   linkinv = function (x) x^2))

  ## define family
  out <- list(name="poisson", ddist=dens, qdist=quan, rdist=sim, pdist=probs,
              pars=c("mu"), default=default, link=link, custom_links=custom_links)
  class(out) <- "causl_family"

  return(out)
}

Now we can suitably modify our earlier example:

fams <- list(1, poisson_causl_fam(link="sqrt"), 1, 1)
pars$X$beta <- c(0.9, 0.05)
cm_sq <- modify.causl_model(cm, family=fams, pars=pars)

set.seed(124)
dat <- rfrugal(1e3, causl_model = cm_sq, control=list(quiet=TRUE))

We can again check that the distribution was correctly sampled.

modX <- glm(X ~ Z, family=quasipoisson(sqrt), data=dat, start=c(1,0))
summary(modX)$coefficients

So we again see that the estimates are (just about) within two standard errors of the true values (0.9 and 0.05), and the dispersion parameter is estimated to be r summary(modX)$dispersion, again close to 1.



rje42/causl documentation built on June 1, 2025, 2:50 p.m.