rctglm: Fit GLM and find any estimand (marginal effect) using plug-in...
In postcard: Estimating Marginal Effects with Prognostic Covariate Adjustment

rctglm

R Documentation

Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation using influence functions

Description

The procedure uses plug-in-estimation and influence functions to perform robust inference of any specified estimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect of covariates in randomisation groups. See Powering RCTs for marginal effects with GLMs using prognostic score adjustment by Højbjerre-Frandsen et. al (2025) for more details on methodology.

Usage

rctglm(
  formula,
  exposure_indicator,
  exposure_prob,
  data,
  family = gaussian,
  estimand_fun = "ate",
  estimand_fun_deriv0 = NULL,
  estimand_fun_deriv1 = NULL,
  cv_variance = FALSE,
  cv_variance_folds = 10,
  verbose = options::opt("verbose"),
  ...
)

Arguments

`formula`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in the glm documentation.
`exposure_indicator`	(name of) the binary variable in `data` that identifies randomisation groups. The variable is required to be binary to make the "orientation" of the `estimand_fun` clear.
`exposure_prob`	a `numeric` with the probability of being in "group 1" (rather than group 0) in groups defined by `exposure_indicator`.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.
`family`	a description of the error distribution and link function to be used in the model. For `glm` this can be a character string naming a family function, a family function or the result of a call to a family function. For `glm.fit` only the third option is supported. (See `family` for details of family functions.)
`estimand_fun`	a `function` with arguments `psi1` and `psi0` specifying the estimand. Alternative, specify "ate" or "rate_ratio" as a `character` to use one of the default estimand functions. See more details in the "Estimand" section of rctglm.
`estimand_fun_deriv0`	a `function` specifying the derivative of `estimand_fun` wrt. `psi0`. As a default the algorithm will use symbolic differentiation to automatically find the derivative from `estimand_fun`
`estimand_fun_deriv1`	a `function` specifying the derivative of `estimand_fun` wrt. `psi1`. As a default the algorithm will use symbolic differentiation to automatically find the derivative from `estimand_fun`
`cv_variance`	a `logical` determining whether to estimate the variance using cross-validation (see details of rctglm).
`cv_variance_folds`	a `numeric` with the number of folds to use for cross validation if `cv_variance` is `TRUE`.
`verbose`	`numeric` verbosity level. Higher values means more information is printed in console. A value of 0 means nothing is printed to console during execution (Defaults to `2`, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')
`...`	Additional arguments passed to `stats::glm()`

Details

The procedure assumes the setup of a randomised clinical trial with observations grouped by a binary exposure_indicator variable, allocated randomly with probability exposure_prob. A GLM is fit and then used to predict the response of all observations in the event that the exposure_indicator is 0 and 1, respectively. Taking means of these predictions produce the counterfactual means psi0 and psi1, and an estimand r(psi0, psi1) is calculated using any specified estimand_fun.

The variance of the estimand is found by taking the variance of the influence function of the estimand. If cv_variance is TRUE, then the counterfactual predictions for each observation (which are used to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictions using cross validation with number of folds specified by cv_variance_folds. The cross validation splits are performed using stratified sampling with exposure_indicator as the strata argument in rsample::vfold_cv.

Value

rctglm returns an object of class inheriting from "rctglm".

An object of class rctglm is a list containing the following components:

estimand: A data.frame with plug-in estimate of estimand, standard error (SE) estimate and variance estimate of estimand
estimand_funs: A list with
- f: The estimand_fun used to obtain an estimate of the estimand from counterfactual means
- d0: The derivative with respect to psi0
- d1: The derivative with respect to psi1
means_counterfactual: A data.frame with counterfactual means psi0 and psi1
fitted.values_counterfactual: A data.frame with counterfactual mean values, obtained by transforming the linear predictors for each group by the inverse of the link function.
glm: A glm object returned from running stats::glm within the procedure
call: The matched call

Estimands

As noted in the description, psi0 and psi1 are the counterfactual means found by prediction using a fitted GLM in the binary groups defined by exposure_indicator.

Default estimand functions can be specified via "ate" (which uses the function function(psi1, psi0) psi1-psi0) and "rate_ratio" (which uses the function function(psi1, psi0) psi1/psi0). See more information on specifying the estimand_fun in vignette("model-fit").

As a default, the Deriv package is used to perform symbolic differentiation to find the derivatives of the estimand_fun.

Examples

# Generate some data to showcase example
n <- 100
exp_prob <- .5

dat_gaus <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = gaussian()
)

# Fit the model
ate <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_gaus,
              family = gaussian)

# Pull information on estimand
estimand(ate)

## Another example with different family and specification of estimand_fun
dat_binom <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = binomial()
)

rr <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial(),
              estimand_fun = "rate_ratio")

odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))
or <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial,
              estimand_fun = odds_ratio)

postcard documentation built on April 12, 2025, 1:57 a.m.