sgf: Sequential g-formula for continuous multiple time point...
In CICI: Causal Inference with Continuous (Multiple Time Point) Interventions

View source: R/sgf.R

sgf	R Documentation

Sequential g-formula for continuous multiple time point interventions

Description

Estimation of counterfactual outcomes for multiple values of continuous interventions at different time points using the sequential (weighted) g-formula.

Usage

sgf(X, Anodes, Ynodes, Lnodes = NULL, Cnodes = NULL,
    abar = NULL, survivalY = FALSE, 
    SL.library = "SL.glm", SL.export = NULL,
    Yweights = NULL, calc.support = FALSE, B = 0,
    ncores = 1, verbose = TRUE, seed = NULL, prog = NULL, 
    cilevel = 0.95, ...)

Arguments

`X`	A data frame, following the time-ordering of the variables.
`Anodes`	A character string of column names in `X` of the intervention variable(s).
`Ynodes`	A character string of column names in `X` of the outcome variable(s).
`Lnodes`	A character string of column names in `X` of the time-dependent (post first treatment) variable(s).
`Cnodes`	A character string of column names in `X` of the censoring variable(s).
`abar`	Numeric vector or matrix of intervention values. See Details.
`survivalY`	Logical. If TRUE, then Y nodes are indicators of an event.
`SL.library`	Either a character vector of prediction algorithms or a list containing character vectors. See details.
`SL.export`	A string vector of user-written learning and screening algorithms that are not part of SuperLearner, but are part of the learning library. Only required if `ncores>1`. See details.
`Yweights`	A list of length of `Ynodes`, likely generated with `calc.weights`.
`calc.support`	Logical. If `TRUE`, both crude and conditional support is estimated.
`B`	An integer specifying the number of bootstrap samples to be used, if any.
`ncores`	An integer for the number of threads/cores to be used. If >1, parallelization will be utilized.
`verbose`	Logical. If `TRUE`, notes and warnings are printed.
`seed`	An integer specifying the seed to be used to create reproducable results for parallel computing (i.e. when ncores>1).
`prog`	A character specifying a path where progress should be saved (typically, when `ncores`>1).
`cilevel`	Numeric value between 0 and 1 specifying the confidence level. Defaults to 95%.
`...`	Further arguments to be passed on.

Details

The function calculates the expected counterfactual outcomes (specified under Ynodes) under the intervention abar.

If abar is a vector, then each vector component is used as the intervention value at each time point; that is, interventions which are constant over time are defined. If abar is a matrix (of size 'number interventions' x 'time points'), then each row of the length of Anodes refers to a particular time-varying intervention strategy.

The nested iterated outcome models are fitted using super learning. The specified prediction algorithms (possibly coupled with algorithms for prior variable screening) are passed on to package SuperLearner. See ?SuperLearner for examples of permitted structures. Note: User-written prediction algorithms, corresponding S3 prediction functions and screening algorithms need to be specified under SL.export, if parallelization is used.

For survival settings, it is required that i) survivalY=TRUE and ii) after a Cnode/Ynode is 1, every variable thereafter is set to NA. See manual for an example. The package intervenes on Cnodes, i.e. calculates counterfactual outcomes under no censoring.

If calc.support=TRUE, conditional and crude support measures (i.e., diagnostics) are calculated as described in Section 3.3.2 of Schomaker et al. (2024).

To parallelize computations automatically, it is sufficient to set ncores>1, as appropriate. No further customization or setup is needed, everything will be done by the package. To make estimates under parallelization reproducible, use the seed argument. To watch the progress of parallelized computations, set a path in the prog argument: then, a text file reports on the progress, which is particularly useful if lengthy bootstrapping computations are required.

Value

Returns an object of of class ‘gformula’:

`results`	matrix of results
`diagnostics`	list of diagnostics and weights based on the estimated support (if `calc.support=TRUE`)
`SL.weights`	matrix of average super learner weights, at each time point
`boot.results`	matrix of bootstrap results
`setup`	list of chosen setup parameters

Author(s)

Michael Schomaker

References

Schomaker M, McIlleron H, Denti P, Diaz I. (2024) Causal Inference for Continuous Multiple Time Point Interventions, Statistics in Medicine, 43:5380-5400, see also https://arxiv.org/abs/2305.06645.

Examples



data(EFV)
est <- sgf(X=EFV,
                Lnodes  = c("adherence.1","weight.1",
                            "adherence.2","weight.2",
                            "adherence.3","weight.3",
                            "adherence.4","weight.4"
                ),
                Ynodes  = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
                Anodes  = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
                abar=seq(0,5,1)
)

est

CICI documentation built on April 7, 2026, 5:08 p.m.