mcee_helper_2stage_estimation: Two-stage helper for mediated causal excursion effects (MCEE)

View source: R/mcee_helper_estimation.R

mcee_helper_2stage_estimationR Documentation

Two-stage helper for mediated causal excursion effects (MCEE)

Description

Fits all nuisance components (Stage 1) and then computes the MCEE parameters (Stage 2) and their sandwich variance. This is a low-level driver used by the high-level wrapper; it assumes 'omega_nrows' and 'f_nrows' are already aligned to the rows of 'data'.

Usage

mcee_helper_2stage_estimation(
  data,
  id_var,
  dp_var,
  outcome_var,
  treatment_var,
  mediator_var,
  avail_var = NULL,
  config_p,
  config_q,
  config_eta,
  config_mu,
  config_nu,
  omega_nrows,
  f_nrows
)

Arguments

data

A long-format 'data.frame' (one row per subject-by-decision point).

id_var

Character scalar. Name of the subject ID column.

dp_var

Character scalar. Name of the decision point column (values need not be consecutive; they may vary in count across subjects).

outcome_var

Character scalar. Name of the distal outcome column.

treatment_var

Character scalar. Name of the binary treatment column (coded 0/1).

mediator_var

Character scalar. Name of the mediator column.

avail_var

Character scalar or 'NULL'. Name of the availability column (1 = available, 0 = unavailable). If 'NULL', availability is treated as all 1.

config_p

Configuration for p_t(a\mid H_t) (propensity). A **list** using one of the following schemas:

  • Known constant(s) (skips fitting): list(known = c(...)) or arm-specific known_a1/known_a0.

  • Model fit: list(formula = ~ rhs, method = m, ...) where method is one of "glm", "gam", "rf", "ranger", "sl", "sl.user-specified-library". Optional fields:

    • family: a GLM/GAM family. If omitted, **auto-detected** as binomial() for p and q, otherwise gaussian().

    • clipping: numeric length-2 c(lo, hi) to clip probabilities into [lo, hi] (useful for stability).

    • For method = "sl": SL.library (character vector of learners); if omitted, a simple default library is used: c("SL.mean", "SL.glm", "SL.gam").

config_q

Configuration for q_t(a\mid H_t, M_t). Same schema as config_p.

config_eta

Configuration for \eta_t(a, H_t) (outcome given A,H). Same schema as config_p; default family auto-detected to gaussian() if omitted.

config_mu

Configuration for \mu_t(a, H_t, M_t) (outcome given A,H,M). Same schema as config_p; default family auto-detected to gaussian() if omitted.

config_nu

Configuration for \nu_t(a, H_t) (cross-world ICE based on \mu). Same schema as config_p; default family auto-detected to gaussian() if omitted.

omega_nrows

Numeric vector of length nrow(data). Per-row weights \omega(i,t) \ge 0. Rows are aligned with data (no reordering).

f_nrows

Numeric matrix with nrow(data) rows and p columns. Row r contains f(t_r)^\top (the basis evaluated at the decision point of row r). The same basis is used for both \alpha (NDEE) and \beta (NIEE).

Details

Availability handling: When avail_var exists and equals 0, Stage 1 sets the working probabilities to 1 for that row (e.g., \hat{p}_t(1\mid H_t)=1, \hat{p}_t(0\mid H_t)=1, similarly for \hat q_t). This prevents division-by-zero in the estimating equations.

Auto-family rules: If family is omitted in a GLM/GAM config, it defaults to binomial() for config_p and config_q, and to gaussian() for config_eta, config_mu, and config_nu.

Learners:

  • "glm": uses stats::glm().

  • "gam": uses mgcv::gam() (supports s() smooths).

  • "rf": uses randomForest::randomForest().

  • "ranger": uses ranger::ranger().

  • "sl": uses SuperLearner::SuperLearner(). If SL.library is not given, a simple default library is used: c("SL.mean", "SL.glm", "SL.gam").

Value

A list with components:

fit

A list with entries alpha_hat, alpha_se, beta_hat, beta_se, and varcov (the 2p\times 2p sandwich variance for (\alpha^\top,\beta^\top)^\top).

nuisance_models

A list of fitted Stage-1 objects: p, q, eta1, eta0, mu1, mu0, nu1, nu0. (For known/known_a0/known_a1, a small descriptor list is returned.)

See Also

mcee_general for a high-level wrapper that constructs omega_nrows and f_nrows from user-friendly arguments.

Examples

## Not run: 
# Minimal sketch (assuming `df` has columns id, t, A, M, Y, I):
fit <- mcee_helper_2stage_estimation(
    data = df,
    id_var = "id", dp_var = "t", outcome_var = "Y",
    treatment_var = "A", mediator_var = "M", avail_var = "I",
    config_p = list(formula = ~ t + M, method = "glm"), # binomial auto
    config_q = list(formula = ~ t + M + A, method = "glm"), # binomial auto
    config_eta = list(formula = ~t, method = "gam"), # gaussian auto
    config_mu = list(formula = ~ t + s(M), method = "gam"), # gaussian auto
    config_nu = list(formula = ~t, method = "glm"), # gaussian auto
    omega_nrows = rep(1, nrow(df)),
    f_nrows = cbind(1) # marginal (p = 1)
)
fit$fit$alpha_hat
fit$fit$beta_hat

## End(Not run)

MRTAnalysis documentation built on Sept. 9, 2025, 5:41 p.m.