dcee: Distal Causal Excursion Effect (DCEE) Estimation

View source: R/dcee.R

dceeR Documentation

Distal Causal Excursion Effect (DCEE) Estimation

Description

Fits distal causal excursion effects in micro-randomized trials using a **two-stage** estimator: (i) learn nuisance outcome regressions \mu_a(H_t) with a specified learner (parametric/ML), optionally with cross-fitting; (ii) solve estimating equations for the distal excursion effect parameters (\beta).

This wrapper standardizes inputs and delegates computation to [dcee_helper_2stage_estimation()].

Usage

dcee(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library",
    "set_to_zero"),
  cross_fit = FALSE,
  cf_fold = 10,
  weighting_function = NULL,
  verbose = TRUE,
  ...
)

Arguments

data

A data.frame in long format.

id

Character scalar: column name for subject identifier.

outcome

Character scalar: column name for proximal/distal outcome.

treatment

Character scalar: column name for binary treatment {0,1}.

rand_prob

Character scalar: column name for randomization probability giving P(A_t=1\mid H_t) (must lie in (0,1)).

moderator_formula

RHS-only formula of moderators of the excursion effect (e.g., '~ 1', '~ Z', or '~ Z1 + Z2').

control_formula

RHS-only formula of covariates for learning nuisance outcome regressions. When 'control_reg_method = "gam"', 's(x)' terms are allowed (e.g., '~ x1 + s(x2)'). For SuperLearner methods, variables are extracted from this formula to build the design matrix 'X'.

availability

Optional character scalar: column name for availability indicator (0/1). If 'NULL', availability is taken as 1 for all rows.

control_reg_method

One of '"gam"', '"lm"', '"rf"', '"ranger"', '"sl"', '"sl.user-specified-library"', '"set_to_zero"'. See Details.

cross_fit

Logical; if 'TRUE', perform K-fold cross-fitting by subject id.

cf_fold

Integer; number of folds if 'cross_fit = TRUE' (default 10).

weighting_function

Either a single numeric constant applied to all rows, or a character column name in 'data' giving decision-point weights \omega_t.

verbose

Logical; print minimal preprocessing messages (default 'TRUE').

...

Additional arguments passed through to the chosen learner (e.g., 'num.trees', 'mtry' for random forests; 'sl.library' when 'control_reg_method = "sl.user-specified-library"').

Details

**Learners.** - 'gam' uses mgcv and supports 's(.)' terms in 'control_formula'. - 'lm' uses base stats::lm. - 'rf' uses randomForest; 'ranger' uses ranger. - 'sl' / 'sl.user-specified-library' use SuperLearner. For the former, 'sl.library = c("SL.mean", "SL.glm", "SL.earth")' are used. For the latter, please provide 'sl.library = c("SL.mean", ...)' via '...'.

**Notes.** - Treatment must be coded 0/1; 'rand_prob' must lie strictly in (0,1). - 'control_formula = ~ 1' is only valid with 'control_reg_method = "set_to_zero"'.

Value

An object of class '"dcee_fit"' with components:

call

The matched call to dcee().

fit

A list returned by the two–stage helper with elements:

beta_hat

Named numeric vector of distal causal excursion effect estimates \beta. Names are "Intercept" and the moderator names (if any) from moderator_formula.

beta_se

Named numeric vector of standard errors for beta_hat (same order/names).

beta_varcov

Variance–covariance matrix of beta_hat (square matrix; row/column names match names(beta_hat)).

conf_int

Matrix of large-sample (normal) Wald 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %".

conf_int_tquantile

Matrix of small-sample (t-quantile) 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %"; degrees of freedom are provided in $df of the "dcee_fit" object.

regfit_a0

Stage-1 nuisance regression fit for \mu_0(H_t) (outcome model among A=0), or NULL when control_reg_method = "set_to_zero". Note: when cross_fit = TRUE, this is the learner object from the last fold and is provided for inspection only (do not use for out-of-fold prediction).

regfit_a1

Stage-1 nuisance regression fit for \mu_1(H_t) (outcome model among A=1); same caveats as regfit_a0 regarding cross_fit.

df

Small-sample degrees of freedom used for t-based intervals: number of unique subjects minus length(fit$beta_hat).

Examples

data(data_distal_continuous, package = "MRTAnalysis")

## Fast example: marginal effect with linear nuisance (CRAN-friendly)
fit_lm <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1, # marginal (no moderators)
    control_formula = ~X, # simple linear nuisance
    availability = "avail",
    control_reg_method = "lm",
    cross_fit = FALSE
)
summary(fit_lm)
summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info

## Moderated effect with GAM nuisance (allows smooth terms); may be slower

fit_gam <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~Z, # test moderation by Z
    control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam
    availability = "avail",
    control_reg_method = "gam",
    cross_fit = TRUE, cf_fold = 5
)
summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient
summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info


## Optional: SuperLearner (runs only if installed)

library(SuperLearner)
fit_sl <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1,
    control_formula = ~ X + Z,
    availability = "avail",
    control_reg_method = "sl",
    cross_fit = FALSE
)
summary(fit_sl)


MRTAnalysis documentation built on Sept. 9, 2025, 5:41 p.m.