contglm: contglm Robust generalized linear models for interpretable...

View source: R/contglm.R

contglmR Documentation

contglm Robust generalized linear models for interpretable causal inference for continuous or ordered treatments. Currently, only supports models for the CATE, the log OR, and the log RR of the form: '1(A>0) * f(W) + A * g(W)' with 'f' and 'g' user-specified.

Description

contglm Robust generalized linear models for interpretable causal inference for continuous or ordered treatments. Currently, only supports models for the CATE, the log OR, and the log RR of the form: '1(A>0) * f(W) + A * g(W)' with 'f' and 'g' user-specified.

Usage

contglm(
  formula_continuous,
  formula_binary = formula_continuous,
  data,
  W,
  A,
  Y,
  estimand = c("CATE", "OR", "RR"),
  learning_method = c("HAL", "SuperLearner", "glm", "glmnet", "gam", "mars", "ranger",
    "xgboost"),
  cross_fit = FALSE,
  sl3_Learner_A = NULL,
  sl3_Learner_Y = NULL,
  formula_Y = as.formula(paste0("~ . + . *", A)),
  formula_HAL_Y = paste0("~ . + h(.,", A, ")"),
  HAL_args_Y = list(smoothness_orders = 1, max_degree = 2, num_knots = c(15, 10, 1)),
  HAL_fit_control = list(parallel = F),
  delta_epsilon = 0.025,
  verbose = TRUE,
  ...
)

Arguments

formula_continuous

An R formula object specifying the continuous component of the parametric form of the continuous treatment CATE. That is (using CATE as example), formula_binary specifies the interaction with 'A' in the model 'E[Y|A=a,W] - E[Y|A=0,W] = 1(A>0) * f(W) + A * g(W)'.

formula_binary

An R formula object specifying the binary component of the parametric form of the continuous treatment estimand. That is (using CATE as example), formula_binary specifies the interaction with '1(A>0)' in the model 'E[Y|A=a,W] - E[Y|A=0,W] = 1(A>0) * f(W) + A * g(W)'. By default, the same as formula_continuous

data

A data.frame or matrix containing the numeric values corresponding with the nodes W, A and Y. Can also be a npglm fit/output object in which case machine-learning fits are reused (see vignette).

W

A character vector of covariates contained in data

A

A character name for the treatment assignment variable contained in data

Y

A character name for the outcome variable contained in data (outcome can be continuous, nonnegative or binary depending on method)

estimand

Estimand/parameter to estimate. Options are: 'CATE': conditional treatment effect using working model 'CATE(a,W) = E[Y|A=a,W] - E[Y|A=0,W] = 1(a>0) * f(W) + a * g(W)' 'OR': conditional odds ratio using working model 'log OR(a,W) = log P(Y=1|A=a,W)/P(Y=0|A=a,W) - log P(Y=1|A=0,W)/P(Y=0|A=0,W) = 1(a>0) * f(W) + a * g(W)' 'RR': conditional relative risk using working model 'log RR(a,W) = log E[Y|A=a,W] - log E[Y|A=0,W] = 1(a>0) * f(W) + a * g(W)'

learning_method

Machine-learning method to use. This is overrided if argument sl3_Learner is provided. Options are: "SuperLearner": A stacked ensemble of all of the below that utilizes cross-validation to adaptivelly choose the best learner. "HAL": Adaptive robust automatic machine-learning using the Highly Adaptive Lasso hal9001 "glm": Fit nuisances with parametric model. "glmnet": Learn using lasso with glmnet. "gam": Learn using generalized additive models with mgcv. "mars": Multivariate adaptive regression splines with earth. "ranger": Robust random-forests with the package Ranger "xgboost": Learn using a default cross-validation tuned xgboost library with max_depths 3 to 7. Note speed can vary depending on learner choice!

cross_fit

Whether to cross-fit the initial estimator. This is always set to FALSE if argument sl3_Learner_A and/or sl3_Learner_Y is provided. learning_method = 'SuperLearner' is always cross-fitted (default). learning_method = 'xgboost' and 'ranger' are always cross-fitted regardless of the value of cross_fit All other learning_methods are only cross-fitted if 'cross_fit=TRUE'. Note, it is not necessary to cross-fit glm, glmnet, gam or mars as long as the dimension of W is not too high. In smaller samples and lower dimensions, it may in fact hurt to cross-fit.

sl3_Learner_A

A sl3 Learner object to use to estimate nuisance functions 'P(A>0|W)' and 'E[A|W]“ with machine-learning. Note, cross_fit is automatically set to FALSE if this argument is provided. If you wish to cross-fit the learner sl3_Learner_A then do: sl3_Learner_A <- Lrnr_cv$new(sl3_Learner_A). Cross-fitting is recommended for all tree-based algorithms like random-forests and gradient-boosting.

sl3_Learner_Y

A sl3 Learner object to use to nonparametrically E[Y|A,W] with machine-learning. Note, cross_fit is automatically set to FALSE if this argument is provided. Cross-fitting is recommended for all tree-based algorithms like random-forests and gradient-boosting.

formula_Y

Only used if 'learning_method By default, 'formula_Y = . + A*.' so that additive learners still model treatment interactions.

formula_HAL_Y

A HAL formula string to be passed to fit_hal). See the 'formula' argument of fit_hal) for syntax and example use.

HAL_args_Y

A list of parameters for the semiparametric Highly Adaptive Lasso estimator for E[Y|A,W]. Should contain the parameters: 1. 'smoothness_orders': Smoothness order for HAL estimator of E[Y|A,W] (see fit_hal) smoothness_order_Y0W = 1 is piece-wise linear. smoothness_order_Y0W = 0 is piece-wise constant. 2. 'max_degree': Max interaction degree for HAL estimator of E[Y|A,W] (see fit_hal) 3. 'num_knots': A vector of the number of knots by interaction degree for HAL estimator of E[Y|A=0,W] (see fit_hal). Used to generate spline basis functions.

HAL_fit_control

See the argument 'fit_control' of (see fit_hal).

delta_epsilon

Step size of iterative targeted maximum likelihood estimator. 'delta_epsilon = 1 ' leads to large step sizes and fast convergence. 'delta_epsilon = 0.01' leads to slower convergence but possibly better performance. Useful to set to a large value in high dimensions.

verbose

Passed to tmle3 routines. Prints additional information if TRUE.

...

Not used


Larsvanderlaan/causalGLM documentation built on April 14, 2022, 12:51 a.m.