View source: R/rctglm_with_prognosticscore.R
rctglm_with_prognosticscore | R Documentation |
The procedure uses fit_best_learner to fit a prognostic model to historical data and uses the model to produce counterfactual predictions as a prognostic score that is then adjusted for as a covariate in the rctglm procedure. See Powering RCTs for marginal effects with GLMs using prognostic score adjustment by Højbjerre-Frandsen et. al (2025) for more details.
rctglm_with_prognosticscore(
formula,
exposure_indicator,
exposure_prob,
data,
family = gaussian,
estimand_fun = "ate",
estimand_fun_deriv0 = NULL,
estimand_fun_deriv1 = NULL,
cv_variance = FALSE,
cv_variance_folds = 10,
...,
data_hist,
prog_formula = NULL,
cv_prog_folds = 5,
learners = default_learners(),
verbose = options::opt("verbose")
)
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in the glm documentation. |
exposure_indicator |
(name of) the binary variable in |
exposure_prob |
a |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. |
family |
a description of the error distribution and link
function to be used in the model. For |
estimand_fun |
a |
estimand_fun_deriv0 |
a |
estimand_fun_deriv1 |
a |
cv_variance |
a |
cv_variance_folds |
a |
... |
Additional arguments passed to |
data_hist |
a |
prog_formula |
an object of class "formula" (or one that can be coerced to that class):
a symbolic description of the prognostic model to be fitted to |
cv_prog_folds |
a |
learners |
a |
verbose |
|
Prognostic covariate adjustment involves training a prognostic model (using
fit_best_learner) on historical data (data_hist
) to predict the response
in that data.
Assuming that the
historical data is representative of the comparator group in a “new” data
set (group 0 of the binary exposure_indicator
in data
), we can use the
prognostic model to predict the counterfactual
outcome of all observations (including the ones in the comparator group
for which the prediction of counterfactual outcome coincides with a
prediction of actual outcome).
This prediction, which is called the prognostic score, is then used as an
adjustment covariate in the GLM (the prognostic score is added to formula
before calling rctglm with the modified formula).
See much more details in the reference in the description.
rctglm_with_prognosticscore
returns an object of class rctglm_prog
,
which inherits from rctglm.
An rctglm_prog
object is a list with the same components as an rctglm object
(see the Value
section of rctglm for a breakdown of the structure),
but with an additional list element of:
prognostic_info
: List with information about the fitted prognostic model
on historical data. It has components:
formula
: The formula
with symbolic description of how the response
is modelled as function of covariates in the models
model_fit
: A trained workflow
- the result of fit_best_learner
learners
: A list
of learners used for the discrete super learner
cv_folds
: The amount of folds used for cross validation
data
: The historical data used for cross validation when fitting and
testing models
Method to extract information of the prognostic model in prog. Function
used to fit the prognostic model is fit_best_learner()
.
See rctglm()
for the function and class this inherits from.
# Generate some data
n <- 100
b0 <- 1
b1 <- 1.5
b2 <- 2
W1 <- runif(n, min = -2, max = 2)
exp_prob <- .5
dat_treat <- glm_data(
Y ~ b0+b1*abs(sin(W1))+b2*A,
W1 = W1,
A = rbinom (n, 1, exp_prob)
)
dat_notreat <- glm_data(
Y ~ b0+b1*abs(sin(W1)),
W1 = W1
)
learners <- list(
mars = list(
model = parsnip::set_engine(
parsnip::mars(
mode = "regression", prod_degree = 3
),
"earth"
)
),
lm = list(
model = parsnip::set_engine(
parsnip::linear_reg(),
"lm"
)
)
)
ate <- rctglm_with_prognosticscore(
formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_treat,
family = gaussian(),
estimand_fun = "ate",
data_hist = dat_notreat,
learners = learners)
# Pull information on estimand
estimand(ate)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.