knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(postcard) library(magrittr) withr::local_seed(1395878) withr::local_options(list(postcard.verbose = 0))
For introductory examples on how to use the package, see vignette("postcard")
.
In this vignette, we will explain how to alter the default behavior of the model fitting functions rctglm()
and rctglm_with_prognosticscore()
.
As in vignette("postcard")
, we simulate data using the glm_data()
function from the package.
n <- 1000 b0 <- 1 b1 <- 3 b2 <- 2 dat_gaus <- glm_data( Y ~ b0+b1*sin(W)^2+b2*A, W = runif(n, min = -2, max = 2), A = rbinom(n, 1, prob = 1/2) ) dat_gaus_hist <- glm_data( Y ~ b0+b1*sin(W)^2, W = runif(n, min = -2, max = 2) ) dat_pois <- glm_data( Y ~ b0+b1*sin(W)^2+b2*A, W = runif(n, min = -2, max = 2), A = rbinom(n, 1, 1/2), family = poisson(link = "log") )
See package level options documentation in options()
, giving information on how to change package behavior through options and environmental variables. Only option is verbose
, which controls the amount of information printed to the console.
As a default, verbose = 2
, meaning various information printed throughout the algorithm. Change to verbose = 1
for a little less information or verbose = 0
for no information.
Below we showcase the information that is printed with different specifications of verbosity.
# Default amount of printing ate <- rctglm( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, verbose = 2) ate_prog <- rctglm_with_prognosticscore( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, data_hist = dat_gaus_hist, verbose = 2) # At little less printing ate <- rctglm( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, verbose = 1) ate_prog <- rctglm_with_prognosticscore( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, data_hist = dat_gaus_hist, verbose = 1) # No printing ate <- rctglm( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, verbose = 0) ate_prog <- rctglm_with_prognosticscore( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, data_hist = dat_gaus_hist, verbose = 0)
Verbosity is suppressed in the rest of the vignette by setting option
postcard.verbose
to0
.
The default estimand_fun
in rctglm()
and rctglm_with_prognosticscore()
is the average treatment effect (ATE).
However, it's possible to specify any estimand by giving any function with 2 named arguments, psi0
and psi1
. Note that in addition to estimand_fun
, the functions also take arguments estimand_fun_deriv0
and estimand_fun_deriv1
, which is the derivative with respect to psi0
and psi1
, respectively. As a default, these are NULL
, which means symbolic differentiation is performed on the estimand_fun
to derive them automatically.
Note that when
verbose > 0
, information is printed to the console about the results of the symbolic differentiation. We run the below code withverbose = 1
though otherwise muted in this vignette to showcase this.
Built in is the ATE and rate ratio, which can be specified with character strings. As is apparent from the documentation of rctglm()
and rctglm_with_prognosticscore()
, the default of estimand_fun
is "ate"
, and similarly the user can specify estimand_fun = "rate_ratio"
to use the estimand function psi1 / psi0
as seen below:
rate_ratio <- rctglm( formula = Y ~ A + W, exposure_indicator = A, exposure_prob = 1/2, data = dat_pois, family = "poisson", estimand_fun = "rate_ratio", verbose = 1) rate_ratio$estimand_funs
Below is an example showing the specification of a custom defined function with arguments psi0
and psi1
.
nonsense_estimand_fun <- function(psi1, psi0) { psi1 / sqrt(psi0) * 2 - 1 } nonsense_estimand <- rctglm( formula = Y ~ A * W, exposure_indicator = A, exposure_prob = 1/2, data = dat_pois, family = poisson(), estimand_fun = nonsense_estimand_fun, verbose = 1) nonsense_estimand$estimand_funs
The variance is estimated as the variance of the influence function of the marginal effect. During the calculation of this function, counterfactual predictions are made for all observations, using a GLM to predict their outcome in case they were in exposure group 0 and 1, respectively.
cv_variance
is an argument in rctglm()
and rctglm_with_prognosticscore()
that enables obtaining these counterfactual predictions as out-of-sample (OOS) prediction by using cross validation.
The rctglm_with_prognosticscore()
uses the function fit_best_learner()
to fit a prognostic model to the historical data, data_hist
. Thereafter, the model is used to predict prognostic scores for all observations in data
before using these scores as a covariate when performing plug-in estimation in a GLM using rctglm
.
The behavior of fit_best_learner()
and subsequently fitting the prognostic model on data_hist
in rctglm_with_prognosticscore()
is to fit a discrete super learner (discrete to avoid overfitting) by finding the model with the lowest RMSE among a list of models. The algorithm uses a default of 5 folds for cross validation (cv_prog_folds
) and if no formula is given for the prognostic model (prog_formula
), the function attempts to model a response with the same name as given in the formula
using an intercept and main effect from all variables in data_hist
.
fit_best_learner
has a list of default models to use for fitting the discrete super learner, which can be seen in the section below. However, it's easy for the user to specify a list of other learners to train the discrete super learner. The package utilises the framework of tidymodels, and it can be seen below how the list of models can look like.
Below we show the code of the unexported default_learners
function, which creates a list of default learners that are used in fit_best_learner()
and rctglm_with_prognosticscore()
.
The body of the function thus represents a valid way of specifying the
learners
argument.
default_learners
A listing of models is available at the tidymodels website, and the user can specify a list of any of those models as the learners
argument.
Below is an example of fitting the prognostic model as a discrete super learner with the best RMSE among a random forest and linear support vector machines model.
learners <- list( rf = list( model = parsnip::rand_forest( mode = "regression", trees = 500, min_n = parsnip::tune("min_n") ) %>% parsnip::set_engine("ranger"), grid = data.frame( min_n = 1:10 ) ), svm.linear = list( model = parsnip::svm_linear( mode = "regression", cost = parsnip::tune("cost"), margin = parsnip::tune("margin")) %>% parsnip::set_engine("LiblineaR"), grid = data.frame( cost = 1:5, margin = seq(0.1, 0.5, 0.1) ) ) ) model_own_learners <- rctglm_with_prognosticscore( formula = Y ~ A * W, exposure_indicator = A, exposure_prob = 1/2, data = dat_gaus, data_hist = dat_gaus_hist, learners = learners)
It's possible to view information regarding the fit of the prognostic model in the rctglm_prog
class object that rctglm_with_prognosticscore()
returns by looking at the list element prognostic_info
. A shorthand way of doing this is using the method prog()
.
Inside this list element are elements
formula
: The formula used as preproc
when fitting models in fit_best_learner()
model_fit
: The result of fit_best_learner()
learners
: The list of learners usedcv_folds
: The number of folds used for cross validationdata
: The data given as data_hist
, which the prognostic model is fitted uponNote that we change the value of data to only show the first rows to not take up too much space when printing in the vignette.
prog_info <- prog(model_own_learners) prog_info$data <- head(prog_info$data) prog_info
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.