library(polle)
This vignette is a guide to policy_eval()
and some of the associated S3 methods.
The purpose of policy_eval
is to estimate (evaluate) the value of a user-defined policy or a policy learning algorithm.
For details on the methodology, see the associated paper [@nordland2023policy].
We consider a fixed two-stage problem as a general setup and simulate data using sim_two_stage()
and create a policy_data
object using policy_data()
:
d <- sim_two_stage(n = 2e3, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B", "BB"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd
User-defined policies are created using policy_def()
. In this case we define a simple static policy always
selecting action '1'
:
p1 <- policy_def(policy_functions = '1', reuse = TRUE, name = "A=1")
As we want to apply the same policy function at both stages we set reuse = TRUE
.
policy_eval()
implements three types of policy evaluations: Inverse
probability weighting estimation, outcome regression estimation, and doubly
robust (DR) estimation. As doubly robust estimation is a combination of the two
other types, we focus on this approach. For details on the implementation see
Algorithm 1 in [@nordland2023policy].
(pe1 <- policy_eval(policy_data = pd, policy = p1, type = "dr"))
policy_eval()
returns an object of type policy_eval
which prints like a lava::estimate
object. The policy value estimate and variance are available via coef()
and vcov()
:
coef(pe1) vcov(pe1)
policy_eval
objectsThe policy_eval
object behaves like an lava::estimate
object, which can also be directly accessed using estimate()
.
estimate
objects makes it easy to work with estimates with an iid decomposition given by the influence curve/function, see the estimate vignette.
The influence curve is available via IC()
:
IC(pe1) |> head()
Merging estimate
objects allow the user to get inference for transformations of the estimates via the Delta method.
Here we get inference for the average treatment effect, both as a difference and as a ratio:
p0 <- policy_def(policy_functions = 0, reuse = TRUE, name = "A=0") pe0 <- policy_eval(policy_data = pd, policy = p0, type = "dr") (est <- merge(pe0, pe1)) estimate(est, function(x) x[2]-x[1], labels="ATE-difference") estimate(est, function(x) x[2]/x[1], labels="ATE-ratio")
So far we have relied on the default generalized linear models for the nuisance g-models and Q-models.
As default, a single g-model trained across all stages using the state/Markov type history, see the
policy_data
vignette. Use get_g_functions()
to get access to the
fitted model:
(gf <- get_g_functions(pe1))
The g-functions can be used as input to a new policy evaluation:
policy_eval(policy_data = pd, g_functions = gf, policy = p0, type = "dr")
or we can get the associated predicted values:
predict(gf, new_policy_data = pd) |> head(6)
Similarly, we can inspect the Q-functions using get_q_functions()
:
get_q_functions(pe1)
Note that a model is trained for each stage. Again, we can predict from the Q-models using predict()
.
Usually, we want to specify the nuisance models ourselves using the g_models
and q_models
arguments:
pe1 <- policy_eval(pd, policy = p1, g_models = list( g_sl(formula = ~ BB + L_1, SL.library = c("SL.glm", "SL.ranger")), g_sl(formula = ~ BB + L_1 + C_2, SL.library = c("SL.glm", "SL.ranger")) ), g_full_history = TRUE, q_models = list( q_glm(formula = ~ A * (B + C_1)), # including action interactions q_glm(formula = ~ A * (B + C_1 + C_2)) # including action interactions ), q_full_history = TRUE)
Here we train a super learner g-model for each stage using the full available history and a generalized linear model for the Q-models. The formula
argument is used to construct the model frame passed to the model for training (and prediction).
The valid formula terms depending on g_full_history
and q_full_history
are available via get_history_names()
:
get_history_names(pd) # state/Markov history get_history_names(pd, stage = 1) # full history get_history_names(pd, stage = 2) # full history
Remember that the action variable at the current stage is always named A
. Some models like glm
require interactions to be specified via the model frame. Thus, for some models, it is important to include action interaction terms for the Q-models.
The value of a learned policy is an important performance measure, and policy_eval()
allow for
direct evaluation of a given policy learning algorithm. For details, see Algorithm 4 in [@nordland2023policy].
In polle
, policy learning algorithms are specified using policy_learn()
, see the associated vignette. These functions can be directly evaluated in policy_eval()
:
policy_eval(pd, policy_learn = policy_learn(type = "ql"))
In the above example we evaluate the policy estimated via Q-learning. Alternatively,
we can first learn the policy and then pass it to policy_eval()
:
p_ql <- policy_learn(type = "ql")(pd, q_models = q_glm()) policy_eval(pd, policy = get_policy(p_ql))
A key feature of policy_eval()
is that it allows for easy cross-fitting of the nuisance models as well the learned policy.
Here we specify two-fold cross-fitting via the M
argument:
pe_cf <- policy_eval(pd, policy_learn = policy_learn(type = "ql"), M = 2)
Specifically, both the nuisance models and the optimal policy are fitted on each training fold. Subsequently, the doubly robust value score is calculated on the validation folds.
The policy_eval
object now consists of a list of policy_eval
objects associated with each fold:
pe_cf$folds$fold_1 |> head() pe_cf$cross_fits$fold_1
In order to save memory, particularly when cross-fitting, it is possible not to save the nuisance models via the
save_g_functions
and save_q_functions
arguments.
future.apply
It is easy to parallelize the cross-fitting procedure via the future.apply
package:
library(future.apply) plan("multisession") # local parallel procession library("progressr") # progress bar handlers(global = TRUE) policy_eval(pd, policy_learn = policy_learn(type = "ql"), q_models = q_rf(), M = 20) plan("sequential") # resetting to sequential processing
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.