policy_data
In polle: Policy Learning

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(polle)

This vignette is a guide to policy_data(). As the name suggests, the function creates a policy_data object with a specific data structure making it easy to use in combination with policy_def(), policy_learn(), and policy_eval(). The vignette is also a guide to some of the associated S3 functions which transform or access parts of the data, see ?policy_data and methods(class="policy_data").

We will start by looking at a simple single-stage example, then consider a fixed two-stage example with varying actions sets and data in wide format, and finally we will look at an example with a stochastic number of stages and data in long format.

Single-stage: wide data

Consider a simple single-stage problem with covariates/state variables $(Z, L, B)$, binary action variable $A$, and utility outcome $U$. We use sim_single_stage() to simulate data:

(d <- sim_single_stage(n = 5e2, seed=1)) |> head()

We give instructions to policy_data() which variables define the action, the state covariates, and the utility variable:

pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U")
pd

In the single-stage case the history $H$ is just $(B, Z, L)$. We access the history and actions using get_history():

get_history(pd)$H |> head()
get_history(pd)$A |> head()

Similarly, we access the utility outcomes $U$:

get_utility(pd) |> head()

rm(list = ls())

Two-stage: wide data

Consider a two-stage problem with observations $O = (B, BB, L_{1}, C_{1}, U_{1}, A_1, L_2, C_{2}, U_{2}, A_2, U_{3})$. Following the general notation introduced in Section 3.1 of [@nordland2023policy], $(B,BB)$ are the baseline covariates, $S_k =(L_{k, C_{k}})$ are the state covariates at stage k, $A_{k}$ is the action at stage k, and $U_k$ is the reward at stage $k$. The utility is the sum of the rewards $U=U_{1}+U_{2}+U_{3}$.

We use sim_two_stage_multi_actions() to simulate data:

d <- sim_two_stage_multi_actions(n=2e3, seed = 1)
colnames(d)

Note that the data is in wide format. The data is transformed using policy_data() with instructions on which variables define the actions, baseline covariates, state covariates, and the rewards:

pd <- policy_data(d,
                  action = c("A_1", "A_2"),
                  baseline = c("B", "BB"),
                  covariates = list(L = c("L_1", "L_2"),
                                    C = c("C_1", "C_2")),
                  utility = c("U_1", "U_2", "U_3"))
pd

The length of the character vector action determines the number of stages K (in this case 2). If the number of stages is 2 or more, the covariates argument must be a named list. Each element must be a character vector with length equal to the number of stages. If a covariate is not available at a given stage we insert an NA value, e.g., L = c(NA, "L_2").

Finally, the utility argument must be a single character string (the utility is observed after stage K) or a character vector of length K+1 with the names of the rewards.

In this example, the observed action sets vary for each stage. get_action_set() returns the global action set and get_stage_action_sets() returns the action set for each stage:

get_action_set(pd)
get_stage_action_sets(pd)

The full histories $H_1 = (B, BB, L_{1}, C_{1})$ and $H_2=(B, BB, L_{1}, C_{1}, A_{1}, L_{2}, C_{2})$ are available using get_history() and full_history = TRUE:

get_history(pd, stage = 1, full_history = TRUE)$H |> head()
get_history(pd, stage = 2, full_history = TRUE)$H |> head()

Similarly, we access the associated actions at each stage via list element A:

get_history(pd, stage = 1, full_history = TRUE)$A |> head()
get_history(pd, stage = 2, full_history = TRUE)$A |> head()

Alternatively, the state/Markov type history and actions are available using full_history = FALSE:

get_history(pd, full_history = FALSE)$H |> head()
get_history(pd, full_history = FALSE)$A |> head()

Note that policy_data() overrides the action variable names to A_1, A_2, ... in the full history case and A in the state/Markov history case.

As in the single-stage case we access the utility, i.e. the sum of the rewards, using get_utility():

get_utility(pd) |> head()

Multi-stage: long data

In this example we illustrate how polle handles decision processes with a stochastic number of stages, see Section 3.5 in [@nordland2023policy]. The data is simulated using sim_multi_stage(). Detailed information on the simulation is available in ?sim_multi_stage. We simulate data from 2000 iid subjects:

d <- sim_multi_stage(2e3, seed = 1)

As described, the stage data is in long format:

d$stage_data[, -(9:10)] |> head()

The id variable is important for identifying which rows belong to each subjects. The baseline data uses the same id variable:

d$baseline_data |> head()

The data is transformed using policy_data() with type = "long". The names of the id, stage, event, action, and utility variables must be specified. The event variable, inspired by the event variable in survival::Surv(), is 0 whenever an action occur and 1 for a terminal event.

pd <- policy_data(data = d$stage_data,
                  baseline_data = d$baseline_data,
                  type = "long",
                  id = "id",
                  stage = "stage",
                  event = "event",
                  action = "A",
                  utility = "U")
pd