a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators.
1 2 3 4 5 6 7 8 9 10 11 12 |
Y |
A vector of outcome of interest. |
A |
A vector of observed treatment options. |
H |
A matrix of covariates before assigning final treatment, excluding previous treatment variables. |
pis.hat |
Estimated propensity score matrix. |
m.method |
Method for calculating estimated conditional mean. |
mus.reg |
Regression-based conditional mean outcome. |
depth |
Maximum tree depth. |
lambda.pct |
Minimal percent change in purity measure for split. |
minsplit |
Minimal node size. |
lookahead |
Whether or not to look into a further step of splitting to find the best split. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.