Description Usage Arguments Details Value Author(s) References See Also Examples

This function implements Q-learning for estimating general K-stage DTRs. Lasso penalty can be applied for variable selection at each stage.

1 |

`H` |
subject history information before treatment for all subjects at the |

`AA` |
observed treatment assignments for all subjects at the |

`RR` |
observed reward outcomes for all subjects at the |

`K` |
number of stages |

`pi` |
treatment assignment probabilities of the observed treatments for all subjects at the K stages. It is a vector if |

`lasso` |
specifies whether to add lasso penalty at each stage when fitting the model. The default is |

`m` |
number of folds in the |

A patient's history information prior to the treatment at stage k can be constructed recursively as *H_k = (H_{k-1}, A_{k-1}, R_{k-1}, X_k)* with *H_1=X_1*, where *X_k* is subject-specific variables collected at stage k just prior to the treatment, *A_k* is the treatment at stage *k*, and *R_k* is the outcome observed post the treatment at stage *k*. Higher order or interaction terms can also be easily incorporated in *H_k*, e.g., *H_k = (H_{k-1}, A_{k-1},R_{k-1}, X_k, H_{k-1}A_{k-1}, R_{k-1}A_{k-1}, X_kA_{k-1})*.

A list of results is returned as an object. It contains the following attributes:

`stage1` |
a list of stage 1 results, ... |

`stageK` |
a list of stage K results |

`valuefun ` |
overall empirical value function under the estimated DTR |

`benefit ` |
overall empirical benefit function under the estimated DTR |

`pi` |
treatment assignment probabilities of the assigned treatments for each subject at the K stages. If |

In each stage's result, a list is returned which consists of

`co` |
the estimated coefficients of |

`treatment` |
the estimated optimal treatment at this stage for each subject in the sample. If no tailoring variables are selected under lasso penalty, treatment will be assigned randomly with equal probability. |

`Q` |
the estimated optimal outcome increment from this stage to the end (the estimated optimal Q-function at this stage) for each subject in the sample |

Yuan Chen, Ying Liu, Donglin Zeng, Yuanjia Wang

Maintainer: Yuan Chen <yc3281@columbia.edu><irene.yuan.chen@gmail.com>

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).

Qian, M., & Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics, 39(2), 1180.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
# simulate 2-stage training and test sets
n_train = 100
n_test = 500
n_cluster = 10
pinfo = 10
pnoise = 20
train = sim_Kstage(n_train, n_cluster, pinfo, pnoise, K=2)
H1_train = scale(train$X)
H2_train = scale(cbind(H1_train, train$A[[1]], H1_train * train$A[[1]]))
pi_train = list(rep(0.5, n_train), rep(0.5, n_train))
test = sim_Kstage(n_test, n_cluster, pinfo, pnoise, train$centroids, K=2)
H1_test = scale(test$X)
H2_test = scale(cbind(H1_test, test$A[[1]], H1_test * train$A[[1]]))
pi_test = list(rep(0.5, n_test), rep(0.5, n_test))
ql_train = ql(H=list(H1_train, H2_train), AA=train$A, RR=train$R, K=2, pi=pi_train, m=3)
ql_test = predict(ql_train, H=list(H1_test, H2_test), AA=test$A, RR=test$R, K=2, pi=pi_test)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.