ql: Q-learning for Estimating Optimal DTRs

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function implements Q-learning for estimating general K-stage DTRs. Lasso penalty can be applied for variable selection at each stage.

Usage

1
ql(H, AA, RR, K, pi='estimated', lasso=TRUE, m=4)

Arguments

H

subject history information before treatment for all subjects at the K stages. It can be a vector or a matrix when only baseline information is used in estimating the DTR; otherwise, it would be a list of length K. Please standardize all the variables in H to have mean 0 and standard deviation 1 before using H as the input. See details for how to construct H.

AA

observed treatment assignments for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages.

RR

observed reward outcomes for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages.

K

number of stages

pi

treatment assignment probabilities of the observed treatments for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages. It can be a user specified input if the treatment assignment probabilities are known. The default is pi="estimated", that is we estimate the treatment assignment probabilities based on lasso-penalized logistic regressions with H_k being the predictors at each stage k.

lasso

specifies whether to add lasso penalty at each stage when fitting the model. The default is lasso=TRUE.

m

number of folds in the m-fold cross validation. It is used when res.lasso=T is specified. The default is m=4.

Details

A patient's history information prior to the treatment at stage k can be constructed recursively as H_k = (H_{k-1}, A_{k-1}, R_{k-1}, X_k) with H_1=X_1, where X_k is subject-specific variables collected at stage k just prior to the treatment, A_k is the treatment at stage k, and R_k is the outcome observed post the treatment at stage k. Higher order or interaction terms can also be easily incorporated in H_k, e.g., H_k = (H_{k-1}, A_{k-1},R_{k-1}, X_k, H_{k-1}A_{k-1}, R_{k-1}A_{k-1}, X_kA_{k-1}).

Value

A list of results is returned as an object. It contains the following attributes:

stage1

a list of stage 1 results, ...

stageK

a list of stage K results

valuefun

overall empirical value function under the estimated DTR

benefit

overall empirical benefit function under the estimated DTR

pi

treatment assignment probabilities of the assigned treatments for each subject at the K stages. If pi='estimated' is specified as input, the estimated treatment assignment probabilities from lasso-penalized logistic regressions will be returned.

In each stage's result, a list is returned which consists of

co

the estimated coefficients of (1, H, A, H*A), the variables in the model at this stage

treatment

the estimated optimal treatment at this stage for each subject in the sample. If no tailoring variables are selected under lasso penalty, treatment will be assigned randomly with equal probability.

Q

the estimated optimal outcome increment from this stage to the end (the estimated optimal Q-function at this stage) for each subject in the sample

Author(s)

Yuan Chen, Ying Liu, Donglin Zeng, Yuanjia Wang

Maintainer: Yuan Chen <yc3281@columbia.edu><irene.yuan.chen@gmail.com>

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).

Qian, M., & Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics, 39(2), 1180.

See Also

predict.ql, sim_Kstage, owl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# simulate 2-stage training and test sets
n_train = 100
n_test = 500
n_cluster = 10
pinfo = 10
pnoise = 20

train = sim_Kstage(n_train, n_cluster, pinfo, pnoise, K=2)
H1_train = scale(train$X)
H2_train = scale(cbind(H1_train, train$A[[1]], H1_train * train$A[[1]]))
pi_train = list(rep(0.5, n_train), rep(0.5, n_train))

test = sim_Kstage(n_test, n_cluster, pinfo, pnoise, train$centroids, K=2)
H1_test = scale(test$X)
H2_test = scale(cbind(H1_test, test$A[[1]], H1_test * train$A[[1]]))
pi_test = list(rep(0.5, n_test), rep(0.5, n_test))

ql_train = ql(H=list(H1_train, H2_train), AA=train$A, RR=train$R, K=2, pi=pi_train, m=3)

ql_test = predict(ql_train, H=list(H1_test, H2_test), AA=test$A, RR=test$R, K=2, pi=pi_test)

DTRlearn2 documentation built on April 22, 2020, 5:07 p.m.

Related to ql in DTRlearn2...