Qlearning: Q-learning
In DTRlearn: Learning Algorithms for Dynamic Treatment Regimes

Description Usage Arguments Value Author(s) References See Also Examples

This funciton impletment multiple stage Q-learning.

1	Qlearning(X,AA,RR,K,pentype="lasso",m=4)

`X`	is either a matrix shared among all stages; or a list of feature matrices, where feature matrices from different stages can have different dimensions.
`AA`	a list of K, each element `A[[i]]` is the vector of treatment assignments for stage i.
`RR`	a list of K, each element `R[[i]]` is the outcome vector for stage i.
`K`	number of stages
`pentype`	the type of regression implemented in Q-learning, the default is 'lasso', another choice is 'LSE'
`m`	number of folds of cross validation for in `cv.glmnet` in regression model when `'lasso'` is selected

it returns a list of K models with class 'qlearn'.

Ying Liu

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).

Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262.

Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.

Qlearning_Single

n_cluster=10
pinfo=10
pnoise=20
example2=make_2classification(n_cluster,pinfo,pnoise,200)
test=make_2classification(n_cluster,pinfo,pnoise,200,example2$centroids)
pi=list()
pi[[2]]=pi[[1]]=rep(1,200)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)