make_2classification: Data Simulation for 2 stages
In DTRlearn: Learning Algorithms for Dynamic Treatment Regimes

Description Usage Arguments Value See Also Examples

It generates simulated dataset to test multiple stage learning algorithms. The outcomes are generated based on a pattern mixture model using a latent variable with 4 categories. For each category, X has a multivariate normal distribution and each category is assigned a vector of optimal treatments V. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions.

Then we assign optimal treatments y=(A_1^*, A_2^*) from (1,1),(1,-1),(-1,-1),(-1,1) to each latent category. The observed treatment assignments A=(A_1,A_2) are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: R_1=0, R_2= A'y+N(0,1). Therefore the mean optimal outcome $R_1+R_2$ is $2$ when the treatment assignments are equal to the optimal treatment for a given a latent group in both stages.

1	make_2classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

`n_cluster`	number of clusters.
`pinfo`	number of informative variables, dimensions of the centroids related to the latent class of the sample.
`pnoise`	number of noise variable.
`n_sample`	sample size
`centroids`	For a training set, do not assign centroids, the centroids are generated randomly by the function. For a testing set, ones want to assign the same set of centroids as the training set. It is a matrix of dimension n_cluster by p.

`X`	Feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generate centroids.
`A`	List of 2, `A[[1]]` and `A[[2]]` are the treatment assignment vectors for stage 1 and 2.
`y`	List of 2, `y[[1]]` and `y[[2]]` are the true optimal treatment vectors for stage 1 and 2
`R`	List of 2, `R[[1]]` is vector of `n_sample` zeros (this is the simplified case where the intermediate outcomes are 0), `R[[2]]` is the final outcomes vector
`centroids`	centers of each cluster, are from pinfo dimensional multivariate normal distribution.

make_classification

n_cluster=5
pinfo=10
pnoise=10
n_sample=50
example2=make_2classification(n_cluster,pinfo,pnoise,n_sample)
pi=list()
pi[[2]]=pi[[1]]=rep(1,n_sample)
set.seed(3)
modelO=Olearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelP=Plearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)