make_2classification: Data Simulation for 2 stages

Description Usage Arguments Value See Also Examples

View source: R/make_classification.R

Description

It generates simulated dataset to test multiple stage learning algorithms. The outcomes are generated based on a pattern mixture model using a latent variable with 4 categories. For each category, X has a multivariate normal distribution and each category is assigned a vector of optimal treatments V. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions.

Then we assign optimal treatments y=(A_1^*, A_2^*) from (1,1),(1,-1),(-1,-1),(-1,1) to each latent category. The observed treatment assignments A=(A_1,A_2) are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: R_1=0, R_2= A'y+N(0,1). Therefore the mean optimal outcome $R_1+R_2$ is $2$ when the treatment assignments are equal to the optimal treatment for a given a latent group in both stages.

Usage

1
make_2classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

Arguments

n_cluster

number of clusters.

pinfo

number of informative variables, dimensions of the centroids related to the latent class of the sample.

pnoise

number of noise variable.

n_sample

sample size

centroids

For a training set, do not assign centroids, the centroids are generated randomly by the function. For a testing set, ones want to assign the same set of centroids as the training set. It is a matrix of dimension n_cluster by p.

Value

X

Feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generate centroids.

A

List of 2, A[[1]] and A[[2]] are the treatment assignment vectors for stage 1 and 2.

y

List of 2, y[[1]] and y[[2]] are the true optimal treatment vectors for stage 1 and 2

R

List of 2, R[[1]] is vector of n_sample zeros (this is the simplified case where the intermediate outcomes are 0), R[[2]] is the final outcomes vector

centroids

centers of each cluster, are from pinfo dimensional multivariate normal distribution.

See Also

make_classification

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
n_cluster=5
pinfo=10
pnoise=10
n_sample=50
example2=make_2classification(n_cluster,pinfo,pnoise,n_sample)
pi=list()
pi[[2]]=pi[[1]]=rep(1,n_sample)
set.seed(3)
modelO=Olearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelP=Plearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)

DTRlearn documentation built on April 6, 2018, 1:04 a.m.