Description Usage Arguments Value See Also Examples
View source: R/make_classification.R
It generates simulated dataset to test multiple stage learning algorithms.
The outcomes are generated based on a pattern mixture model using a latent variable with 4 categories. For each category, X has a multivariate normal distribution and each category is assigned a vector of optimal treatments V.
Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo
+pnoise
dimensions.
Then we assign optimal treatments y=(A_1^*, A_2^*) from (1,1),(1,-1),(-1,-1),(-1,1) to each latent category. The observed treatment assignments A=(A_1,A_2) are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: R_1=0, R_2= A'y+N(0,1). Therefore the mean optimal outcome $R_1+R_2$ is $2$ when the treatment assignments are equal to the optimal treatment for a given a latent group in both stages.
1 | make_2classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)
|
n_cluster |
number of clusters. |
pinfo |
number of informative variables, dimensions of the centroids related to the latent class of the sample. |
pnoise |
number of noise variable. |
n_sample |
sample size |
centroids |
For a training set, do not assign centroids, the centroids are generated randomly by the function. For a testing set, ones want to assign the same set of centroids as the training set. It is a matrix of dimension n_cluster by p. |
X |
Feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generate centroids. |
A |
List of 2, |
y |
List of 2, |
R |
List of 2, |
centroids |
centers of each cluster, are from pinfo dimensional multivariate normal distribution. |
1 2 3 4 5 6 7 8 9 10 11 | n_cluster=5
pinfo=10
pnoise=10
n_sample=50
example2=make_2classification(n_cluster,pinfo,pnoise,n_sample)
pi=list()
pi[[2]]=pi[[1]]=rep(1,n_sample)
set.seed(3)
modelO=Olearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelP=Plearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.