make_classification: Data Simulation for single stage

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/make_classification.R

Description

It generates simulated datasets to test single stage DTR learning algorithms. The outcomes are generated based on a pattern mixture model using a latent variable with 2 categories. Category 1 has the optimal treatment y=1, and category 2 has y=-1. The feature variables X has a multivariate normal distribution. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions. The observed treatment assignments A are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: R_1=0, R_2= 1.5A*y+N(0,1).

Usage

1
make_classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

Arguments

n_cluster

number of clusters.

pinfo

number of informative variables, dimensions of the centroids related to the latent class of the sample.

pnoise

number of noise variables.

n_sample

sample size

centroids

For a training set, do not assign centroids, the centroids are generated randomly by the function. For a testing set, one wants to assign the same set of centroids as the training set. It is a matrix of dimension n_cluster by p.

Value

X

The feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generated centroids.

A

The treatment assignment vector

y

The true optimal treatment vector

R

Outcomes vector

centroids

are from pinfo dimensional multivariate normal distribution.

Author(s)

Ying Liu yl2802@cumc.columbia.edu

References

This function borrows idea from a python comparable function make_classification in scikit_learn http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification

See Also

make_2classification for generating simulation data for 2 stages

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
n_cluster=10
pinfo=10
pnoise=20
example1=make_classification(n_cluster,pinfo,pnoise,100)
test=make_classification(n_cluster,pinfo,pnoise,100,example1$centroids)
model1=Olearning_Single(example1$X,example1$A,example1$R)
Atp=predict(model1,test$X)
V1=mean(test$R[Atp==test$A])

model2=wsvm(example1$X,example1$A,example1$R,'rbf',0.05)
Atp=predict(model2,test$X)
V2=mean(test$R[Atp==test$A])
#in this very non-linear case, one can compare V1 with V2 (the empirical value on testing set),
#and can see the better of model2 using rbf kernel
#to model1 using linear kernel. 
#the true optimal value here is 1.5

Example output

Loading required package: kernlab
Loading required package: MASS
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

Loading required package: ggplot2

Attaching package: 'ggplot2'

The following object is masked from 'package:kernlab':

    alpha

DTRlearn documentation built on April 6, 2018, 1:04 a.m.