data_generator: Functions for Simulating Data

Description Usage Arguments Details Value Examples

Description

When investigating the properties of GEM, the following three data generators are used in various simulations. They are designed to construct three specific types of data sets in the case of two treatment groups. See more detail in E Petkova, T Tarpey, Z Su, and RT Ogden. Generated effect modifiers (GEMs) in randomized clinical trials. Biostatistics, (First published online: July 27, 2016). doi: 10.1093/biostatistics/kxw035.

Usage

1
2
3
4
5
data_generator1(d, R2, v2, n, co, beta1, inter)

data_generator2(n, co, R2, bet, inter)

data_generator3(n, co, bet, inter)

Arguments

d

A scalar indicating the effect size of the GEM when the data is generated under a GEM model

R2

A scalar indicating the proportion of explained variance R^2 for the entire data set

v2

A scalar indicating the proportion of explained variance R^2 for the first treatment group

n

A scalar indicating the number of observation in each treatment group, assumed to be the same.

co

A p by p positive semidefinite matrix indicating the covariance matrix of the covariates

beta1

A vector of length p giving the regression coefficients for the first treatment group

inter

A vector of length 2 recording the intercepts β_{10},β_{20} for the two treatment groups respectively

bet

A list with two elements, each a vector of length p, giving the regression coefficients for the two treatment groups respectively

Details

data_generator1 is used to create data where the outcome is a linear function of the covariates

y_j = β_{j0} + Xβ_j + ε, j = 1, 2,

and the coffcicients of covariates β are proportional between two treatment groups: β_2 = b * β_1. This type of data set matches perfectly with the motivation of GEM algorithm. β_1 is set as an argument of the function while β_2 = b * β_1 is derived by controling R^2 of the whole data and the effect size. See more detail in Kraemer, H. C. (2013). Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Statistics in medicine, 32(11), 1964-1973.

data_generator2 is similar to the first one except that the coefficients of the covariates are not necessarily proportional. Hence two \bold{β}'s should be specified as arguments of the function.

data_generator3 constructs a data set where the outcome under each treatment condition is given for all subjects. In addition, no error is added to the mean outcome. This generator is useful for obtaining the "true" value of a treatment decision. This data generator is similar to data generator2

y_j = β_{j0} + Xβ_j, j = 1,2.

Value

The output from these functions are different:

For the function data_generator1

  1. dat A data frame with first and second column as treatment group index and outcome respectively, and each of the remaining columns as a covariate.

  2. bet A list with two elements, each a vector of length p, giving the regression coefficients for the two treatment groups respectively

  3. error_12 A vector of length three represeting the standard deviation of ε, the explained variance by the linear part for the first and second treatment group respectively.

For the function data_generator2

  1. dat A data frame with first and second column as treatment group index and outcome respectively, and each of the remaining columns as a covariate.

  2. bet list with two elements, each a vector of length p, giving the regression coefficients for the two treatment groups respectively

  3. error A scalar represeting the standard deviation of ε

For the function data_generator3

  1. y0 Outcome vector under the first treatment assignment

  2. y1 Outcome vector under the second treatment assignment

  3. X Design matrix for the covariates

  4. oracle Average of the outcome if each subject takes the optimal treatment assignment

  5. invOracle Average of the outcome if each subject does not take the optimal treatment assignment

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#constructing the covariance matrix
co <- matrix(0.2, 30, 30)
diag(co) <- 1
dataEx <- data_generator1(d = 0.3, R2 = 0.5, v2 = 1, n = 3000, 
                           co = co, beta1 = rep(1,30),inter = c(0,0))
#check the R squared of the simluated data set
dat <- dataEx[[1]]
summary(lm(V2~factor(trt)*(V3+V4+V5+V6+V7+V8+V9+V10+V11+V12+V13+V14+V15+V16+
V17+V18+V19+V20+V21+V22+V23+V24+V25+V26+V27+V28+V29+V30+V31+V32),data=dat))

bigData <- data_generator3(n = 10000,co = co,bet =dataEx[[2]], inter = c(0,0))

suzhesuzhe/GEM documentation built on May 26, 2017, 11 p.m.