gendata: Simulation Scenario from Bhatnagar et al. (2018+) sail paper

Description Usage Arguments Details Value References Examples

View source: R/simulations.R

Description

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the truncnorm package to be installed.

Usage

1
2
gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
  SNR, parameterIndex)

Arguments

n

number of observations

p

number of main effect variables (X)

corr

correlation between predictors

E

simulated environment vector of length n. Can be continuous or integer valued. Factors must be converted to numeric. Default: truncnorm::rtruncnorm(n, a = -1, b = 1)

betaE

exposure effect size

SNR

signal to noise ratio

parameterIndex

simulation scenario index. See details for more information.

Details

We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.

Heredity Property

Truth obeys strong hierarchy (parameterIndex = 1)

Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Truth obeys weak hierarchy (parameterIndex = 2)

Y* = f_1(X_{1}) + f_2(X_{2}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Truth only has interactions (parameterIndex = 3)

Y* = X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Non-linearity

Truth is linear (parameterIndex = 4)

Y* = ∑_{j=1}^{4}β_j X_{j} + β_E * X_{E} + X_{E} * X_{3} + X_{E} * X_{4}

Interactions

Truth only has main effects (parameterIndex = 5)

Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E}

.

The functions are from the paper by Lin and Zhang (2006):

f1

f1 <- function(t) 5 * t

f2

f2 <- function(t) 3 * (2 * t - 1)^2

f3

f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))

f4

f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)

The response is generated as

Y = Y* + k*error

where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*

The covariates are simulated as follows as described in Huang et al. (2010). First, we generate w1,…, wp, u,v independently from Normal(0,1) truncated to the interval [0,1] for i=1,…,n. Then we set x_j = (w_j + t*u)/(1 + t) for j = 1,…, 4 and x_j = (w_j + t*v)/(1 + t) for j = 5,…, p, where the parameter t controls the amount of correlation among predictors. This leads to a compound symmetry correlation structure where Corr(x_j,x_k) = t^2/(1+t^2), for 1 ≤ j ≤ 4, 1 ≤ k ≤ 4, and Corr(x_j,x_k) = t^2/(1+t^2), for 5 ≤ j ≤ p, 5 ≤ k ≤ p, but the covariates of the nonzero and zero components are independent.

Value

A list with the following elements:

x

matrix of dimension nxp of simulated main effects

y

simulated response vector of length n

e

simulated exposure vector of length n

Y.star

linear predictor vector of length n

f1

the function f1 evaluated at x_1 (f1(X1))

f2

the function f1 evaluated at x_1 (f1(X1))

f3

the function f1 evaluated at x_1 (f1(X1))

f4

the function f1 evaluated at x_1 (f1(X1))

betaE

the value for β_E

f1.f

the function f1

f2.f

the function f2

f3.f

the function f3

f4.f

the function f4

X1

an n length vector of the first predictor

X2

an n length vector of the second predictor

X3

an n length vector of the third predictor

X4

an n length vector of the fourth predictor

scenario

a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)

causal

character vector of causal variable names

not_causal

character vector of noise variables

References

Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.

Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

Examples

1
DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)

sahirbhatnagar/sail documentation built on July 17, 2021, 5:10 a.m.