gendata: Simulation Scenario from Bhatnagar et al. (2018+) sail paper
In sahirbhatnagar/sail: Sparse Additive Interaction Learning

Description Usage Arguments Details Value References Examples

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the truncnorm package to be installed.

1 2	gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE, SNR, parameterIndex)

`n`	number of observations
`p`	number of main effect variables (X)
`corr`	correlation between predictors
`E`	simulated environment vector of length `n`. Can be continuous or integer valued. Factors must be converted to numeric. Default: `truncnorm::rtruncnorm(n, a = -1, b = 1)`
`betaE`	exposure effect size
`SNR`	signal to noise ratio
`parameterIndex`	simulation scenario index. See details for more information.

We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.

Heredity Property

Truth obeys strong hierarchy (parameterIndex = 1)

Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Truth obeys weak hierarchy (parameterIndex = 2)

Y* = f_1(X_{1}) + f_2(X_{2}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Truth only has interactions (parameterIndex = 3)

Y* = X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})

Non-linearity

Truth is linear (parameterIndex = 4)

Y* = ∑_{j=1}^{4}β_j X_{j} + β_E * X_{E} + X_{E} * X_{3} + X_{E} * X_{4}

Interactions

Truth only has main effects (parameterIndex = 5)

Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E}

The functions are from the paper by Lin and Zhang (2006):

f1: f1 <- function(t) 5 * t
f2: f2 <- function(t) 3 * (2 * t - 1)^2
f3: f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))
f4: f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)

The response is generated as

Y = Y* + k*error

where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*

The covariates are simulated as follows as described in Huang et al. (2010). First, we generate w1,…, wp, u,v independently from Normal(0,1) truncated to the interval [0,1] for i=1,…,n. Then we set x_j = (w_j + t*u)/(1 + t) for j = 1,…, 4 and x_j = (w_j + t*v)/(1 + t) for j = 5,…, p, where the parameter t controls the amount of correlation among predictors. This leads to a compound symmetry correlation structure where Corr(x_j,x_k) = t^2/(1+t^2), for 1 ≤ j ≤ 4, 1 ≤ k ≤ 4, and Corr(x_j,x_k) = t^2/(1+t^2), for 5 ≤ j ≤ p, 5 ≤ k ≤ p, but the covariates of the nonzero and zero components are independent.

A list with the following elements:

x: matrix of dimension nxp of simulated main effects
y: simulated response vector of length n
e: simulated exposure vector of length n
Y.star: linear predictor vector of length n
f1: the function f1 evaluated at x_1 (f1(X1))
f2: the function f1 evaluated at x_1 (f1(X1))
f3: the function f1 evaluated at x_1 (f1(X1))
f4: the function f1 evaluated at x_1 (f1(X1))
betaE: the value for β_E
f1.f: the function f1
f2.f: the function f2
f3.f: the function f3
f4.f: the function f4
X1: an n length vector of the first predictor
X2: an n length vector of the second predictor
X3: an n length vector of the third predictor
X4: an n length vector of the fourth predictor
scenario: a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)
causal: character vector of causal variable names
not_causal: character vector of noise variables

Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.

Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

1	DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)

sahirbhatnagar/sail documentation built on July 17, 2021, 5:10 a.m.

sahirbhatnagar/sail index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sahirbhatnagar/sail
Sparse Additive Interaction Learning

gendata: Simulation Scenario from Bhatnagar et al. (2018+) sail paper
In sahirbhatnagar/sail: Sparse Additive Interaction Learning

Description

Usage

Arguments

Details

Value

References

Examples

Related to gendata in sahirbhatnagar/sail...

R Package Documentation

Browse R Packages

We want your feedback!

sahirbhatnagar/sail Sparse Additive Interaction Learning

gendata: Simulation Scenario from Bhatnagar et al. (2018+) sail paper In sahirbhatnagar/sail: Sparse Additive Interaction Learning

Description

Usage

Arguments

Details

Value

References

Examples

Related to gendata in sahirbhatnagar/sail...

R Package Documentation

Browse R Packages

We want your feedback!

sahirbhatnagar/sail
Sparse Additive Interaction Learning

gendata: Simulation Scenario from Bhatnagar et al. (2018+) sail paper
In sahirbhatnagar/sail: Sparse Additive Interaction Learning