Description Usage Arguments Details Value References Examples

Function that generates data of the different simulation studies
presented in the accompanying paper. This function requires the
`truncnorm`

package to be installed.

1 2 | ```
gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
SNR, parameterIndex)
``` |

`n` |
number of observations |

`p` |
number of main effect variables (X) |

`corr` |
correlation between predictors |

`E` |
simulated environment vector of length |

`betaE` |
exposure effect size |

`SNR` |
signal to noise ratio |

`parameterIndex` |
simulation scenario index. See details for more information. |

We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.

- Heredity Property
Truth obeys strong hierarchy (

`parameterIndex = 1`

)*Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})*Truth obeys weak hierarchy (

`parameterIndex = 2`

)*Y* = f_1(X_{1}) + f_2(X_{2}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})*Truth only has interactions (

`parameterIndex = 3`

)*Y* = X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})*

- Non-linearity
Truth is linear (

`parameterIndex = 4`

)*Y* = ∑_{j=1}^{4}β_j X_{j} + β_E * X_{E} + X_{E} * X_{3} + X_{E} * X_{4}*- Interactions
Truth only has main effects (

`parameterIndex = 5`

)*Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E}*

.

The functions are from the paper by Lin and Zhang (2006):

- f1
f1 <- function(t) 5 * t

- f2
f2 <- function(t) 3 * (2 * t - 1)^2

- f3
f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))

- f4
f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)

The response is generated as

*Y = Y* + k*error*

where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*

The covariates are simulated as follows as described in Huang et al.
(2010). First, we generate *w1,…, wp, u,v* independently from
*Normal(0,1)* truncated to the interval `[0,1]`

for
*i=1,…,n*. Then we set *x_j = (w_j + t*u)/(1 + t)* for *j
= 1,…, 4* and *x_j = (w_j + t*v)/(1 + t)* for *j = 5,…,
p*, where the parameter *t* controls the amount of correlation among
predictors. This leads to a compound symmetry correlation structure where
*Corr(x_j,x_k) = t^2/(1+t^2)*, for *1 ≤ j ≤ 4, 1 ≤ k ≤ 4*,
and *Corr(x_j,x_k) = t^2/(1+t^2)*, for *5 ≤ j ≤ p, 5 ≤ k ≤
p*, but the covariates of the nonzero and zero components are independent.

A list with the following elements:

- x
matrix of dimension

`nxp`

of simulated main effects- y
simulated response vector of length

`n`

- e
simulated exposure vector of length

`n`

- Y.star
linear predictor vector of length

`n`

- f1
the function

`f1`

evaluated at`x_1`

(`f1(X1)`

)- f2
the function

`f1`

evaluated at`x_1`

(`f1(X1)`

)- f3
the function

`f1`

evaluated at`x_1`

(`f1(X1)`

)- f4
the function

`f1`

evaluated at`x_1`

(`f1(X1)`

)- betaE
the value for

*β_E*- f1.f
the function

`f1`

- f2.f
the function

`f2`

- f3.f
the function

`f3`

- f4.f
the function

`f4`

- X1
an

`n`

length vector of the first predictor- X2
an

`n`

length vector of the second predictor- X3
an

`n`

length vector of the third predictor- X4
an

`n`

length vector of the fourth predictor- scenario
a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)

- causal
character vector of causal variable names

- not_causal
character vector of noise variables

Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.

Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

1 | ```
DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.