Description Usage Arguments Details Value References Examples
Function that generates data of the different simulation studies
presented in the accompanying paper. This function requires the
truncnorm
package to be installed.
1 2 | gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
SNR, parameterIndex)
|
n |
number of observations |
p |
number of main effect variables (X) |
corr |
correlation between predictors |
E |
simulated environment vector of length |
betaE |
exposure effect size |
SNR |
signal to noise ratio |
parameterIndex |
simulation scenario index. See details for more information. |
We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.
Truth obeys strong hierarchy
(parameterIndex = 1
)
Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})
Truth obeys
weak hierarchy (parameterIndex = 2
)
Y* = f_1(X_{1}) + f_2(X_{2}) + β_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})
Truth only has interactions (parameterIndex = 3
)
Y* = X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4})
Truth is
linear (parameterIndex = 4
)
Y* = ∑_{j=1}^{4}β_j X_{j} + β_E * X_{E} + X_{E} * X_{3} + X_{E} * X_{4}
Truth only has main effects (parameterIndex = 5
)
Y* = ∑_{j=1}^{4} f_j(X_{j}) + β_E * X_{E}
.
The functions are from the paper by Lin and Zhang (2006):
f1 <- function(t) 5 * t
f2 <- function(t) 3 * (2 * t - 1)^2
f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))
f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)
The response is generated as
Y = Y* + k*error
where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*
The covariates are simulated as follows as described in Huang et al.
(2010). First, we generate w1,…, wp, u,v independently from
Normal(0,1) truncated to the interval [0,1]
for
i=1,…,n. Then we set x_j = (w_j + t*u)/(1 + t) for j
= 1,…, 4 and x_j = (w_j + t*v)/(1 + t) for j = 5,…,
p, where the parameter t controls the amount of correlation among
predictors. This leads to a compound symmetry correlation structure where
Corr(x_j,x_k) = t^2/(1+t^2), for 1 ≤ j ≤ 4, 1 ≤ k ≤ 4,
and Corr(x_j,x_k) = t^2/(1+t^2), for 5 ≤ j ≤ p, 5 ≤ k ≤
p, but the covariates of the nonzero and zero components are independent.
A list with the following elements:
matrix of
dimension nxp
of simulated main effects
simulated response
vector of length n
simulated exposure vector of length
n
linear predictor vector of length n
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the value for β_E
the function
f1
the function f2
the function
f3
the function f4
an n
length
vector of the first predictor
an n
length vector of the
second predictor
an n
length vector of the third
predictor
an n
length vector of the fourth predictor
a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)
character vector of causal variable names
character vector of noise variables
Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.
Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.
Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.
1 | DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.