GenSyntheticHighCorr | R Documentation |
Generates a synthetic dataset as follows: 1) Generate a correlation matrix, SIG, where item [i, j] = A^|i-j|. 2) Draw from a Multivariate Normal Distribution using (mu and SIG) to generate X. 3) Generate a vector B with every ~p/k entry set to 1 and the rest are zeros. 4) Sample every element in the noise vector e from N(0,1). 4) Set y = XB + b0 + e.
GenSyntheticHighCorr( n, p, k, seed, rho = 0, b0 = 0, snr = 1, mu = 0, base_cor = 0.9 )
n |
Number of samples |
p |
Number of features |
k |
Number of non-zeros in true vector of coefficients |
seed |
The seed used for randomly generating the data |
rho |
The threshold for setting values to 0. if |X(i, j)| > rho => X(i, j) <- 0 |
b0 |
intercept value to scale y by. |
snr |
desired Signal-to-Noise ratio. This sets the magnitude of the error term 'e'. SNR is defined as SNR = Var(XB)/Var(e) |
mu |
The mean for drawing from the Multivariate Normal Distribution. A scalar of vector of length p. |
base_cor |
The base correlation, A in [i, j] = A^|i-j|. |
A list containing: the data matrix X, the response vector y, the coefficients B, the error vector e, the intercept term b0.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.