gen_data | R Documentation |
This function is designed to scale efficiently to high dimensions, and therefore imposes some restrictions. For example, correlation must be positive.
gen_data(
n,
p,
p1 = floor(p/2),
beta,
family = c("gaussian", "binomial"),
SNR = 1,
signal = c("homogeneous", "heterogeneous"),
corr = c("exchangeable", "autoregressive"),
rho = 0
)
n |
Sample size |
p |
Number of features |
p1 |
Number of nonzero features |
beta |
Vector of regression coefficients in the generating model, or, if a scalar, the value of each nonzero regression coefficient. |
family |
Generate |
SNR |
Signal to noise ratio |
signal |
Should the beta coefficients be homogeneous (default) or heterogeneous |
corr |
Correlation structure between features ('exchangeable' | 'autoregressive') |
rho |
Correlation coefficient |
Note that if beta is not supplied, this function must calculate the SNR to determine an appropriate coefficient size. This will be slow if the dimension is large and beta is not sparse.
dat <- gen_data(100, 100, 10)
dim(dat$X)
head(dat$y)
head(dat$beta)
gen_data(100, 10, 5)$beta
gen_data(100, 10, 5, SNR=2)$beta
gen_data(100, 10, 5, SNR=2, corr='exch', rho=0.7)$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.7)$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.7, signal='het')$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.1, signal='het')$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.1, signal='het', b=1)$beta
gen_data(10, 10, 5, family='binomial')$y
gen_data(1000, 10, rho=0.0, corr='exch')$X |> cor() |> round(digits=2)
gen_data(1000, 10, rho=0.7, corr='exch')$X |> cor() |> round(digits=2)
gen_data(1000, 10, rho=0.7, corr='auto')$X |> cor() |> round(digits=2)
gen_data(1000, 3, 3, rho=0)$X |> cor() |> round(digits=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.