gen_data: Simulate data for regression models

View source: R/gen_data.r

gen_dataR Documentation

Simulate data for regression models

Description

This function is designed to scale efficiently to high dimensions, and therefore imposes some restrictions. For example, correlation must be positive.

Usage

gen_data(
  n,
  p,
  p1 = floor(p/2),
  beta,
  family = c("gaussian", "binomial"),
  SNR = 1,
  signal = c("homogeneous", "heterogeneous"),
  corr = c("exchangeable", "autoregressive"),
  rho = 0
)

Arguments

n

Sample size

p

Number of features

p1

Number of nonzero features

beta

Vector of regression coefficients in the generating model, or, if a scalar, the value of each nonzero regression coefficient.

family

Generate y according to linear "gaussian" or logistic "binomial" model

SNR

Signal to noise ratio

signal

Should the beta coefficients be homogeneous (default) or heterogeneous

corr

Correlation structure between features ('exchangeable' | 'autoregressive')

rho

Correlation coefficient

Details

Note that if beta is not supplied, this function must calculate the SNR to determine an appropriate coefficient size. This will be slow if the dimension is large and beta is not sparse.

Examples

dat <- gen_data(100, 100, 10)
dim(dat$X)
head(dat$y)
head(dat$beta)

gen_data(100, 10, 5)$beta
gen_data(100, 10, 5, SNR=2)$beta
gen_data(100, 10, 5, SNR=2, corr='exch', rho=0.7)$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.7)$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.7, signal='het')$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.1, signal='het')$beta
gen_data(100, 10, 5, SNR=2, corr='auto', rho=0.1, signal='het', b=1)$beta

gen_data(10, 10, 5, family='binomial')$y

gen_data(1000, 10, rho=0.0, corr='exch')$X |> cor() |> round(digits=2)
gen_data(1000, 10, rho=0.7, corr='exch')$X |> cor() |> round(digits=2)
gen_data(1000, 10, rho=0.7, corr='auto')$X |> cor() |> round(digits=2)
gen_data(1000, 3, 3, rho=0)$X |> cor() |> round(digits=2)

pbreheny/hdrm documentation built on Jan. 17, 2024, 8:53 p.m.