gen_data_abn: Simulate data according to Causal/Correlated/Noise paradigm

View source: R/gen_data_abn.R

gen_data_abnR Documentation

Simulate data according to Causal/Correlated/Noise paradigm

Description

This function is designed to scale efficiently to high dimensions, and therefore imposes some restrictions. For example, correlation must be positive.

Usage

gen_data_abn(
  n = 100,
  p = 60,
  a = 6,
  b = 2,
  rho = 0.5,
  family = c("gaussian", "binomial"),
  signal = c("homogeneous", "heterogeneous"),
  noise = c("exchangeable", "autoregressive"),
  rho.noise = 0,
  beta,
  SNR = 1
)

Arguments

n

Sample size

p

Number of features

a

Number of causal ('A') variables

b

Number of correlated ('B') variables per causal ('A') variable

rho

Correlation between 'A' and 'B' variables

family

Generate y according to linear "gaussian" or logistic "binomial" model

signal

Should the groups be heterogeneous (in beta) or homogeneous?

noise

Correlation structure between features ('exchangeable' | 'autoregressive')

rho.noise

Correlation parameter for noise variables

beta

Vector of regression coefficients in the generating model. Should be either a scalar, in which case it represents the value of each nonzero regression coefficient, or a vector, in which case it should be of length a

SNR

Signal to noise ratio

Details

Note that if beta is not supplied, this function must calculate the SNR to determine an appropriate coefficient size. This will be slow if the dimension is large and beta is not sparse.

Examples

Data <- gen_data_abn(n=100, p=20, a=2, b=3)
expect_equal(dim(Data$X), c(100, 20))
expect_equal(length(Data$y), 100)
expect_equal(Data$varType[1:8], rep(c('A', 'B', 'B', 'B'), 2))
with(Data, data.frame(beta, varType))
gen_data_abn(100, 10, 2, 1)$beta
gen_data_abn(100, 10, 2, 1, rho=0.9)$beta
gen_data_abn(100, 10, 2, 1, rho=0.9, rho.noise=0.9)$beta
gen_data_abn(100, 10, 2, 1, SNR=3)$beta
gen_data_abn(100, 10, 2, 1, SNR=3, signal='het')$beta
gen_data_abn(100, 10, 2, 1, beta=3)$beta
gen_data_abn(100, 10, 2, 1, beta=2:1)$beta

gen_data_abn(10, 20, 2, 3, family='binomial')$y

gen_data_abn(1000, 10, 2, 2, rho=0.25, rho.noise=0.0, noise='exch')$X |> cor() |> round(digits=2)
gen_data_abn(1000, 10, 2, 2, rho=0.5, rho.noise=0.5, noise='exch')$X |> cor() |> round(digits=2)
gen_data_abn(1000, 10, 2, 2, rho=0.75, rho.noise=0.9, noise='auto')$X |> cor() |> round(digits=2)

pbreheny/hdrm documentation built on Jan. 17, 2024, 8:53 p.m.