gen_data_grp: Simulate grouped data for regression models

View source: R/gen_data_grp.R

gen_data_grpR Documentation

Simulate grouped data for regression models

Description

This function is designed to scale efficiently to high dimensions, and therefore imposes some restrictions. For example, correlation must be positive.

Usage

gen_data_grp(
  n,
  J,
  K = 1,
  beta,
  family = c("gaussian", "binomial"),
  J1 = ceiling(J/2),
  K1 = K,
  SNR = 1,
  signal = c("homogeneous", "heterogeneous"),
  signal.g = c("homogeneous", "heterogeneous"),
  rho = 0,
  rho.g = rho
)

Arguments

n

Sample size

J

Number of groups

K

Number of features per group

beta

Vector of regression coefficients in the generating model, or, if a scalar, the value of each nonzero regression coefficient

family

Generate y according to linear "gaussian" or logistic "binomial" model

J1

Number of nonzero groups

K1

Number of nonzero coefficients per group

SNR

Signal to noise ratio

signal

Should the groups be heterogeneous (in beta) or homogeneous?

signal.g

Should the coefficients within a group be heterogeneous or homogeneous?

rho

Correlation between groups

rho.g

Correlation between parameters within a group

Details

Note that if beta is not supplied, this function must calculate the SNR to determine an appropriate coefficient size. This will be slow if the dimension is large and beta is not sparse.

Examples

Data <- gen_data_grp(100, 10, 5, J1=3, K1=2)
expect_equal(dim(Data$X), c(100, 50))
head(Data$y)
B <- matrix(Data$beta, ncol=10)
expect_false(any(B[1:2, 1:3]==0))
expect_true(all(B[3:5, 1:3]==0))
expect_true(all(B[, 4:10]==0))
expect_equal(Data$group, rep(1:10, each=5))

gen_data_grp(100, 3, 3, J1=2, K1=2)$beta
gen_data_grp(100, 3, 3, J1=2, K1=2, SNR=2)$beta
gen_data_grp(100, 3, 3, J1=2, K1=2, SNR=2, rho=0.8)$beta
gen_data_grp(100, 3, 3, J1=2, K1=2, SNR=2, rho=0.8, signal='het')$beta
gen_data_grp(100, 3, 3, J1=2, K1=2, SNR=2, rho=0.8, signal='het', signal.g='het')$beta
gen_data_grp(100, 3, 3, J1=2, K1=2, SNR=2, rho=0.8, signal='het', b=1)$beta

gen_data_grp(1000, 3, 3, rho=0)$X |> cor() |> round(digits=2)
gen_data_grp(1000, 3, 3, rho=0.7)$X |> cor() |> round(digits=2)
gen_data_grp(1000, 3, 3, rho=0.3, rho.g=0.8)$X |> cor() |> round(digits=2)
gen_data_grp(1000, 3, 3, rho=0.1, rho.g=0.5)$X |> cor() |> round(digits=2)

pbreheny/hdrm documentation built on Jan. 17, 2024, 8:53 p.m.