GenSyntheticHighCorr: Generate Exponential Correlated Synthetic Data

View source: R/genhighcorr.R

GenSyntheticHighCorrR Documentation

Generate Exponential Correlated Synthetic Data

Description

Generates a synthetic dataset as follows: 1) Generate a correlation matrix, SIG, where item [i, j] = A^|i-j|. 2) Draw from a Multivariate Normal Distribution using (mu and SIG) to generate X. 3) Generate a vector B with every ~p/k entry set to 1 and the rest are zeros. 4) Sample every element in the noise vector e from N(0,1). 4) Set y = XB + b0 + e.

Usage

GenSyntheticHighCorr(
  n,
  p,
  k,
  seed,
  rho = 0,
  b0 = 0,
  snr = 1,
  mu = 0,
  base_cor = 0.9
)

Arguments

n

Number of samples

p

Number of features

k

Number of non-zeros in true vector of coefficients

seed

The seed used for randomly generating the data

rho

The threshold for setting values to 0. if |X(i, j)| > rho => X(i, j) <- 0

b0

intercept value to scale y by.

snr

desired Signal-to-Noise ratio. This sets the magnitude of the error term 'e'. SNR is defined as SNR = Var(XB)/Var(e)

mu

The mean for drawing from the Multivariate Normal Distribution. A scalar of vector of length p.

base_cor

The base correlation, A in [i, j] = A^|i-j|.

Value

A list containing: the data matrix X, the response vector y, the coefficients B, the error vector e, the intercept term b0.


L0Learn documentation built on March 7, 2023, 8:18 p.m.