simData: Simulating Linear Regression Data

View source: R/simData.R

simDataR Documentation

Simulating Linear Regression Data

Description

This function simulates a design matrix and a response vector.

Usage

simData(m1, m, n, rho = 0, type = "equicorr", incrBeta = FALSE, SNR = 1, seed = NULL)

Arguments

m1

number of active variables.

m

total number of variables.

n

number of observations.

rho

correlation parameter.

type

type of covariance matrix among equicorr and toeplitz.

incrBeta

logical, TRUE for increasing active coefficients (1,2,3,...), FALSE for active coefficients all equal to 1.

SNR

signal-to-noise-ratio (ratio between the variances of X beta and the error term).

seed

seed.

Details

The design matrix X contains n independent observations from a MVN with mean 0 and covariance matrix Sigma. The term Sigma(ij) is given by type:

  • equicorrelation: 1 if i=j, and rho otherwise

  • Toeplitz: rho^|i-j|

A number m1 of the coefficients are non-null, with values depending on incrBeta. Then the response variable Y is equal to X beta plus an error term. The standard deviation of this error term is such that the signal-to-noise ratio is SNR.

Value

simData returns a list containing the design matrix X (not including the intercept), the response vector Y, and the index vector of active variables active.

Author(s)

Anna Vesely.

Examples

# generate linear regression data with 20 variables and 10 observations
res <- simData(m1=2, m=20, n=10, rho=0.5, type="toeplitz", SNR=5, seed=42)
X <- res$X # design matrix
Y <- res$Y # response vector
active <- res$active # indices of active variables

# choose target as twice the number of active variables
target <- 2*length(active)

# standardized scores using the approximate method with Lasso selection of target variables
G <- splitFlip(X, Y, target=target, seed=42)

# maxT algorithm
maxT(G, alpha=0.1)

annavesely/splitFlip documentation built on July 27, 2024, 4:23 a.m.