gen.data: Generate simulated data

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/gen.data.R

Description

Generate data for simulations under the generalized linear model and Cox model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
gen.data(
  n,
  p,
  k = NULL,
  rho = 0,
  family = c("gaussian", "binomial", "poisson", "cox"),
  beta = NULL,
  cortype = 1,
  snr = 10,
  censoring = TRUE,
  c = 1,
  scal,
  sigma = 1,
  seed = 1
)

Arguments

n

The number of observations.

p

The number of predictors of interest.

k

The number of nonzero coefficients in the underlying regression model. Can be omitted if beta is supplied.

rho

A parameter used to characterize the pairwise correlation in predictors. Default is 0.

family

The distribution of the simulated data. "gaussian" for gaussian data."binomial" for binary data. "poisson" for count data. "cox" for survival data.

beta

The coefficient values in the underlying regression model.

cortype

The correlation structure. cortype = 1 denotes the exponential structure, where the covariance matrix has (i,j) entry equals rho^{|i-j|}. codecortype = 2 denotes the constant structure, where the (i,j) entry of covariance matrix is rho for every i \neq j and 1 elsewhere. cortype = 3 denotes the moving average structure. Details can be found below.

snr

A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as as the variance of divided by the variance of a gaussian noise: \frac{Var(xβ)}{σ^2}. The gaussian noise ε is set with mean 0 and variance. The noise is added to the linear predictor η = . Default is snr = 10. This option is invalid for cortype = 3.

censoring

Whether data is censored or not. Valid only for family = "cox". Default is TRUE.

c

The censoring rate. Default is 1.

scal

A parameter in generating survival time based on the Weibull distribution. Only used for the "cox" family.

sigma

A parameter used to control the signal-to-noise ratio. For linear regression, it is the error variance σ^2. For logistic regression and Cox's model, the larger the value of sigma, the higher the signal-to-noise ratio. Valid only for cortype = 3.

seed

seed to be used in generating the random numbers.

Details

We generate an n \times p random Gaussian matrix X with mean 0 and a covariance matrix with an exponential structure or a constant structure. For the exponential structure, the covariance matrix has (i,j) entry equals rho^{|i-j|}. For the constant structure, the (i,j) entry of the covariance matrix is rho for every i \neq j and 1 elsewhere. For the moving average structure, For the design matrix X, we first generate an n \times p random Gaussian matrix \bar{X} whose entries are i.i.d. \sim N(0,1) and then normalize its columns to the √ n length. Then the design matrix X is generated with X_j = \bar{X}_j + ρ(\bar{X}_{j+1}+\bar{X}_{j-1}) for j=2,…,p-1.

For family = "gaussian" , the data model is

Y = X β + ε.

The underlying regression coefficient β has uniform distribution [m, 100m], m=5 √{2log(p)/n}.

For family= "binomial", the data model is

Prob(Y = 1) = \exp(X β + ε)/(1 + \exp(X β + ε)).

The underlying regression coefficient β has uniform distribution [2m, 10m], m = 5σ √{2log(p)/n}.

For family = "poisson" , the data is modeled to have an exponential distribution:

Y = Exp(\exp(X β + ε)).

For family = "cox", the data model is

T = (-\log(S(t))/\exp(X β))^{1/scal}.

The centering time is generated from uniform distribution [0, c], then we define the censor status as δ = I\{T ≤q C\}, R = min\{T, C\}. The underlying regression coefficient β has uniform distribution [2m, 10m], m = 5σ √{2log(p)/n}. In the above models, ε \sim N(0, σ^2 ), where σ^2 is determined by the snr.

Value

x

Design matrix of predictors.

y

Response variable.

Tbeta

The coefficients used in the underlying regression model.

Author(s)

Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.

See Also

bsrr, predict.bsrr.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
SNR <- 10
cortype <- 1
seed <- 10
Data <- gen.data(n, p, k, rho, family = "gaussian", cortype = cortype, snr = SNR, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")

bestridge documentation built on Oct. 10, 2021, 5:06 p.m.