Description Usage Arguments Details Value Author(s) See Also Examples
Generate data for simulations under the generalized linear model and Cox model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
n |
The number of observations. |
p |
The number of predictors of interest. |
k |
The number of nonzero coefficients in the underlying regression
model. Can be omitted if |
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
family |
The distribution of the simulated data. |
beta |
The coefficient values in the underlying regression model. |
cortype |
The correlation structure. |
snr |
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as
as the variance of xβ divided
by the variance of a gaussian noise: \frac{Var(xβ)}{σ^2}.
The gaussian noise ε is set with mean 0 and variance.
The noise is added to the linear predictor η = xβ. Default is |
censoring |
Whether data is censored or not. Valid only for |
c |
The censoring rate. Default is |
scal |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |
sigma |
A parameter used to control the signal-to-noise ratio. For linear regression,
it is the error variance σ^2. For logistic regression and Cox's model,
the larger the value of sigma, the higher the signal-to-noise ratio. Valid only for |
seed |
seed to be used in generating the random numbers. |
We generate an n \times p random Gaussian matrix X with mean 0 and a covariance matrix with an exponential structure or a constant structure. For the exponential structure, the covariance matrix has (i,j) entry equals rho^{|i-j|}. For the constant structure, the (i,j) entry of the covariance matrix is rho for every i \neq j and 1 elsewhere. For the moving average structure, For the design matrix X, we first generate an n \times p random Gaussian matrix \bar{X} whose entries are i.i.d. \sim N(0,1) and then normalize its columns to the √ n length. Then the design matrix X is generated with X_j = \bar{X}_j + ρ(\bar{X}_{j+1}+\bar{X}_{j-1}) for j=2,…,p-1.
For family = "gaussian"
, the data model is
Y = X β + ε.
The underlying regression coefficient β has uniform distribution [m, 100m], m=5 √{2log(p)/n}.
For family= "binomial"
, the data model is
Prob(Y = 1) = \exp(X β + ε)/(1 + \exp(X β + ε)).
The underlying regression coefficient β has uniform distribution [2m, 10m], m = 5σ √{2log(p)/n}.
For family = "poisson"
, the data is modeled to have an exponential distribution:
Y = Exp(\exp(X β + ε)).
For family = "cox"
, the data model is
T = (-\log(S(t))/\exp(X β))^{1/scal}.
The centering time is generated from uniform distribution [0, c],
then we define the censor status as δ = I\{T ≤q C\}, R = min\{T, C\}.
The underlying regression coefficient β has uniform distribution [2m, 10m], m = 5σ √{2log(p)/n}.
In the above models, ε \sim N(0,
σ^2 ), where σ^2 is determined by the snr
.
x |
Design matrix of predictors. |
y |
Response variable. |
Tbeta |
The coefficients used in the underlying regression model. |
Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
SNR <- 10
cortype <- 1
seed <- 10
Data <- gen.data(n, p, k, rho, family = "gaussian", cortype = cortype, snr = SNR, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.