View source: R/generate.data.R
generate.data | R Documentation |
Generate simulated data under the generalized linear model and Cox proportional hazard model.
generate.data( n, p, support.size = NULL, rho = 0, family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial", "gamma", "ordinal"), beta = NULL, cortype = 1, snr = 10, sigma = NULL, weibull.shape = 1, uniform.max = 1, y.dim = 3, class.num = 3, seed = 1 )
n |
The number of observations. |
p |
The number of predictors of interest. |
support.size |
The number of nonzero coefficients in the underlying regression
model. Can be omitted if |
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
family |
The distribution of the simulated response. |
beta |
The coefficient values in the underlying regression model.
If it is supplied, |
cortype |
The correlation structure.
|
snr |
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as
as the variance of xβ divided
by the variance of a gaussian noise: \frac{Var(xβ)}{σ^2}.
The gaussian noise ε is set with mean 0 and variance.
The noise is added to the linear predictor η = xβ. Default is |
sigma |
The variance of the gaussian noise. Default |
weibull.shape |
The shape parameter of the Weibull distribution.
It works only when |
uniform.max |
A parameter controlling censored rate.
A large value implies a small censored rate;
otherwise, a large censored rate.
It works only when |
y.dim |
Response's Dimension. It works only when |
class.num |
The number of class. It works only when |
seed |
random seed. Default: |
For family = "gaussian"
, the data model is
Y = X β + ε.
The underlying regression coefficient β has uniform distribution [m, 100m] and m=5 √{2log(p)/n}.
For family= "binomial"
, the data model is
Prob(Y = 1) = \exp(X β + ε)/(1 + \exp(X β + ε)).
The underlying regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.
For family = "poisson"
, the data is modeled to have
an exponential distribution:
Y = Exp(\exp(X β + ε)).
The underlying regression coefficient β has uniform distribution [2m, 10m] and m = √{2log(p)/n}/3.
For family = "cox"
, the model for failure time T is
T = (-\log(U / \exp(X β)))^{1/weibull.shape},
where U is a uniform random variable with range [0, 1]. The centering time C is generated from uniform distribution [0, uniform.max], then we define the censor status as δ = I(T ≤ C) and observed time as R = \min\{T, C\}. The underlying regression coefficient β has uniform distribution [2m, 10m], where m = 5 √{2log(p)/n}.
For family = "mgaussian"
, the data model is
Y = X β + E.
The non-zero values of regression matrix β are sampled from uniform distribution [m, 100m] and m=5 √{2log(p)/n}.
For family= "multinomial"
, the data model is
Prob(Y = 1) = \exp(X β + E)/(1 + \exp(X β + E)).
The non-zero values of regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.
In the above models, ε \sim N(0, σ^2 ) and E \sim MVN(0, σ^2 \times I_{q \times q}),
where σ^2 is determined by the snr
and q is y.dim
.
A list
object comprising:
x |
Design matrix of predictors. |
y |
Response variable. |
beta |
The coefficients used in the underlying regression model. |
Jin Zhu
# Generate simulated data n <- 200 p <- 20 support.size <- 5 dataset <- generate.data(n, p, support.size) str(dataset)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.