Home

/

GitHub

/

bbayukari/StatComp21077

/

generate.data: Generate simulated data.

generate.data: Generate simulated data.
In bbayukari/StatComp21077: use to test the abess

View source: R/generate.data.R

generate.data

R Documentation

Generate simulated data.

Description

Generate simulated data under the generalized linear model and Cox proportional hazard model.

Usage

generate.data(
  n,
  p,
  support.size = NULL,
  rho = 0,
  family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial",
    "gamma", "ordinal"),
  beta = NULL,
  cortype = 1,
  snr = 10,
  sigma = NULL,
  weibull.shape = 1,
  uniform.max = 1,
  y.dim = 3,
  class.num = 3,
  seed = 1
)

Arguments

`n`	The number of observations.
`p`	The number of predictors of interest.
`support.size`	The number of nonzero coefficients in the underlying regression model. Can be omitted if `beta` is supplied.
`rho`	A parameter used to characterize the pairwise correlation in predictors. Default is `0`.
`family`	The distribution of the simulated response. `"gaussian"` for univariate quantitative response, `"binomial"` for binary classification response, `"poisson"` for counting response, `"cox"` for left-censored response, `"mgaussian"` for multivariate quantitative response, `"mgaussian"` for multi-classification response.
`beta`	The coefficient values in the underlying regression model. If it is supplied, `support.size` would be omitted.
`cortype`	The correlation structure. `cortype = 1` denotes the independence structure, where the covariance matrix has (i,j) entry equals I(i \neq j). `cortype = 2` denotes the exponential structure, where the covariance matrix has (i,j) entry equals rho^{\|i-j\|}. codecortype = 3 denotes the constant structure, where the non-diagonal entries of covariance matrix are rho and diagonal entries are 1.
`snr`	A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as as the variance of xβ divided by the variance of a gaussian noise: \frac{Var(xβ)}{σ^2}. The gaussian noise ε is set with mean 0 and variance. The noise is added to the linear predictor η = xβ. Default is `snr = 10`. Note that this arguments's effect is overridden if `sigma` is supplied with a non-null value.
`sigma`	The variance of the gaussian noise. Default `sigma = NULL` implies it is determined by `snr`.
`weibull.shape`	The shape parameter of the Weibull distribution. It works only when `family = "cox"`. Default: `weibull.shape = 1`.
`uniform.max`	A parameter controlling censored rate. A large value implies a small censored rate; otherwise, a large censored rate. It works only when `family = "cox"`. Default is `uniform.max = 1`.
`y.dim`	Response's Dimension. It works only when `family = "mgaussian"`. Default: `y.dim = 3`.
`class.num`	The number of class. It works only when `family = "multinomial"`. Default: `class.num = 3`.
`seed`	random seed. Default: `seed = 1`.

Details

For family = "gaussian", the data model is

Y = X β + ε.

The underlying regression coefficient β has uniform distribution [m, 100m] and m=5 √{2log(p)/n}.

For family= "binomial", the data model is

Prob(Y = 1) = \exp(X β + ε)/(1 + \exp(X β + ε)).

The underlying regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.

For family = "poisson", the data is modeled to have an exponential distribution:

Y = Exp(\exp(X β + ε)).

The underlying regression coefficient β has uniform distribution [2m, 10m] and m = √{2log(p)/n}/3.

For family = "cox", the model for failure time T is

T = (-\log(U / \exp(X β)))^{1/weibull.shape},

where U is a uniform random variable with range [0, 1]. The centering time C is generated from uniform distribution [0, uniform.max], then we define the censor status as δ = I(T ≤ C) and observed time as R = \min\{T, C\}. The underlying regression coefficient β has uniform distribution [2m, 10m], where m = 5 √{2log(p)/n}.

For family = "mgaussian", the data model is

Y = X β + E.

The non-zero values of regression matrix β are sampled from uniform distribution [m, 100m] and m=5 √{2log(p)/n}.

For family= "multinomial", the data model is

Prob(Y = 1) = \exp(X β + E)/(1 + \exp(X β + E)).

The non-zero values of regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.

In the above models, ε \sim N(0, σ^2 ) and E \sim MVN(0, σ^2 \times I_{q \times q}), where σ^2 is determined by the snr and q is y.dim.

Value

A list object comprising:

`x`	Design matrix of predictors.
`y`	Response variable.
`beta`	The coefficients used in the underlying regression model.

Author(s)

Jin Zhu

Examples


# Generate simulated data
n <- 200
p <- 20
support.size <- 5
dataset <- generate.data(n, p, support.size)
str(dataset)

bbayukari/StatComp21077 documentation built on March 21, 2022, 2:03 a.m.

bbayukari/StatComp21077 index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bbayukari/StatComp21077
use to test the abess

generate.data: Generate simulated data.
In bbayukari/StatComp21077: use to test the abess

Generate simulated data.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to generate.data in bbayukari/StatComp21077...

R Package Documentation

Browse R Packages

We want your feedback!

bbayukari/StatComp21077 use to test the abess

generate.data: Generate simulated data. In bbayukari/StatComp21077: use to test the abess

Generate simulated data.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to generate.data in bbayukari/StatComp21077...

R Package Documentation

Browse R Packages

We want your feedback!

bbayukari/StatComp21077
use to test the abess

generate.data: Generate simulated data.
In bbayukari/StatComp21077: use to test the abess