generate_data: Data generation function

Description Usage Arguments Value Examples

View source: R/generate_data.R

Description

Function to generate data with n observations of a primary outcome Y, secondary outcome K, exposure X, and measured as well as unmeasured confounders L and U, where the primary outcome is a quantitative normally-distributed variable (setting = "GLM") or censored time-to-event outcome under an accelerated failure time (AFT) model (setting = "AFT"). Under the AFT setting, the observed time-to-event variable T=exp(Y) as well as the censoring indicator C are also computed. X is generated as a genetic exposure variable in the form of a single nucleotide variant (SNV) in 0-1-2 additive coding with minor allele frequency maf. X can be generated independently of U (X_orth_U = TRUE) or dependent on U (X_orth_U = FALSE). For more details regarding the underlying model, see the vignette.

Usage

1
2
3
4
5
generate_data(setting = "GLM", n = 1000, maf = 0.2, cens = 0.3,
  a = NULL, b = NULL, aXK = 0.2, aXY = 0.1, aXL = 0, aKY = 0.3,
  aLK = 0, aLY = 0, aUY = 0, aUL = 0, mu_X = NULL, sd_X = NULL,
  X_orth_U = TRUE, mu_U = 0, sd_U = 1, mu_K = 0, sd_K = 1, mu_L = 0,
  sd_L = 1, mu_Y = 0, sd_Y = 1)

Arguments

setting

String with value "GLM" or "AFT" indicating whether the primary outcome is generated as a normally-distributed quantitative outcome ("GLM") or censored time-to-event outcome ("AFT").

n

Numeric. Sample size.

maf

Numeric. Minor allele frequency of the genetic exposure variable.

cens

Numeric. Desired percentage of censored individuals and has to be specified under the AFT setting. Note that the actual censoring rate is generated through specification of the parameters a and b, and cens is mostly used as a check whether the desired censoring rate is obtained through a and b (otherwise, a warning is issued).

a

Integer for generating the desired censoring rate under the AFT setting. Has to be specified under the AFT setting.

b

Integer for generating the desired censoring rate under the AFT setting. Has to be specified under the AFT setting.

aXK

Numeric. Size of the effect of X on K.

aXY

Numeric. Size of the effect of X on Y.

aXL

Numeric. Size of the effect of X on L.

aKY

Numeric. Size of the effect of K on Y.

aLK

Numeric. Size of the effect of L on K.

aLY

Numeric. Size of the effect of L on Y.

aUY

Numeric. Size of the effect of U on Y.

aUL

Numeric. Size of the effect of U on L.

mu_X

Numeric. Expected value of X.

sd_X

Numeric. Standard deviation of X.

X_orth_U

Logical. Indicator whether X should be generated independently of U (X_orth_U = TRUE) or dependent on U (X_orth_U = FALSE).

mu_U

Numeric. Expected value of U.

sd_U

Numeric. Standard deviation of U.

mu_K

Numeric. Expected value of K.

sd_K

Numeric. Standard deviation of K.

mu_L

Numeric. Expected value of L.

sd_L

Numeric. Standard deviation of L.

mu_Y

Numeric. Expected value of Y.

sd_Y

Numeric. Standard deviation of Y.

Value

A dataframe containing n observations of the variables Y, K, X, L, U. Under the AFT setting, T=exp(Y) and the censoring indicator C (0 = censored, 1 = uncensored) are also computed.

Examples

1
2
3
4
5
6
7
# Generate data under the GLM setting with default values
dat_GLM <- generate_data()
head(dat_GLM)

# Generate data under the AFT setting with default values
dat_AFT <- generate_data(setting = "AFT", a = 0.2, b = 4.75)
head(dat_AFT)

CIEE documentation built on May 2, 2019, 6:39 a.m.