data.simu: Data Simulation

Description Usage Arguments Details Value Examples

View source: R/data-simu.R

Description

A function to simulate dataset of primary trait, secondary trait, genotype and one covariate

Usage

1
2
data.simu(par.ls, sec.type, sd1 = 1, sd2 = 1, N = 1000, maf = 0.3,
  cutoff = 0, qntl = 0.1)

Arguments

par.ls

an R list of b0,b1,b3,g0,g1. More information can be seen in Details.

sec.type

an R character to specify secodnary trait type: "binary" or "continuous"

sd1

error term standard deviation for primary trait

sd2

error term standard deviation for secondary trait

N

sample size of dataset

maf

minor allele frequency of SNPs to simulate genotype

cutoff

cutoff to generate binary secondary phenotype

qntl

quantile to choose y1 and y2, between 0 and 0.5. Default value is 0.1, that is, subjects with primary phenotype of top 10% and bottom 10% are in cohort

Details

For continuous secondary traits, dataset is simulated by the following model

Z = g0+g1[1]G+g1[2]X+e1

Y = b0+b1[1]G+b1[2]X+b1[3]Z+e2

For binary secondary traits, dataset is simulated by the following model

Z = g0+g1[1]G+g1[2]X+e1

D = I(Z>cutoff)

Y = b0+b1[1]G+b1[2]X+b1[3]Z+e2

where 'Z'/'D' is continuous/binary secondary trait, 'Y' is primary trait, 'X' is covariate following standard normal distribution, 'G' is genotype following HWE with MAF of 'maf', error term 'e1'/'e2' follows normal distribution with a mean of 0 and standard deviation of 'sd1'/'sd2', only subjects with primary phenotype at top/bottom quantile of 'qntl' are retained as extreme phenotype sampling design.

Value

An R matrix with each row for one subject. Columns contain the following components: 'Y' is for primary traits, 'Z'/'D' is for continuous/binary secondary traits, 'G' is for genotypes, 'E' is for covariates.

Examples

1
2
3
par.ls = list(b0=0,b1=rnorm(2),b3=rnorm(1),g0=0,g1=rnorm(2))
data.cont = data.simu(par.ls,"continuous")
data.bina = data.simu(par.ls,"binary")

WenjianBI/STEPS documentation built on July 22, 2019, 11:12 p.m.