longskat_gene_simulate: Simulation for LSKAT test

Description Usage Arguments Details Value References Examples

Description

Using the pre-defined parameters to make the simulation data for the LSKAT test (including power test and type I error test).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
longskat_gene_simulate( power.test = TRUE, 
    n.minsect = 3000, 
    n.maxsect = 30000, 
    n.sample = 800,
    n.time = 6, 
    n.gene = 10, 
    plink.format = FALSE, 
    file.plink.prefix = "LSKAT.plink.test", 
    geno.miss = 0.01, 
    pheno.miss = 0.1, 
    pheno.dist = "mn", 
    pheno.cov = "AR1", 
    intercept = FALSE, 
    par = list() )

Arguments

power.test

Logical variable, indicating whether simulate individual random effects and the individual-specific timede-pendent random effects for the pwer test, otherwise, FALSE indicates type I error test.

n.minsect

Numeric, the minimum size of gene(Unit: BP)

n.maxsect

Numeric, the maximum size of gene(Unit: BP)

n.sample

Numeric, sample size, ie, individual count.

n.time

Numeric, measurement time.

n.gene

Numeric, gene number. If simulation for power test, the 1st gene is the causal gene, the rest are non-causal gene.

plink.format

Logical variable, indicating whether the data will be stored into PLINK file in addtionalto return a list obecjt with multiple matrices.

file.plink.prefix

String, the prefix file name for plink data set if plink.format is TRUE.

geno.miss

Numeric, the missing rate for genome data set.

pheno.miss

Numeric, the missing rate for phenotype traits.

pheno.dist

String, the distribution of individual-specific timede-pendent random effects, four optional values: 'mn', 'mt', 'msn', 'mmn', see details.

pheno.cov

String, the covariance structure of individual-specific timede-pendent random effects, three optional values: 'AR1', "SAD1' and 'CS', see details.

intercept

Logical variable, indicating whether intercept is used in phenotypic traits.

par

List, the parameters for the phenotype traits, including covariates and individual-specific timede-pendent random effects.

Details

The simaltion is generated by the following formula:

Y_{ij} = intercept + b1 * X1_{ij} + b2*X2_{ij} + a_{i} + r_{ij} + e_{ij}

a_{i}:individual random effects

r_{ij}:individual-specific timede-pendent random effects

e_{ij}:measurement error

the individual random effects follow the normal distribution with the standard deviation sig.a.

the individual-specific timede-pendent random effects follow the multivariate normal distribution with covariance structure: AR1, SAD1 or CS.

the individual random effects follows the distribution of t, normal, skew normal or mixed normal.

The covariance structure:

AR1

first-order Autoregressive model [AR(1)], parameters: par$rho and par$sig.b

SAD1

first-order structured antedependence [SAD(1)], parameters: par$rho and par$sig.b

CS

compound symmetry model, parameters: par$rho and par$sig.b

The distibution of measurement error:

mn Normal distribution, parameters: par$sig.e
mt Student distribution, parameters: df=10
msn Skew normal distribution, parameters: par$sig.e, alpha = 40
mmn Mixed normal distribution,parameters: par$par.e[1], par$par.e[2], par$par.e[3]

The pre-defined parameters in the package have the following values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
par <- list(b0=1, b1=0.5, b2=0.5, 
    sig.a=0.8, sig.b=0.8, sig.e=0.8,
    rho=0.7, 
    cov.param   = c(0,1, 0.1),
    time.cov    = 0,
    time.effect = c(0.2, -0.08),
    max.common.causal  = 4,
    coef.common.causal = 0.12,
    max.rare.causal    = 10,
    coef.rare.causal   = 0.08,
    positive.ratio     = 1,
    rare.cutoff        = 0.05 );
b0

Numeric, the intercept value if the intercept is enable.

b1

Numeric, the coefficient of the 1st covariate, binary variable.

b2

Numeric, the coefficient of the 2nd covariate, continuous variable.

sig.a

Numeric, the standar deviation of individual random effects.

sig.b

Numeric, the standar deviation of individual-specific timede-pendent random effects.

sig.e

Numeric, the standar deviation of measurement error.

rho

Numeric, the corelation coefficient of covariance structure.

cov.param

Vector, the other parameters of covariance structure except rho.

time.cov

Numeric, indicating whether consider times as covariate, 0 means no time effects, 1 means time effects, 2 means time effects and time square effects are included as covariates. and so on.

time.effect

Numeric, the time coefficient of time effects. The 1st item is the coefficient for time effects, The 2nd item is the coefficient for time square effects and so on.

max.common.causal

Numeric, the maximum number of common causal SNPs.

coef.common.causal

Numeric, the effect coefficient for common causal SNPs.

max.rare.causal

Numeric, the maximum number of rare causal SNPs.

coef.rare.causal

Numeric, the effect coefficient for rare causal SNPs.

positive.ratio

Numeric, the positive ratio in all causal SNPs.

rare.cutoff

Numeric, hard cuf off for rare MAF, default rare cut off is calculated by the formula: 1/√{2*sample}.

Value

A list object is returned with the following items:

file.plink.bed

String, if plink.format is assigned to TRUE, this is the name of the PLINK file containing the packed binary SNP genotype data. It should have the extension .bed.

file.plink.bim

String, if plink.format is assigned to TRUE, this is the name of the PLINK file containing the SNP descriptions.

file.plink.fam

String, if plink.format is assigned to TRUE, this is the name of the PLINK file containing subject (and, possibly, family) identifiers.

file.gene.set

String, if plink.format is assigned to TRUE, this is the name of the table file contaning the gene defintion, 1st column is gene and 2nd column is SNP name.

file.phe.cov

String, if plink.format is assigned to TRUE, this is the CSV file containing covariate matrix with m rows (individuals) and n columns ( covariates), and also with the individual IDs as row names.

file.phe.long

String, if plink.format is assigned to TRUE, this is the CSV file containing phenotype traits matrix with m rows (individuals) and n columns ( covariates), and also with the individual IDs as row names.

phe.long

Matrix, phenotype traits matrix with m rows (individuals) and n columns ( covariates), and also with the individual IDs as row names.

phe.cov

Matrix, covariate matrix with m rows (individuals) and n columns ( covariates), and also with the individual IDs as row names.

snp.mat

List, containing multiple matrices, each matrix includes all SNPs in the gene.

References

Wang Z., Xu K., Zhang X., Wu X., and Wang Z., (2016) Longitudinal SNP-set association analysis of quantitative phenotypes. Genetic Epidemiology.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## data simulation for the power test
p0 <- longskat_gene_simulate( plink.format=T, file.plink.prefix="tmp-plink-simulate", 
      power.test=T );

## test all genes in the PLINK data set
r.lskat1 <- longskat_gene_plink(p0$file.plink.bed, p0$file.plink.bim, p0$file.plink.fam, 
      p0$file.phe.long, p0$file.phe.cov, NULL, p0$file.gene.set, options=list(g.maxiter=3 ));


## data simulation for the test of type 1 error
p1 <- longskat_gene_simulate( plink.format=T, file.plink.prefix="tmp-plink-simulate", 
      power.test=F );

## test all genes in the PLINK data set
r.lskat2 <- longskat_gene_plink(p1$file.plink.bed, p1$file.plink.bim, p1$file.plink.fam, 
      p1$file.phe.long, p1$file.phe.cov, NULL, p1$file.gene.set, 
      options=list(g.maxiter=3, plink.path="plink"));

ZWang-Lab/LSKAT documentation built on May 10, 2019, 1:55 a.m.