seqAssocGLMM_SPA: P-value calculation

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/assoc_single.r

Description

P-value calculations using variance approximation and an adjustment of Saddlepoint approximation.

Usage

1
2
3
seqAssocGLMM_SPA(gdsfile, modobj, maf=NaN, mac=10, missing=0.1, dsnode="",
    spa.pval=0.05, var.ratio=NaN, res.savefn="", res.compress="LZMA",
    parallel=FALSE, verbose=TRUE)

Arguments

gdsfile

a SeqArray GDS filename, or a GDS object

modobj

an R object for SAIGE model parameters

maf

minor allele frequency threshold (checking >= maf), NaN for no filter

mac

minor allele count threshold (checking >= mac), NaN for no filter

missing

missing threshold for variants (checking <= missing), NaN for no filter

dsnode

"" for automatically searching the GDS nodes "genotype" and "annotation/format/DS", or use a user-defined GDS node in the file

spa.pval

the p-value threshold for SPA adjustment, 0.05 by default (since normal approximation performs well when the test statistic is close to the mean)

var.ratio

NaN for using the estimated variance ratio in the model fitting, or a user-defined variance ratio

res.savefn

an RData or GDS file name, "" for no saving

res.compress

the compression method for the output file, it should be one of LZMA, LZMA_RA, ZIP, ZIP_RA and none

parallel

FALSE (serial processing), TRUE (multicore processing), a numeric value for the number of cores, or other value; parallel is passed to the argument cl in seqParallel, see seqParallel for more details

verbose

if TRUE, show information

Details

The original SAIGE R package uses 0.05 as a threshold for unadjusted p-values (based on asymptotic normality) to further calculate adjusted p-values (Saddlepoint approximation, SPA). If var.ratio=NaN, the average of variance ratios (mean(modobj$var.ratio$ratio)) is used instead. For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).

Value

Return a data.frame with the following components if not saving to a file:

id

variant ID in the GDS file;

chr

chromosome;

pos

position;

rs.id

the RS IDs if it is available in the GDS file;

ref

the reference allele;

alt

the alternative allele;

AF.alt

allele frequency for the alternative allele; the minor allele frequency is pmin(AF.alt, 1-AF.alt);

mac

minor allele count; the allele count for the alternative allele is ifelse(AF.alt<=0.5, mac, 2*num-mac);

num

the number of samples with non-missing genotypes;

beta

beta coefficient, odds ratio if binary outcomes (alternative allele vs. reference allele);

SE

standard error for beta coefficient;

pval

adjusted p-value with the Saddlepoint approximation method;

p.norm

p-values based on asymptotic normality (could be 0 if it is too small, e.g., pnorm(-50) = 0 in R; used for checking only

converged

whether the SPA algorithm converges or not for adjusted p-values.

Author(s)

Xiuwen Zheng

References

Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.

See Also

seqAssocGLMM_SPA, seqSAIGE_LoadPval

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# open a GDS file
fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds")
gdsfile <- seqOpen(fn)

# load phenotype
phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds")
pheno <- read.table(phenofn, header=TRUE, as.is=TRUE)
head(pheno)

# fit the null model
glmm <- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, gdsfile, trait.type="binary")

# p-value calculation
assoc <- seqAssocGLMM_SPA(gdsfile, glmm, mac=10)
head(assoc)

# close the GDS file
seqClose(gdsfile)

SAIGEgds documentation built on Nov. 8, 2020, 7:46 p.m.