seqAssocGLMM_spaBurden: Burden tests
In SAIGEgds: Scalable Implementation of Generalized mixed models using GDS files in Phenome-Wide Association Studies

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/assoc_aggregate.r

Burden p-value calculations using mixed models and the Saddlepoint approximation method for case-control imbalance.

1
2
3

seqAssocGLMM_spaBurden(gdsfile, modobj, units, wbeta=AggrParamBeta,
    summac=3, dsnode="", spa.pval=0.05, var.ratio=NaN, res.savefn="",
    res.compress="LZMA", parallel=FALSE, verbose=TRUE, verbose.maf=TRUE)

`gdsfile`	a SeqArray GDS filename, or a GDS object
`modobj`	an R object for SAIGE model parameters
`units`	a list of units of selected variants, with S3 class `"SeqUnitListClass"` defined in the SeqArray package
`wbeta`	weights for per-variant effect, using beta distribution `dbeta()` according to variant's MAF; a length-two vector, or a matrix with two rows for multiple beta parameters; by default, using beta(1,1) and beta(1,25) both
`summac`	a threshold for the weighted sum of minor allele counts (checking `>= summac`)
`dsnode`	"" for automatically searching the GDS nodes "genotype" and "annotation/format/DS", or use a user-defined GDS node in the file
`spa.pval`	the p-value threshold for SPA adjustment, 0.05 by default
`var.ratio`	`NaN` for using the estimated variance ratio in the model fitting, or a user-defined variance ratio
`res.savefn`	an RData or GDS file name, "" for no saving
`res.compress`	the compression method for the output file, it should be one of LZMA, LZMA_RA, ZIP, ZIP_RA and none
`parallel`	`FALSE` (serial processing), `TRUE` (multicore processing), a numeric value for the number of cores, or other value; `parallel` is passed to the argument `cl` in `seqParallel`, see `seqParallel` for more details
`verbose`	if `TRUE`, show information
`verbose.maf`	if `TRUE`, show summary of MAFs in units

The original SAIGE R package uses 0.05 as a threshold for unadjusted p-values to further calculate SPA-adjusted p-values. If var.ratio=NaN, the average of variance ratios (mean(modobj$var.ratio$ratio)) is used instead. For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).

Return a data.frame with the following components if not saving to a file: chr, chromosome; start, a starting position; end, an ending position; numvar, the number of variants in a window; summac, the weighted sum of minor allele counts; beta, beta coefficient, odds ratio if binary outcomes); SE, standard error for beta coefficient; pval, adjusted p-value with Saddlepoint approximation;

p.norm

p-values based on asymptotic normality (could be 0 if it is too small, e.g., pnorm(-50) = 0 in R; used for checking only

cvg, whether the SPA algorithm converges or not for adjusted p-value.

Xiuwen Zheng

Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.

seqAssocGLMM_spaACAT_V, seqAssocGLMM_spaACAT_O

# open a GDS file
fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds")
gdsfile <- seqOpen(fn)

# load phenotype
phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds")
pheno <- read.table(phenofn, header=TRUE, as.is=TRUE)
head(pheno)

# fit the null model
glmm <- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, gdsfile, trait.type="binary")

# get a list of variant units for burden tests
units <- seqUnitSlidingWindows(gdsfile, win.size=500, win.shift=250)

assoc <- seqAssocGLMM_spaBurden(gdsfile, glmm, units)
head(assoc)

# close the GDS file
seqClose(gdsfile)