Description Usage Arguments Details Value Author(s) References See Also Examples
assocTestSingle
performs genotype association tests
using the null model fit with fitNullModel
.
1 2 3 4 5 6 7 8 9 10 11 12 | ## S4 method for signature 'SeqVarIterator'
assocTestSingle(gdsobj, null.model,
test=c("Score", "Score.SPA", "BinomiRare", "CMP"),
recalc.pval.thresh=0.05, GxE=NULL,
sparse=TRUE, imputed=FALSE,
male.diploid=TRUE, genome.build=c("hg19", "hg38"),
verbose=TRUE)
## S4 method for signature 'GenotypeIterator'
assocTestSingle(gdsobj, null.model,
test=c("Score", "Score.SPA", "BinomiRare", "CMP"),
recalc.pval.thresh=0.05, GxE=NULL,
male.diploid=TRUE, verbose=TRUE)
|
gdsobj |
An object of class |
null.model |
A null model object returned by |
test |
A character string specifying the type of test to be performed. The possibilities are |
recalc.pval.thresh |
If test is not "Score", recalculate p-values using the specified 'test' for variants with a Score p-value below this threshold; return the score p-value for all other variants. |
GxE |
A vector of character strings specifying the names of the variables for which a genotype interaction term should be included.If |
sparse |
Logical indicator of whether to read genotypes as sparse Matrix objects; the default is |
imputed |
Logical indicator of whether to read dosages from the "DS" field containing imputed dosages instead of counting the number of alternate alleles. |
male.diploid |
Logical for whether males on sex chromosomes are coded as diploid. |
genome.build |
A character sting indicating genome build; used to identify pseudoautosomal regions on the X and Y chromosomes. |
verbose |
Logical indicator of whether updates from the function should be printed to the console; the default is |
All samples included in null model
will be included in the association test, even if a different set of samples is present in the current filter for gdsobj
.
The effect size estimate is for each copy of the alternate allele (when gdsobj
is a SeqVarIterator
object) or the "A" allele (when gdsobj
is a GenotypeIterator
object). For multiallelic variants, each alternate (or "A") allele is tested separately.
Sporadic missing genotype values are mean imputed using the allele frequency calculated on all other samples at that variant.
Monomorphic variants (including variants where every sample is a heterozygote) are omitted from the results.
The input GxE
can be used to perform GxE tests. Multiple interaction variables may be specified, but all interaction variables specified must have been included as covariates in fitting the null model with fitNullModel
. When performing GxE analyses, assocTestSingle
will report two tests: (1) the joint Wald test of all genotype interaction terms in the model (this is the test for any genotype interaction effect), and (2) the joint Wald test of the genotype term along with all of the genotype interaction terms (this is the test for any genetic effect). Individual genotype interaction terms can be tested by creating test statistics from the reported effect size estimates and their standard errors (Note: when GxE
contains a single continuous or binary covariate, this test is the same as the test for any genotype interaction effect mentioned above).
The saddle point approximation (SPA), run by using test = "Score.SPA"
, implements the method described by Dey et al. (2017), which was extended to mixed models by Zhou et al. (2018) in their SAIGE software. SPA provides better calibration of p-values when fitting mixed models with a binomial family for a sample with an imbalanced case to control ratio.
The BinomiRare test, run by using test = "BinomiRare"
, and the CMP test, run by using test = "CMP"
are carrier-only, robust tests. Only variants where the alternate allele is minor will be tested. Both tests focuse on carriers of the rare variant allele ("carriers"), and use the estimated probabilities of the binary outcome within the carriers, estimated under the null of not association, and the actual number of observed outcomes, to compute p-values. BinomiRare uses the Poisson-Binomial distribution, and the CMP uses the Conway-Maxwell-Poisson distribution, and is specifically designed for mixed models. (If test = "CMP"
but null.model$family$mixedmodel = FALSE
, the BinomiRare test will be run instead.) These tests provide both a traditional p-value ("pval"
) and a mid-p-value ("midp"
), which is less conservative/more liberal, with the difference being more pronounced for small number of carriers. The BinomiRare test is described in Sofer (2017) and compared to the Score and SPA in various settings in Sofer and Guo (2020).
For the GenotypeIterator
method, objects created with GdsGenotypeReader
or MatrixGenotypeReader
are supported. NcdfGenotypeReader
objects are not supported.
A data.frame where each row refers to a different variant with the columns:
variant.id |
The variant ID |
chr |
The chromosome value |
pos |
The base pair position |
allele.index |
The index of the alternate allele. For biallelic variants, this will always be 1. |
n.obs |
The number of samples with non-missing genotypes |
freq |
The estimated alternate allele frequency |
MAC |
The minor allele count. For multiallelic variants, "minor" is determined by comparing the count of the alternate allele specified by |
If test
is "Score"
:
Score |
The value of the score function |
Score.SE |
The estimated standard error of the Score |
Score.Stat |
The score Z test statistic |
Score.pval |
The score p-value |
Est |
An approximation of the effect size estimate for each additional copy of the alternate allele |
Est.SE |
An approximation of the standard error of the effect size estimate |
PVE |
An approximation of the proportion of phenotype variance explained |
If test
is "Score.SPA"
:
SPA.pval |
The score p-value after applying the saddle point approximation (SPA) |
SPA.converged |
logical indiactor of whether the SPA converged; |
If GxE
is not NULL
:
Est.G |
The effect size estimate for the genotype term |
Est.G:env |
The effect size estimate for the genotype*env interaction term. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name. |
SE.G |
The estimated standard error of the genotype term effect size estimate |
SE.G:env |
The estimated standard error of the genotype*env effect size estimate. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name. |
GxE.Stat |
The Wald Z test statistic for the test of all genotype interaction terms. When there is only one genotype interaction term, this is the test statistic for that term. |
GxE.pval |
The Wald p-value for the test of all genotype interaction terms; i.e. the test of any genotype interaction effect |
Joint.Stat |
The Wald Z test statistic for the joint test of the genotype term and all of the genotype interaction terms |
Joint.pval |
The Wald p-value for the joint test of the genotype term and all of the genotype interaction terms; i.e. the test of any genotype effect |
If test
is "BinomiRare" or "CMP"
:
n.carrier |
Number of individuals with at least one copy of the alternate allele |
n.D.carrier |
Number of cases with at least one copy of the alternate allele |
pval |
p-value |
mid.pval |
mid-p-value |
Matthew P. Conomos, Stephanie M. Gogarten, Tamar Sofer, Ken Rice, Chaoyu Yu
Dey, R., Schmidt, E. M., Abecasis, G. R., & Lee, S. (2017). A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. The American Journal of Human Genetics, 101(1), 37-49. Sofer, T. (2017). BinomiRare: A robust test of the association of a rare variant with a disease for pooled analysis and meta-analysis, with application to the HCHS/SOL. Genetic Epidemiology, 41(5), 388-395. Sofer, T. & Guo, N. (2020). Rare variants association testing for a binary outcome when pooling individual level data from heterogeneous studies. https://www.biorxiv.org/content/10.1101/2020.04.17.047530v1. Zhou, W., Nielsen, J. B., Fritsche, L. G., Dey, R., Gabrielsen, M. E., Wolford, B. N., ... & Bastarache, L. A. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature genetics, 50(9), 1335.
fitNullModel
for fitting the null mixed model needed as input to assocTestSingle
.
SeqVarIterator
for creating the input object with genotypes.
effectAllele
for returning the effect allele for each variant.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | library(SeqVarTools)
library(Biobase)
# open a sequencing GDS file
gdsfile <- seqExampleFileName("gds")
gds <- seqOpen(gdsfile)
# simulate some phenotype data
set.seed(4)
data(pedigree)
pedigree <- pedigree[match(seqGetData(gds, "sample.id"), pedigree$sample.id),]
pedigree$outcome <- rnorm(nrow(pedigree))
# construct a SeqVarIterator object
seqData <- SeqVarData(gds, sampleData=AnnotatedDataFrame(pedigree))
iterator <- SeqVarBlockIterator(seqData)
# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome", covars="sex")
# run the association test
assoc <- assocTestSingle(iterator, nullmod)
seqClose(iterator)
library(GWASTools)
# open a SNP-based GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
gds <- GdsGenotypeReader(filename = gdsfile)
# simulate some phenotype data
set.seed(4)
pheno <- data.frame(scanID=getScanID(gds),
outcome=rnorm(nscan(gds)))
# construct a GenotypeIterator object
genoData <- GenotypeData(gds, scanAnnot=ScanAnnotationDataFrame(pheno))
iterator <- GenotypeBlockIterator(genoData)
# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome")
# run the association test
assoc <- assocTestSingle(iterator, nullmod)
close(iterator)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.