or relatedness

assocTestSingle

R Documentation

Genotype Association Testing with Mixed Models

Description

assocTestSingle performs genotype association tests using the null model fit with fitNullModel.

Usage

## S4 method for signature 'SeqVarIterator'
assocTestSingle(gdsobj, null.model,
                test=c("Score", "Score.SPA", "BinomiRare", "CMP"),
                recalc.pval.thresh=0.05, fast.score.SE=FALSE,
                GxE=NULL,
                geno.coding=c("additive", "dominant", "recessive"),
                sparse=TRUE, imputed=FALSE,
                male.diploid=TRUE, genome.build=c("hg19", "hg38"),
                BPPARAM=bpparam(), verbose=TRUE)
## S4 method for signature 'GenotypeIterator'
assocTestSingle(gdsobj, null.model,
                test=c("Score", "Score.SPA", "BinomiRare", "CMP"),
                recalc.pval.thresh=0.05, GxE=NULL,
                geno.coding=c("additive", "dominant", "recessive"),
                male.diploid=TRUE, BPPARAM=bpparam(), verbose=TRUE)

Arguments

`gdsobj`	An object of class `SeqVarIterator` from the package SeqVarTools, or an object of class `GenotypeIterator` from the package GWASTools, containing the genotype data for the variants and samples to be used for the analysis.
`null.model`	A null model object returned by `fitNullModel`.
`test`	A character string specifying the type of test to be performed. The possibilities are `"Score"` (default), `"Score.SPA"`, `"BinomiRare"`, or `"CMP"`; `"Score.SPA"`, `"BinomiRare"`, and `"CMP"` can only be used when the family of the null model fit with `fitNullModel` is binomial.
`recalc.pval.thresh`	If test is not "Score", recalculate p-values using the specified 'test' for variants with a Score p-value below this threshold; return the score p-value for all other variants.
`fast.score.SE`	Logical indicator of whether to use the fast approximation of the score standard error for testing variant association. When `FALSE` (default), the true score SE is calculated. When `TRUE`, the fast score SE approximation from SAIGE is used. This option can only be used with a null model fit with `fitNullModelFastScore` or updated with `nullModelFastScore`. See 'Details' for further information.
`GxE`	A vector of character strings specifying the names of the variables for which a genotype interaction term should be included.If `GxE` is not `NULL`, `test` is ignored and Wald tests of interaction are performed. If `GxE` is `NULL` (default) no genotype interactions are included. See 'Details' for further information.
`geno.coding`	Whether genotypes should be coded as "additive" (0, 1, or 2 copies of the effect allele), "recessive" (1=homozygous for the effect allele, 0 otherwise), or "dominant" (1=heterozygous or homozygous for the effect allele, 0 for no effect allele). For recessive coding on sex chromosomes, males are coded as 1 if they are hemizygous for the effect allele.
`sparse`	Logical indicator of whether to read genotypes as sparse Matrix objects; the default is `TRUE`. Set this to `FALSE` if the alternate allele dosage of the genotypes in the test are not expected to be mostly 0.
`imputed`	Logical indicator of whether to read dosages from the "DS" field containing imputed dosages instead of counting the number of alternate alleles.
`male.diploid`	Logical for whether males on sex chromosomes are coded as diploid. Default is 'male.diploid=TRUE', meaning sex chromosome genotypes for males have values 0/2. If the input `gdsobj` codes males as 0/1 on sex chromosomes, set 'male.diploid=FALSE'.
`genome.build`	A character sting indicating genome build; used to identify pseudoautosomal regions on the X and Y chromosomes. These regions are not treated as sex chromosomes when calculating allele frequencies.
`BPPARAM`	A `BiocParallelParam` object to process blocks of variants in parallel. If not provided, the default back-end returned by `bpparam` will be used.
`verbose`	Logical indicator of whether updates from the function should be printed to the console; the default is `TRUE`.

Details

assocTestSingle uses the BiocParallel package to process iterator chunks in parallel. See the BiocParallel documentation for more information on the default behaviour of bpparam and how to register different parallel backends. If serial execution is desired, set BPPARAM=BiocParallel::SerialParam(). Note that parallel execution requires more RAM than serial execution.

All samples included in null model will be included in the association test, even if a different set of samples is present in the current filter for gdsobj.

The effect size estimate is for each copy of the alternate allele (when gdsobj is a SeqVarIterator object) or the "A" allele (when gdsobj is a GenotypeIterator object). We refer to this as the "effect allele" in the rest of the documentation. For multiallelic variants in SeqVarIterator objects, each alternate (or "A") allele is tested separately.

Sporadic missing genotype values are mean imputed using the allele frequency calculated on all other samples at that variant.

Monomorphic variants (including variants where every sample is a heterozygote) are omitted from the results.

The input GxE can be used to perform GxE tests. Multiple interaction variables may be specified, but all interaction variables specified must have been included as covariates in fitting the null model with fitNullModel. When performing GxE analyses, assocTestSingle will report two tests: (1) the joint Wald test of all genotype interaction terms in the model (this is the test for any genotype interaction effect), and (2) the joint Wald test of the genotype term along with all of the genotype interaction terms (this is the test for any genetic effect). Individual genotype interaction terms can be tested by creating test statistics from the reported effect size estimates and their standard errors (Note: when GxE contains a single continuous or binary covariate, this test is the same as the test for any genotype interaction effect mentioned above).

The saddle point approximation (SPA), run by using test = "Score.SPA", implements the method described by Dey et al. (2017), which was extended to mixed models by Zhou et al. (2018) in their SAIGE software. SPA provides better calibration of p-values when fitting mixed models with a binomial family for a sample with an imbalanced case to control ratio.

The fast approximation to the score standard error for testing variant association used by Zhou et al. (2018) in their SAIGE software is available by setting the fast.score.SE parameter to TRUE. This approximation may be much faster than computing the true score SE in large samples, as it replaces the full covariance matrix in the calculation with the product of a diagonal matrix and a scalar correction factor. This scalar correction factor must be computed beforehand and stored in the input null.model as se.correction, either by fitting the null.model with fitNullModelFastScore, or by updating a null.model previously fit with fitNullModel using the calcScore and nullModelFastScore functions. This approach assumes a constant scalar SE correction factor across all variants. This method is only available when gdsobj is a SeqVarIterator object.

The BinomiRare test, run by using test = "BinomiRare", and the CMP test, run by using test = "CMP" are carrier-only, robust tests. Only variants where the effect allele is minor will be tested. Both tests focuse on carriers of the rare variant allele ("carriers"), and use the estimated probabilities of the binary outcome within the carriers, estimated under the null of not association, and the actual number of observed outcomes, to compute p-values. BinomiRare uses the Poisson-Binomial distribution, and the CMP uses the Conway-Maxwell-Poisson distribution, and is specifically designed for mixed models. (If test = "CMP" but null.model$family$mixedmodel = FALSE, the BinomiRare test will be run instead.) These tests provide both a traditional p-value ("pval") and a mid-p-value ("midp"), which is less conservative/more liberal, with the difference being more pronounced for small number of carriers. The BinomiRare test is described in Sofer (2017) and compared to the Score and SPA in various settings in Sofer and Guo (2020).

For the GenotypeIterator method, objects created with GdsGenotypeReader or MatrixGenotypeReader are supported. NcdfGenotypeReader objects are not supported.

p-values that are calculated using pchisq and are smaller than .Machine\$double.xmin are set to .Machine\$double.xmin.

Value

A data.frame where each row refers to a different variant with the columns:

`variant.id`	The variant ID
`chr`	The chromosome value
`pos`	The base pair position
`allele.index`	The index of the alternate allele. For biallelic variants, this will always be 1.
`n.obs`	The number of samples with non-missing genotypes
`freq`	The estimated effect allele frequency
`MAC`	The minor allele count. For multiallelic variants, "minor" is determined by comparing the count of the allele specified by `allele.index` with the sum of all other alleles.

If geno.coding is "recessive":

n.hom.eff

The number of samples homozygous for the effect allele.

If geno.coding is "dominant":

n.any.eff

The number of samples with any copies of the effect allele.

If test is "Score":

`Score`	The value of the score function
`Score.SE`	The estimated standard error of the Score
`Score.Stat`	The score Z test statistic
`Score.pval`	The score p-value
`Est`	An approximation of the effect size estimate for each additional copy of the effect allele
`Est.SE`	An approximation of the standard error of the effect size estimate
`PVE`	An approximation of the proportion of phenotype variance explained

If test is "Score.SPA":

`SPA.pval`	The score p-value after applying the saddle point approximation (SPA)
`SPA.converged`	logical indiactor of whether the SPA converged; `NA` indicates that the SPA was not applied and the original Score.pval was returned

If GxE is not NULL:

`Est.G`	The effect size estimate for the genotype term
`Est.G:env`	The effect size estimate for the genotype*env interaction term. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name.
`SE.G`	The estimated standard error of the genotype term effect size estimate
`SE.G:env`	The estimated standard error of the genotype*env effect size estimate. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name.
`GxE.Stat`	The Wald Z test statistic for the test of all genotype interaction terms. When there is only one genotype interaction term, this is the test statistic for that term.
`GxE.pval`	The Wald p-value for the test of all genotype interaction terms; i.e. the test of any genotype interaction effect
`Joint.Stat`	The Wald Z test statistic for the joint test of the genotype term and all of the genotype interaction terms
`Joint.pval`	The Wald p-value for the joint test of the genotype term and all of the genotype interaction terms; i.e. the test of any genotype effect

If test is "BinomiRare" or "CMP":

`n.carrier`	Number of individuals with at least one copy of the effect allele
`n.D.carrier`	Number of cases with at least one copy of the effect allele
`pval`	p-value
`mid.pval`	mid-p-value

Author(s)

Matthew P. Conomos, Stephanie M. Gogarten, Tamar Sofer, Ken Rice, Chaoyu Yu

References

Dey, R., Schmidt, E. M., Abecasis, G. R., & Lee, S. (2017). A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. The American Journal of Human Genetics, 101(1), 37-49.

Sofer, T. (2017). BinomiRare: A robust test of the association of a rare variant with a disease for pooled analysis and meta-analysis, with application to the HCHS/SOL. Genetic Epidemiology, 41(5), 388-395.

Sofer, T. & Guo, N. (2020). Rare variants association testing for a binary outcome when pooling individual level data from heterogeneous studies. https://www.biorxiv.org/content/10.1101/2020.04.17.047530v1.

Zhou, W., Nielsen, J. B., Fritsche, L. G., Dey, R., Gabrielsen, M. E., Wolford, B. N., ... & Bastarache, L. A. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature genetics, 50(9), 1335.

Examples

library(SeqVarTools)
library(Biobase)

# open a sequencing GDS file
gdsfile <- seqExampleFileName("gds")
gds <- seqOpen(gdsfile)

# simulate some phenotype data
set.seed(4)
data(pedigree)
pedigree <- pedigree[match(seqGetData(gds, "sample.id"), pedigree$sample.id),]
pedigree$outcome <- rnorm(nrow(pedigree))

# construct a SeqVarIterator object
seqData <- SeqVarData(gds, sampleData=AnnotatedDataFrame(pedigree))
iterator <- SeqVarBlockIterator(seqData)

# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome", covars="sex")

# run the association test
assoc <- assocTestSingle(iterator, nullmod,
                         BPPARAM=BiocParallel::SerialParam())

# use fast score SE for a null model with a covariance matrix
seqResetFilter(seqData)
grm <- SNPRelate::snpgdsGRM(seqData, verbose=FALSE)
covmat <- grm$grm; dimnames(covmat) <- list(grm$sample.id, grm$sample.id)
set.seed(5)
nullmod <- fitNullModelFastScore(iterator, outcome="outcome", covars="sex", cov.mat=covmat)
assoc.se <- assocTestSingle(iterator, nullmod, fast.score.SE=TRUE,
                            BPPARAM=BiocParallel::SerialParam())

seqClose(iterator)


library(GWASTools)

# open a SNP-based GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
gds <- GdsGenotypeReader(filename = gdsfile)

# simulate some phenotype data
set.seed(4)
pheno <- data.frame(scanID=getScanID(gds),
                    outcome=rnorm(nscan(gds)))

# construct a GenotypeIterator object
genoData <- GenotypeData(gds, scanAnnot=ScanAnnotationDataFrame(pheno))
iterator <- GenotypeBlockIterator(genoData)

# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome")

# run the association test
assoc <- assocTestSingle(iterator, nullmod,
                         BPPARAM=BiocParallel::SerialParam())

close(iterator)

smgogarten/GENESIS documentation built on Feb. 4, 2025, 4:28 p.m.