Aggregate Association Testing with Sequencing Data in Sliding Windows

Description

assocTestSeqWindow performs aggregate association tests with sequencing data in sliding windows using the null model fit with fitNullMM or fitNullReg.

Usage

1
2
3
4
assocTestSeqWindow(seqData, nullModObj, variant.include = NULL, chromosome = NULL,
                    window.size = 50, window.shift = 20, AF.sample = NULL, AF.range = c(0,1), 
                    weight.beta = c(0.5, 0.5), weight.user = NULL, test = "Burden", 
                    burden.test = "Score", rho = 0, pval.method = "kuonen", verbose = TRUE)

Arguments

seqData

An object of class SeqVarData from the package SeqVarTools containing the sequencing genotype data for variants and samples to be used for the analysis.

nullModObj

A null model object returned by fitNullMM when using mixed models or fitNullReg when using linear or logistic regression.

variant.include

A vector of variant.id values indicating which variants to include in the sliding window analysis. If NULL, see chromosome for further details.

chromosome

A vector specifying which chromosomes to analyze. This parameter is only considred when variant.include is NULL; if chromosome is also NULL, then all variants in seqData are included.

window.size

The size, in kb, of each window to be analyzed. Windows are defined based on physical position.

window.shift

The distance, in kb, that the window is shifted to create each new window.

AF.sample

A vector of sample.id values specifying which samples should be used for allele frequency calculation. When NULL (the default), all samples included in the test are used. Allele frequency calculation will affect variant inclusion based on AF.range and variant weighting based on weight.beta.

AF.range

A numeric vector of length two specifying the lower and upper bounds on the alternate allele frequency for variants to be included in the analysis. Variants with alternate allele frequencies outside of this range are given a weight of 0 (i.e. excluded).

weight.beta

A numeric vector of length two specifying the two parameters of the Beta distribution used to determine variant weights; weights are given by dbeta(AF, a, b), where AF is the alternate allele frequency, and a and b are the two parameters specified here. weight.beta = c(1,25) gives the Wu weights; weight.beta = c(0.5, 0.5) is proportional to the Madsen-Browning weights; and weight.beta = c(1,1) gives a weight of 1 to all variants. This input is ignored when weight.user is not NULL.

weight.user

A character string specifying the name of a variable in the variantData slot of the seqData object to be used as variant weights. When left NULL (the default), the weights specified by weight.beta will be used.

test

A character string specifying the type of test to be performed. The possibilities are "Burden" (default) or "SKAT". When this is set to "SKAT" and the parameter rho has multiple values, a SKAT-O test is performed.

burden.test

A character string specifying the type of Burden test to perform when test = "Burden". The possibilities are "Score", "Wald", and "Firth". "Score" can be used for any nullModObj. "Wald" can not be used when the nullModObj is from a mixed model with a binary outcome variable. "Firth" can only be used when the nullModObj is from a logistic regression with a binary outcome variable.

rho

A numeric value (or vector of numeric values) in [0,1] specifying the rho parameter for SKAT. When rho = 0, a standard SKAT test is performed. When rho = 1, a score burden test is performed. When rho is a vector of values, SKAT-O is performed using each of those values as the search space for the optimal rho.

pval.method

A character string specifying which method to use to calculate SKAT p-values. "kuonen" (the default) uses a saddlepoint method; "davies" uses numerical integration; and "liu" uses a moment matching approximation.

verbose

Logical indicator of whether updates from the function should be printed to the console; the default is TRUE.

Value

A list with the following items:

param

A list with model parameters including:

AF.range

The lower and upper bounds on the alternate allele frequency for variants that were included in the analysis.

weight.beta

The two parameters of the Beta distribution used to determine variant weights if used, NULL otherwise.

weight.user

A character string specifying the name of the variable in the variantData slot of the seqData object used as variant weights if used, NULL otherwise.

family

Either "gaussian" for a continous outcome or "binomial" for a binary outcome.

mixedmodel

Logical indicating whether or not a mixed model was used to fit the null model.

test

Specifies whether Burden, SKAT, or SKAT-O tests were performed.

burden.test

If test = "Burden", specifies if Score, Wald, or Firth tests were performed.

rho

The values of rho used in the SKAT or SKAT-O test.

pval.method

The p-value calculation method used in SKAT or SKAT-O tests.

window

A list with the following values:

size

The size of the windows in kb

shift

The distance each window was shifted, in kb, to create the next window.

nsample

A list with the following values:

analysis

The number of samples included in the analysis.

AF

The number of samples used to calculate allele frequencies.

results

A data.frame containing the results from the main analysis. Each row is a separate aggregate test:

chr

The chromosome that the window is on.

window.start

The base position of the start of the window.

window.stop

The base position of the end of the window.

n.site

The number of variant sites included in the test.

dup

Takes the value 1 if the variants in this window are identical to the variants in the previous window; takes the value 0 otherwise.

If test is "Burden":

burden.skew

The skewness of the burden value for all samples.

If burden.test is "Score":

Score

The value of the score function

Var

The variance of the score function

Score.stat

The score chi-squared test statistic

Score.pval

The score p-value

If burden.test is "Wald":

Est

The effect size estimate for a one unit increase in the burden value

SE

The estimated standard error of the effect size estimate

Wald.stat

The Wald chi-squared test statistic

Wald.pval

The Wald p-value

If burden.test is "Firth":

Est

The effect size estimate for a one unit increase in the burden value

SE

The estimated standard error of the effect size estimate

Firth.stat

The Firth test statistic

Firth.pval

The Firth p-value

If test is "SKAT":

Q_rho

The SKAT test statistic for the value of rho specified. There will be as many of these variables as there are rho values chosen.

pval_rho

The SKAT p-value for the value of rho specified. There will be as many of these variables as there are rho values chosen.

err_rho

Takes value 1 if there was an error in calculating the p-value for the value of rho specified when using the "kunonen" or "davies" methods; 0 otherwise. When there is an error, the p-value returned is from the "liu" method. There will be as many of these variables as there are rho values chosen.

When length(rho) > 1 and SKAT-O is performed:

min.pval

The minimum p-value among the p-values calculated for each choice of rho.

opt.rho

The optimal rho value; i.e. the rho value that gave the minimum p-value.

pval_SKATO

The SKAT-O p-value after adjustment for searching across multiple rho values.

variantInfo

A data.frame providing information on all of the variants used in the aggregate tests across all of the windows. The data.frame contains the following information:

variantID

The variant.id value from seqData.

allele

The index of the allele in seqData.

chr

The chromosome the variant is located on.

pos

The position of the variant on the chromosome.

n.obs

The number of samples with observed genotype values at the variant.

freq

The allele frequency calculated using the samples specified by AF.sample (or all samples if AF.sample is NULL) of the alternate allele given by the allele index at the variant.

weight

The weight assigned to the variant in the analysis. A weight of 0 means the variant was excluded.

Author(s)

Matthew P. Conomos

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
library(SeqVarTools)
library(Biobase)

# open a sequencing GDS file
gdsfile <- seqExampleFileName("gds")
gds <- seqOpen(gdsfile)

# simulate some phenotype data
data(pedigree)
pedigree <- pedigree[match(seqGetData(gds, "sample.id"), pedigree$sample.id),]
pedigree$outcome <- rnorm(nrow(pedigree))

# construct a SeqVarData object
seqData <- SeqVarData(gds, sampleData=AnnotatedDataFrame(pedigree))

# fit the null model
nullmod <- fitNullReg(sampleData(seqData), outcome="outcome", covars="sex")

# burden test
assoc <- assocTestSeqWindow(seqData, nullmod, chromosome=22, test="Burden")
head(assoc$results)
head(assoc$variantInfo)

# SKAT test
assoc <- assocTestSeqWindow(seqData, nullmod, chromosome=22, test="SKAT")
head(assoc$results)

# SKAT-O test
assoc <- assocTestSeqWindow(seqData, nullmod, chromosome=22, test="SKAT", rho=seq(0, 1, 0.25))
head(assoc$results)

# user-specified weights
variant.id <- seqGetData(gds, "variant.id")
weights <- data.frame(variant.id, weight=runif(length(variant.id)))
variantData(seqData) <- AnnotatedDataFrame(weights)
assoc <- assocTestSeqWindow(seqData, nullmod, chromosome=22, test="Burden", weight.user="weight")
head(assoc$results)
head(assoc$variantInfo)

seqClose(seqData)