Aggregate Association Testing with Sequencing Data

Description

assocTestSeq performs aggregate association tests with sequencing data using the null model fit with fitNullMM or fitNullReg.

Usage

1
2
3
4
assocTestSeq(seqData, nullModObj, aggVarList, AF.sample = NULL,
            AF.range = c(0,1), weight.beta = c(0.5, 0.5), weight.user = NULL,
            test = "Burden", burden.test = "Score", rho = 0, 
            pval.method = "kuonen", verbose = TRUE)

Arguments

seqData

An object of class SeqVarData from the package SeqVarTools containing the sequencing genotype data for variants and samples to be used for the analysis.

nullModObj

A null model object returned by fitNullMM when using mixed models or fitNullReg when using linear or logistic regression.

aggVarList

A list specifying the variant aggregation units to be tested; each element of the list represents one aggregate test. Each element of the list should be a data.frame that contains at least two columns: variant.id matching the variant.id in seqData for the variants that should be aggregated, and allele.index specifying which allele in seqData is to be tested at that variant.id. Multiple alleles can be included at the same variant location by including multiple rows with the same variant.id and different allele.index values. allele.index=0 indicates the reference allele, allele.index=1 is the first alternate allele, allele.index=2 is the second alternate allele (for multiallelic variants), and so on up to the number of possible alleles per variant.

AF.sample

A vector of sample.id values specifying which samples should be used for allele frequency calculation. When NULL (the default), all samples included in the test are used. Allele frequency calculation will affect variant inclusion based on AF.range and variant weighting based on weight.beta.

AF.range

A numeric vector of length two specifying the lower and upper bounds on the alternate allele frequency for variants to be included in the analysis. Variants with alternate allele frequencies outside of this range are given a weight of 0 (i.e. excluded).

weight.beta

A numeric vector of length two specifying the two parameters of the Beta distribution used to determine variant weights; weights are given by dbeta(AF, a, b), where AF is the alternate allele frequency, and a and b are the two parameters specified here. weight.beta = c(1,25) gives the Wu weights; weight.beta = c(0.5, 0.5) is proportional to the Madsen-Browning weights; and weight.beta = c(1,1) gives a weight of 1 to all variants. This input is ignored when weight.user is not NULL.

weight.user

A character string specifying the name of a variable in the variantData slot of the seqData object to be used as variant weights. When left NULL (the default), the weights specified by weight.beta will be used.

test

A character string specifying the type of test to be performed. The possibilities are "Burden" (default) or "SKAT". When this is set to "SKAT" and the parameter rho has multiple values, a SKAT-O test is performed.

burden.test

A character string specifying the type of Burden test to perform when test = "Burden". The possibilities are "Score", "Wald", and "Firth". "Score" can be used for any nullModObj. "Wald" can not be used when the nullModObj is from a mixed model with a binary outcome variable. "Firth" can only be used when the nullModObj is from a logistic regression with a binary outcome variable.

rho

A numeric value (or vector of numeric values) in [0,1] specifying the rho parameter for SKAT. When rho = 0, a standard SKAT test is performed. When rho = 1, a score burden test is performed. When rho is a vector of values, SKAT-O is performed using each of those values as the search space for the optimal rho.

pval.method

A character string specifying which method to use to calculate SKAT p-values. "kuonen" (the default) uses a saddlepoint method; "davies" uses numerical integration; and "liu" uses a moment matching approximation.

verbose

Logical indicator of whether updates from the function should be printed to the console; the default is TRUE.

Value

A list with the following items:

param

A list with model parameters including:

AF.range

The lower and upper bounds on the alternate allele frequency for variants that were included in the analysis.

weight.beta

The two parameters of the Beta distribution used to determine variant weights if used, NULL otherwise.

weight.user

A character string specifying the name of the variable in the variantData slot of the seqData object used as variant weights if used, NULL otherwise.

family

Either "gaussian" for a continous outcome or "binomial" for a binary outcome.

mixedmodel

Logical indicating whether or not a mixed model was used to fit the null model.

test

Specifies whether Burden, SKAT, or SKAT-O tests were performed.

burden.test

If test = "Burden", specifies if Score, Wald, or Firth tests were performed.

rho

The values of rho used in the SKAT or SKAT-O test.

pval.method

The p-value calculation method used in SKAT or SKAT-O tests.

nsample

A list with the following values:

analysis

The number of samples included in the analysis.

AF

The number of samples used to calculate allele frequencies.

results

A data.frame containing the results from the main analysis. Each row is a separate aggregate test:

n.site

The number of variant sites included in the test.

n.sample.alt

The number of samples with an observed alternate allele at any variant in the aggregate set.

If test is "Burden":

burden.skew

The skewness of the burden value for all samples.

If burden.test is "Score":

Score

The value of the score function

Var

The variance of the score function

Score.stat

The score chi-squared test statistic

Score.pval

The score p-value

If burden.test is "Wald":

Est

The effect size estimate for a one unit increase in the burden value

SE

The estimated standard error of the effect size estimate

Wald.stat

The Wald chi-squared test statistic

Wald.pval

The Wald p-value

If burden.test is "Firth":

Est

The effect size estimate for a one unit increase in the burden value

SE

The estimated standard error of the effect size estimate

Firth.stat

The Firth test statistic

Firth.pval

The Firth p-value

If test is "SKAT":

Q_rho

The SKAT test statistic for the value of rho specified. There will be as many of these variables as there are rho values chosen.

pval_rho

The SKAT p-value for the value of rho specified. There will be as many of these variables as there are rho values chosen.

err_rho

Takes value 1 if there was an error in calculating the p-value for the value of rho specified when using the "kunonen" or "davies" methods; 0 otherwise. When there is an error, the p-value returned is from the "liu" method. There will be as many of these variables as there are rho values chosen.

When length(rho) > 1 and SKAT-O is performed:

min.pval

The minimum p-value among the p-values calculated for each choice of rho.

opt.rho

The optimal rho value; i.e. the rho value that gave the minimum p-value.

pval_SKATO

The SKAT-O p-value after adjustment for searching across multiple rho values.

variantInfo

A list with as many elements as aggregate tests performed. Each element of the list is a data.frame providing information on the variants used in the aggregate test with results presented in the corresponding row of results. Each of these data.frames has the following information:

variantID

The variant.id value from seqData.

allele

The index of the allele in seqData.

chr

The chromosome the variant is located on.

pos

The position of the variant on the chromosome.

n.obs

The number of samples with observed genotype values at the variant.

freq

The allele frequency calculated using the samples specified by AF.sample (or all samples if AF.sample is NULL) of the alternate allele given by the allele index at the variant.

weight

The weight assigned to the variant in the analysis. A weight of 0 means the variant was excluded.

Author(s)

Matthew P. Conomos