assocTest  R Documentation 
Method for performing a kernelbased association test given a genotype, VCF file, or kernel matrix
## S4 method for signature 'GenotypeMatrix,NullModel' assocTest(Z, model, ranges, kernel=c("linear.podkat", "localsim.podkat", "quadratic.podkat", "linear.SKAT", "localsim.SKAT", "quadratic.SKAT"), width=1000, weights=NULL, weightFunc=betaWeights(), method=NULL, adj=c("automatic", "none", "force"), pValueLimit=0.05) ## S4 method for signature 'matrix,NullModel' assocTest(Z, model, method=NULL, adj=c("automatic", "none", "force"), pValueLimit=0.05) ## S4 method for signature 'TabixFile,NullModel' assocTest(Z, model, ranges, kernel=c("linear.podkat", "localsim.podkat", "quadratic.podkat", "linear.SKAT", "localsim.SKAT", "quadratic.SKAT"), cl=NULL, nnodes=1, batchSize=20, noIndels=TRUE, onlyPass=TRUE, na.limit=1, MAF.limit=1, na.action=c("impute.major", "omit"), MAF.action=c("invert", "omit", "ignore"), sex=NULL, weightFunc=betaWeights(), width=1000, method=NULL, adj=c("automatic", "none", "force"), pValueLimit=(0.1 / length(ranges)), tmpdir=tempdir(), displayProgress=TRUE) ## S4 method for signature 'character,NullModel' assocTest(Z, model, ...)
Z 
an object of class 
model 
an object of class 
ranges 
an object with genomic regions to be tested; may be
an object of class 
kernel 
determines the kernel that should be used for association testing (see Subsection 9.2 of the package vignette for details) 
width 
tolerance radius parameter for positiondependent kernels “linear.podkat”, “quadratic.podkat”, and “localsim.podkat”; must be single positive numeric value; ignored for kernels “linear.SKAT”, “quadratic.SKAT”, and “localsim.SKAT” (see Subsection 9.2 of the package vignette for details) 
weights 
for the method with signature

weightFunc 
function for computing variant weights from minor allele
frequencies (MAFs); see 
method 
identifies the method for computing the pvalues. If the
null model is of type “logistic” and small sample correction
is applied (see argument 
adj 
whether or not to use small sample correction for
logistic models (binary trait with covariates). The choice
“none” turns off small sample correction. If “force”
is chosen, small sample correction is turned on unconditionally.
If “automatic” is chosen (default), small sample correction
is turned on if the number of samples does not exceed 2,000. This
argument is ignored for any type of model except “logistic”
and small sample correction is switched off. For details how to
train a null model for small sample correction, see

pValueLimit 
if the null model is of type “bernoulli”,

cl 
if 
nnodes 
if 
batchSize 
parameter which determines how many
regions of 
noIndels 
if 
onlyPass 
if 
na.limit 
all variants with a missing value ratio above this
threshold in the VCF file 
MAF.limit 
all variants with a minor allele frequency (MAF) above this
threshold in the VCF file 
na.action 
if “impute.major”, all missing values will
be imputed by major alleles before association testing. If
“omit”, all columns containing missing values in the VCF file

MAF.action 
if “invert”, all columns with an MAF exceeding 0.5 will be inverted in the sense that all minor alleles will be replaced by major alleles and vice versa. If “omit”, all variants in the VCF file with an MAF greater than 0.5 are ignored. If “ignore”, no action is taken and MAFs greater than 0.5 are kept as they are. 
sex 
if 
tmpdir 
if computations are parallelized over multiple client
processes (see arguments 
displayProgress 
if 
... 
all other parameters are passed on to the 
The assocTest
method is the main function of the podkat
package. For a given genotype and a null model, it performs the actual
association test(s).
For null models of types “linear” and
“logistic” (see NullModel
and
nullModel
), a variance component score test is used
(see Subsection 9.1 of the package vignette for details).
The test relies on the choice of a particular kernel to measure the
pairwise similarities of genotypes. The choice of the kernel can be
made with the kernel
argument (see computeKernel
and Subsection 9.2 of the package vignette for more details).
For null models of type “linear”, the test statistic follows
a mixture of chisquares distribution. For models of typ
“logistic”, the test statistic approximately follows a mixture
of chisquares distribution. The computation of pvalues for a given
mixture of chisquares can be done according to Davies (1980) (which is
the default), according to Liu et al. (2009), or using a
modified method similar to the one suggested by
Liu et al. (2009) as implemented in the SKAT package,
too. Which method is used can be controlled using the method
argument. If method according to Davies (1980) fails,
assocTest
resorts to the method by Liu et al. (2009).
See also Subsection 9.1 of the package vignette for more details.
For null models of type “logistic”, the assocTest
method also offers the small sample correction suggested by
Lee et al. (2012). Whether small sample correction is applied,
is controlled by the adj
argument. The additional adjustment
of higher moments as suggested by Lee et al. (2012) is
performed whenever resampled null model residuals are available in
the null model model
(slot res.resampled.adj
, see
NullModel
). In this case, the method
argument controls how the excess kurtosis of test statistics sampled
from the null distribution are computed. The
default setting “unbiased” computes unbiased estimates by
using the exact expected value determined from the mixture components.
The settings “population” and
“sample” use almost unbiased and biased sample statistics,
respectively. The choice “SKAT” uses the same method as
implemented in the SKAT package. See Subsection 9.5 of the
package vignette for more details.
If the null model is of type “bernoulli”, the test statistic follows a mixture of Bernoulli distributions. In this case, an exact pvalue is determined that is computed as the probability to observe a test statistic for random Bernoullidistributed traits (under the null hypothesis) that is at least as large as the observed test statistic. For reasons of computational complexity, this option is limited to sample numbers not larger than 100. See Subsection 9.1 of the package vignette for more details.
The podkat package offers multiple interfaces for association
testing all of which require the second argument model
to be a
NullModel
object.
The simplest method is to call assocTest
for an object
of class GenotypeMatrix
as first argument
Z
. If the ranges
argument is not supplied, a single
association test is performed using the entire genotype matrix
contained in Z
and an object of class
AssocTestResult
is returned. In this case,
all variants need to reside on the same chromosome (compare with
computeKernel
). If the ranges
argument is
specified, each region in ranges
is tested separately and the
result is returned as an AssocTestResultRanges
object.
As said, the simplest method is to store the entire genotype in a
GenotypeMatrix
object and to call assocTest
as described above. This approach has the shortcoming that the entire
genotype must be read (e.g. from a VCF file) and kept in memory as a
whole. For large studies, in particular, wholegenome studies, this is
not feasible. In order to be able to cope with large studies, the
podkat package offers an interface that allows for reading from
a VCF file piece by piece without the need to read and store the entire
genotype at once. If Z
is a TabixFile
object or the name of a VCF file, assocTest
reads from the file
in batches of batchSize
regions, performs the association tests
for these regions, and returns the results as an
AssocTestResultRanges
object. This sequential
batch processing can also be parallelized. The user can either set up
a cluster him/herself and pass the
SOCKcluster
object as
cl
argument. If the cl
is NULL
, users can
leave the setup of the cluster to assocTest
. In this case, the
only thing necessary is to determine the number of R client processes
by the nnodes
argument. The variant with the VCF interface
supports the same preprocessing and filter arguments as
readGenotypeMatrix
to control which variants are actually
taken into account and how to handle variants with MAFs greater than
50%.
If the argument Z
is a numeric matrix, Z
is
interpreted as a kernel matrix
\bold{K}. Then a single association test is
performed as described above and the result is returned as an
AssocTestResult
object. This allows the user to
use a custom kernel not currently implemented in the podkat
package. The assocTest
function assumes that row and column
objects in the kernel matrix are in the same order. It does not
perform any check whether row and column names are the same or whether
the kernel matrix is actually positive semidefinite. Users should be
aware that running the function for invalid kernels matrices,
i.e. for a matrix that is not positive semidefinite, produces
meaningless results and may even lead to unexpected errors.
Finally, note that the samples in the null model model
and
in the genotype (GenotypeMatrix
object or VCF file)
need not be aligned to each other. If both the samples in model
and in the genotype are named (i.e. row names are defined
for Z
if it is a GenotypeMatrix
object; VCF files
always contain sample names anyway), assocTest
checks if all
samples in model
are present in the genotype. If so, it
selects only those samples from the genotype that occur in the null
model. If not, it quits with an error. If either the samples in the
null model or the genotypes are not named, assocTest
assumes
that the samples are aligned to each other. This applies only if
the number of samples in the null model and the number of genotypes
are the same or if the number of genotypes equals the number of
samples in the null model plus the number of samples that were omitted
from the null model when it was trained (see
NullModel
and nullModel
).
Otherwise, the function quits with an error.
An analogous procedure is applied if the kernel matrix interface is
used (signature matrix,NullModel
).
an object of class AssocTestResult
or
AssocTestResultRanges
(see details above)
Ulrich Bodenhofer bodenhofer@bioinf.jku.at
http://www.bioinf.jku.at/software/podkat
Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011) Rarevariant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 8293. DOI: 10.1016/j.ajhg.2011.05.029.
Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI Exome Sequencing Project  ESP Lung Project Team, Christiani, D. C., Wurfel, M. M., and Lin, X. (2012) Optimal unified approach for rarevariant association testing with application to smallsample casecontrol wholeexome sequencing studies. Am. J. Hum. Genet. 91, 224237. DOI: 10.1016/j.ajhg.2012.06.007.
Davies, R. B. (1980) The distribution of a linear combination of chi^2 random variables. J. R. Stat. Soc. Ser. CAppl. Stat. 29, 323333.
Liu, H., Tang, Y., and Zhang, H. (2009) A new chisquare approximation to the distribution of nonnegative definite quadratic forms in noncentral normal variables. Comput. Stat. Data Anal. 53, 853856.
AssocTestResult
,
AssocTestResultRanges
, nullModel
,
NullModel
, computeKernel
,
weightFuncs
, readGenotypeMatrix
,
GenotypeMatrix
,
plot
, qqplot
, p.adjust
,
filterResult
## load genome description data(hgA) ## partition genome into overlapping windows windows < partitionRegions(hgA) ## load genotype data from VCF file vcfFile < system.file("examples/example1.vcf.gz", package="podkat") Z < readGenotypeMatrix(vcfFile) ## read phenotype data from CSV file (continuous trait + covariates) phenoFile < system.file("examples/example1lin.csv", package="podkat") pheno.c < read.table(phenoFile, header=TRUE, sep=",") ## train null model with all covariates in data frame 'pheno' model.c < nullModel(y ~ ., pheno.c) ## perform association test res < assocTest(Z, model.c, windows) print(res) ## perform association test using the VCF interface res < assocTest(vcfFile, model.c, windows, batchSize=100) print(res) ## create Manhattan plot of adjusted pvalues plot(p.adjust(res), which="p.value.adj") ## read phenotype data from CSV file (binary trait + covariates) phenoFile < system.file("examples/example1log.csv", package="podkat") pheno.b <read.table(phenoFile, header=TRUE, sep=",") ## train null model with all covariates in data frame 'pheno' model.b < nullModel(y ~ ., pheno.b) ## perform association test res < assocTest(Z, model.b, windows) print(res) ## create Manhattan plot of adjusted pvalues plot(p.adjust(res), which="p.value.adj")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.