Description Usage Arguments Details Value Author(s) References Examples
This funtion tests for differences in minor allele frequency between groups and is based on extra-binomial variation model for pooled sequencing data.
1 |
R |
A matrix with rows indexed by SNPs and columns by pools. The entries are counts of allele 1. |
R.alt |
A similarly formatted matrix containing the counts of allele 2. |
cc |
A case/control indicator vector with length = number of pools containing 0s (control pool) and 1s (case pool). |
n |
Number of chromosomes (twice the number of subjects) in each pooled sample. |
tol |
Maximum difference between coefficient values in successive glm before we can stop, the default=0.001. |
a.start |
An intial value for the parameter a in linear regression, the default=1. |
b.start |
An intial value for the parameter b in linear regression, the default=1. |
max.it |
Maximum iterations, the default=1000. |
digits |
How many significant digits are to be used for allele frequency and p-value. The default, 'NULL', uses 'getOption(digits)'. |
model.maf |
A logical value indicating whether to allow the modelled error structure to depend on allele frequency (the default) or just read depth. The default=TRUE. |
R and R.alt contain the read counts for the major allele and the alternative allele respectively and are required to have the same dimension.
The extra-binomial model defined: E(R/N)=p, Var(R/N)=p(1-p)(a/n+b/N) when N=R+R.alt
We denote: W=1/(a/n+b/N), which may be interpreted as the adjusted depth of pool j for SNP i. Given the expected quantities: E(r2)=1/W=a/n+b/N, the parameters a and b can be estimated by linear regression of r2 on 1/N, giving a/n as the intercept and b as the slope. If model.maf=TRUE, W=1/(a/n+b/N+b2*p+b3*p^2) and two additional parameters (b2 and b3) are estimated. This regression is carried out using generalized linear model (GLM) by first adopting Gaussian errors to estimate a relatively good start value of a and b, and then using these start values to do GLM with gamma errors and identity link because both a and b are positive.
Since the estimated allele frequency p depends on a and b, the calculations are carried out iteratively.
A chi-square test is performed on a 2*2 table using the weighted allele counts to calculate the p-value.
A list containing the following components:
result |
a data.frame with three columns: the first shows the minor allele frequency of controls; the second shows the minor allele freqeuncy of cases; the third shows the p-value. Each row stands for a SNP. |
parameters |
a character vector indicating the values of the parameters a and b (and b2, b3 if model.maf=TRUE) in the linear regression and and the times of iteration. |
Xin Yang, Chris Wallace
Yang et al. "Extra-binomial variation approach for analysis of pooled DNA sequencing data", under review.
1 2 3 4 5 6 7 | R<-matrix(c(1409,1530,1490,1630,924,998,1000,1012),nrow=2,ncol=4,byrow=TRUE)
R.alt<-matrix(c(170,210,192,209,13,14,30,38),nrow=2,ncol=4,byrow=TRUE)
cc<-c(0,0,1,1)
n=96
exbio(R, R.alt, cc, n, max.it = 100, digits=3)
##=> p.value = 9.91e-01 for SNP1 and 4.01e-11 for SNP2,
##so association for SNP2 is established, but not for SNP1.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.