View source: R/nbp.exact.test.R
exact.nb.test | R Documentation |
exact.nb.test
performs the Robinson and Smyth exact
negative binomial (NB) test for differential gene
expression on each gene and summarizes the results using
p-values and q-values (FDR).
exact.nb.test(obj, grp1, grp2, print.level = 1)
obj |
output from |
grp1 |
Identifier of group 1. A number, character or string (should match at least one of the obj$grp.ids). |
grp2 |
Identifier of group 2. A number, character or string (should match at least one of the obj$grp.ids). |
print.level |
a number. Controls the amount of messages printed: 0 for suppressing all messages, 1 for basic progress messages, larger values for more detailed messages. |
The negative binomial (NB) distribution offers a more realistic model for RNA-Seq count variability and still permits an exact (non-asymptotic) test for comparing expression levels in two groups.
For each gene, let S_1, S_2 be the sums of gene counts from all biological replicates in each group. The exact NB test is based on the conditional distribution of S_1|S_1+S_2: a value of S_1 that is too big or too small, relative to the sum S_1+S_2, indicates evidence for differential gene expression. When the effective library sizes are the same in all replicates and the dispersion parameters are known, we can determine the probability functions of S_1, S_2 explicitly. The exact p-value is computed as the total conditional probability of all possible values of (S_1, S_2) that have the same sum as but are more extreme than the observed values of (S_1, S_2).
Note that we assume that the NB dispersion parameters for the two groups are the same and library sizes (column totals of the count matrix) are the same.
the list obj
from the input with the following added
components:
grp1 |
same as input. |
grp2 |
same as input. |
pooled.pie |
estimated pooled mean of relative count frequencies in the two groups being compared. |
expression.levels |
a matrix of estimated gene
expression levels as indicated by mean relative read
frequencies. It has three columns |
log.fc |
base 2 log fold change in mean relative frequency between two groups. |
p.values |
p-values of the exact NB test applied to each gene (row). |
q.values |
q-values (estimated FDR) corresponding to the p-values. |
Before calling exact.nb.test
, the user should
call estimate.norm.factors
to estimate
normalization factors, call prepare.nbp
to
adjust library sizes, and call estimate.disp
to fit a dispersion model. The exact NB test will be
performed using pseudo.counts
in the list
obj
, which are normalized and adjusted to have the
same effective library sizes (column sums of the count
matrix, multiplied by normalization factors).
Users not interested in fine tuning the underlying
statistical model should use nbp.test
instead. The all-in-one function nbp.test
uses sensible approaches to normalize the counts, estimate
the NBP model parameters and test for differential gene
expression.
A test will be performed on a row (a gene) only when the total row count is nonzero, otherwise an NA value will be assigned to the corresponding p-value and q-value.
nbp.test
.
## Load Arabidopsis data data(arab); ## Specify treatment groups ## grp.ids = c(1, 1, 1, 2, 2, 2); # Numbers or strings are both OK grp.ids = rep(c("mock", "hrcc"), each=3); ## Estimate normalization factors norm.factors = estimate.norm.factors(arab); print(norm.factors); ## Prepare an NBP object, adjust the library sizes by thinning the ## counts. For demonstration purpose, only use the first 100 rows of ## the arab data. set.seed(999); obj = prepare.nbp(arab[1:100,], grp.ids, lib.size=colSums(arab), norm.factors=norm.factors); print(obj); ## Fit a dispersion model (NBQ by default) obj = estimate.disp(obj); plot(obj); ## Perform exact NB test ## grp1 = 1; ## grp2 = 2; grp1 = "mock"; grp2 = "hrcc"; obj = exact.nb.test(obj, grp1, grp2); ## Print and plot results print(obj); par(mfrow=c(3,2)); plot(obj);
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.