Fit Negative Binomial Regression Model and Test for a Regression Coefficient

Description

For each row of the input data matrix, nb.glm.test fits an NB log-linear regression model and performs large-sample tests for a one-dimensional regression coefficient.

Usage

1
2
3
4
nb.glm.test(counts, x, beta0, lib.sizes = colSums(counts),
  normalization.method = "AH2010", dispersion.model = "NBQ",
  tests = c("HOA", "LR", "Wald"), alternative = "two.sided",
  subset = 1:dim(counts)[1])

Arguments

counts

an m by n matrix of RNA-Seq read counts with rows corresponding to gene features and columns corresponding to independent biological samples.

x

an n by p design matrix specifying the treatment structure.

beta0

a p-vector specifying the null hypothesis. Non-NA components specify the parameters to test and their null values.

lib.sizes

a p-vector of observed library sizes, usually (and by default) estimated by column totals.

normalization.method

a character string specifying the method for estimating the normalization factors, can be NULL or "AH2010". If method=NULL, the normalization factors will have values of 1 (i.e., no normalization is applied); if method="AH2010", the normalization method proposed by Anders and Huber (2010) will be used.

dispersion.model

a character string specifying the dispersion model, and can be one of "NB2", "NBP", "NBQ" (default), "NBS" or "step".

tests

a character string vector specifying the tests to be performed, can be any subset of "HOA" (higher-order asymptotic test), "LR" (likelihood ratio test), and "Wald" (Wald test).

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

subset

specify a subset of rows to perform the test on

Details

nbp.glm.test provides a simple, one-stop interface to performing a series of core tasks in regression analysis of RNA-Seq data: it calls estimate.norm.factors to estimate normalization factors; it calls prepare.nb.data to create an NB data structure; it calls estimate.dispersion to estimate the NB dispersion; and it calls test.coefficient to test the regression coefficient.

To keep the interface simple, nbp.glm.test provides limited options for fine tuning models/parameters in each individual step. For more control over individual steps, advanced users can call estimate.norm.factors, prepare.nb.data, estimate.dispersion, and test.coefficient directly, or even substitute one or more of them with their own versions.

Value

A list containing the following components:

data

a list containing the input data matrix with additional summary quantities, output from prepare.nb.data.

dispersion

dispersion estimates and models, output from estimate.dispersion.

test

test results, output from test.coefficient.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Load Arabidopsis data
data(arab);

## Specify treatment structure
grp.ids = as.factor(c(1, 1, 1, 2, 2, 2));
x = model.matrix(~grp.ids);

## Specify the null hypothesis
## The null hypothesis is beta[1]=0 (beta[1] is the log fold change).
beta0 = c(NA, 0);

## Fit NB regression model and perform large sample tests.
## The step can take long if the number of genes is large
fit = nb.glm.test(arab, x, beta0, subset=1:50);

## The result contains the data, the dispersion estimates and the test results
print(str(fit));

## Show HOA test results for top ten genes
subset = order(fit$test.results$HOA$p.values)[1:10];
cbind(fit$data$counts[subset,], fit$test.results$HOA[subset,]);

## Show LR test results
subset = order(fit$test.results$LR$p.values)[1:10];
cbind(fit$data$counts[subset,], fit$test.results$LR[subset,]);