Largesample Test for a Regression Coefficient in an Negative Binomial Regression Model
Description
test.coefficient
performs largesample tests
(higherorder asymptotic test, likelihood ratio test,
and/or Wald test) for testing regression coefficients in an
NB regression model.
Usage
1 2  test.coefficient(nb, dispersion, x, beta0, tests = c("HOA", "LR", "Wald"),
alternative = "two.sided", subset = 1:m, print.level = 1)

Arguments
nb 
an NB data object, output from

dispersion 
dispersion estimates, output from

x 
an n by p design matrix describing the treatment structure 
beta0 
a pvector specifying the null hypothesis. NonNA components specify the parameters to test and their null values. (Currently, only onedimensional test is implemented, so only one nonNA component is allowed). 
tests 
a character string vector specifying the
tests to be performed, can be any subset of 
alternative 
a character string specifying the
alternative hypothesis, must be one of 
subset 
an index vector specifying on which rows should the tests be performed 
print.level 
a number controlling the amount of messages printed: 0 for suppressing all messages, 1 (default) for basic progress messages, and 2 to 5 for increasingly more detailed message. 
Details
test.coefficient
performs largesample tests for a
onedimensional (q=1) component ψ of the
pdimensional regression coefficient β. The
hypothesized value ψ_0 of ψ is specified
by the nonNA component of the vector beta0
in the
input.
The likelihood ratio statistic,
λ = 2 (l(\hatβ)  l(\tildeβ)),
converges in distribution to a chisquare distribution with 1 degree of freedom. The signed square root of the likelihood ratio statistic λ, also called the directed deviance,
r = sign (\hatψ  ψ_0) √ λ
converges to a standard normal distribution.
For testing a onedimensional parameter of interest, BarndorffNielsen (1986, 1991) showed that a modified directed
r^* = r  \frac{1}{r} \log(z)
is, in wide generality, asymptotically standard normally distributed to a higher order of accuracy than the directed deviance r itself, where z is an adjustment term. Tests based on highorder asymptotic adjustment to the likelihood ratio statistic, such as r^* or its approximation, are referred to as higherorder asymptotic (HOA) tests. They generally have better accuracy than corresponding unadjusted likelihood ratio tests, especially in situations where the sample size is small and/or when the number of nuisance parameters (pq) is large. The implementation here is based on Skovgaard (2001). See Di et al. 2013 for more details.
Value
a list containing the following components:
beta.hat 
an m by p matrix of regression coefficient under the full model 
mu.hat 
an m by n matrix of fitted mean frequencies under the full model 
beta.tilde 
an m by p matrix of regression coefficient under the null model 
mu.tilde 
an m by n matrix of fitted mean frequencies under the null model. 
HOA, LR,
Wald 
each is a list of two mvectors,

References
BarndorffNielsen, O. (1986): "Infereni on full or partial parameters based on the standardized signed log likelihood ratio," Biometrika, 73, 307322
BarndorffNielsen, O. (1991): "Modified signed log likelihood ratio," Biometrika, 78, 557563.
Skovgaard, I. (2001): "Likelihood asymptotics," Scandinavian Journal of Statistics, 28, 332.
Di Y, Schafer DW, Emerson SC, Chang JH (2013): "Higher order asymptotics for negative binomial regression inferences from RNAsequencing data". Stat Appl Genet Mol Biol, 12(1), 4970.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  ## Load Arabidopsis data
data(arab);
## Estimate normalization factors (we want to use the entire data set)
norm.factors = estimate.norm.factors(arab);
## Prepare the data
## For demonstration purpose, only the first 50 rows are used
nb.data = prepare.nb.data(arab[1:50,], lib.sizes = colSums(arab), norm.factors = norm.factors);
## For real analysis, we will use the entire data set, and can omit lib.sizes parameter)
## nb.data = prepare.nb.data(arab, norm.factors = norm.factors);
print(nb.data);
plot(nb.data);
## Specify the model matrix (experimental design)
grp.ids = as.factor(c(1, 1, 1, 2, 2, 2));
x = model.matrix(~grp.ids);
## Estimate dispersion model
dispersion = estimate.dispersion(nb.data, x);
print(dispersion);
plot(dispersion);
## Specify the null hypothesis
## The null hypothesis is beta[2]=0 (beta[2] is the log fold change).
beta0 = c(NA, 0);
## Test regression coefficient
res = test.coefficient(nb.data, dispersion, x, beta0);
## The result contains the data, the dispersion estimates and the test results
print(str(res));
## Show HOA test results for top ten most differentially expressed genes
top = order(res$HOA$p.values)[1:10];
print(cbind(nb.data$counts[top,], res$HOA[top,]));
## Plot log fold change versus the fitted mean of sample 1 (analagous to an MAplot).
plot(res$mu.tilde[,1], res$beta.hat[,2]/log(2), log="x",
xlab="Fitted mean of sample 1 under the null",
ylab="Log (base 2) fold change");
## Highlight top DE genes
points(res$mu.tilde[top,1], res$beta.hat[top,2]/log(2), col="magenta");
