mglm: Fit Negative Binomial Generalized Linear Models to Multiple...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Fit the same log-link negative binomial or Poisson generalized linear model (GLM) to each row of a matrix of counts.

Usage

1
2
3
4
5
6
7
mglmOneGroup(y, dispersion = 0, offset = 0, weights = NULL,
             coef.start = NULL, maxit = 50, tol = 1e-10, verbose = FALSE)
mglmOneWay(y, design = NULL, group = NULL, dispersion = 0, offset = 0, weights = NULL,
             coef.start = NULL, maxit = 50, tol = 1e-10)
mglmLevenberg(y, design, dispersion = 0, offset = 0, weights = NULL,
             coef.start = NULL, start.method = "null", maxit = 200, tol = 1e-06)
designAsFactor(design)

Arguments

y

numeric matrix containing the negative binomial counts. Rows for genes and columns for libraries.

design

numeric matrix giving the design matrix of the GLM. Assumed to be full column rank. This is a required argument for mglmLevernberg and code{designAsFactor}. For mglmOneWay, it defaults to model.matrix(~0+group) if group is specified and otherwise to model.matrix( ~1).

group

factor giving group membership for oneway layout. If both design and group are both specified, then they must agree in terms of designAsFactor. If design=NULL, then a group-means design matrix is implied.

dispersion

numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.

offset

numeric vector or matrix giving the offset that is to be included in the log-linear model predictor. Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same size as y.

weights

numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

coef.start

numeric matrix of starting values for the linear model coefficients. Number of rows should agree with y and number of columns should agree with design. For mglmOneGroup, a numeric vector or a matrix with one column. This argument does not usually need to be set as the automatic starting values perform well.

start.method

method used to generate starting values when coef.stat=NULL. Possible values are "null" to start from the null model of equal expression levels or "y" to use the data as starting value for the mean.

tol

numeric scalar giving the convergence tolerance. For mglmOneGroup, convergence is judged successful when the step size falls below tol in absolute size.

maxit

integer giving the maximum number of iterations for the Fisher scoring algorithm. The iteration will be stopped when this limit is reached even if the convergence criterion hasn't been satisfied.

verbose

logical. If TRUE, warnings will be issued when maxit iterations are exceeded before convergence is achieved.

Details

These functions are low-level work-horses used by higher-level functions in the edgeR package, especially by glmFit. Most users will not need to call these functions directly.

The functions mglmOneGroup, mglmOneWay and mglmLevenberg all fit a negative binomial GLM to each row of y. The row-wise GLMS all have the same design matrix but possibly different dispersions, offsets and weights. These functions are all low-level in that they operate on atomic objects (numeric matrices and vectors).

mglmOneGroup fits an intercept only model to each response vector. In other words, it treats all the libraries as belonging to one group. It implements Fisher scoring with a score-statistic stopping criterion for each gene. Excellent starting values are available for the null model so this function seldom has any problems with convergence. It is used by other edgeR functions to compute the overall abundance for each gene.

mglmOneWay fits a oneway layout to each response vector. It treats the libraries as belonging to a number of groups and calls mglmOneGroup for each group.

mglmLevenberg fits an arbitrary log-linear model to each response vector. It implements a Levenberg-Marquardt modification of the GLM scoring algorithm to prevent divergence. The main computation is implemented in C++.

All these functions treat the dispersion parameter of the negative binomial distribution as a known input.

designAsFactor is used to convert a general design matrix into a oneway layout if that is possible. It determines how many distinct row values the design matrix is capable of computing and returns a factor with a level for each possible distinct value.

Value

mglmOneGroup produces a numeric vector of coefficients.

mglmOneWay produces a list with the following components:

coefficients

matrix of estimated coefficients for the linear models. Rows correpond to rows of y and columns to columns of design.

fitted.values

matrix of fitted values. Of same dimensions as y.

mglmLevenberg produces a list with the following components:

coefficients

matrix of estimated coefficients for the linear models.

fitted.values

matrix of fitted values.

deviance

numeric vector of residual deviances.

iter

number of iterations used.

fail

logical vector indicating genes for which the maximum damping was exceeded before convergence was achieved.

designAsFactor returns a factor of length equal to nrow(design).

Author(s)

Gordon Smyth, Yunshun Chen, Davis McCarthy, Aaron Lun. C++ code by Aaron Lun.

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. https://doi.org/10.1093/nar/gks042

See Also

Most users will call either glmFit, the higher-level function offering more object-orientated GLM modelling of DGE data, or else exactTest, which is designed for oneway layouts.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
y <- matrix(rnbinom(1000, mu = 10, size = 2), ncol  =  4)
lib.size <- colSums(y)
dispersion <- 0.1

## Compute intercept for each row
beta <- mglmOneGroup(y, dispersion = dispersion, offset = log(lib.size))

## Unlogged intercepts add to one:
sum(exp(beta))

## Fit the NB GLM to the counts with a given design matrix
f1 <- factor(c(1,1,2,2))
f2 <- factor(c(1,2,1,2))
X <- model.matrix(~ f1 + f2)
fit <- mglmLevenberg(y, X, dispersion = dispersion, offset = log(lib.size))
head(fit$coefficients)

Example output

Loading required package: limma
[1] 1.00014
     (Intercept)         f12         f22
[1,]   -6.230713 -0.04872134  0.59887334
[2,]   -5.934499 -0.08597506 -0.08597506
[3,]   -6.869556  0.16821078  0.58245659
[4,]   -4.376166 -1.14410747 -0.74643541
[5,]   -5.056449 -0.46044808 -0.95480743
[6,]   -6.545261  0.92059705  0.52755817

edgeR documentation built on Jan. 16, 2021, 2:03 a.m.