nb.gof.m: Main Function of Implementing Simulation-based...
In gu-mi/NBGOF: Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

Description Usage Arguments Details Value Author(s) References Examples

This function performs simulation-based goodness-of-fit tests of different negative binomial dispersion models for an RNA-Seq dataset with a known design matrix. Currently supported models are implemented in the NBPSeq and edgeR packages.

1 2	nb.gof.m(counts, x, lib.sizes=colSums(counts), sim=999, model=NULL, method=NULL, min.n=100, prior.df = 10, seed=1, ncores = NULL, ...)

`counts`	an m-by-n count matrix of non-negative integers. For a typical RNA-Seq experiment, this is the read counts with m genes and n samples.
`x`	an n-by-p design matrix.
`lib.sizes`	library sizes of an RNA-Seq experiment. Default is the column sums of the `counts` matrix.
`sim`	number of simulations performed.
`model`	a string of characters specifying the negative binomial dispersion model used to fit the data. Currently the following dispersion models are available to be checked for goodness-of-fit: NBP dispersion model in the `NBPSeq` package (`NBP`) NBQ dispersion model in the `NBPSeq` pacakge (`NBQ`) NBS dispersion model in the `NBPSeq` pacakge (`NBS`) NBSTEP dispersion model in the `NBPSeq` pacakge (`NBSTEP`) NB common dispersion model in the `edgeR` package (`Common`) NB genewise in the `NBPSeq` package (`Genewise`) NB trended (non-parametric) dispersion model in the `edgeR` package (`Trended`) NB tagwise-common dispersion model in the `edgeR` package (`Tagwise-Common`) NB tagwise-trended dispersion model in the `edgeR` package (`Tagwise-Trend`) Users should specify exactly the same characters as the ones in paratheses above for each dispersion model.
`method`	method for estimating dispersions. MAPL: maximum adjusted profile likelihood other estimation methods from `edgeR` can also be specified.
`min.n`	for `Trended` model only: specify the minimim number of genes in a bin (default is 100).
`prior.df`	control of the shrinkage applied for `edgeR` genewise, tagwise (and its variants) dispersion models. Setting `prior.df=0` gives the genewise model without any shrinkage. Setting `prior.df=10` (default in `edgeR`) gives the tagwise model with shrinkage, either towards the common dispersion (`Tagwise-Common`) or the global dispersion trend (`Tagwise-Trend`). In the paper of McCarthy, DJ et al. (2012), in the formula G_0 = prior.df/df = 20/df: df is the "residual df" (equals the number of libraries minus the number of distinct treatment groups), and prior.df=20. For example, if there are six libraries and two distinct groups, df = residual df = 4. If we set prior.df = 20, then G_0 = 5. G_0 is the "prior.n" in `edgeR`'s earlier versions. Users should be careful in choosing appropriate prior.df when using tagwise models.
`seed`	initial random number seed for reproducibility of re-simulations (default is 1).
`ncores`	number of CPU cores to use. If unspecified, use the total number of CPUs minus 1.
`...`	(for future use)

When the response is a count matrix, we can use this function to test the goodness-of-fit of a specified negative binomial dispersion model. Specifically, this function calls model.nbp.m to fit the NBP dispersion model, calls model.nbq.m to fit the NBQ dispersion model, calls model.edgeR.common to fit the common dispersion model, calls model.edgeR.genewise to fit the pure genewise dispersion model (without shrinkage), calls model.edgeR.trended to fit the non-parametric trended dispersion model in edgeR, calls model.edgeR.tagcom to fit the tagwise model with the dispersions shrinking towards a common dispersion, and calls model.edgeR.tagtrd to fit the tagwise model with the dispersions shrinking towards the global dispersion trend.

An object of class "gofm" to which other methods can be applied.

Gu Mi <neo.migu@gmail.com>, Yanming Di, Daniel Schafer

Mi, G, Di, Y, & Schafer, DW (2015). Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data. PLOS ONE, 10(3).

Di Y, Schafer DW, Cumbie JS, and Chang JH (2011): "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq", Statistical Applications in Genetics and Molecular Biology, 10 (1).

McCarthy DJ, Chen Y and Smyth GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.

Cox, DR, and Reid, N (1987). Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society Series B 49, 1-39.

See https://github.com/gu-mi/NBGOF/wiki/ for more details.

## load package into session:
library(NBGOF)

## basic set-up of the model:
seed = 539
sim = 999
conf.env = 0.95

## basic parameter specifications:
m = 1000 # 1000 genes
n = 6    # 6 samples
s = 1e6  # lib.sizes
offset = log(s)  # make sure offset for NBP functions should take the log

## simulate coefficients:
beta = matrix(0, m, 1)  # beta0 only
set.seed(seed)
beta[,1] = rnorm(m, 5, 2) - offset   # beta0 ~ normal (mean and a based on real RNA-Seq data)

## design matrix (single group only):
x = matrix(rep(1,n))

## specify mean levels:
mu = round(t(s * exp(x %*% t(beta))))
pi = mu/s

## simulate an m-by-n count matrix mimicking a RNA-Seq dataset:
## simulate NB1.8 data with random uniform noise added to the dispersion
alpha1 = -0.2  # NB1.8
phi0 = 0.02
alpha0 = log(phi0)
phi.nbp = phi0 * pi^alpha1
range(phi.nbp)
cbind(mu[,1], phi.nbp[,1])

## add noise:
a = 0.1
set.seed(seed)  
phi.noi.vec = phi.nbp[ ,1] * exp(matrix(runif(m, -a, a), nr=m, nc=1))
phi.noi = matrix(phi.noi.vec, nr=m, nc=n)
range(phi.noi)
cbind(mu[,1], phi.noi[,1])  # make sure phi's are in reasonable range

## generate NBP response with added noise:
set.seed(seed)
y = rnbinom(m * n, mu=mu, size=1/phi.noi)
dim(y) = dim(mu)
rownames(y) = paste("g", seq(1,m), sep="")
colnames(y) = paste("s", seq(1,n), sep="")
plot(mu, phi.noi, log="xy")

## GOF tests for different dispersion models, using parallel computing:
## CAUTION: may be time-consuming depending on the size of data and simulations!
nc = detectCores() - 1
fnbp.noip = nb.gof.m(counts=y, x=x, sim=sim, model="NBP", method="MAPL", seed=1, ncores=nc)
fnbq.noip = nb.gof.m(counts=y, x=x, sim=sim, model="NBQ", method="MAPL", seed=1, ncores=nc)
fcom.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Common", method="CoxReid", seed=1, ncores=nc)
fgen.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Genewise", method="auto", seed=1, ncores=nc)
ftgc.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Tagwise-Common", method="CoxReid", seed=1, ncores=nc)
ftgt.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Tagwise-Trend", method="auto", seed=1, ncores=nc)
ftrd.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Trended", method="auto", seed=1, ncores=nc)

## summarize the GOF test results:
summary(fnbp.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fnbq.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fcom.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fgen.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftgc.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftgt.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftrd.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")

## evaluate GOF p-values using histograms and quantile-quantile uniform plots:
# pdf("plots-NBP-01-1grp.pdf", width=20, height=20)
h1 = diagPlot(fnbp.noip, type="hist", binwidth=0.1)
q1 = diagPlot(fnbp.noip, type="qq")
h2 = diagPlot(fnbq.noip, type="hist", binwidth=0.1)
q2 = diagPlot(fnbq.noip, type="qq")
h3 = diagPlot(fcom.noip, type="hist", binwidth=0.1)
q3 = diagPlot(fcom.noip, type="qq")
h4 = diagPlot(fgen.noip, type="hist", binwidth=0.1)
q4 = diagPlot(fgen.noip, type="qq")
h5 = diagPlot(ftgc.noip, type="hist", binwidth=0.1)
q5 = diagPlot(ftgc.noip, type="qq")
h6 = diagPlot(ftgt.noip, type="hist", binwidth=0.1)
q6 = diagPlot(ftgt.noip, type="qq")
h7 = diagPlot(ftrd.noip, type="hist", binwidth=0.1)
q7 = diagPlot(ftrd.noip, type="qq")
multiplot(h1, q1, h2, q2, h3, q3, h4, q4, h5, q5, h6, q6, h7, q7, cols=4, layout=matrix(seq(1,16), nrow=4, byrow=TRUE))
# dev.off()

gu-mi/NBGOF documentation built on Oct. 25, 2020, 3:30 a.m.

gu-mi/NBGOF index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gu-mi/NBGOF
Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

nb.gof.m: Main Function of Implementing Simulation-based...
In gu-mi/NBGOF: Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to nb.gof.m in gu-mi/NBGOF...

R Package Documentation

Browse R Packages

We want your feedback!

gu-mi/NBGOF Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

nb.gof.m: Main Function of Implementing Simulation-based... In gu-mi/NBGOF: Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to nb.gof.m in gu-mi/NBGOF...

R Package Documentation

Browse R Packages

We want your feedback!

gu-mi/NBGOF
Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

nb.gof.m: Main Function of Implementing Simulation-based...
In gu-mi/NBGOF: Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data