Description Usage Arguments Details Value Author(s) References Examples
This function performs simulation-based goodness-of-fit tests of different
negative binomial dispersion models for an RNA-Seq dataset with a known design
matrix. Currently supported models are implemented in the NBPSeq
and edgeR
packages.
1 2 |
counts |
an m-by-n count matrix of non-negative integers. For a typical RNA-Seq experiment, this is the read counts with m genes and n samples. |
x |
an n-by-p design matrix. |
lib.sizes |
library sizes of an RNA-Seq experiment. Default is the column
sums of the |
sim |
number of simulations performed. |
model |
a string of characters specifying the negative binomial dispersion model used to fit the data. Currently the following dispersion models are available to be checked for goodness-of-fit:
Users should specify exactly the same characters as the ones in paratheses above for each dispersion model. |
method |
method for estimating dispersions. MAPL: maximum adjusted profile likelihood other estimation methods from
|
min.n |
for |
prior.df |
control of the shrinkage applied for |
seed |
initial random number seed for reproducibility of re-simulations (default is 1). |
ncores |
number of CPU cores to use. If unspecified, use the total number of CPUs minus 1. |
... |
(for future use) |
When the response is a count matrix, we can use this function to test
the goodness-of-fit of a specified negative binomial dispersion model. Specifically,
this function calls model.nbp.m
to fit the NBP dispersion model,
calls model.nbq.m
to fit the NBQ dispersion model, calls
model.edgeR.common
to fit the common dispersion model, calls
model.edgeR.genewise
to fit the pure genewise dispersion model (without
shrinkage), calls model.edgeR.trended
to fit the non-parametric trended
dispersion model in edgeR
, calls model.edgeR.tagcom
to fit the tagwise
model with the dispersions shrinking towards a common dispersion, and calls
model.edgeR.tagtrd
to fit the tagwise model with the dispersions shrinking
towards the global dispersion trend.
An object of class "gofm" to which other methods can be applied.
Gu Mi <neo.migu@gmail.com>, Yanming Di, Daniel Schafer
Mi, G, Di, Y, & Schafer, DW (2015). Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data. PLOS ONE, 10(3).
Di Y, Schafer DW, Cumbie JS, and Chang JH (2011): "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq", Statistical Applications in Genetics and Molecular Biology, 10 (1).
McCarthy DJ, Chen Y and Smyth GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.
Cox, DR, and Reid, N (1987). Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society Series B 49, 1-39.
See https://github.com/gu-mi/NBGOF/wiki/ for more details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | ## load package into session:
library(NBGOF)
## basic set-up of the model:
seed = 539
sim = 999
conf.env = 0.95
## basic parameter specifications:
m = 1000 # 1000 genes
n = 6 # 6 samples
s = 1e6 # lib.sizes
offset = log(s) # make sure offset for NBP functions should take the log
## simulate coefficients:
beta = matrix(0, m, 1) # beta0 only
set.seed(seed)
beta[,1] = rnorm(m, 5, 2) - offset # beta0 ~ normal (mean and a based on real RNA-Seq data)
## design matrix (single group only):
x = matrix(rep(1,n))
## specify mean levels:
mu = round(t(s * exp(x %*% t(beta))))
pi = mu/s
## simulate an m-by-n count matrix mimicking a RNA-Seq dataset:
## simulate NB1.8 data with random uniform noise added to the dispersion
alpha1 = -0.2 # NB1.8
phi0 = 0.02
alpha0 = log(phi0)
phi.nbp = phi0 * pi^alpha1
range(phi.nbp)
cbind(mu[,1], phi.nbp[,1])
## add noise:
a = 0.1
set.seed(seed)
phi.noi.vec = phi.nbp[ ,1] * exp(matrix(runif(m, -a, a), nr=m, nc=1))
phi.noi = matrix(phi.noi.vec, nr=m, nc=n)
range(phi.noi)
cbind(mu[,1], phi.noi[,1]) # make sure phi's are in reasonable range
## generate NBP response with added noise:
set.seed(seed)
y = rnbinom(m * n, mu=mu, size=1/phi.noi)
dim(y) = dim(mu)
rownames(y) = paste("g", seq(1,m), sep="")
colnames(y) = paste("s", seq(1,n), sep="")
plot(mu, phi.noi, log="xy")
## GOF tests for different dispersion models, using parallel computing:
## CAUTION: may be time-consuming depending on the size of data and simulations!
nc = detectCores() - 1
fnbp.noip = nb.gof.m(counts=y, x=x, sim=sim, model="NBP", method="MAPL", seed=1, ncores=nc)
fnbq.noip = nb.gof.m(counts=y, x=x, sim=sim, model="NBQ", method="MAPL", seed=1, ncores=nc)
fcom.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Common", method="CoxReid", seed=1, ncores=nc)
fgen.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Genewise", method="auto", seed=1, ncores=nc)
ftgc.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Tagwise-Common", method="CoxReid", seed=1, ncores=nc)
ftgt.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Tagwise-Trend", method="auto", seed=1, ncores=nc)
ftrd.noip = nb.gof.m(counts=y, x=x, sim=sim, model="Trended", method="auto", seed=1, ncores=nc)
## summarize the GOF test results:
summary(fnbp.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fnbq.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fcom.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(fgen.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftgc.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftgt.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
summary(ftrd.noip, conf.env=conf.env, data.note="NB1.8-noise(a=0.1)")
## evaluate GOF p-values using histograms and quantile-quantile uniform plots:
# pdf("plots-NBP-01-1grp.pdf", width=20, height=20)
h1 = diagPlot(fnbp.noip, type="hist", binwidth=0.1)
q1 = diagPlot(fnbp.noip, type="qq")
h2 = diagPlot(fnbq.noip, type="hist", binwidth=0.1)
q2 = diagPlot(fnbq.noip, type="qq")
h3 = diagPlot(fcom.noip, type="hist", binwidth=0.1)
q3 = diagPlot(fcom.noip, type="qq")
h4 = diagPlot(fgen.noip, type="hist", binwidth=0.1)
q4 = diagPlot(fgen.noip, type="qq")
h5 = diagPlot(ftgc.noip, type="hist", binwidth=0.1)
q5 = diagPlot(ftgc.noip, type="qq")
h6 = diagPlot(ftgt.noip, type="hist", binwidth=0.1)
q6 = diagPlot(ftgt.noip, type="qq")
h7 = diagPlot(ftrd.noip, type="hist", binwidth=0.1)
q7 = diagPlot(ftrd.noip, type="qq")
multiplot(h1, q1, h2, q2, h3, q3, h4, q4, h5, q5, h6, q6, h7, q7, cols=4, layout=matrix(seq(1,16), nrow=4, byrow=TRUE))
# dev.off()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.