timeSeq: Statistical Inference for Time Course RNA-Seq Data using a...
In timeSeq: Detecting Differentially Expressed Genes in Time Course RNA-Seq Data

Description Usage Arguments Details Value Author(s) References Examples

View source: R/timeSeq.R

Accurately identifying differentially expressed (DE) genes from time course RNA-seq data has been of tremendous significance in creating a global picture of cellular function. DE genes from the time course RNA-seq data can be classified into two types, parallel DE genes (PDE) and non-parallel DE (NPDE) genes. The former are often biologically irrelevant, whereas the latter are often biologically interesting. In this package, we propose a negative binomial mixed-effects (NBME) model to identify both PDE and NPDE genes in time course RNA-seq data.

1	timeSeq(data.count,group.label,gene.names,exon.length=NULL,exon.level=FALSE,pvalue=TRUE)

`data.count`	a n by p matrix of expression values. Data should be appropriately normalized beforehand.
`group.label`	a vector indicating the experimental conditions of each time point.
`gene.names`	a vector containing all the gene names.
`exon.length`	a vector containing the length of exons, only used in exon level data.
`exon.level`	logical:indicating if this is an exon level dataset. Default is FALSE.
`pvalue`	logical:indicating if p-values are returned. Default is TRUE.

Nonparallel differential expression(NPDE) genes and parallel differential expression(PDE) genes detection.

A list with components

`sorted`	an object returned by timeSeq.sort function. It contains sorted Kullback Leibler Ratios(KLRs) or p-values for identifying DE genes.
`count`	the number of exons or replicates for each gene.
`NPDE`	the NPDE ratios or p-values.
`PDE`	the PDE ratios or p-values.
`genenames`	gene names.
`table`	gene expression values.
`data`	a n by p matrix of expression values.
`gene.names`	a vector including all the gene names.
`group.label`	a vector indicating the experimental conditions of each time point.
`group.length`	the total number of time points.
`group1.length`	the number of time points of condition one.
`group2.length`	the number of time points of condition two.
`exon.level`	logical:indicating if this is an exon level dataset. Default is FALSE.
`pvalue`	logical:indicating if p-values are returned. Default is TRUE.

Fan Gao and Xiaoxiao Sun

Sun, Xiaoxiao, David Dalpiaz, Di Wu, Jun S. Liu, Wenxuan Zhong, and Ping Ma. "Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model." BMC Bioinformatics, 17(1):324, 2016.

Chong Gu. Model diagnostics for smoothing spline ANOVA models. Canadian Journal of Statistics, 32(4):347-358, 2004.

Chong Gu. Smoothing spline ANOVA models. Springer, second edition, 2013.

Chong Gu and Ping Ma. Optimal smoothing in nonparametric mixed-effect models. Annals of Statistics, 1357-1379, 2005.

Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):20-25

####Data should be appropriately normalized beforehand####

##Exon level data (The p-values calculation is not supported)
data(pAbp)
attach(pAbp)
model.fit <- timeSeq(data.count,group.label,gene.names,exon.length,exon.level=TRUE,pvalue=FALSE)
#NPDE genes have large KLRs
model.fit$NPDE
detach(pAbp)

##Gene level data (three replicates)
data(simulate.dt)
attach(simulate.dt)
model.fit <- timeSeq(data.count,group.label,gene.names,exon.level=FALSE,pvalue=TRUE)
#p-values
model.fit$NPDE

There were 50 or more warnings (use warnings() to see the first 50)
[1] 0.8404626
 [1] 4.056052e-102 1.601623e-216  8.095688e-86 3.998878e-112 8.449241e-278
 [6]  9.973802e-01  7.780800e-01  9.026850e-01  1.000000e+00  9.278024e-01