intSEQ: Implement integrated likelihood ratio test.

Description Usage Arguments Details Value Author(s) References Examples

Description

intSEQ implement integrated likelihood ratio test, returns a vector of p-values corresponds to rows of count.data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Default S3 method:
intSEQ(count.data, condition,  nullcondition = NULL,nneighbour = 400
, lambda1 =ncol(count.data),lambda2 =0.05, 
meanmeth = c("estimator", "local.mean"), smoothmethod = c("loess", "spline", "no"), 
normalize = TRUE,  offsets = NULL, weights = NULL,   constadj=FALSE,
w1=max(1-ncol(count.data)/100, 0)
,w2=max(1-ncol(count.data)/1000, 0),...)

## S3 method for class 'DGEList'
intSEQ(count.data,  nullcondition = NULL,nneighbour = 400,
lambda1 =ncol(count.data),lambda2 =0.05,
meanmeth = c("estimator", "local.mean"), smoothmethod = c("loess", "spline", "no"),
normalize = TRUE, offsets = NULL, weights = NULL,  constadj=FALSE, 
w1=max(1-ncol(count.data)/100, 0)  ,w2=max(1-ncol(count.data)/1000, 0),...)

Arguments

count.data

The count table of RNA-seq data, should be a numerical matrix, or a DGEList object generated by DGEList function in edgeR.

condition

The conditions of RNA-seq data of the full model, should be a vector with length equal to number of columns of count.data or a data frame with number of rows equal to number of columns of count.data.

nullcondition

The conditions of RNA-seq data of the full model, should be a vector with length equal to number of columns of count.data or a data frame with number of rows equal to number of columns of count.data. The default value is NULL, means the global null model.

nneighbour

Number of neighbours selected to estimate the mean and variance of the normal prior.

lambda1

The first parameter of the variance, see details below.

lambda2

The second parameter of the variance, see details below.

meanmeth

Use the estimated dispersion or the local mean as the mean of normal prior.

smoothmethod

The smooth method to get the mean-dispersion trend.

normalize

Whether use "TMM" method to normalize the count data.

offsets

Give offsets for the log-linear models. Should be a vector with length of ncol(count.data)

weights

optional numeric matrix giving prior weights for the observations (for each library and gene) to be used in the GLM calculations.

constadj

Whether a constant should be multiplied to adjust the underflow problem of joint distribution.

w1

See Details.

w2

See Details.

...

Other arguments that are currently not used.

Details

intSEQ implements the integrated likelihood ratio test of RNA-seq data developed by (the new paper).

We first use estimateGLMTrendedDisp and estimateGLMTagwiseDisp in edgeR to estimate the Cox and Reid's dispersion. A mean-dispersion trend was estimated either with leoss or spline.

For each gene, find 100 neighbors of that gene w.r.t the mean. Estimate he local mean of dispersion by the smoothing fit. Estimate the sample standard deviation by s=\max \{ \hat{σ},0.05 \} . hat{sigma} is the square root of MSE of smoothing fit.

Let the prior be π(θ)=N(\bar{θ},λ_1 s^2+λ_2)

Calculate the integrated likelihood for the null and alternative L_0= \int L(Y | μ_{0}, θ) π(θ) dθ and L_a= \int L(Y | μ_{a}, θ) π(θ) dθ . μ_0 and μ_a are the MLE of mean under null and alternative. Then compare -2(\log L_0-\log L_a) with chi-square distribution.

The Gaussian-Hermite quardrature is used for numerical integration with location shift by the mean and rescale by the variance. The mean is defined by mu = w1*local_mean+(1-w1)*raw_estimator. The default of w1 is max(1-p/100, 0) where p is the sample size. The variance is defined by w2*1+(1-w2)*sqrt(1/p). The default value of w2 is w2=max(1-p/1000,0).

Value

An object of class "intres". A list containing 3 object. The first "restable" stores most information, the last two for internal use only. "restable" contains following

logFC

log fold change.

logCPM

the log-average concentration/abundance for each tag in the two groups being compared.

FDR

The FDR adjusted p values of integrated likelihood.

intLR

The integrated likelihood ratio statistics.

intPValue

The integrated likelihood ratio p values.

ordinaryLR

The ordinary likelihood ratio statistics.

ordinaryPValue

The ordinary likelihood ratio p values.

The "parameters" contains fitted values of the count table and estimated dispersion parameters. The "cond" stores The conditions (covariates) of RNA-seq data.

Author(s)

Yilun Zhang, David M. Rocke

References

our paper

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#select the first 10 columns of mont&pick data
data(count.data)
data(condition)
count=count.data[,1:10]
cond=rep(0:1,each=5)
res=intSEQ(count, cond)
res$restable[1:20,]

#Do the same thing with DGEList object
library(edgeR)
dge=DGEList(counts=count, group=cond)
res2=intSEQ(dge)
res2$restable[1:20,]

lunge111/intSEQ documentation built on May 20, 2019, 9:38 a.m.