fit.mixtNB: Fitting mixtures of Negative Binomials in RNA-Seq data

Description Usage Arguments Details Value Author(s) References Examples

Description

A mixture of K Negative Binomial distributions is fitted on the data with the aim to cluster the genes according to their dispersions. The number of groups must be known in advance.

Usage

1
2
fit.mixtNB(y, cr, K, it = 200, eps = 1e-05, init = NULL, seme = 1, 
           filter = TRUE, quiet = FALSE)

Arguments

y

matrix that contains the (normalized) dataset where the rows contain the set of replicates in the counts in two conditions. Given p the number of genes and n the total number of replicates, the matrix must have dimensions p x n.

cr

a vector with n elements that contains the numerical labels of the conditions

K

the number of mixture components

it

maximum number of iterations for the EM algorithm

eps

a tolerance level for checking the convergence of the EM algorithm

init

a list that may contain the initial values for the EM algorithm. It may contain: a K-vector 'a' that are the sizes or dispersions of the Negative Binomials, 'w' is the vector with the K initial values for the weights of the components; 'lambda' is the matrix of dimension p x 2 with the initial values of the lambda parameters. If init is NULL, a random initialization will be used.

seme

A numerical value to be used in the set.seed function

filter

Logical to indicate the genes with very small counts should be removed

quiet

Logical to indicate if information about the fitting should be provided

Details

A mixture of K components is fitted with the aim of clustering the genes according to their dispersions. Genes with too small number of reads across experiments are filtered out. The default is to filter out genes with no more than 5 reads totally across all experiments, AND with no more than 0.5 reads averagely across all experiments. The EM algorithm stops when the maximum number of iterations are reached or the relative increment of log-likelihood is smaller than eps.

Value

A list containing

y

The filtered data

K

The number of components

cr

Labels denoting the condition for the replicates

cl

Posterior classification of the genes to the components

likelihood

Log-likelihood at each iteration

AIC

Akaike information criterion

BIC

Bayesian information criterion

a

Estimated dispersions

lambda

Estimated means

f.z.y

Estimated posterior probabilities

time.sec

Computational time (in seconds)

variances

Estimated variances

gname

Positions of the filtered genes

Author(s)

Elisabetta Bonafede, Cinzia Viroli

References

E. Bonafede, F. Picard, S. Robin and C. Viroli (2015), Modelling overdispersion heterogeneity in differential expression analysis using mixtures, under revision.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# create a toy data set with 1000 genes, and 5 samples in each of the two conditions. 
# The first 100 genes are DE expressed. The other 900 genes are null. 
 
lambda.de<-matrix(runif(100,0,250),100)
lambda.de=cbind(lambda.de,lambda.de/exp(rnorm(100,0.5,0.125)))
lambda<-rbind(lambda.de,matrix(runif(900,0,250),900,2))
a<-runif(1000,0.5,600)
cr<-rep(1:2,each=5)
y<-matrix(0,1000,10)
for (i in 1:1000) for (l in 1:10) y[i,l]<-rnbinom(1,mu=lambda[i,cr[l]],size=a[i])
fit=fit.mixtNB(y,cr,K=3)

Example output

0 genes has been filtered because they contains too small number of reads across the experiments.
The EM converges after 59 iterations.

Estimation details:
       logL   BIC   AIC EM-iter Elapsed Time
1 -38188.45 90227 80387      59        1.323

mixtNB documentation built on May 2, 2019, 6:53 a.m.