fit.mixtNB: Fitting mixtures of Negative Binomials in RNA-Seq data
In mixtNB: DE Analysis of RNA-Seq Data by Mixtures of NB

Description Usage Arguments Details Value Author(s) References Examples

A mixture of K Negative Binomial distributions is fitted on the data with the aim to cluster the genes according to their dispersions. The number of groups must be known in advance.

1 2	fit.mixtNB(y, cr, K, it = 200, eps = 1e-05, init = NULL, seme = 1, filter = TRUE, quiet = FALSE)

`y`	matrix that contains the (normalized) dataset where the rows contain the set of replicates in the counts in two conditions. Given p the number of genes and n the total number of replicates, the matrix must have dimensions p x n.
`cr`	a vector with n elements that contains the numerical labels of the conditions
`K`	the number of mixture components
`it`	maximum number of iterations for the EM algorithm
`eps`	a tolerance level for checking the convergence of the EM algorithm
`init`	a list that may contain the initial values for the EM algorithm. It may contain: a K-vector 'a' that are the sizes or dispersions of the Negative Binomials, 'w' is the vector with the K initial values for the weights of the components; 'lambda' is the matrix of dimension p x 2 with the initial values of the lambda parameters. If init is NULL, a random initialization will be used.
`seme`	A numerical value to be used in the set.seed function
`filter`	Logical to indicate the genes with very small counts should be removed
`quiet`	Logical to indicate if information about the fitting should be provided

A mixture of K components is fitted with the aim of clustering the genes according to their dispersions. Genes with too small number of reads across experiments are filtered out. The default is to filter out genes with no more than 5 reads totally across all experiments, AND with no more than 0.5 reads averagely across all experiments. The EM algorithm stops when the maximum number of iterations are reached or the relative increment of log-likelihood is smaller than eps.

A list containing

`y`	The filtered data
`K`	The number of components
`cr`	Labels denoting the condition for the replicates
`cl`	Posterior classification of the genes to the components
`likelihood`	Log-likelihood at each iteration
`AIC`	Akaike information criterion
`BIC`	Bayesian information criterion
`a`	Estimated dispersions
`lambda`	Estimated means
`f.z.y`	Estimated posterior probabilities
`time.sec`	Computational time (in seconds)
`variances`	Estimated variances
`gname`	Positions of the filtered genes

Elisabetta Bonafede, Cinzia Viroli

E. Bonafede, F. Picard, S. Robin and C. Viroli (2015), Modelling overdispersion heterogeneity in differential expression analysis using mixtures, under revision.

# create a toy data set with 1000 genes, and 5 samples in each of the two conditions. 
# The first 100 genes are DE expressed. The other 900 genes are null. 
 
lambda.de<-matrix(runif(100,0,250),100)
lambda.de=cbind(lambda.de,lambda.de/exp(rnorm(100,0.5,0.125)))
lambda<-rbind(lambda.de,matrix(runif(900,0,250),900,2))
a<-runif(1000,0.5,600)
cr<-rep(1:2,each=5)
y<-matrix(0,1000,10)
for (i in 1:1000) for (l in 1:10) y[i,l]<-rnbinom(1,mu=lambda[i,cr[l]],size=a[i])
fit=fit.mixtNB(y,cr,K=3)

0 genes has been filtered because they contains too small number of reads across the experiments.
The EM converges after 59 iterations.

Estimation details:
       logL   BIC   AIC EM-iter Elapsed Time
1 -38188.45 90227 80387      59        1.323