Description Usage Arguments Details Value Author(s) References Examples
This function implements a differential analysis of RNA-seq data using a Poisson mixture model, where one cluster is fixed to represent genes with equal mean in each experimental condition (i.e., a cluster of non-differentially expressed genes).
1 |
counts |
(n x q) matrix of observed counts for n genes and q samples, with row names corresponding to gene IDs |
conds |
Vector of length q defining the condition (treatment group) for each variable (column) in |
DEclusters |
Number of clusters to include to represent differentially expressed genes (default value of 4), in addition to the cluster fixed to represent non-differentially expressed genes. |
norm |
The estimator to be used for the library size parameter: “ |
epsilon |
Cutoff used to identify whether the log2-ratio of cluster parameters between conditions is sufficiently large to be declared as differentially expressed, with default value 0.8 |
EM.verbose |
If |
... |
Additional parameters to be passed to the HTSCluster package, if desired. These include notably the following: 1) |
In a Poisson mixture model, the data y are assumed to come from g distinct subpopulations (clusters), each of which is modeled separately; the overall population is thus a mixture of these subpopulations. In the case of a Poisson mixture model with g components, the model may be written as
f(y;g,ψ_g) = ∏_{i=1}^n ∑_{k=1}^g π_k ∏_{j=1}^{d}∏_{l=1}^{r_j} P(y_{ijl} ; θ_k)
for i = 1, …, n observations in l = 1, …, r_j replicates of j = 1, …, d conditions (treatment groups), where P(\cdot) is the standard Poisson density, ψ_g = (π_1,…,π_{g-1}, θ^\prime), θ^\prime contains all of the parameters in θ_1,…,θ_g assumed to be distinct, and π = (π_1,…,π_g)^\prime are the mixing proportions such that π_k is in (0,1) for all k and ∑_k π_k = 1. We consider
μ_{ijlk} = w_i s_{jl} λ_{jk}
where w_i and λ_k are as before and s_{jl} is the normalized library size (a fixed constant) for replicate l of condition j. See Rau et al. (2011) for more details on this model, including parameter estimation, algorithm initialization, and model selection.
In the case of differential analysis, we fix one of the clusters (typically the first, although this choice is arbitrary) to represent non-differentially expressed genes, i.e., λ_{11} = ... = λ_{1d} = 1. Typically we fix the number of remaining clusters (DEclusters
) to be 4, although this choice may be modified by the user. In addition to
the fixed cluster, clusters for which the absolute value of \log_2(λ_{1k} / λ{2k}) is less than epsilon
(default value 0.8) are also considered to represent
non-differentially expressed genes.
Following clusering, a gene is declared differentially expressed if its conditional probability to be non-differentially expressed (i.e., to belong to a cluster of non-differentially expressed genes) is less than 1e-8.
res |
Results data frame containing the following information: |
PMM |
Object of class |
iterations |
Number of iterations run |
logLikeDiff |
Difference in log-likelihood between the last and penultimate iterations of the algorithm |
Andrea Rau <andrea.rau@jouy.inra.fr>
S. Balzergue, G. Rigaill, V. Brunaud, E. Blondet, A. Rau, O. Rogier, J. Caius, C. Maugis-Rabusseau, L. Soubigou-Taconnat, S. Aubourg, C. Lurin, E. Delannoy, and M.-L. Martin-Magniette. (2014) HTSDiff: A Model-Based Clustering Alternative to Test-Based Methods in Differential Gene Expression Analyses by RNA-Seq Benchmarked on Real and Synthetic Datasets (submitted).
1 2 3 4 5 6 7 | set.seed(12345)
## Generate synthetic data: 2000 genes under H0
test <- syntheticData(H0number = 2000)
## Mixture model differential analysis
## DEtest <- HTSDiff(test, c(1,1,2,2))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.