Description Usage Arguments Details Value Author(s) References Examples
The "bimodality index" is a continuous measure of the extent to which a set of (univariate) data fits a two-component mixture model. The score is larger if the two components are balanced in size or if the separation between the two modes is larger.
1 | bimodalIndex(dataset, verbose=TRUE)
|
dataset |
A matrix or data.frame, usually with columns representing samples and rows representing genes or proteins. |
verbose |
A logical value; should the function output a stream of information while it is working? |
Identifying genes with bimodal expression patterns from large-scale expression profiling data is an important analytical task, which is often addressed using model-based clustering. That technique commonly uses the Bayesian information criterion (BIC) or the Akaike information criterion (AIC) for model selection. In practice, however, BIC and AIC appear to be overly sensitive and may lead to the identification of bimodally expressed genes that are unreliable or not clinically useful. We propose using a novel criterion, the bimodality index, not only to identify but also to rank meaningful and reliable bimodal patterns.
We model the data as a mixture
π N(μ_1, σ) + (1 - π) N(μ_2, σ)
of two normal components with a common standard deviation. We define the standardized distance between the two means to be
δ = \frac{|μ_1 - μ_2|}{σ}.
We then define the bimodality index as
BI = δ√{π(1-π)}.
The bimodality index can be computed by first using either a mixture model-based algorithm such as Mclust or by using Markov chain Monte Carlo (MCMC) techniques to estimate the model parameters. In this package, we rely on the Mclust implementation.
In the paper by Wang et al. referenced below, we provide a statistical justification for the definition of the bimodality index, based on considerations of power and sample size. Theoretical considerations suggest that, in the limit over the number of samples, a bimodality index of 1.1 or greater is likely to indicate a "useful" bimodal pattern of expression. Higher cutoffs are needed when there are relatively few samples, and can be chosen by simulating from the null distribution. We carried out simulation studies and applied the method to real data from a lung cancer gene expression profiling study. Our findings suggest that BIC behaves like a lax cutoff based on the bimodality index (much smaller than 1), and that the bimodality index provides an objective measure to identify and rank meaningful and reliable bimodal patterns from large-scale gene expression datasets.
Returns a data frame containing six columns, with the rows
corresponding to the rows of the original data set. The columns
contain the four parameters from the normal mixture model (mu1
,
mu2
, sigma
, and pi
) along with the standardized
distance delta
and the bimodal index BI
.
Kevin R. Coombes krc@silicovore.com
Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR.
The bimodality index: A criterion for discovering and ranking bimodal
signatures from cancer gene expression profiling data.
Cancer Informatics, 2009 Aug 5; 7:199–216.
1 2 3 4 | library(oompaData)
data(lungData)
bi <- bimodalIndex(lung.dataset, verbose=FALSE)
summary(bi)
|
mu1 mu2 sigma delta
Min. : 0.02019 Min. : 2.465 Min. :0.1584 Min. : 0.2025
1st Qu.: 4.35765 1st Qu.: 5.716 1st Qu.:0.3718 1st Qu.: 0.9883
Median : 6.22044 Median : 7.291 Median :0.4587 Median : 2.0408
Mean : 6.03109 Mean : 7.147 Mean :0.5171 Mean : 2.1591
3rd Qu.: 7.81944 3rd Qu.: 8.816 3rd Qu.:0.6458 3rd Qu.: 2.8737
Max. :11.55071 Max. :12.410 Max. :1.2767 Max. :16.1466
NA's :1 NA's :1 NA's :1 NA's :1
pi BI
Min. :0.007425 Min. :0.1010
1st Qu.:0.126952 1st Qu.:0.3835
Median :0.481653 Median :0.6274
Mean :0.442838 Mean :0.6552
3rd Qu.:0.659508 3rd Qu.:0.8293
Max. :0.993248 Max. :3.4913
NA's :1 NA's :1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.