cd09-0-bimIndex: The Bimodality Index

Description Usage Arguments Details Value Author(s) References Examples

Description

The "bimodality index" is a continuous measure of the extent to which a set of (univariate) data fits a two-component mixture model. The score is larger if the two components are balanced in size or if the separation between the two modes is larger.

Usage

1
bimodalIndex(dataset, verbose=TRUE)

Arguments

dataset

A matrix or data.frame, usually with columns representing samples and rows representing genes or proteins.

verbose

A logical value; should the function output a stream of information while it is working?

Details

Identifying genes with bimodal expression patterns from large-scale expression profiling data is an important analytical task, which is often addressed using model-based clustering. That technique commonly uses the Bayesian information criterion (BIC) or the Akaike information criterion (AIC) for model selection. In practice, however, BIC and AIC appear to be overly sensitive and may lead to the identification of bimodally expressed genes that are unreliable or not clinically useful. We propose using a novel criterion, the bimodality index, not only to identify but also to rank meaningful and reliable bimodal patterns.

We model the data as a mixture

π N(μ_1, σ) + (1 - π) N(μ_2, σ)

of two normal components with a common standard deviation. We define the standardized distance between the two means to be

δ = \frac{|μ_1 - μ_2|}{σ}.

We then define the bimodality index as

BI = δ√{π(1-π)}.

The bimodality index can be computed by first using either a mixture model-based algorithm such as Mclust or by using Markov chain Monte Carlo (MCMC) techniques to estimate the model parameters. In this package, we rely on the Mclust implementation.

In the paper by Wang et al. referenced below, we provide a statistical justification for the definition of the bimodality index, based on considerations of power and sample size. Theoretical considerations suggest that, in the limit over the number of samples, a bimodality index of 1.1 or greater is likely to indicate a "useful" bimodal pattern of expression. Higher cutoffs are needed when there are relatively few samples, and can be chosen by simulating from the null distribution. We carried out simulation studies and applied the method to real data from a lung cancer gene expression profiling study. Our findings suggest that BIC behaves like a lax cutoff based on the bimodality index (much smaller than 1), and that the bimodality index provides an objective measure to identify and rank meaningful and reliable bimodal patterns from large-scale gene expression datasets.

Value

Returns a data frame containing six columns, with the rows corresponding to the rows of the original data set. The columns contain the four parameters from the normal mixture model (mu1, mu2, sigma, and pi) along with the standardized distance delta and the bimodal index BI.

Author(s)

Kevin R. Coombes krc@silicovore.com

References

Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR.
The bimodality index: A criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data.
Cancer Informatics, 2009 Aug 5; 7:199–216.

Examples

1
2
3
4
library(oompaData)
data(lungData)
bi <- bimodalIndex(lung.dataset, verbose=FALSE)
summary(bi)

Example output

      mu1                mu2             sigma            delta        
 Min.   : 0.02019   Min.   : 2.465   Min.   :0.1584   Min.   : 0.2025  
 1st Qu.: 4.35765   1st Qu.: 5.716   1st Qu.:0.3718   1st Qu.: 0.9883  
 Median : 6.22044   Median : 7.291   Median :0.4587   Median : 2.0408  
 Mean   : 6.03109   Mean   : 7.147   Mean   :0.5171   Mean   : 2.1591  
 3rd Qu.: 7.81944   3rd Qu.: 8.816   3rd Qu.:0.6458   3rd Qu.: 2.8737  
 Max.   :11.55071   Max.   :12.410   Max.   :1.2767   Max.   :16.1466  
 NA's   :1          NA's   :1        NA's   :1        NA's   :1        
       pi                 BI        
 Min.   :0.007425   Min.   :0.1010  
 1st Qu.:0.126952   1st Qu.:0.3835  
 Median :0.481653   Median :0.6274  
 Mean   :0.442838   Mean   :0.6552  
 3rd Qu.:0.659508   3rd Qu.:0.8293  
 Max.   :0.993248   Max.   :3.4913  
 NA's   :1          NA's   :1       

BimodalIndex documentation built on May 7, 2019, 3 a.m.