Description Introduction For the impatient ISA biclusters ISA row and column scores How ISA works Parameters Random seeding and smart seeding Normalization Robustness A typical ISA work flow Author(s) References See Also
The Iterative Signature Algorithm
The Iterative Signature Algorithm (ISA) is a biclustering algorithm. Biclustering algorithms classify simultaneously the rows and columns of an input matrix into biclusters, or as we will call them here, modules.
The easiest way to run ISA is to call the
with your input matrix as the single argument. This does all steps of
a typical ISA work flow, with the default parameters.
An ISA module is pair; a subset of the rows of the input matrix and a subset of its columns. In other words, a bicluster is a block of the reordered input matrix, where reordering means a permutation of both the rows and columns. (Another bicluster might be block of the same permuted input matrix or one after a different permutation.)
The criteria of a good bicluster is that 1) its rows are significantly different than the other rows, when we consider only the positions defined by the columns of the same bicluster, and (symmetrically) 2) its columns are significantly different than the other columns, when we consider only the positions defined by the rows of the same bicluster.
In other words, the rows of the bicluster are correlated, but only on the columns defined by the same bicluster; and the opposite is also true, the columns of the bicluster are correlated, but only on the rows defined by the same bicluster.
ISA biclusters are soft, two biclusters may overlap in their rows, columns or even both. It is also possible that some rows and/or columns of the input matrix are not found to be part of any ISA biclusters. Depending on the stringency parameters, it might even happen that ISA does not find any biclusters.
ISA biclusters are not only soft, but every row and column in a given bicluster has a score, a number between minus one and one. The further this number is from zero, then stronger is the association of the given row or column to the bicluster.
ISA works in an iterative way. For an E (m x n) input matrix it starts from seed vector r(0), which is typically a sparse 0/1 vector of length m. This defines a set of rows in E. Then E' is multiplied by r(0) and the result is thresholded. (Please see also ‘Normalization’ below.)
The thresholding is an important step of the ISA, without thresholding
ISA would be equivalent to a (not too effective) numerical singular
value decomposition (SVD). Currently thresholding is done by
calculating the mean and standard deviation of the vector and keeping
only elements that are further than a given number of standard
deviations from the mean. Based on the
this means 1) keeping values that are significantly higher than the
direction="up"), significantly lower
direction="down") or both
The thresholded vector c(0) is the (column) ‘signature’ of r(0). Then the (row) signature of c(0) is calculated, E is multiplied by c(0) and then thresholded to get r(1).
This iteration is performed until it converges, i.e. r(i) and r(i-1) are “close”, and c(i) and c(i-1) are also close. The convergence criteria, i.e. what “close” means is by default defined by high Pearson correlation.
It is very possible that the ISA finds the same modules more than once;
two or more seeds might converge to the same module. The function
isa.unique eliminates every module from the result of
isa.iterate that is very similar (in terms of
Pearson correlation) to the one that was already found before it.
The two main parameters of ISA are the two thresholds (one for the rows and one for the columns). They basically define the stringency of the modules. If the row threshold is high, then the modules will have very similar rows. If it is mild, then modules will be bigger, with less similar rows than in the first case.
By default (i.e. if the
isa function is used) the ISA is
performed from random sparse starting seeds, generated by
generate.seeds. This way the algorithm is
completely unsupervised, but also stochastic: it might give different
results for different runs.
It is possible to use non-random seeds as well, if you have some
knowledge about the data or are interested in a particular subset of
rows/columns, then you can feed in your seeds into the
isa.iterate function directly. In this case the
algorithm is deterministic, for the same seed you will always get the
On in silico data we observed that ISA has the best performance if the
input matrix is normalized (see
normalization produces two matrices: Er and
Ec. Er is calculated by transposing E and
centering and scaling its rows (see
scale). Ec is
calculated by centering and scaling the rows of E. E_r is
used to calculate the column signature of rows and E_c is used
to calculate the signature of the columns.
It is possible to use another normalization, then the user is
requested to supply the normalized input data in a named list,
including the two matrices of appropriate
Er’ will be used for calculating the
signature of the rows, ‘
Ec’ the signature of the
columns. If you want to use the same matrix in both steps, then supply
it twice, the first one transposed.
As ISA is an unsupervised algorithm, it may very well find some modules, even if you feed in noise as an input matrix. To avoid these spurious modules we defined a robustness measure, a single number for a modules that gives how well the rows and the columns are correlated.
It recommended that the user uses
run ISA on the scrambled input matrix with the same threshold
parameters and then drop every module, which has a robustness score
lower than the highest robustness score among modules found in the
Please see the manual page and the source code of
for a typical ISA work flow. (You can obtain the source code by typing
isa’ (without the apostrophes) into your R prompt and
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
The vignette in the package and
isa for running ISA.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.