Description Methods Details Note Author(s) Examples
generateSeeds
takes either matrix or
an ExpressionSet
object to generate seeds. Seeds
are defined as pairs of genes (edges) which share coincident
expression levels in samples. The higher the coincidence, the higher
the score of the seeds will be. The seeds are generated by subsequent
comparing each pair of genes. When all seeds have been produced, they
are sorted by the coincidence scores and returned as an object. See
the details section for notes on implementation.
In the rqubic
package, generateSeeds
currently supports
two data types: ExpressionSet
(an inherited type
of eSet
, or numeric matrix.
Both methods requires in addition a parameter, minColWidth
,
specifying the minimum number of conditions shared by the two genes of
each seed. Its default value is 2. When this default value is used,
the minimum coincidence score is defined as max(2, ncol/20),
where ncol represents the number of conditions. When a
non-default value is provided, the value is used to select seeds.
signature(object = "eSet")
An object representing
expression data. Note that the exprs
must be a matrix of
integers, otherwise the method warns and coerces the storage mode
of matrix into integer.
signature(object = "matrix")
A matrix of integers. In case filled by non-integers, the method warns and coerces the storage mode into integer
The function compares all pairs of genes, namely all edges of a complete graph composed by genes. The weight of each edge is defined as the number of samples, in which two genes have the same expression level. This weight, also known as the coincidence score, reflects the co-regulation relationship between two genes.
The seed is chosen by picking edges with higher scores than the
minimum score, provided by the minColWidth
parameter (default:
2).
To implement such a selection algorithm, a Fibonacci heap is constructed in the C codes. Its size is predefined as a constant, which should be reduced in case the gene number is too large to run the algorithm. A new seed, which was selected by having a higher coincidence score than the minimum, is inserted to the heap. And dependent on whether the heap is full or not, it is either inserted by squeezing the minimum seed out, or put into the heap directly.
Once the heap is filled by examining all pairs of genes, it is dumped
into an array of edge pointers, with decreasingly ordered edge
pointers by their scores. This array is captured as an external
pointer, attached as an attribute of an rqubicSeeds
object.
An rqubicSeeds
object holds an integer, which records the
height of the heap. It has (besides the class identifier) two
attributes: one for the external pointer, and the other one for the
threshold of the coincidence score.
In the rqubic
implementation, the variable arr_c[i][j]
holds the level symbols (-1, 0, 1 in the default case), whereas in
the QUBIC
implementation, this variable holds the index of
level symbols, and the level symbols are saved in the global variable
symbols
.
Jitao David Zhang <jitao_david.zhang@roche.com>
1 2 3 4 5 6 7 8 | data(sample.ExpressionSet, package="Biobase")
sample.disc <- quantileDiscretize(sample.ExpressionSet)
sample.seeds <- generateSeeds(sample.disc)
sample.seeds
## with higher threshold of incidence score
sample.seeds.higher <- generateSeeds(sample.disc, minColWidth=5)
sample.seeds.higher
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.