Description Usage Arguments Details Value Author(s) See Also Examples
This function searches for annotation-driven splits of patients in microarray data. A split is a partitioning of patients into two groups. In order to do so it refers to GO terms and KEGG pathways. In addition, a significance measure can be computed by simulating a random distribution of scores. DLD-scores are used to judge the quality of a split.
1 2 3 4 |
mydata |
either an expression set as defined by the package
|
annotation.ids |
a vector of GO or KEGG identifiers in the form "GO:..." or "KEGG:..." respectively. The prefix "KEGG:" is removed from the KEGG-identifiers before accessing the chip's "...PATH2PROBES" hash. |
chip.name |
the name of the chip by which the expression set is
measured. |
min.probes |
annotation identifiers with fewer than this associated genes are skipped. |
max.probes |
annotation identifiers with more than this associated genes are skipped. The default is ten percent of the genes on the chip. |
B |
the number of random gene set samplings to be performed to compute empirical p-values. |
min.group.size |
filter criteria to avoid splits suggesting tiny groups. Splits where one of the two suggested groups are smaller than this number are removed from the split set. |
ngenes |
number of genes used to compute DLD scores. |
ignore.genes |
number of best scoring genes to be ignored when computing DLD scores. |
This function applies the same splitting procedure to all annotation
identifiers provided. Firstly, the associated genes for one identifier
are determined and extracted from the expression data. Then the
diana2means
function is applied to the restricted data and the
different splits generated are collected into a single splitSet
object.
As annotation identifiers vectors of identifiers of the
KEGG:nnnnn
and GO:nnnnnn
are valid. In addition, the
keywords "KEGG", "GO" and "all" are allowed, representing all terms in
the corresponding ontology.
If B
is set to a integer number this number of samplings are
used to generate a null-distribution of DLD-scores. This
distribution is used to compute empirical p-values for each
split. If more than one valid split is found, multiple testing is
corrected for by applying Benjamini-Hochbergs correction from the
multtest package.
Returns an object of class splitSet
with the following list
elements:
cuts |
a matrix of split attributions. One row per annotation identifier (GO term or KEGG pathway for which a split has been generated. One column per object in the dataset. |
score |
one score per generated split. |
pvalue |
one empirical p-value per generated split, or |
qvalue |
one q-value computed according Benjamini-Hochberg's
correction for multiple testing per generated split, or |
Claudio Lottaz, Joern Toedling
diana2means
, randomDiana2means
,
image.splitSet
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# prepare data
library(golubEsets)
data(Golub_Merge)
# generate annotation-driven splits for apoptosis and signal transduction
x <- adSplit(Golub_Merge, "GO:0006915", "hu6800")
x <- adSplit(Golub_Merge, c("GO:0007165","GO:0006915"), "hu6800", max.probes=7000)
# generate a split for glutamate metabolism including
# an empirical p-value
x <- adSplit(Golub_Merge, "KEGG:00251", "hu6800", B=100)
## Not run:
# generate splits for all KEGG pathways.
x <- adSplit(Golub_Merge, "KEGG", "hu6800")
image(x)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.