Description Usage Arguments Details Value Note Author(s) References See Also Examples

`computeCat`

computes the overlap proportions between
pairs of ordered vectors of identifiers.
The input to this function is a data.frame containing non-redundant
identifiers and a number of ranking statistics organized by columns.
This function enables comparing all possible pair combinations,
or selecting one column as the reference ranking for the remaining.
The output of this function can be used as the input to
`plotCat`

, which creates correspondence at the top
curves, as used in Irizarry et al, Nat Methods (2005), for
comparing differential gene expression across platforms and labs.

1 2 3 |

`data` |
A data.frame produced by |

`size` |
numeric. The number of top ranking statistics
to be considered in the computation of the overlap proportions.
If omitted all rows in |

`idCol` |
numeric or character. The index (by default equal to one), or the name of the column containing the common identifiers (e.g. ENTREZID, SYMBOLS, ...). |

`ref` |
character. The column name corresponding to the ranking statistics to be used as the reference in all pairs of comparisons. |

`method` |
character. The method used to compute the overlap proportion between two ordered vectors of identifiers: either "equalRank" or "equalStat". The first method computed the overlap based on equal ranks, whereas the latter uses equal statistics. |

`decreasing` |
logical. This argument defines whether decreasing or increasing ordering should be used |

`computeCat`

computes overlapping proportions
between pairs of ordered vectors of identifiers.
This function first finds all possible pairs of vector combinations,
then it computes the corresponding overlapping
proportions. If a column is selected as the reference,
using the argument `ref`

, only the combinations
involving this column will be returned.

Briefly, for each CAT curve two vectors of identifiers are first ordered by the ranking statistics of choice, then the overlap between the two vectors is computed by considering more and more identifiers (vector size).

This function enables to compute overlapping proportions using two distinct methods: "equalRank" or "equalStat". With "equalRank" the overlap is obtained between vectors of the same size using equal ranks, which in turn can potentially correspond to ranking statistics of different magnitude (e.g. the vectors are of the same size, but might have different ranking statistics). With "equalStat" the overlap is obtained between vectors defined by using equal ranking statistics, which can potentially correspond to different rank, and hence to vectors of different size (e.g. the vectors are of different size, but have similar ranking statistics).

A list of lists in which each element correspond to a
CAT curve. If a specific reference column is provided
through the `ref`

argument, the number of
list elements is equal to the number of combinations
involving the reference group, otherwise all possible
combinations are returned.
When the "equalRank" method is used each list element
contains only the overlapping proportion, while when
the "equalStat" method is used the number of genes
with equal statistics is stored along with the overlapping
proportion.
This output is used to produce CAT curves,
using the `plotCat`

function, as described
in Irizarry et al, Nat Methods (2005).

Given the combinatorial nature of the computation,
a long computational time can be necessary if the input
`data`

contains many columns and many rows
(number of features).
In such a case consider limiting the number of rows
used using the `size`

argument.

Luigi Marchionni marchion@jhu.edu

Irizarry, R. A.; Warren, D.; Spencer, F.; Kim, I. F.; Biswal, S.; Frank, B. C.; Gabrielson, E.; Garcia, J. G. N.; Geoghegan, J.; Germino, G.; Griffin, C.; Hilmer, S. C.; Hoffman, E.; Jedlicka, A. E.; Kawasaki, E.; Martinez-Murillo, F.; Morsberger, L.; Lee, H.; Petersen, D.; Quackenbush, J.; Scott, A.; Wilson, M.; Yang, Y.; Ye, S. Q. and Yu, W. Multiple-laboratory comparison of microarray platforms. Nat Methods, 2005, 2, 345-350

Ross, A. E.; Marchionni, L.; Vuica-Ross, M.; Cheadle, C.; Fan, J.; Berman, D. M.; and Schaeffer E. M. Gene Expression Pathways of High Grade Localized Prostate Cancer. Prostate 2011, 71, 1568-1578

Benassi, B.; Flavin, R.; Marchionni, L.; Zanata, S.; Pan, Y.; Chowdhury, D.; Marani, M.; Strano, S.; Muti, P.; and Blandino, G. c-Myc is activated via USP2a-mediated modulation of microRNAs in prostate cancer. Cancer Discovery, 2012, March, 2, 236-247

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
###load data
data(matchBoxExpression)
###the column name for the identifiers
idCol <- "SYMBOL"
###the column name for the ranking statistics
byCol <- "t"
###use lapply to remove redundancy from all data.frames
###default method is "maxORmin"
newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
###select t-statistics and merge into a new data.frame using SYMBOL
mat <- mergeData(newMatchBoxExpression, idCol=idCol, byCol=byCol)
###Compute CAT for decreasing t-statistics: all genes
cpH2L <- computeCat(mat, idCol=1,decreasing=TRUE, method="equalRank")
###Compute CAT for increasing t-statistics:only the first 300 genes
cpL2H <- computeCat(mat, idCol=1, size=300, decreasing=FALSE, method="equalRank")
###Compute CAT for increasing t-statistics:only the first 300 genes
###use the second column as the reference
cpL2H.ref <- computeCat(mat, idCol=1, size=300, ref="dataSetA.t",
decreasing=FALSE, method="equalRank")
``` |

marchion/matchBox documentation built on May 9, 2019, 4:07 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.