Description Usage Arguments Details Value Note Author(s) See Also Examples
Prior computing proportion of overlap between ranked vector of features
it is necessary to remove the redundant features.
This can be accomplished using a number of methods implemeted
in the filterRedundant function, as explained below.
1 2 3 |
object |
a data.frame from which redundant features (rows) must be removed. |
method |
character. The method used for removing redundancy.
Currently available methods are: |
idCol |
character or numeric. Name or index of the column containing redundant identifiers (e.g. ENTREZID, SYMBOLS, ...). |
byCol |
character or numeric. Name or index of the column
containing the ranking statistics (used only with |
absolute |
logical. Indicates whether the absolute statistics,
as defined by |
decreasing |
logical. Indicates whether reodering should be
decreasing or not (used only with |
trim |
numeric. Indicates whether a trimmed mean should
be computed (used only with |
... |
further arguments to be passed (not currently implemented). |
The maxORmin method removes
redundant features by selecting the rows
that correspond to the maximum or minimum
value of a selected statistics.
With this approach
redundant features are first
ranked in increasing or decreasing order,
as defined by the decreasing argument,
using the ranking statistics defined by byCol,
either in their original or absolute scale,
as defined by absolute argument.
Subsequently data.frame rows corresponding to redundant
identifiers are removed, after these have been identified in
the column defined by the idCol,
using the duplicated function.
The mean, median, geoMean,
and random methods provide alternative ways
for summarizing numerical values corresponding to
redundant features, as defined by the idCol
argument:
mean takes the average,
median the median,
geoMean the geometric mean,
random select a random value.
A data.frame with fewer rows with respect to the input one,
unique by the identifier specified by the idCol argument.
filterRedundant is a utility function providing various
methods to remove redundant rows from a data.frame.
The choice of the method depends on the nature of the values,
and the final goal.
Therefore caution should be used when taking the mean
or the median across few values, or passing the arguments
with the minORmax method (for instance it would
make no sense at all to use a decreasing ordering if the ranking
statistics is a p-value).
Luig Marchionni <marchion@jhu.edu>
See duplicated.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ###load data
data(matchBoxExpression)
###check whether there are redundant identifiers
sapply(matchBoxExpression,nrow)
###the column name for the identifiers
idCol <- "SYMBOL"
###the column name for the ranking statistics
byCol <- "t"
###use lapply to remove redundancy from all data.frames
###default method is "maxORmin"
newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
###recheck number of rows
sapply(newMatchBoxExpression, nrow)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.