Description Usage Arguments Details Value Note Author(s) See Also Examples
Prior computing proportion of overlap between ranked vector of features
it is necessary to remove the redundant features.
This can be accomplished using a number of methods implemeted
in the filterRedundant
function, as explained below.
1 2 3 |
object |
a data.frame from which redundant features (rows) must be removed. |
method |
character. The method used for removing redundancy.
Currently available methods are: |
idCol |
character or numeric. Name or index of the column containing redundant identifiers (e.g. ENTREZID, SYMBOLS, ...). |
byCol |
character or numeric. Name or index of the column
containing the ranking statistics (used only with |
absolute |
logical. Indicates whether the absolute statistics,
as defined by |
decreasing |
logical. Indicates whether reodering should be
decreasing or not (used only with |
trim |
numeric. Indicates whether a trimmed mean should
be computed (used only with |
... |
further arguments to be passed (not currently implemented). |
The maxORmin
method removes
redundant features by selecting the rows
that correspond to the maximum or minimum
value of a selected statistics.
With this approach
redundant features are first
ranked in increasing or decreasing order,
as defined by the decreasing
argument,
using the ranking statistics defined by byCol
,
either in their original or absolute scale,
as defined by absolute
argument.
Subsequently data.frame rows corresponding to redundant
identifiers are removed, after these have been identified in
the column defined by the idCol
,
using the duplicated
function.
The mean
, median
, geoMean
,
and random
methods provide alternative ways
for summarizing numerical values corresponding to
redundant features, as defined by the idCol
argument:
mean
takes the average,
median
the median,
geoMean
the geometric mean,
random
select a random value.
A data.frame with fewer rows with respect to the input one,
unique by the identifier specified by the idCol
argument.
filterRedundant
is a utility function providing various
methods to remove redundant rows from a data.frame.
The choice of the method depends on the nature of the values,
and the final goal.
Therefore caution should be used when taking the mean
or the median across few values, or passing the arguments
with the minORmax
method (for instance it would
make no sense at all to use a decreasing ordering if the ranking
statistics is a p-value).
Luig Marchionni <marchion@jhu.edu>
See duplicated
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ###load data
data(matchBoxExpression)
###check whether there are redundant identifiers
sapply(matchBoxExpression,nrow)
###the column name for the identifiers
idCol <- "SYMBOL"
###the column name for the ranking statistics
byCol <- "t"
###use lapply to remove redundancy from all data.frames
###default method is "maxORmin"
newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
###recheck number of rows
sapply(newMatchBoxExpression, nrow)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.