keepMaxStatProbe | R Documentation |
The function filters features (commonly probesets) in an
ExpressionSet
object. It does not affect genes with only one feature
present, or genes without an valid annotation (see details below). For genes
with multiple probesets, the function calculates the statistic of each
probeset across all samples and filter probesets by only keeping the one
with the maximum of variance. Thereby an ExpressionSet
returned by
the function has only one probeset matching each gene.
keepMaxStatProbe(
eset,
probe.index.name,
keepNAprobes = TRUE,
stat = function(x) mean(x, na.rm = TRUE),
...
)
eset |
An |
probe.index.name |
The column name of the |
keepNAprobes |
Logical, determines whether genes without an valid index name should kept or left out. See details below. |
stat |
Function or character, a function (or the name referring to it)
which takes a vector of numerical values, and returns one value as the
statistic, e.g. |
... |
Parameters passed to the |
Names of probesets are determined by the featureNames(eset)
function.
The column of probe.index.name
in the fData(eset)
data.frame
determines the index of genes, for example the Entrez GeneID, to which
probesets are matched. Those genes without a valid index, whose index is
either an empty string or NA
, can be set to be left out by
keepNAprobes=FALSE
. If the option is set as TRUE
, then these
genes are kept in the returning object.
The stat
function should only return one statistic, most favorably
not NA, by taking a vector of numerical values. Most statistics can be
calculated in a robust way by setting na.rm=TRUE
. This option should
be always used whenver possible. Otherwise when there is one or more missing
value of a probeset, its statistic will probably be NA
and this will
lead to discard the probeset. Even worse, when all probesets matching to a
gene have NA
s, the gene will be totally filtered out, which is
usually not desired. Therefore, set na.rm=TRUE
through the ...
option (see examples below) whenever possible.
An filtered ExpressionSet
.
Note that when the statistics of two or more probesets tie (having the same value), the probeset chosed could be random (the probeset with its name ranked first when multiple names are converted into a factor vector).
Jitao David Zhang <jitao_david.zhang@roche.com>
library("Biobase")
example.mat <- matrix(c(1,1,3,4, 2,2,3,3, 4,5,6,7, 7,8,9,10), ncol=4, byrow=TRUE)
example.eset <- new("ExpressionSet", exprs=example.mat)
featureNames(example.eset) <- c("1a","1b","2","3")
fData(example.eset)$geneid <- c(1,1,2,3)
## keep probesets with the maximal variance
example.sd <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=sd)
featureNames(example.sd)
## keep probesets with the maximal Median Absolute Deviation (MAD)
example.mad <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=mad)
featureNames(example.mad)
## keep probesets with the maximal mean value
example.mean <- keepMaxStatProbe(example.eset,
probe.index.name="geneid", stat=mean)
featureNames(example.mean)
## note that NA value may cause problems, it is a good practice to make
## the stat function _resist_ to NA
na.eset <- example.eset
exprs(na.eset)[1,1] <- NA
## Not run:
## prone to error
na.mean <- keepMaxStatProbe(na.eset,
probe.index.name="geneid",stat=mean)
featureNames(na.mean)
## better
na.mean.narm <- keepMaxStatProbe(na.eset,
probe.index.name="geneid",na.rm=TRUE)
featureNames(na.mean.narm)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.