getBestFeatures | R Documentation |
Calls limma on input data to determine features most associated with found clusters (based on an F-statistic, pairwise comparisons, or following a tree that clusters the clusters).
## S4 method for signature 'matrixOrHDF5'
getBestFeatures(
x,
cluster,
contrastType = c("F", "Dendro", "Pairs", "OneAgainstAll"),
dendro = NULL,
pairMat = NULL,
weights = NULL,
contrastAdj = c("All", "PerContrast", "AfterF"),
DEMethod = c("edgeR", "limma", "limma-voom"),
dgeArgs = NULL,
...
)
## S4 method for signature 'ClusterExperiment'
getBestFeatures(
x,
contrastType = c("F", "Dendro", "Pairs", "OneAgainstAll"),
whichCluster = "primary",
whichAssay = 1,
DEMethod,
weights = if ("weights" %in% assayNames(x)) "weights" else NULL,
...
)
x |
data for the test. Can be a numeric matrix or a
|
cluster |
A numeric vector with cluster assignments. “-1” indicates the sample was not assigned to a cluster. |
contrastType |
What type of test to do. ‘F’ gives the omnibus
F-statistic, ‘Dendro’ traverses the given dendrogram and does contrasts of
the samples in each side, ‘Pairs’ does pair-wise contrasts based on the
pairs given in pairMat (if pairMat=NULL, does all pairwise), and
‘OneAgainstAll’ compares each cluster to the average of all others. Passed
to |
dendro |
The dendrogram to traverse if contrastType="Dendro". Note that this should be the dendrogram of the clusters, not of the individual samples, either of class "dendrogram" or "phylo4" |
pairMat |
matrix giving the pairs of clusters for which to do pair-wise
contrasts (must match to elements of cl). If NULL, will do all pairwise of
the clusters in |
weights |
weights to use in by edgeR. If |
contrastAdj |
What type of FDR correction to do for contrasts tests (i.e. if contrastType='Dendro' or 'Pairs'). |
DEMethod |
character vector describing how the differential expression
analysis should be performed (replaces previous argument |
dgeArgs |
a list of arguments to pass to |
... |
If |
whichCluster |
argument that can be a single numeric or character value
indicating the single clustering to be used. Giving values that result in more than one clustering will result in an error. See details of |
whichAssay |
numeric or character specifying which assay to use. See
|
getBestFeatures returns the top ranked features corresponding to a
cluster assignment. It uses either limma or edgeR to fit the models, and
limma/edgeR functions topTable
to find the best features. See the options
of this function to put better control on what gets returned (e.g. only
if significant, only if log-fc is above a certain amount, etc.). In
particular, set 'number=' to define how many significant features to return
(where number is per contrast for the 'Pairs' or 'Dendro' option)
DEMethod
triggers what type of differential expression
analysis will be performed. Three options are available: limma, edgeR, and
limma with a voom corrections. The last two options are only appropriate
for count data. If the input x
is a ClusterExperiment
object,
and DEMethod="limma"
, then the data analyzed for DE will be
after taking the transformation of the data (as given in the
transformation slot of the object). For the options "limma-voom" and
"edgeR", the transformation slot will be ignored and only the counts data
(as specified by the whichAssay
slot) will be passed to the
programs. Note that for "limma-voom" this implies that the data will be
transformed by voom with the function log2(x+0.5). If weights
is not
NULL
, and DEMethod="edgeR"
, then the function
glmWeightedF
from the zinbwave
package is run; otherwise
glmLRT
from edgeR
.
Note that the argument DEMethod
replaces the previous option
isCount
, to decide on the method of DE.
When 'contrastType' argument implies that the best features should be found via contrasts (i.e. 'contrastType' is 'Pairs' or 'Dendro'), then then 'contrastAdj' determines the type of multiple testing correction to perform. 'PerContrast' does FDR correction for each set of contrasts, and does not guarantee control across all the different contrasts (so probably not the preferred method). 'All' calculates the corrected p-values based on FDR correction of all of the contrasts tested. 'AfterF' controls the FDR based on a hierarchical scheme that only tests the contrasts in those genes where the omnibus F statistic is significant. If the user selects 'AfterF', the user must also supply an option 'p.value' to have any effect, and then only those significant at that p.value level will be returned. Note that currently the correction for 'AfterF' is not guaranteed to control the FDR; improvements will be added in the future.
Note that the default option for topTable
is
to not filter based on adjusted p-values (p.value = 1
) and return
only the top 10 most significant (number = 10
) – these are options
the user can change (these arguments are passed via the ...
in
getBestFeatures
). In particular, it only makes sense to set
requireF = TRUE
if p.value
is meaningful (e.g. 0.1 or 0.05);
the default value of p.value = 1
will not result in any effect on
the adjusted p-value otherwise.
A data.frame
in the same format as
topTable
, or topTags
. The output
differs between these two programs, mainly in the naming of columns.
Furthermore, if weights are used, an additional column padjFilter
is
included as the result of running glmWeightedF
with
default option independentFiltering = TRUE
. The following column
names are the same between all of the DE methods.
Feature
This is the column called 'ProbeID' by
topTable
IndexInOriginal
Gives the index of the feature to the original
input dataset, x
Contrast
The contrast that the results corresponds to (if
applicable, depends on contrastType
argument)
ContrastName
The name of the contrast that the results
corresponds to. For dendrogram searches, this will be the node of the tree of
the dendrogram. If x
is a ClusterExperiment
object, this name
will make use of the user defined names of the cluster or node in x
.
InternalName
Only present if x
is a
ClusterExperiment
object. In this case this column will give the name
of the contrast using the internal ids of the clusters and nodes, not the
user-defined names. This provides stability in matching the contrast if the
user has changed the names since running getBestFeatures
P.Value
The unadjusted p-value (changed from PValue
in
topTags
)
adj.P.Val
The adjusted p-value (changed from FDR
or
FWER
in topTags
)
Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. http://nar.oxfordjournals.org/content/43/7/e47
Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. http://genomebiology.com/2014/15/2/R29
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Volume 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf
glmLRT
glmWeightedF
topTable
topTags
data(simData)
#create a clustering, for 8 clusters (truth was 4)
cl <- clusterSingle(simData, subsample=FALSE,
sequential=FALSE, mainClusterArgs=list(clusterFunction="pam", clusterArgs=list(k=8)))
#basic F test, return all, even if not significant:
testF <- getBestFeatures(cl, contrastType="F", number=nrow(simData),
DEMethod="limma")
#Do all pairwise, only return significant, try different adjustments:
pairsPerC <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="PerContrast",
p.value=0.05, DEMethod="limma")
pairsAfterF <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="AfterF",
p.value=0.05, DEMethod="limma")
pairsAll <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="All",
p.value=0.05, DEMethod="limma")
#not useful for this silly example, but could look at overlap with Venn
allGenes <- paste("Row", 1:nrow(simData),sep="")
if(require(limma)){
vennC <- vennCounts(cbind(PerContrast= allGenes %in% pairsPerC$Feature,
AllJoint=allGenes %in% pairsAll$Feature, FHier=allGenes %in%
pairsAfterF$Feature))
vennDiagram(vennC, main="FDR Overlap")
}
#Do one cluster against all others
oneAll <- getBestFeatures(cl, contrastType="OneAgainstAll", contrastAdj="All",
p.value=0.05,DEMethod="limma")
#Do dendrogram testing
hcl <- makeDendrogram(cl)
allDendro <- getBestFeatures(hcl, contrastType="Dendro", contrastAdj=c("All"),
number=ncol(simData), p.value=0.05,DEMethod="limma")
# do DE on counts using voom
# compare results to if used simData instead (not on count scale).
testFVoom <- getBestFeatures(simCount, primaryCluster(cl), contrastType="F",
number=nrow(simData), DEMethod="limma-voom")
plot(testF$P.Value[order(testF$Index)],
testFVoom$P.Value[order(testFVoom$Index)],log="xy")
# do DE on counts using edgeR, compare voom
testFEdge <- getBestFeatures(simCount, primaryCluster(cl), contrastType="F",
n=nrow(simData), DEMethod="edgeR")
plot(testFVoom$P.Value[order(testFVoom$Index)],
testFEdge$P.Value[order(testFEdge$Index)],log="xy")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.