Function for finding best features associated with clusters
Description
Calls limma on input data to determine features most associated with found clusters (based on an Fstatistic, pairwise comparisons, or following a tree that clusters the clusters).
Usage
1 2 3 4 5 6 7 8 9  ## S4 method for signature 'matrix'
getBestFeatures(x, cluster, contrastType = c("F", "Dendro",
"Pairs", "OneAgainstAll"), dendro = NULL, pairMat = NULL,
contrastAdj = c("All", "PerContrast", "AfterF"), isCount = FALSE,
normalize.method = "none", ...)
## S4 method for signature 'ClusterExperiment'
getBestFeatures(x, contrastType = c("F",
"Dendro", "Pairs", "OneAgainstAll"), isCount = FALSE, ...)

Arguments
x 
data for the test. Can be a numeric matrix or a

cluster 
A numeric vector with cluster assignments. “1” indicates the sample was not assigned to a cluster. 
contrastType 
What type of test to do. ‘F’ gives the omnibus
Fstatistic, ‘Dendro’ traverses the given dendrogram and does contrasts of
the samples in each side, ‘Pairs’ does pairwise contrasts based on the
pairs given in pairMat (if pairMat=NULL, does all pairwise), and
‘OneAgainstAll’ compares each cluster to the average of all others. Passed
to 
dendro 
The dendrogram to traverse if contrastType="Dendro". Note that this should be the dendrogram of the clusters, not of the individual samples. 
pairMat 
matrix giving the pairs of clusters for which to do pairwise
contrasts (must match to elements of cl). If NULL, will do all pairwise of
the clusters in 
contrastAdj 
What type of FDR correction to do for contrasts tests (i.e. if contrastType='Dendro' or 'Pairs'). 
isCount 
logical as to whether input data is count data, in which case to perform voom correction to data. See details. 
normalize.method 
character value, passed to 
... 
options to pass to 
Details
getBestFeatures returns the top ranked features corresponding to a
cluster assignment. It uses limma to fit the models, and limma's functions
topTable
or topTableF
to find the
best features. See the options of these functions to put better control on
what gets returned (e.g. only if significant, only if logfc is above a
certain amount, etc.). In particular, set 'number=' to define how many
significant features to return (where number is per contrast for the
'Pairs' or 'Dendro' option)
When 'contrastType' argument implies that the best features should be found via contrasts (i.e. 'contrastType' is 'Pairs' or 'Dendro'), then then 'contrastAdj' determines the type of multiple testing correction to perform. 'PerContrast' does FDR correction for each set of contrasts, and does not guarantee control across all the different contrasts (so probably not the preferred method). 'All' calculates the corrected pvalues based on FDR correction of all of the contrasts tested. 'AfterF' controls the FDR based on a hierarchical scheme that only tests the contrasts in those genes where the omnibus F statistic is significant. If the user selects 'AfterF', the user must also supply an option 'p.value' to have any effect, and then only those significant at that p.value level will be returned. Note that currently the correction for 'AfterF' is not guaranteed to control the FDR; improvements will be added in the future.
Note that the default option for topTable
is
to not filter based on adjusted pvalues (p.value = 1
) and return
only the top 10 most significant (number = 10
) – these are options
the user can change (these arguments are passed via the ...
in
getBestFeatures
). In particular, it only makes sense to set
requireF = TRUE
if p.value
is meaningful (e.g. 0.1 or 0.05);
the default value of p.value = 1
will not result in any effect on
the adjusted pvalue otherwise.
isCount
triggers whether the "voom" correction will be
performed in limma
. If the input data is a matrix is counts (or a
'ClusterExperiment' object with counts as the primary data before
transformation) this should be set to TRUE and they will be logtransformed
internally by voom for the differential expression analysis in a way that
accounts for the difference in the meanvariance relationships. Otherwise,
dat should be on the correct (log) scale for differential expression
analysis without a need a variance stabilization (e.g. microarray data).
Currently the default is set to FALSE, simply because the isCount has not
been heavily tested. If the But TRUE with x
being counts really
should be the default for RNASeq data. If the input data is a
'ClusterExperiment' object, setting 'isCount=TRUE' will cause the program
to ignore the internally stored transformation function and instead use
voom with log2(x+0.5). Alternatively, 'isCount=FALSE' for a
'ClusterExperiment' object will cause the DE to be performed with 'limma'
after transforming the data with the stored transformation. Although some
writing about "voom" seem to suggest that it would be appropriate for
arbitrary transformations, the authors have cautioned against using it for
anything other than count data on mailing lists. For this reason we are not
implementing it for arbitrary transformations at this time (e.g.
log(FPKM+epsilon) transformations).
Value
A data.frame
in the same format as
topTable
, except for the following additional or
changed columns:
Feature
This is the column called 'ProbeID' bytopTable
IndexInOriginal
Gives the index of the feature to the original input dataset,x
Contrast
The contrast that the results corresponds to (if applicable, depends oncontrastType
argument)ContrastName
The name of the contrast that the results corresponds to. For dendrogram searches, this will be the node of the tree of the dendrogram.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43  data(simData)
#create a clustering, for 8 clusters (truth was 4)
cl < clusterSingle(simData, clusterFunction="pam", subsample=FALSE,
sequential=FALSE, clusterDArgs=list(k=8))
#basic F test, return all, even if not significant:
testF < getBestFeatures(cl, contrastType="F", number=nrow(simData),
isCount=FALSE)
#Do all pairwise, only return significant, try different adjustments:
pairsPerC < getBestFeatures(cl, contrastType="Pairs", contrastAdj="PerContrast",
p.value=0.05, isCount=FALSE)
pairsAfterF < getBestFeatures(cl, contrastType="Pairs", contrastAdj="AfterF",
p.value=0.05, isCount=FALSE)
pairsAll < getBestFeatures(cl, contrastType="Pairs", contrastAdj="All",
p.value=0.05, isCount=FALSE)
#not useful for this silly example, but could look at overlap with Venn
allGenes < paste("Row", 1:nrow(simData),sep="")
if(require(limma)){
vennC < vennCounts(cbind(PerContrast= allGenes %in% pairsPerC$Feature,
AllJoint=allGenes %in% pairsAll$Feature, FHier=allGenes %in%
pairsAfterF$Feature))
vennDiagram(vennC, main="FDR Overlap")
}
#Do one cluster against all others
oneAll < getBestFeatures(cl, contrastType="OneAgainstAll", contrastAdj="All",
p.value=0.05)
#Do dendrogram testing
hcl < makeDendrogram(cl)
allDendro < getBestFeatures(hcl, contrastType="Dendro", contrastAdj=c("All"),
number=ncol(simData), p.value=0.05)
# do DE on counts using voom
# compare results to if used simData instead (not on count scale).
# Again, not relevant for this silly example, but basic principle useful
testFVoom < getBestFeatures(simCount, primaryCluster(cl), contrastType="F",
number=nrow(simData), isCount=TRUE)
plot(testF$P.Value[order(testF$Index)],
testFVoom$P.Value[order(testFVoom$Index)],log="xy")
