mergeClusters | R Documentation |
Takes an input of hierarchical clusterings of clusters and returns estimates of number of proportion of non-null and merges those below a certain cutoff.
## S4 method for signature 'matrixOrHDF5'
mergeClusters(
x,
cl,
dendro = NULL,
mergeMethod = c("none", "Storey", "PC", "adjP", "locfdr", "JC"),
plotInfo = "none",
nodePropTable = NULL,
calculateAll = TRUE,
showWarnings = FALSE,
cutoff = 0.05,
plot = TRUE,
DEMethod,
logFCcutoff = 0,
weights = NULL,
...
)
## S4 method for signature 'ClusterExperiment'
mergeClusters(
x,
eraseOld = FALSE,
mergeMethod = "none",
plotInfo = "all",
clusterLabel = "mergeClusters",
leafType = c("samples", "clusters"),
plotType = c("colorblock", "name", "ids"),
plot = TRUE,
whichAssay = 1,
forceCalculate = FALSE,
weights = if ("weights" %in% assayNames(x)) "weights" else NULL,
DEMethod,
...
)
## S4 method for signature 'ClusterExperiment'
nodeMergeInfo(x)
## S4 method for signature 'ClusterExperiment'
mergeCutoff(x)
## S4 method for signature 'ClusterExperiment'
mergeMethod(x)
## S4 method for signature 'ClusterExperiment'
mergeClusterIndex(x)
## S4 method for signature 'ClusterExperiment'
eraseMergeInfo(x)
## S4 method for signature 'ClusterExperiment'
getMergeCorrespond(x, by = c("merge", "original"))
x |
data to perform the test on. It can be a matrix or a
|
cl |
A numeric vector with cluster assignments to compare to. “-1” indicates the sample was not assigned to a cluster. |
dendro |
dendrogram providing hierarchical clustering of clusters in cl.
If x is a matrix, then the default is |
mergeMethod |
method for calculating proportion of non-null that will be used to merge clusters (if 'none', no merging will be done). See details for description of methods. |
plotInfo |
what type of information about the merging will be shown on
the dendrogram. If 'all', then all the estimates of proportion non-null
will be plotted at each node of the dendrogram; if 'mergeMethod', then only
the value used in the |
nodePropTable |
Only for matrix version. Matrix of results from previous
run of |
calculateAll |
logical. Whether to calculate the estimates for all
methods. This reduces computation costs for any future calls to
|
showWarnings |
logical. Whether to show warnings given by the methods.
The 'locfdr' method in particular frequently spits out warnings (which may
indicate that its estimates are not reliable). Setting
|
cutoff |
minimimum value required for NOT merging a cluster, i.e. two clusters with the proportion of DE below cutoff will be merged. Must be a value between 0, 1, where lower values will make it harder to merge clusters. |
plot |
logical as to whether to plot the dendrogram with the merge results |
DEMethod |
character vector describing how the differential expression analysis should be performed that will be used in the estimation of the percentage DE per node. See getBestFeatures for current options. See details. |
logFCcutoff |
Relevant only if the |
weights |
weights to use in by edgeR. If |
... |
for signature |
eraseOld |
logical. Only relevant if input |
clusterLabel |
a string used to describe the type of clustering. By default it is equal to "mergeClusters", to indicate that this clustering is the result of a call to mergeClusters (only if x is a ClusterExperiment object) |
leafType |
if plotting, whether the leaves should be the clusters or the
samples. Choosing 'samples' allows for visualization of how many samples
are in the merged clusters (only if x is a ClusterExperiment object), which
is the main difference between choosing "clusters" and "samples",
particularly if |
plotType |
if plotting, then whether leaves of dendrogram should be labeled by rectangular blocks of color ("colorblock") or with the names of the leaves ("name") (only if x is a ClusterExperiment object). |
whichAssay |
numeric or character specifying which assay to use. See
|
forceCalculate |
This forces the function to erase previously saved merge results and recalculate the merging. |
by |
indicates whether output from |
Estimation of proportion non-null "Storey" refers to the
method of Storey (2002). "PC" refers to the method of Pounds and Cheng
(2004). "JC" refers to the method of Ji and Cai (2007), and implementation
of "JC" method is copied from code available on Jiashin Ji's website,
December 16, 2015
(http://www.stat.cmu.edu/~jiashun/Research/software/NullandProp/). "locfdr"
refers to the method of Efron (2004) and is implemented in the package
locfdr
. "adjP"
refers to the proportion of genes that are found significant based on a FDR
adjusted p-values (method "BH") and a cutoff of 0.05. Previous versions offered the method "MB", a method of Meinshausen and Buhlmann
(2005), but the package howmany
is no longer supported for its implementation.
Control of Plotting If mergeMethod
is not equal to
'none' then the plotting will indicate where the clusters will be merged by
making dotted lines of edges that are merged together (assuming
plotInfo
is not 'none'). plotInfo
controls simultaneously
what information will be plotted on the nodes as well as whether the dotted
lines will be shown for the merged cluster. Notice that the choice of
plotInfo
(as long as it is not 'none') has no effect on how the
dotted edges are drawn – they are always drawn based on the
mergeMethod
. If you choose plotInfo
to not be equal to the
mergeMethod
, then you will have a confusing picture where the dotted
edges will be based on the clustering created by mergeMethod
while
the information on the nodes is based on a different method. Note that you
can override plotInfo
by setting show.node.label=FALSE
(passed to plot.phylo), so that no information is plotted on the nodes, but
the dotted edges are still drawn. If you just want plot of the dendrogram,
with no merging performed nor demonstrated on the plot, see
plotDendrogram
.
Saving and Reusing of results By default, the function
saves the results in the ClusterExperiment
object and will not
recalculate them if not needed. Note that by default
calculateAll=TRUE
, which means that regardless of the value of
mergeMethod
, all the methods will be calculated so that those
results will be stored and if you change the mergeMethod, no additional
calculations are needed. Since the computationally intensive step is the
running the DE method on the genes, this is a big savings (all of the
methods then calculate the proportion from those results). However, note
that if calculateAll=TRUE
and ANY of the methods returned NA for any
value, the calculation will be redone. Thus if, for example, the
locfdr
function does not run successfully and returns NA, the
function will always recalculate each time, even if you don't specifically
want the results of locfdr
. In this case, it makes sense to turn
calculateAll=FALSE
.
If the dendrogram was made with option
unassignedSamples="cluster"
(i.e. unassigned were clustered in with
other samples), then you cannot choose the option
leafType='samples'
. This is because the current code cannot reliably
link up the internal nodes of the sample dendrogram to the internal nodes
of the cluster dendrogram when the unassigned samples are intermixed.
When the input is a ClusterExperiment
object, the function
attempts to update the merge information in that object. This is done by
checking that the existing dendrogram stored in the object (and run on
the clustering stored in the slot dendro_index
) is the same
clustering that is stored in the slot merge_dendrocluster_index
.
For this reason, new calls to makeDendrogram
will erase the merge
information saved in the object.
If mergeClusters
is run with mergeMethod="none"
, the
function may still calculate the proportions per node if plotInfo
is
not equal to "none" or calculateAll=TRUE
. If the input object was a
ClusterExperiment
object, the resulting information will be still
saved, though no new clustering was created; if there was not an existing
merge method, the slot merge_dendrocluster_index
will be updated.
If 'x' is a matrix, it returns (invisibly) a list with elements
clustering
a vector of length equal to ncol(x) giving
the integer-valued cluster ids for each sample. "-1" indicates the sample was
not clustered.
oldClToNew
A table of the old cluster labels to
the new cluster labels.
nodeProp
A table of the proportions
that are DE on each node.This table is saved in the merge_nodeProp
slot
of a ClusterExperiment
object and can be accessed along with the
nodeMerge info with the nodeMergeInfo
function.
nodeMerge
A table of indicating for each node whether merged or
not and the cluster id in the new clustering that corresponds to the node.
Note that a node can be merged and not correspond to a node in the new
clustering, if its ancestor node is also merged. But there must be some node
that corresponds to a new cluster id if merging has been done. This table is
saved in the merge_nodeMerge
slot of a ClusterExperiment
object
and can be accessed along with the nodeProp info with the nodeMergeInfo
function.
updatedClusterDendro
The dendrogram on which the
merging was based (based on the original clustering).
cutoff
The cutoff value for merging.
If 'x' is a ClusterExperiment
, it returns a new
ClusterExperiment
object with an additional clustering based on the
merging. This becomes the new primary clustering. Note that even if
mergeMethod="none"
, the returned object will erase any old merge
information, update the work flow numbering, and return the newly calculated
merge information.
nodeMergeInfo
returns information collected about the nodes
during merging as a data.frame with the following entries:
Node
Name of the node
Contrast
The
contrast compared at each node, in terms of the cluster ids
isMerged
Logical as to whether samples from that node which were
merged into one cluster during merging
mergeClusterId
If a
node corresponds to a new, merged cluster, gives the cluster id it
corresponds to. Otherwise NA
...
The remaining columns give
the estimated proportion of genes differentially expressed for each method. A
column of NAs means that the method in question hasn't been calculated yet.
mergeCutoff
returns the cutoff used for the current merging.
mergeMethod
returns the method used for the current merge.
mergeClusterIndex
returns the index of the clustering used for the current merge.
eraseMergeInfo
returns object with all previously saved merge info removed.
getMergeCorrespond
returns the correspondence between the
merged cluster and its originating cluster. If by="original"
returns
a named vector, where the names of the vector are the cluster ids of the
originating cluster and the values of the vector are the cluster ids of the
merged cluster. If by="merge"
the results returned are organized by
the merged clusters. This will generally be a list, with the names of the
list equal to the clusterIds of the merge clusters and the entries the
clusterIds of the originating clusters. However, if there was no merging
done (so that the clusters are identical) the output will be a vector like
with by="original"
.
Ji and Cai (2007), "Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons", JASA 102: 495-906.
Efron (2004) "Large-scale simultaneous hypothesis testing: the choice of a null hypothesis," JASA, 99: 96-104.
Meinshausen and Buhlmann (2005) "Lower bounds for the number of false null hypotheses for multiple testing of associations", Biometrika 92(4): 893-907.
Storey (2002) "A direct approach to false discovery rates", J. R. Statist. Soc. B 64 (3)": 479-498.
Pounds and Cheng (2004). "Improving false discovery rate estimation." Bioinformatics 20(11): 1737-1745.
makeDendrogram, plotDendrogram, getBestFeatures
data(simData)
#create a clustering, for 8 clusters (truth was 3)
cl<-clusterSingle(simData, subsample=FALSE,
sequential=FALSE, mainClusterArgs=list(clusterFunction="pam", clusterArgs=list(k=8)))
#give more interesting names to clusters:
newNames<- paste("Cluster",clusterLegend(cl)[[1]][,"name"],sep="")
clusterLegend(cl)[[1]][,"name"]<-newNames
#make dendrogram
cl <- makeDendrogram(cl)
#plot showing the before and after clustering
#(Note argument 'use.edge.length' can improve
#readability)
merged <- mergeClusters(cl, plotInfo="all",
mergeMethod="adjP", use.edge.length=FALSE, DEMethod="limma")
#Simpler plot with just dendrogram and single method
merged <- mergeClusters(cl, plotInfo="mergeMethod",
mergeMethod="adjP", use.edge.length=FALSE, DEMethod="limma",
leafType="clusters",plotType="name")
#compare merged to original
tableClusters(merged,whichClusters=c("mergeClusters","clusterSingle"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.