Description Usage Arguments Details Value See Also Examples
Given input data, SummarizedExperiment
, or
ClusterExperiment
object, this function will find clusters,
based on a single specification of parameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  ## S4 method for signature 'missing,matrixOrNULL'
clusterSingle(x, diss, ...)
## S4 method for signature 'matrixOrNULL,missing'
clusterSingle(x, diss, ...)
## S4 method for signature 'SummarizedExperiment,missing'
clusterSingle(x, diss, ...)
## S4 method for signature 'ClusterExperiment,missing'
clusterSingle(x,
replaceCoClustering = FALSE, ...)
## S4 method for signature 'matrixOrNULL,matrixOrNULL'
clusterSingle(x, diss, subsample = TRUE,
sequential = FALSE, mainClusterArgs = NULL, subsampleArgs = NULL,
seqArgs = NULL, isCount = FALSE, transFun = NULL,
dimReduce = c("none", "PCA", "var", "cv", "mad"), ndims = NA,
clusterLabel = "clusterSingle", checkDiss = TRUE)

x 
the data on which to run the clustering (features in rows), or a

diss 

... 
arguments to be passed on to the method for signature

replaceCoClustering 
logical. Applicable if 
subsample 
logical as to whether to subsample via

sequential 
logical whether to use the sequential strategy (see details
of 
mainClusterArgs 
list of arguments to be passed for the mainClustering step, see
help pages of 
subsampleArgs 
list of arguments to be passed to the subsampling step
(if 
seqArgs 
list of arguments to be passed to 
isCount 
logical. Whether the data are in counts, in which case the
default 
transFun 
function A function to use to transform the input data matrix before clustering. 
dimReduce 
character A character identifying what type of
dimensionality reduction to perform before clustering. Options are
"none","PCA", "var","cv", and "mad". See 
ndims 
integer An integer identifying how many dimensions to reduce to
in the reduction specified by 
clusterLabel 
a string used to describe the clustering. By default it
is equal to "clusterSingle", to indicate that this clustering is the result
of a call to 
checkDiss 
logical. Whether to check whether the input 
clusterSingle
is an 'expertoriented' function, intended to
be used when a user wants to run a single clustering and/or have a great
deal of control over the clustering parameters. Most users will find
clusterMany
more relevant. However, clusterMany
makes certain assumptions about the intention of certain combinations of
parameters that might not match the user's intent; similarly
clusterMany
does not directly take a dissimilarity matrix but
only a matrix of values x
(though a user can define a distance
function to be applied to x
in clusterMany
).
Unlike clusterMany
, most of the relevant arguments for
the actual clustering algorithms in clusterSingle
are passed to the
relevant steps via the arguments mainClusterArgs
, subsampleArgs
,
and seqArgs
. These arguments should be named lists with
parameters that match the corresponding functions:
mainClustering
,subsampleClustering
, and
seqCluster
. These functions are not meant to be called by the
user, but rather accessed via calls to clusterSingle
. But the user
can look at the help files of those functions for more information
regarding the parameters that they take.
Only certain combinations of parameters are possible for certain
choices of sequential
and subsample
. These restrictions are
documented below.
clusterFunction
for
mainClusterArgs
: The choice of subsample=TRUE
also controls
what algorithm type of clustering functions can be used in the mainClustering
step. When subsample=TRUE
, then resulting coclustering matrix from
subsampling is converted to a dissimilarity (specificaly 1coclustering
values) and is passed to diss
of mainClustering
. For this
reason, the ClusterFunction
object given to mainClustering
via the argument mainClusterArgs
must take input of the form of a
dissimilarity. When subsample=FALSE
and sequential=TRUE
, the
clusterFunction
passed in clusterArgs
element of
mainClusterArgs
must define a ClusterFunction
object with
algorithmType
'K'. When subsample=FALSE
and
sequential=FALSE
, then there are no restrictions on the
ClusterFunction
and that clustering is applied directly to the input
data.
clusterFunction
for subsampleArgs
: If the
ClusterFunction
object given to the clusterArgs
of
subsamplingArgs
is missing the algorithm will use the default for
subsampleClustering
(currently "pam"). If
sequential=TRUE
, this ClusterFunction
object must be of type
'K'.
Setting k
for subsampling: If subsample=TRUE
and sequential=TRUE
, the current K of the sequential iteration
determines the 'k' argument passed to subsampleClustering
so
setting 'k=' in the list given to the subsampleArgs will not do anything
and will produce a warning to that effect (see documentation of
seqCluster
).
Setting k
for mainClustering step: If
sequential=TRUE
then the user should not set k
in the
clusterArgs
argument of mainClusterArgs
because it must be set
by the sequential code, which has a iterative reseting of the parameters.
Specifically if subsample=FALSE
, then the sequential method iterates
over choices of k
to cluster the input data. And if
subsample=TRUE
, then the k
in the clustering of mainClustering step
(assuming the clustering function is of type 'K') will use the k
used in the subsampling step to make sure that the k
used in the
mainClustering step is reasonable.
Setting findBestK
in
mainClusterArgs
: If sequential=TRUE
and
subsample=FALSE
, the user should not set 'findBestK=TRUE' in
mainClusterArgs
. This is because in this case the sequential method
changes k
; an error message will be given if this combination of
options are set. However, if sequential=TRUE
and
subsample=TRUE
, then passing either 'findBestK=TRUE' or
'findBestK=FALSE' via mainClusterArgs
will function as expected
(assuming the clusterFunction
argument passed to mainClusterArgs
is of type 'K'). In particular, the sequential step will set the number of
clusters k
for clustering of each subsample. If findBestK=FALSE,
that same k
will be used for mainClustering step that clusters the
resulting cooccurance matrix after subsampling. If findBestK=TRUE, then
mainClustering
will search for best k. Note that the default
'kRange' over which mainClustering
searches when findBestK=TRUE
depends on the input value of k
which is set by the sequential
method if sequential=TRUE
), see above. The user can change
kRange
to not depend on k
and to be fixed across all of the
sequential steps by setting kRange
explicitly in the
mainClusterArgs
list.
To provide a distance matrix via the argument distFunction
,
the function must be defined to take the distance of the rows of a matrix
(internally, the function will call distFunction(t(x))
. This is to
be compatible with the input for the dist
function. as.matrix
will be performed on the output of distFunction
, so if the object
returned has a as.matrix
method that will convert the output into a
symmetric matrix of distances, this is fine (for example the class
dist
for objects returned by dist
have such a method). If
distFunction=NA
, then a default distance will be calculated based on
the type of clustering algorithm of clusterFunction
. For type "K"
the default is to take dist
as the distance function. For type "01",
the default is to take the (1cor(x))/2.
A ClusterExperiment
object if input was x
a
matrix (or assay
of a ClusterExperiment
or
SummarizedExperiment
object).
If input was diss
, then the result is a list with values
clustering: The vector of clustering results
clusterInfo: A list with information about the parameters run in the clustering
diss: The dissimilarity matrix used in the clustering
clusterMany
to compare multiple choices of parameters,
and mainClustering
,subsampleClustering
, and
seqCluster
for the underlying functions called by
clusterSingle
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  data(simData)
## Not run:
#following code takes some time.
#use clusterSingle to do sequential clustering
#(same as example in seqCluster only using clusterSingle ...)
clusterFunction="hierarchical01",clusterArgs=list(alpha=0.1)))
## End(Not run)
#use clusterSingle to do just clustering k=3 with no subsampling
clustNothing < clusterSingle(simData,
subsample=FALSE, sequential=FALSE, mainClusterArgs=list(clusterFunction="pam",
clusterArgs=list(k=3)))
#compare to standard pam
cluster::pam(t(simData),k=3,cluster.only=TRUE)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.