Description Usage Arguments Details Value Author(s) Examples
For large datasets, we can perform vector quantization (e.g., with kmeans clustering) to create centroids. These centroids are then subjected to a slower clustering technique such as graphbased community detection. The label for each cell is set to the label of the centroid to which it was assigned.
1 2 3 4  TwoStepParam(first = KmeansParam(centers = sqrt), second = NNGraphParam())
## S4 method for signature 'ANY,TwoStepParam'
clusterRows(x, BLUSPARAM, full = FALSE)

first 
A BlusterParam object specifying a fast vector quantization technique. 
second 
A BlusterParam object specifying the second clustering technique on the centroids. 
x 
A numeric matrixlike object where rows represent observations and columns represent variables. 
BLUSPARAM 
A KmeansParam object. 
full 
Logical scalar indicating whether the clustering statistics from both steps should be returned. 
Here, the idea is to use a fast clustering algorithm to perform vector quantization and reduce the size of the dataset, followed by a slower algorithm that aggregates the centroids for easier interpretation. The exact choice of the number of clusters is less relevant to the first clustering step as long as not too many centroids are generated but the clusters are still sufficiently granular. The second step can take more care (and computational time) summarizing the centroids into meaningful “metaclusters”.
The default choice is to use kmeans for the first step, with number of clusters set to the root of the number of observations; and graphbased clustering for the second step, which automatically detects a suitable number of clusters. Kmeans also eliminates density differences in the data that can introduce variable resolution from graphbased methods.
To modify an existing TwoStepParam object x
,
users can simply call x[[i]]
or x[[i]] < value
where i
is any argument used in the constructor.
The TwoStepParam
constructor will return a TwoStepParam object with the specified parameters.
The clusterRows
method will return a factor of length equal to nrow(x)
containing the cluster assignments.
If full=TRUE
, a list is returned with a clusters
factor and an objects
list containing:
first
, a list of objects from the first clustering step.
This is equal to the objects
list in the output of clusterRows
with the first
BlusterParam.
centroids
, a numeric matrix of centroids generated from the first clustering step.
second
, a list of objects from the second clustering step on the centroids.
This is equal to the objects
list in the output of clusterRows
with the second
BlusterParam.
Aaron Lun
1 2 3  m < matrix(runif(100000), ncol=10)
stuff < clusterRows(m, TwoStepParam())
table(stuff)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.