clusterSweep: Clustering parameter sweeps

View source: R/clusterSweep.R

clusterSweepR Documentation

Clustering parameter sweeps

Description

Perform a sweep across combinations of parameters to obtain different clusterings from the same algorithm.

Usage

clusterSweep(
  x,
  BLUSPARAM,
  ...,
  full = FALSE,
  BPPARAM = SerialParam(),
  args = list()
)

Arguments

x

A numeric matrix-like object where rows represent observations and columns represent variables.

BLUSPARAM

A BlusterParam object specifying the algorithm to use.

...

Named vectors or lists specifying the parameters to sweep over.

full

Logical scalar indicating whether the full clustering statistics should be returned for each method.

BPPARAM

A BiocParallelParam specifying how the sweep should be parallelized.

args

A named list of additional arguments to use with .... This is provided in case there is a name conflict with the existing arguments in this function signature.

Details

This function allows users to conveniently test out a range of clustering parameters in a single call. The name of each argument in ... should be a legitimate argument to x[[i]], and will be used to modify any existing values in BLUSPARAM to obtain a new set of parameters. (For all other parameters, the existing values in BLUSPARAM are used.) If multiple arguments are provided, all combinations are tested.

We attempt to create a unique name for each column based on its parameter combination. This has the format of <NAME1>.<VALUE1>_<NAME2>.<VALUE2>_... based on the parameter names and values. Note that any non-atomic values are simply represented by the name of their class; no attempt is made to convert these into a compact string.

If an entry of ... is a named list of vectors, we expand those to generate all possible combinations of values. For example, if we passed:

    blah.args = list(a = 1:5, b = LETTERS[1:3])

This would be equivalent to manually specifying:

    blah.args = list(list(a = 1, b = "A"), list(a = 1, b = "B"), ...)

The auto-expansion mechanism allows us to conveniently test parameter combinations when those parameters are stored inside x as a list. The algorithm is recursive so any internal named lists containing vectors are similarly expanded. Expansion can be disabled by wrapping vectors in I, in which case they are passed verbatim. No expansion is performed for non-vector arguments.

Value

A List containing:

  • clusters, a DataFrame with number of rows equal to that of x, where each column corresponds to (and is named after) a specific combination of clustering parameters.

  • parameters, another DataFrame with number of rows equal to the number of columns in the previous clusters DataFrame. Each row contains the specific parameter combination for each column of clusters.

  • If full=TRUE, objects is an additional list of length equal to the number of rows in clusters. This contains the objects produced by each run.

Author(s)

Aaron Lun

See Also

clusterRows, which manages the dispatch to specific methods based on BLUSPARAM.

BlusterParam, which determines which algorithm is actually used.

Examples

out <- clusterSweep(iris[,1:4], KmeansParam(10), 
    centers=4:10, algorithm=c("Lloyd", "Hartigan-Wong"))
out$clusters[,1:5]
out$parameters

out <- clusterSweep(iris[,1:4], NNGraphParam(), k=c(5L, 10L, 15L, 20L),
    cluster.fun=c("louvain", "walktrap"))
out$clusters[,1:5]
out$parameters

# Combinations are automatically expanded inside named lists:
out <- clusterSweep(iris[,1:4], NNGraphParam(), k=c(5L, 10L, 15L, 20L), 
    cluster.args=list(steps=3:4))
out$clusters[,1:5]
out$parameters


LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.