ClaraParam-class: Clustering Large Applications

ClaraParam-classR Documentation

Clustering Large Applications

Description

Run the CLARA algorithm, an extension of the PAM method for large datasets.

Usage

ClaraParam(
  centers,
  metric = NULL,
  stand = NULL,
  samples = NULL,
  sampsize = NULL
)

## S4 method for signature 'ANY,ClaraParam'
clusterRows(x, BLUSPARAM, full = FALSE)

Arguments

centers

An integer scalar specifying the number of centers. Alternatively, a function that takes the number of observations and returns the number of centers.

metric, stand, samples, sampsize

Further arguments to pass to clara. Set to the function defaults if not supplied.

x

A numeric matrix-like object where rows represent observations and columns represent variables.

BLUSPARAM

A ClaraParam object.

full

Logical scalar indicating whether the full PAM statistics should be returned.

Details

This class usually requires the user to specify the number of clusters beforehand. However, we can also allow the number of clusters to vary as a function of the number of observations. The latter is occasionally useful, e.g., to allow the clustering to automatically become more granular for large datasets.

To modify an existing ClaraParam object x, users can simply call x[[i]] or x[[i]] <- value where i is any argument used in the constructor.

Note that clusterRows will always use rngR=TRUE, for greater consistency with other algorithms of the FixedNumberParam class; and pamLike=TRUE, for consistency with the PAM implementation from which it was derived.

Value

The ClaraParam constructor will return a ClaraParam object with the specified parameters.

The clusterRows method will return a factor of length equal to nrow(x) containing the cluster assignments. If full=TRUE, a list is returned with clusters (the factor, as above) and objects (a list containing clara, the direct output of clara).

Author(s)

Aaron Lun

See Also

clara, which actually does all the heavy lifting.

PamParam, for the original PAM algorithm.

Examples

clusterRows(iris[,1:4], ClaraParam(centers=4))
clusterRows(iris[,1:4], ClaraParam(centers=4, sampsize=50))
clusterRows(iris[,1:4], ClaraParam(centers=sqrt))

LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.