CARP: Compute 'CARP' (Convex Clustering) Solution Path

Description Usage Arguments Value Examples

View source: R/carp.R

Description

CARP returns a fast approximation to the Convex Clustering solution path along with visualizations such as dendrograms and cluster paths. CARP solves the Convex Clustering problem via an efficient Algorithmic Regularization scheme.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
CARP(
  X,
  ...,
  weights = sparse_rbf_kernel_weights(k = "auto", phi = "auto", dist.method =
    "euclidean", p = 2),
  labels = rownames(X),
  X.center = TRUE,
  X.scale = FALSE,
  back_track = FALSE,
  exact = FALSE,
  norm = 2,
  t = 1.05,
  npcs = min(4L, NCOL(X), NROW(X)),
  dendrogram.scale = NULL,
  impute_func = function(X) {     if (anyNA(X))          missForest(X)$ximp     else X
    },
  status = (interactive() && (clustRviz_logger_level() %in% c("MESSAGE", "WARNING",
    "ERROR")))
)

Arguments

X

The data matrix (X): rows correspond to the observations (to be clustered) and columns to the variables (which will not be clustered). If X has missing values - NA or NaN values - they will be automatically imputed.

...

Unused arguements. An error will be thrown if any unrecognized arguments as given. All arguments other than X must be given by name.

weights

One of the following:

  • A function which, when called with argument X, returns an b-by-n matrix of fusion weights.

  • A matrix of size n-by-n containing fusion weights

labels

A character vector of length n: observations (row) labels

X.center

A logical: Should X be centered columnwise?

X.scale

A logical: Should X be scaled columnwise?

back_track

A logical: Should back-tracking be used to exactly identify fusions? By default, back-tracking is not used.

exact

A logical: Should the exact solution be computed using an iterative algorithm? By default, algorithmic regularization is applied and the exact solution is not computed. Setting exact = TRUE often significantly increases computation time.

norm

Which norm to use in the fusion penalty? Currently only 1 and 2 (default) are supported.

t

A number greater than 1: the size of the multiplicative update to the cluster fusion regularization parameter (not used by back-tracking variants). Typically on the scale of 1.005 to 1.1.

npcs

An integer >= 2. The number of principal components to compute for path visualization.

dendrogram.scale

A character string denoting how the scale of dendrogram regularization proportions should be visualized. Choices are 'original' or 'log'; if not provided, a data-driven heuristic choice is used.

impute_func

A function used to impute missing data in X. By default, the missForest function from the package of the same name is used. This provides a flexible potentially non-linear imputation function. This function has to return a data matrix with no NA values. Note that, consistent with base R, both NaN and NA are treaded as "missing values" for imputation.

status

Should a status message be printed to the console?

Value

An object of class CARP containing the following elements (among others):

Examples

1
2
3
carp_fit <- CARP(presidential_speech[1:10,1:4])
print(carp_fit)
plot(carp_fit)

jjn13/clustRviz documentation built on Sept. 1, 2020, 7:53 a.m.