Home

/

CRAN

/

MSCA

/

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity
In MSCA: Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

View source: R/fast_clara_jaccard.R

fast_clara_jaccard

R Documentation

Fast CLARA-like clustering using Jaccard dissimilarity

Description

Implements a CLARA (Clustering Large Applications) strategy using Jaccard dissimilarity computed on individual patients state matrices. The algorithm repeatedly samples subsets of the data, performs PAM clustering on each subset, and selects the medoids that minimise the total dissimilarity across the full dataset. Final assignments are made by mapping all data points to the nearest selected medoid.

Usage

fast_clara_jaccard(
  data,
  k,
  samples = 20,
  samplesize = NULL,
  seed = 123,
  frac = 1
)

Arguments

`data`	A state matrix of censored time-to-event indicators as computed by the `make_state_matrix` function.
`k`	Number of returned clusters.
`samples`	Number of random samples drawn from the analysed population.
`samplesize`	Number of patients per sample (default: min(50 + 5k, ncol(data))).
`seed`	Random seed for reproducibility (default: 123).
`frac`	Fraction of the population to use for cost computation (default: 1).

Details

This implementation adapts the original CLARA method described by Kaufman and Rousseeuw (1990) in "Finding Groups in Data: An Introduction to Cluster Analysis".

Value

A list with index of patients from the sample a, medoid indices, cluster assignment, and cost.

clustering: An integer vector of cluster assignments for each patient.
medoids: Indices of medoids associated witht the lower cost.
sample: Indices of the sampled columns used in clustering.
cost: Total cost (sum of dissimilarities to assigned medoids).

Note

To improve efficiency, the function used fastpam procedure from the fastkmedoids library and uses optimized Jaccard index computation. For simulation purpose, the frac parameter can be used to reduce time when computing the cost for each sample. The final cost is given using medoids associated with lower cost computed on fractionned data. A final analysis using the proper CLARA method should be conducted setting frac to 1.

References

Kaufman, L. & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley.

MSCA documentation built on June 8, 2025, 10:52 a.m.

MSCA index

MSCA

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MSCA
Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity
In MSCA: Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

Fast CLARA-like clustering using Jaccard dissimilarity

Description

Usage

Arguments

Details

Value

Note

References

Related to fast_clara_jaccard in MSCA...

R Package Documentation

Browse R Packages

We want your feedback!

MSCA Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity In MSCA: Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

Fast CLARA-like clustering using Jaccard dissimilarity

Description

Usage

Arguments

Details

Value

Note

References

Related to fast_clara_jaccard in MSCA...

R Package Documentation

Browse R Packages

We want your feedback!

MSCA
Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity
In MSCA: Unsupervised Clustering of Multiple Censored Time-to-Event Endpoints