R package for sequences clustering This package works with the TraMineR library which provides the LCS method for distance calculations.
To use this package on R, this following lines are required :
install.packages("devtools")
library(devtools)
install_github("Ltochon/CLARA.seq")
If the vignette are not refreshed and you can't see the CLARA.seq's one, use this line :
devtools::install(build_vignettes = TRUE)
This package contains 3 differents algorithms - CLARA - CLARANS - CLARA-FUZZY
And a quality index - Davies-Bouldin Index
TraMineR's package has a limitation of the number of sequences used (~46'300). To counter this limit, different algorithm has been implemented to use subsets of the entire dataset to extract the best clustering for the big dataset. At this moment, a dataset with 227'000 sequences has been tested and perfectly clustered with the CLARA algorithm.
WARNINGS CLARA algorithm is the most efficient one. CLARANS and CLARA-FUZZY are still in test phase.
The complete documentation is available in the folder Documentation - User guide - Package Documentation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.