In this vignette we go through how to use this package. Nowadays clustering becomes more and more important in data science area. Lots of applications such as finding patterns, exploratory data analysis and grouping data together make use of clustering algorithms. People have developed a variety of clustering techniques such as k-means, hierarchical clustering, dbscan, etc to cluster the data. However, different algorithms produce different clustering results, and it's very hard to distinguish which it's better. Also, when you produce clustering results from different algorithms, there is also a problem to combine all these clustering results together. TREC stands for tree reduced ensemble clustering, which is an effective algorithm to combine your clustering results together and produce one final clustering.

How to use TREC

Here is a sample code for how to use TREC:

library(TREC)
# randomly generate our data
data<-matrix(rnorm(100),nrow=50)
# producing clustering result cl1,cl2,cl3
# you may use as many clustering algorithms as you want
cl1<-hclust(dist(data),method='single')
cl2<-kmeans(data,centers=3)
library(dbscan)
cl3<-dbscan(data,eps = .2)
# call combineClusterings function
final_cluster_result <- combineClusterings(cl1,cl2,cl3)
plot(final_cluster_result)

From sample code, we can see there are four steps to use TREC:

  1. load/generate your data

  2. use clustering algorithms to produce clustering result

  3. use function asBranchComponent to transform clustering result into matrix

  4. use function to cbind matrix result and apply function TRECgetComponentsfromClustering

It's necessary to follow above 4 steps to use TREC. Final cluster result is a matrix, which represent a hierarchical clustering result. Each row represents a data point, and each column represents a layer of clustering result. Each number represents the cluster id that data point is assigned to.

more examples

library(TREC)
# randomly generate our data
data<-matrix(rnorm(150),nrow=50)
# producing clustering result cl1,cl2,cl3
# you may use as many clustering algorithms as you want
cl1<-hclust(dist(data),method='complete')
cl2<-kmeans(data,centers=5)
library(mclust)
cl3<-Mclust(data,eps = .2)
# call TREC function
final_cluster_result <- combineClusterings(cl1,cl2,cl3)
plot(final_cluster_result)


rwoldford/trec documentation built on May 15, 2019, 6:29 p.m.