knitr::opts_chunk$set(echo = TRUE)

Intro

Thanks for checking out my opensource package weightedClustSuite, developed by myself.

This package was created to easily implement weighted point density clustering and validation. In my package, one can easily give their datapoints weights and then see the results shown and validated with multiple unsupervised clustering algorithms. This is all done through a single clustering object making the data, ex. different clustering assignments, very easy to access.

The package operates through my implementation of manipulating the gaussian density estimates and distance matrices with the weights, prior to the bulk of each clustering algorithm. This performs more EFFICIENTLY and can allow for much LARGER weighted datasets than the brute force alternative of manually adding each points weight to the dataset.

#Run this command to install the package on your machine
#devtools::install_github("dhanujg/weightedClustSuite")

Current clustering algorithms currently implemented with weights in the package

Density Based Unsupervised Clustering

Partitioning Based Unsupervised Clustering

Examples of Package Utilization

We will use the package on a dataset currently under research. This dataset represents a 2 species trait values and the species abundance for each species. The data is presented in 3 columns, 2 of which are the species trait values and the third are the WEIGHTS (species abundance)

Create input Data & Weight & Distance Matrices

# Read in the Data
Colnames = c('X','Y','Weight')
Species_Data <- read.csv(file.choose(), header = FALSE)

#Create Input Matrices for the package
Trait_Species <- as.vector(Species_Data[,1:2])
Weight_Species <- as.vector(Species_Data[,3])
Dist_Species <- dist(Species_Data[,1:2])
Trait_Species[1:5,]
Weight_Species[1:5]

Call the package and create the overall Clustering Object

library(weightedClustSuite)

#feed in the input data to create the base clustering object with ClustObj function
SpeciesClust <- ClustObj(Trait_Species, Weight_Species, Dist_Species, gaussian=TRUE, verbose = TRUE)

Quick Run of Clustering Algorithms

We can now update the clustering object (SpeciesClust) with data from a clustering algorithm, here we will run K mediods and Spectral to get a quick visualization of the data.

#SpeciesClust <- findDensityPeakClusters(SpeciesClust, rho=4.01, delta=0.21)
SpeciesClust <- findKMediodsClusters(SpeciesClust, neighbors = 5 )
SpeciesClust <- findSpectralClusters(SpeciesClust, neighbors = 5 )
SpeciesClust <- findDBSCANClusters(SpeciesClust, eps = 0.05)

We can also create plots of each of the clustering assignments WITH the datapoint sizes changing corresponding to the weights

KMediods_2D <- plot2DCluster(SpeciesClust, SpeciesClust$clustersKmediods, type = 'Kmediods')
Spectral_2D <- plot2DCluster(SpeciesClust, SpeciesClust$clustersSpectral, type = 'Spectral')

clustering assignments are also stored in the class object and can be viewed in examples like

SpeciesClust$clustersKmediods[1:10]

Detailed Clustering and Validation : Example using 2014 Density Peak Clustering

Here I will go through the full training and implementation of performing Weighted Density Peak Cluster by Rodriguez et al. 2014.

Create a Validation chart to test various rho/delta combinations and obtain ideal classification score

SpeciesChart <- findDensityPeakCluster_validationChart(SpeciesClust, DBCV = FALSE, status = FALSE)
View(SpeciesChart[1:5,])

We can also visualize the density peaks by rho/delta values in this plot

plotDensityPeak(SpeciesClust)

from the validation chart, we have chosen this specific rho/delta combination to give us 5 clusters with no unclassified points, we can then run the Density Peak clustering

SpeciesClust <- findDensityPeakClusters(SpeciesClust, rho=4.01, delta=0.21, verbose = FALSE)

Now we can plot our clustering assignments through a 2D Graph, a MDS graph (for higher dimensions), and a TSNE Graph. The cross marks indicate the cluster centers and the size of the points correspond to the weights.

DensityPeak_2D = plotDensityPeak2D(SpeciesClust)
DensityPeak_MDS = plotMDS(SpeciesClust)
DensityPeak_TSNE = plotTSNE(SpeciesClust)

Model Comparison

We can compare the results of our Density Peak Clustering, with DBSCAN clustering utilizing the same weighted distance cutoff for epsilon.

DensityPeak_2D = plotDensityPeak2D(SpeciesClust)

DBSCAN_2D = plot2DCluster(SpeciesClust, SpeciesClust$clustersDBSCAN, type = 'DBSCAN')

Future

There is a lot more to be added to the package and still currently being edited for implementation. Please email me at dhanujg@umich.edu if you have any questions.



DhanujG/weightedClustSuite documentation built on March 3, 2021, 12:29 a.m.