library(LearnClust) knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
LearnClust package allows users to learn how the algorithms get the solution.
The package implements distances between clusters.
It includes main functions that return the solution applying the algorithms.
It contains .details functions that explain the process used to get the solution. They help the user to understand how it gets the solution.
We initialize some datasets to use in the algorithms:
cluster1 <- matrix(c(1,2),ncol=2) cluster2 <- matrix(c(2,4),ncol=2) weight <- c(0.2,0.8) vectorData <- c(1,1,2,3,4,7,8,8,8,10) # vectorData <- c(1:10) matrixData <- matrix(vectorData,ncol=2,byrow=TRUE) print(matrixData) dfData <- data.frame(matrixData) print(dfData) plot(dfData) cMatrix <- matrix(c(2,4,4,2,3,5,1,1,2,2,5,5,1,0,1,1,2,1,2,4,5,1,2,1), ncol=3, byrow=TRUE) cDataFrame <- data.frame(cMatrix)
The package includes different types of distance:
edistance(cluster1,cluster2)
mdistance(cluster1,cluster2)
canberradistance(cluster1,cluster2)
chebyshevDistance(cluster1,cluster2)
octileDistance(cluster1,cluster2)
Each function has a .details version that explain how the calculus is done.
There are functions where some weights are applied to each element. These function are used in the extra algorithm. These functions are:
edistanceW(cluster1,cluster2,weight)
mdistanceW(cluster1,cluster2,weight)
canberradistanceW(cluster1,cluster2,weight)
chebyshevDistanceW(cluster1,cluster2,weight)
octileDistanceW(cluster1,cluster2,weight)
This algorithm uses some functions according to the theoretical process:
list <- toList(vectorData) # list <- toList(matrixData) # list <- toList(dfData) print(list)
matrixDistance <- mdAgglomerative(list,'MAN','AVG') print(matrixDistance)
minDistance <- minDistance(matrixDistance) print(minDistance)
groupedClusters <- getCluster(minDistance, matrixDistance) print(groupedClusters)
updatedClusters <- newCluster(list, groupedClusters) print(updatedClusters)
The complete function that implement the algorithm is:
agglomerativeExample <- agglomerativeHC(dfData,'EUC','MAX') plot(agglomerativeExample$dendrogram) print(agglomerativeExample$clusters) print(agglomerativeExample$groupedClusters)
The package includes some auxiliar functions to implement the algorithm. These functions are:
cleanClusters <- usefulClusters(updatedClusters) print(cleanClusters)
distances <- c(2,4,6,8) clusterDistanceByApproach <- clusterDistanceByApproach(distances,'AVG') print(clusterDistanceByApproach)
"clusterDistanceByApproach" get the value using approach type. This type could be "MAX","MIN", and "AVG"
clusterDistance <- clusterDistance(cluster1,cluster2,'MAX','MAN') print(clusterDistance)
"clusterDistance" get the distance value between each element from one cluster to the other ones using distance type. This type could be "EUC", "MAN", "CAN", "CHE", and "OCT"
This algorithm explains every function.
list <- toList.details(vectorData) # list <- toList(matrixData) # list <- toList(dfData) print(list)
matrixDistance <- mdAgglomerative.details(list,'MAN','AVG')
minDistance <- minDistance.details(matrixDistance)
groupedClusters <- getCluster.details(minDistance, matrixDistance)
updatedClusters <- newCluster.details(list, groupedClusters)
The complete function that explains the algorithm is:
agglomerativeExample <- agglomerativeHC.details(vectorData,'EUC','MAX')
This algorithm uses some functions according to the theoretical process:
# list <- toListDivisive(vectorData) # list <- toListDivisive(matrixData) list <- toListDivisive(dfData[1:4,]) print(list)
clustersList <- initClusters(list) print(clustersList)
matrixDistance <- mdDivisive(clustersList,'MAN','AVG',list) print(matrixDistance)
maxDistance <- maxDistance(matrixDistance) print(maxDistance)
dividedClusters <- getClusterDivisive(maxDistance, matrixDistance) print(dividedClusters)
Two new subclusters will be created from the initial one and added to the solution.
We repeat from step 2 to 5 until any cluster could be divided again.
The complete function that implement the algorithm is:
divisiveExample <- divisiveHC(dfData[1:4,],'MAN','AVG') print(divisiveExample)
The package uses the same auxiliar functions as the previous to implement the algorithm. These functions are:
clusterDistanceByApproach
clusterDistance
complementaryClusters: checks if the clusters we are going to divide are complementary, that is, every initial cluster is in one or in the other cluster, but never in both. This condition allows to not loose any cluster when the division is done.
data <- c(1,2,1,3,1,4,1,5) components <- toListDivisive(data) cluster1 <- matrix(c(1,2,1,3),ncol=2,byrow=TRUE) cluster2 <- matrix(c(1,4,1,5),ncol=2,byrow=TRUE) cluster3 <- matrix(c(1,6,1,7),ncol=2,byrow=TRUE) complementaryClusters(components,cluster1,cluster2) complementaryClusters(components,cluster1,cluster3)
Its ".details" version, explains how the functions checks this condition:
complementaryClusters.details(components,cluster1,cluster2)
This algorithm explains every function.
# list <- toListDivisive.details(vectorData) # list <- toListDivisive(matrixData) list <- toListDivisive(dfData[1:4,]) print(list)
clustersList <- initClusters.details(list)
matrixDistance <- mdDivisive.details(clustersList,'MAN','AVG',list)
maxDistance <- maxDistance.details(matrixDistance)
dividedClusters <- getClusterDivisive.details(maxDistance, matrixDistance)
The complete function that explains the algorithm is:
divisiveExample <- divisiveHC.details(dfData[1:4,],'MAN','AVG') print(divisiveExample)
This example shows how the algorithm works step by step. 1. Input data is initialized creating a cluster with each data frame row.
initData <- initData(cDataFrame) print(initData)
target <- c(1,2,3) initTarget <- initTarget(target,cDataFrame) print(initTarget)
weight <- c(5,7,6) weights <- normalizeWeight(TRUE,weight,cDataFrame) print(weights)
cluster1 <- matrix(c(1,2,3),ncol=3) cluster2 <- matrix(c(2,5,8),ncol=3) weight <- c(3,7,4) distance <- distances(cluster1,cluster2,'CHE',weight) print(distance)
Finally, the complete algorithm sorts the distances and sort the clusters aswell. It presents the solution as a sorted clusters list, with the distances or using a dendrogram.
target <- c(5,5,1) weight <- c(3,7,5) correlation <- correlationHC(cDataFrame, target, weight) print(correlation$sortedValues) print(correlation$distances) plot(correlation$dendrogram)
This example shows how the algorithm works step by step.
initData <- initData.details(cDataFrame)
targetValid <- c(1,2,3) targetInvalid <- c(1,2) initTarget <- initTarget.details(targetValid,cDataFrame) initTarget <- initTarget.details(targetInvalid,cDataFrame)
weight <- c(5,7,6) weights <- normalizeWeight.details(TRUE,weight,cDataFrame) weights <- normalizeWeight.details(FALSE,weight,cDataFrame) weights <- normalizeWeight.details(FALSE,NULL,cDataFrame)
cluster1 <- matrix(c(1,2,3),ncol=3) cluster2 <- matrix(c(2,5,8),ncol=3) weight <- c(3,7,4) distance <- distances.details(cluster1,cluster2,'CHE',weight)
The complete function that explains the algorithm is:
target <- c(5,5,1) weight <- c(3,7,5) correlation <- correlationHC.details(cDataFrame, target, weight) print(correlation$sortedValues) print(correlation$distances) plot(correlation$dendrogram)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.