VatAna
is an R package implements the Visual Assessment of Cluster Tendency (VAT) algorithm proposed by Bezdek & Hathaway (2002).
To cite this vignette and package 'VatAna', please use one of the following items that fits to your references list:
Cebeci, Z. & Yildiz, F. (2015). Görsel Kümelenme Eğilimi Değerlendirmesi ve R'de Uygulaması. Çukurova Üniversitesi Ziraat Fakültesi Dergisi, Vol. 30, no. 2, pp. 1-8. (URL: https://dergipark.org.tr/en/download/article-file/219860)
or in BibTeX format:
@article{cebeci30gorsel, title={G{\"o}rsel K{\"u}melenme E{\u{g}}ilimi De{\u{g}}erlendirmesi ve R’de Uygulamas{\i}}, author={Cebeci, Zeynel and Yildiz, Figen}, journal={{\c{C}}ukurova {\"U}niversitesi Ziraat Fak{\"u}ltesi Dergisi}, volume={30}, number={2}, pages={1--8} }
In order to install the package VatAna
from the GitHub repository you should first install the devtools
package from CRAN into your local system. Then you can install the package VatAna
using install_github
of devtools
package as shown with the R code chunks below:
if(!require(devtools)) {install.packages('devtools'); library(devtools)} install_github("zcebeci/VatAna")
If you would like to have a compiled version of the vignettes of the package try to install the package VatAna
using install_github
with build_vignettes
argument set to TRUE as shown below:
if(!require(devtools)) {install.packages('devtools'); library(devtools)} devtools::install_github("zcebeci/VatAna", build_vignettes=TRUE)
If you have not already installed rmarkdown
and prettydoc
in your local system, before running the above install commands firstly install these packages as following:
```r install.packages('prettydoc')
If you have already installed '`VatAna`', you can load it into R working environment by using the following command: ```r library(VatAna)
We demonstrate clusterig validation using the package 'VatAna
' on the dataset protcarbo
consisting of two features (energy
and protein
) of 15 human food.
data(protcarbo) print(protcarbo)
The following R command displays the scatter plots between the feature pairs.
plot(protcarbo, pch=19, col=4, main="Scatter plot of energy vs protein")
The function vatimage
in the package generates the original dissimilarities matrix (DM) and the reordered dissimilarities matrix (ODM) as follows. If the argument disp
is set to TRUE
, the matrix ODM is displayed as an image (ODI).
resimg <- vatimage(protcarbo, disp=TRUE)
Above the result of vatimage
run was assigned to resimg
object to use by forthcoming analyses. The original dissimilarities matrix (DM) is extracted from resimg
and stored in dm
and then its first 3 rows are displayed for giving an idea about its content.
dm <- resimg$dm head(dm, 3)
In the following code chunk, the reordered dissimilarities matrix (ODM) is built from the vatimage
run above, and its first 3 rows are displayed for giving an idea about its content.
odm <- resimg$odm head(odm, 3)
The function vatdisp
of the package displays the original dissimilarities matrix (DM) and the reordered dissimilarities matrix (ODM) and the binary reordered dissimilarities matrix (BDM).
vatdisp(protcarbo)
The default color palette is "grey"
for the images displayed by vatdisp
. But users can assign one of following items to the argument renk
for changing the color palette.
renk | Used color palette
-------- | ------------------
cm
| cm.colors
grey
| gray256 colors
heat
| heat.colors
terrain
|terrain.colors
topo
|topo.colors
In the following code chunk, vatdisp
displays the images by using terrain.colors()
palette.
vatdisp(protcarbo, renk="terrain")
One of the images in the figure above can also be displayed as an alone image as shown for the ODM matrix in the following code chunk.
vatdisp(protcarbo, renk="topo", which=3)
Binary images are produced with the Otsu Thresholding method.
vatdisp(protcarbo, renk="grey", which=4)
In following code chunk, the greyscale image is generated with 256 grey-levels as usual.
godm <- greyimage(odm, greylevel=256, disp=TRUE)
In following code chunk, the greyscale image is generated with 16 grey-levels.
godm2 <- greyimage(odm, greylevel=16, disp=TRUE)
A binary image is generated by the function binimage
using a greyscale image which is output of the function greyscale
. There are several methods for avaliable to build binary images are listed in the package manual. In the following code chunk, the binary image displayed is produced with Otsu thresholding method.
bodm1 <- binimage(godm, method="otsu", disp=TRUE)
In the following code chunk, the binary image is produced with the auto
option using an automatic thresholding value which is calculated from the processed ODM.
bodm2 <- binimage(godm, method="auto", disp=TRUE)
In the following code chunk, the binary image is produced with a fixed t
, a user-defined threshold value.
bodm3 <- binimage(godm, method="fixed", t=40, disp=TRUE)
The package VatAna
has two functions to propose a number of clusters for a given dataset. They work with the different techniques to count the blocks in binary dissimilarities matrices.
The function findk
finds the number of clusters by tracing the horizontal and vertical borders of the blocks in ODIs. In this way overlapped and well-separated blocks are counted, and the count is proposed as the number of clusters in the dataset. In the following code chunk, the ODM matrix called as pcodm
is used as the input argument in the call of findk
, and the computed k
is returned and displayed.
resimg <- vatimage(protcarbo, disp=FALSE) pcodm <- resimg$odm pcgdm <- greyimage(pcodm) pcbdm <- binimage(pcgdm, method="otsu")$binimg k <- findk(pcbdm, disp=TRUE)$k cat("Proposed number of clusters:", k, "\n")
The function findk2
finds the number of clusters by tracing only along the diagonal of the blocks in ODIs. In this way only the well-separated blocks can be counted. The count is proposed as the number of clusters in the dataset. With the function findk2
in following code chunk, pcodm
, the ordered dissimilarities matrix is traced along its diagonal. The count of blocks is proposed as the number of clusters in the datasetprotcarbo
.
resimg <- vatimage(protcarbo, disp=FALSE) pcodm <- resimg$odm pcgdm <- greyimage(pcodm) pcbdm <- binimage(pcgdm)$binimg k <- findk2(pcbdm, disp=TRUE) print(k)
Binary images generated with known number of blocks can be used for testing the performances of block counting algorithms. In the following code chunk a binary image is generated with 6 blocks and processed by the function findk
and findk2
, respectively.
bdm1 <- genbinimg(nb=6, seed=4, disp=TRUE)$binimg k <- findk(bdm1, disp=FALSE)$k print(k)
Since the findk2
traces the binary image through its diagonal only it fails to find the blocks located outside diagonal. So it is recommended that to use the function findk
for detecting the blocks more successfully.
k <- findk2(bdm1, disp=FALSE) print(k)
On the other hand, if the blocks are exactly positioned along the diagonal of binary image, both the functions will give the same result as shown below.
The output from the function findk
for the binary image has 4 blocks along its diagonal:
bdm2 <- genbinimg(nb=4, seed=65, disp=TRUE)$binimg k <- findk(bdm2, disp=FALSE)$k print(k)
The output from the function findk2
for the binary image has 4 blocks along its diagonal:
k <- findk2(bdm2, disp=FALSE) print(k)
Bezdek, J. C., & Hathaway, R. J. (2002). VAT: A tool for visual assessment of (cluster) tendency. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290), Vol. 3, pp. 2225-2230. IEEE.
Cebeci, Z. & Yildiz, F. (2015). Görsel Kümelenme Eğilimi Değerlendirmesi ve R'de Uygulaması. Çukurova Üniversitesi Ziraat Fakültesi Dergisi, 30 (2), 1-8. (URL: https://dergipark.org.tr/en/download/article-file/219860)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.