The fast optimal circular clustering (FOCC) [@Debnath21] and the heuristic repeated $K$-means circular clustering (HEUC) algorithms are applied on the CpG sites of the Candidatus Carsonella ruddii genome (GenBank accession number CP019943.1). Both algorithms clustered the CpG sites into 14 groups, as shown in the figure below.
library(OptCirClust) library(ape) library(knitr) library(graphics) opar <- par(mar=c(0,0,2,0)) opts_chunk$set(fig.width=6, fig.height=4) Event <- "CG" K <- 14 # Seq <- read.GenBank("CP019943.1", as.character = TRUE)[] file <- system.file("extdata", "CP019943.1.fasta", package = "OptCirClust") Seq <- ape::read.dna(file, format="fasta", as.matrix=FALSE, as.character = TRUE) Seq <- toupper(paste(Seq$`CP019943.1 Candidatus Carsonella ruddii strain BC chromosome, complete genome`, collapse = '')) V <- gregexpr(Event, Seq) O <- sort(V[][1:length(V[])]) Circumference <- nchar(Seq) set.seed(1) result <- CirClust(O, K, Circumference, method = "FOCC") plot(result, main = "Optimal circular clustering") # arrows(.58, - 1.75, 0.48, -1.45, length = 0.125, angle = 30, code = 2, col="orange", lwd=4) # arrows(0, -10, 0, 0, length = 0.125, angle = 30, code = 2, col="orange", lwd=4) arrows(0.167, -0.55, 0,-0.145, length = 0.125, angle = 30, code = 1, col="orange", lwd=4) result_km <- CirClust(O, K, Circumference, method = "HEUC") plot(result_km, main = "Heuristic circular clustering",) # arrows(.58, - 1.75, 0.4, -1.5, length = 0.125, angle = 30, code = 2, col="orange", lwd=4) arrows(0.135, -0.55, 0,-0.145, length = 0.125, angle = 30, code = 1, col="orange", lwd=4) par(opar)
The clusters obtained by FOCC algorithm are more compact and justifiable as compared to the HEUC ones. The cluster border between the C8 and C9 clusters of the optimal clustering are more subjectively justifiable as compared to the border between C4 and C8 clusters of the heuristic clustering outcome. The cluster borders are pointed by orange arrows inside the circular genome. A fixed seed for random number generation is used to force $K$-means to always return the same results.
Therefore, the advantage of optimal clustering over the heuristic clustering algorithm is evident in this example representing practical applications.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.