Circular genome clustering
In OptCirClust: Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Optimal versus heuristic cluster borders on CpG sites of a circular bacterial genome

The fast optimal circular clustering (FOCC) [@Debnath21] and the heuristic repeated $K$-means circular clustering (HEUC) algorithms are applied on the CpG sites of the Candidatus Carsonella ruddii genome (GenBank accession number CP019943.1). Both algorithms clustered the CpG sites into 14 groups, as shown in the figure below.

library(OptCirClust)
library(ape)
library(knitr)
library(graphics)

opar <- par(mar=c(0,0,2,0))

opts_chunk$set(fig.width=6, fig.height=4) 

Event <- "CG"

K <- 14

# Seq <- read.GenBank("CP019943.1", as.character = TRUE)[[1]]
file <- system.file("extdata", "CP019943.1.fasta", package = "OptCirClust")

Seq <- ape::read.dna(file, format="fasta", as.matrix=FALSE, as.character = TRUE)

Seq <- toupper(paste(Seq$`CP019943.1 Candidatus Carsonella ruddii strain BC chromosome, complete genome`, collapse = ''))

V <- gregexpr(Event, Seq)

O <- sort(V[[1]][1:length(V[[1]])])

Circumference <- nchar(Seq)

set.seed(1)

result <- CirClust(O, K, Circumference, method = "FOCC")

plot(result, main = "Optimal circular clustering")

# arrows(.58, - 1.75, 0.48, -1.45, length = 0.125, angle = 30, code = 2, col="orange", lwd=4)
# arrows(0, -10, 0, 0, length = 0.125, angle = 30, code = 2, col="orange", lwd=4)
arrows(0.167, -0.55, 0,-0.145, length = 0.125, angle = 30, code = 1, col="orange", lwd=4)

result_km <- CirClust(O, K, Circumference, method = "HEUC")

plot(result_km, main = "Heuristic circular clustering",)

# arrows(.58, - 1.75, 0.4, -1.5, length = 0.125, angle = 30, code = 2, col="orange", lwd=4)

arrows(0.135, -0.55, 0,-0.145, length = 0.125, angle = 30, code = 1, col="orange", lwd=4)

par(opar)

The clusters obtained by FOCC algorithm are more compact and justifiable as compared to the HEUC ones. The cluster border between the C8 and C9 clusters of the optimal clustering are more subjectively justifiable as compared to the border between C4 and C8 clusters of the heuristic clustering outcome. The cluster borders are pointed by orange arrows inside the circular genome. A fixed seed for random number generation is used to force $K$-means to always return the same results.

Therefore, the advantage of optimal clustering over the heuristic clustering algorithm is evident in this example representing practical applications.

References

Any scripts or data that you put into this service are public.

OptCirClust documentation built on July 28, 2021, 9:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

OptCirClust
Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Circular genome clustering
In OptCirClust: Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Optimal versus heuristic cluster borders on CpG sites of a circular bacterial genome

References

Try the OptCirClust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

OptCirClust Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Circular genome clustering In OptCirClust: Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Optimal versus heuristic cluster borders on CpG sites of a circular bacterial genome

References

Try the OptCirClust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

OptCirClust
Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible

Circular genome clustering
In OptCirClust: Circular, Periodic, or Framed Data Clustering: Fast, Optimal, and Reproducible