In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

K-means {.tabset}

K-means is a generic clustering algorithm that has been used in many application areas. In R, it can be applied via the stats::kmeans() function. Typically, it is applied to a reduced dimension representation of the expression data (most often PCA, because of the interpretability of the low-dimensional distances). We need to define the number of clusters in advance.

kmeans_used_functions <- "stats::kmeans()"

if (cfg$CLUSTER_KMEANS_KBEST_ENABLED) {
  kmeans_used_functions <- c(kmeans_used_functions, "cluster::clusGap()", "cluster::maxSE()")

  cat(scdrake::str_space(
    "It is also possible to determine an optimal value of `k`.",
    "One way to measure the goodness of clustering is to calculate within-cluster sum of squares $W$",
    "(i.e. sum of distances between each data point and cluster center). The optimal `k` should have clusters with minimal $W$.",
    "Here, we used a modified [gap statistic method](https://datasciencelab.wordpress.com/tag/gap-statistic/) described in",
    "[OSCA](https://bioconductor.org/books/3.12/OSCA/clustering.html#base-implementation).\n\n"
  ))
  x <- scdrake::create_a_link(cluster_kmeans_kbest_gaps_plot_file, "**PDF with gap statistics**", href_rel_start = fs::path_dir(report_html_file), do_cat = TRUE)
}

The relationships in cluster abundances under different ks are visualized in the clustree plot below. Stable clusters across different ks can be quickly find as straight or little branched vertical lines.

r scdrake::create_a_link(cluster_kmeans_k_clustree_file, "**PDF with clustree**", href_rel_start = fs::path_dir(report_html_file))

r scdrake::format_used_functions(kmeans_used_functions)

res <- scdrake::generate_dimred_plots_clustering_section(
  dimred_plots_clustering_files, dimred_plots_clustering_united_files, "kmeans", c("k", "kbest"), fs::path_dir(report_html_file), 3
)

bioinfocz/scdrake documentation built on Sept. 19, 2024, 4:43 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

K-means {.tabset}

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

K-means {.tabset}

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language