find_central_elements_by_cluster: Encapsulation of steps to create clusters and determine most...

View source: R/find_central_clone.R

find_central_elements_by_clusterR Documentation

Encapsulation of steps to create clusters and determine most central elements of each cluster


Generate clusters using kmeans method, and determine most representative element for each cluster using a pca analysis (most central feature in pca space) , mhorn similarity index (most similar feature), or pearson/spearman correlation (most correlated feature).


  anno_mark_font_size = 8,
  annotate_central_elements = T,
  annotate_central_elements_n_clusters = 40,
  central_element_circle_radius = 1/10,
  centrality_methods = "by-rank",
  cluster_id_width = NA,
  cluster_plot_sizes = NA,
  dist_method = "euclidean",
  file_prefix = "central_elements",
  grid_size = 100,
  grid_units = "mm",
  hclust_method = "complete",
  max_clusters = 40,
  max_depth = NA,
  min_clusters = 1L,
  my_threads = 1,
  my_seed = NA,
  output_central_elements = T,
  output_cumulative_variance = F,
  output_dir = ".",
  output_gmt = T,
  output_heatmap = F,
  output_pc1_vs_pc2 = F,
  output_ranked_central_elements = T,
  rank_clm = "Rank",
  rank_df = NULL,
  row_annotation_lwd = 0.25,
  row_annotation_width = 15,
  row_annotation_width_units = "mm",
  row_dend_lwd = 0.25,
  row_dend_width = 15,
  row_dend_width_units = "mm",
  hm_raster_quality = 5,
  show_hm_row_names = T



data.frame on which to perform PCA, mhorn or spearman analysis and kmeans clustering. Importantly: Rows must be named after features.


A character vector with strings specifying the method for selecting the most central feature of a cluster:

  • two-in-a-row - using PCA, selects the feature that shows up two times in a row as we calculate sum of squares adding more and more PC's is selected

  • max-depth - using PCA, selects the feature with the maximum sum of squares calculated across the number of pc's requested as the "max_depth"

  • first-most-frequent - using PCA, determines the max sum of squares for 2 pcs, 3 pcs, 4 pcs ... up to N pc's and then picks the feature that showed up the most times across all those calculations

  • mhorn - feature most similar to others (ie, largest sum to all other elements) wins

  • spearman - feature most similar to others (ie, largest sum to all other elements) wins

  • pearson - feature most similar to others (ie, largest sum to all other elements) wins

  • by-rank - defaults to the most significant according to rank_df


An integer indicating how many characters to use for cluster group and cluster number id's. Defaults to one more than the number of characters in max_clusters.


Integer vector indicating which cluster groups to save as plots with clusters circled and central elements labeled. Only used if centrality_methods is one of the pca options.


String indicating the method to pass to stats::dist method for clustering


The text to be prepended to the file names for tables and plots


Number to specify the size of the heatmap


Number to specify the units corresponding to grid_size of the heatmap


String indicating the method to pass to stats::hclust method for clustering


Integer indicating the maximum number of clusters to split data into


Integer indicating the maximum depth across principle components to use for determining most central element


Integer indicating the minimum number of clusters to split data into


Integer value specifying to number of parallel processes to use when calculating mhorn indices. Defaults to 1.


The seed key to use so clustering can be reproduced


Boolean whether or not to save the table of central elements by cluster group


Boolean whether to save a plot of the cumulative variance explained by the pca axes. Only used if centrality_methods is one of the pca options.


The base directory to which files and plots will be saved


Boolean whether or not to save the gmt data to file


Boolean whether to save correlation heatmap to file. Ignored if centrality_methods is one of the PCA options.


Boolean whether to save the table of unique central elements sorted by rank within cluster group


One-length character vector with the name of the column holding the initial rankings, if any, in either rank_df if one was sent, or in feature_df otherwise


Data.frame with feature_df features by row in column one and rank_clm with numeric default ranking for tie-breaking. If <NA> rank_clm will be looked for in feature_df.


Boolean whether to save a plot of the principle component 1 and 2 axes. Only used if centrality_methods is one of the pca options.


Returns 3 variable list with cluster_members, seed, and results. Results is a named list of each centrality_methods with central_elements and either pca or correlations ( depending on the centrality_methods )

Benjamin-Vincent-Lab/binfotron documentation built on Oct. 1, 2024, 8:33 p.m.