ssvSignalClustering: Clustering as for a heatmap. This is used internally by...

View source: R/functions_clusteringKmeans.R

ssvSignalClusteringR Documentation

Clustering as for a heatmap. This is used internally by ssvSignalHeatmap but can also be run before calling ssvSignalHeatmap for greater control and access to clustering results directly.

Description

Clustering is via k-means by default. The number of clusters is determined by nclust. Optionally, k-means can be initialized with a data.frame provided to k_centroids. As an alternative to k-means, a membership table from ssvMakeMembTable can be provided to determine logical clusters.

Usage

ssvSignalClustering(
  bw_data,
  nclust = NULL,
  k_centroids = NULL,
  memb_table = NULL,
  row_ = "id",
  column_ = "x",
  fill_ = "y",
  facet_ = "sample",
  cluster_ = "cluster_id",
  max_rows = 500,
  max_cols = 100,
  clustering_col_min = -Inf,
  clustering_col_max = Inf,
  within_order_strategy = valid_sort_strategies[2],
  dcast_fill = NA,
  iter.max = 30,
  fun.aggregate = "mean"
)

Arguments

bw_data

a GRanges or data.table of bigwig signal. As returned from ssvFetchBam and ssvFetchBigwig

nclust

Number of clusters. Defaults to 6 if nclust, k_centroids, and memb_table are not set.

k_centroids

data.frame of centroids for k-means clusters. Incompatible with nclust or memb_table.

memb_table

Membership table as from ssvMakeMembTable. Logical groups from membership table will be clusters. Incompatible with nclust or k_centroids.

row_

variable name mapped to row, likely id or gene name for ngs data. Default is "id" and works with ssvFetch* output.

column_

varaible mapped to column, likely bp position for ngs data. Default is "x" and works with ssvFetch* output.

fill_

numeric variable to map to fill. Default is "y" and works with ssvFetch* output.

facet_

variable name to facet horizontally by. Default is "sample" and works with ssvFetch* output. Set to "" if data is not facetted.

cluster_

variable name to use for cluster info. Default is "cluster_id".

max_rows

for speed rows are sampled to 500 by default, use Inf to plot full data

max_cols

for speed columns are sampled to 100 by default, use Inf to plot full data

clustering_col_min

numeric minimum for col range considered when clustering, default in -Inf

clustering_col_max

numeric maximum for col range considered when clustering, default in Inf

within_order_strategy

one of "hclust", "sort", "right", "left", "reverse". If "hclust", hierarchical clustering will be used. If "sort", a simple decreasing sort of rosSums. If "left", will atttempt to put high signal on left ("right" is opposite). If "reverse" reverses existing order (should only be used after meaningful order imposed).

dcast_fill

value to supply to dcast fill argument. default is NA.

iter.max

Number of max iterations to allow for k-means. Default is 30.

fun.aggregate

Function to aggregate when multiple values present for facet_, row_, and column_. The function should accept a single vector argument or be a character string naming such a function.

Details

Within each cluster, items will either be sorted by decreasing average signal or hierachically clustered; this is controlled via within_order_strategy.

Value

data.table of signal profiles, ready for ssvSignalHeatmap

Examples

clust_dt = ssvSignalClustering(CTCF_in_10a_profiles_gr)
ssvSignalHeatmap(clust_dt)

clust_dt2 = ssvSignalClustering(CTCF_in_10a_profiles_gr, nclust = 2)
ssvSignalHeatmap(clust_dt2)

#clustering can be targetted to a specific part of the region
clust_dt3 = ssvSignalClustering(CTCF_in_10a_profiles_gr, nclust = 2,
    clustering_col_min = -250, clustering_col_max = -150)
ssvSignalHeatmap(clust_dt3)
clust_dt4 = ssvSignalClustering(CTCF_in_10a_profiles_gr, nclust = 2,
    clustering_col_min = 150, clustering_col_max = 250)
ssvSignalHeatmap(clust_dt4)

jrboyd/seqsetvis documentation built on March 17, 2024, 3:14 p.m.