ssvSignalClustering: Clustering as for a heatmap. This is used internally by...

View source: R/functions_clusteringKmeans.R

ssvSignalClusteringR Documentation

Clustering as for a heatmap. This is used internally by ssvSignalHeatmap but can also be run before calling ssvSignalHeatmap for greater control and access to clustering results directly.

Description

Clustering is via k-means by default. The number of clusters is determined by nclust. Optionally, k-means can be initialized with a data.frame provided to k_centroids. As an alternative to k-means, a membership table from ssvMakeMembTable can be provided to determine logical clusters.

Usage

ssvSignalClustering(
  bw_data,
  nclust = NULL,
  k_centroids = NULL,
  memb_table = NULL,
  row_ = "id",
  column_ = "x",
  fill_ = "y",
  facet_ = "sample",
  cluster_ = "cluster_id",
  max_rows = 500,
  max_cols = 100,
  clustering_col_min = -Inf,
  clustering_col_max = Inf,
  within_order_strategy = valid_sort_strategies[2],
  dcast_fill = NA,
  iter.max = 30,
  fun.aggregate = "mean"
)

Arguments

bw_data

a GRanges or data.table of bigwig signal. As returned from ssvFetchBam and ssvFetchBigwig

nclust

Number of clusters. Defaults to 6 if nclust, k_centroids, and memb_table are not set.

k_centroids

data.frame of centroids for k-means clusters. Incompatible with nclust or memb_table.

memb_table

Membership table as from ssvMakeMembTable. Logical groups from membership table will be clusters. Incompatible with nclust or k_centroids.

row_

variable name mapped to row, likely id or gene name for ngs data. Default is "id" and works with ssvFetch* output.

column_

varaible mapped to column, likely bp position for ngs data. Default is "x" and works with ssvFetch* output.

fill_

numeric variable to map to fill. Default is "y" and works with ssvFetch* output.

facet_

variable name to facet horizontally by. Default is "sample" and works with ssvFetch* output. Set to "" if data is not facetted.

cluster_

variable name to use for cluster info. Default is "cluster_id".

max_rows

for speed rows are sampled to 500 by default, use Inf to plot full data

max_cols

for speed columns are sampled to 100 by default, use Inf to plot full data

clustering_col_min

numeric minimum for col range considered when clustering, default in -Inf

clustering_col_max

numeric maximum for col range considered when clustering, default in Inf

within_order_strategy

one of "hclust", "sort", "right", "left", "none", "reverse". If "hclust", hierarchical clustering will be used. If "sort", a simple decreasing sort of rosSums. If "left", will attempt to put high signal on left ("right" is opposite). If "none", existing order is preserved. If "reverse" reverses existing order.

dcast_fill

value to supply to dcast fill argument. default is NA.

iter.max

Number of max iterations to allow for k-means. Default is 30.

fun.aggregate

Function to aggregate when multiple values present for facet_, row_, and column_. The function should accept a single vector argument or be a character string naming such a function.

Details

Within each cluster, items will either be sorted by decreasing average signal or hierachically clustered; this is controlled via within_order_strategy.

Value

data.table of signal profiles, ready for ssvSignalHeatmap

Examples

data(CTCF_in_10a_profiles_gr)
clust_dt = ssvSignalClustering(CTCF_in_10a_profiles_gr)
ssvSignalHeatmap(clust_dt)

clust_dt2 = ssvSignalClustering(CTCF_in_10a_profiles_gr, nclust = 2)
ssvSignalHeatmap(clust_dt2)

#clustering can be targetted to a specific part of the region
clust_dt3 = ssvSignalClustering(CTCF_in_10a_profiles_gr, nclust = 2,
    clustering_col_min = -250, clustering_col_max = -150)
ssvSignalHeatmap(clust_dt3)

# there are also multiple sorting strategies to apply within each cluster
clust_dt4 = ssvSignalClustering(
  CTCF_in_10a_profiles_gr,
  nclust = 2,
  within_order_strategy = "left"
)
ssvSignalHeatmap(clust_dt4)

clust_dt5 = ssvSignalClustering(
  CTCF_in_10a_profiles_gr,
  nclust = 2,
  within_order_strategy = "sort"
)
ssvSignalHeatmap(clust_dt5)

jrboyd/seqsetvis documentation built on June 1, 2024, 5:39 a.m.