create_clusterball_mapper_object: ClusterBall Mapper

View source: R/baskin_robbins.R

create_clusterball_mapper_objectR Documentation

ClusterBall Mapper

Description

Run Ball Mapper, but non-trivially cluster within the balls. You can use two different distance matrices to for the balling and clustering.

Usage

create_clusterball_mapper_object(
  data,
  dist1,
  dist2,
  eps,
  clusterer = local_hierarchical_clusterer("single")
)

Arguments

data

A data frame.

dist1

A distance matrix for the data frame; this will be used to ball the data. It can be a dist object or a matrix.

dist2

Another distance matrix for the data frame; this will be used to cluster the data after balling. It can be a dist object or a matrix.

eps

A positive real number for the desired ball radius.

clusterer

A function which accepts a list of distance matrices as input, and returns the results of clustering done on each distance matrix; that is, it should return a list of named vectors, whose name are the names of data points and whose values are cluster assignments (integers). If this value is omitted, then single-linkage clustering will be done (and cutting heights will be decided for you).

Value

A list of two data frames, nodes and edges, which contain information about the Mapper graph constructed from the given parameters.

The node data frame consists of:

  • id: vertex ID

  • cluster_size: number of data points in cluster

  • medoid: the name of the medoid of the vertex

  • mean_dist_to_medoid: mean distance to medoid of cluster

  • max_dist_to_medoid: max distance to medoid of cluster

  • cluster_width: maximum pairwise distance within cluster

  • wcss: sum of squares of distances to cluster medoid

  • data: names of data points in cluster

  • patch: level set ID

The edge data frame contains consists of:

  • source: vertex ID of edge source

  • target: vertex ID of edge target

  • weight: Jaccard index of edge; this is the size of the intersection between the vertices divided by the union

  • overlap_data: names of data points in overlap

  • overlap_size: number of data points overlap

Examples

# Create noisy circle data set
data = data.frame(x = sapply(1:1000, function(x) cos(x)) + runif(1000, 0, .25),
y = sapply(1:1000, function(x) sin(x)) + runif(1000, 0, .25))
data.dists = dist(data)

# Set ball radius
eps = 1

# Do single-linkage clustering in the balls to produce Mapper graph
create_clusterball_mapper_object(data, data.dists, data.dists, eps)

mappeR documentation built on Aug. 8, 2025, 6:24 p.m.