Home

/

CRAN

/

genieclust

/

mst: Minimum Spanning Tree of the Pairwise Distance Graph

mst: Minimum Spanning Tree of the Pairwise Distance Graph
In genieclust: Fast and Robust Hierarchical Clustering with Noise Points Detection

mst	R Documentation

Minimum Spanning Tree of the Pairwise Distance Graph

Description

An parallelised implementation of a Jarnik (Prim/Dijkstra)-like algorithm for determining a(*) minimum spanning tree (MST) of a complete undirected graph representing a set of n points with weights given by a pairwise distance matrix.

(*) Note that there might be multiple minimum trees spanning a given graph.

Usage

mst(d, ...)

## Default S3 method:
mst(
  d,
  distance = c("euclidean", "l2", "manhattan", "cityblock", "l1", "cosine"),
  M = 1L,
  cast_float32 = TRUE,
  verbose = FALSE,
  ...
)

## S3 method for class 'dist'
mst(d, M = 1L, verbose = FALSE, ...)

Arguments

`d`	either a numeric matrix (or an object coercible to one, e.g., a data frame with numeric-like columns) or an object of class `dist`, see `dist`
`...`	further arguments passed to or from other methods
`distance`	metric used to compute the linkage, one of: `"euclidean"` (synonym: `"l2"`), `"manhattan"` (a.k.a. `"l1"` and `"cityblock"`), `"cosine"`
`M`	smoothing factor; `M` = 1 gives the selected `distance`; otherwise, the mutual reachability distance is used
`cast_float32`	logical; whether to compute the distances using 32-bit instead of 64-bit precision floating-point arithmetic (up to 2x faster)
`verbose`	logical; whether to print diagnostic messages and progress information

Details

If d is a numeric matrix of size n p, the n (n-1)/2 distances are computed on the fly, so that O(n M) memory is used.

The algorithm is parallelised; set the OMP_NUM_THREADS environment variable Sys.setenv to control the number of threads used.

Time complexity is O(n^2) for the method accepting an object of class dist and O(p n^2) otherwise.

If M >= 2, then the mutual reachability distance m(i,j) with smoothing factor M (see Campello et al. 2013) is used instead of the chosen "raw" distance d(i,j). It holds m(i, j)=\max(d(i,j), c(i), c(j)), where c(i) is d(i, k) with k being the (M-1)-th nearest neighbour of i. This makes "noise" and "boundary" points being "pulled away" from each other. Genie++ clustering algorithm (see gclust) with respect to the mutual reachability distance gains the ability to identify some observations are noise points.

Note that the case M = 2 corresponds to the original distance, but we determine the 1-nearest neighbours separately as well, which is a bit suboptimal; you can file a feature request if this makes your data analysis tasks too slow.

Value

Matrix of class mst with n-1 rows and 3 columns: from, to and dist. It holds from < to. Moreover, dist is sorted nondecreasingly. The i-th row gives the i-th edge of the MST. (from[i], to[i]) defines the vertices (in 1,...,n) and dist[i] gives the weight, i.e., the distance between the corresponding points.

The method attribute gives the name of the distance used. The Labels attribute gives the labels of all the input points.

If M > 1, the nn attribute gives the indices of the M-1 nearest neighbours of each point.

Author(s)

Marek Gagolewski and other contributors

References

Jarnik V., O jistem problemu minimalnim, Prace Moravske Prirodovedecke Spolecnosti 6, 1930, 57-63.

Olson C.F., Parallel algorithms for hierarchical clustering, Parallel Comput. 21, 1995, 1313-1325.

Prim R., Shortest connection networks and some generalisations, Bell Syst. Tech. J. 36, 1957, 1389-1401.

Campello R.J.G.B., Moulavi D., Sander J., Density-based clustering based on hierarchical density estimates, Lecture Notes in Computer Science 7819, 2013, 160-172, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-642-37456-2_14")}.

Examples

library("datasets")
data("iris")
X <- iris[1:4]
tree <- mst(X)

genieclust documentation built on Sept. 11, 2024, 8:01 p.m.

genieclust index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

genieclust
Fast and Robust Hierarchical Clustering with Noise Points Detection

mst: Minimum Spanning Tree of the Pairwise Distance Graph
In genieclust: Fast and Robust Hierarchical Clustering with Noise Points Detection

Minimum Spanning Tree of the Pairwise Distance Graph

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to mst in genieclust...

R Package Documentation

Browse R Packages

We want your feedback!

genieclust Fast and Robust Hierarchical Clustering with Noise Points Detection

mst: Minimum Spanning Tree of the Pairwise Distance Graph In genieclust: Fast and Robust Hierarchical Clustering with Noise Points Detection

Minimum Spanning Tree of the Pairwise Distance Graph

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to mst in genieclust...

R Package Documentation

Browse R Packages

We want your feedback!

genieclust
Fast and Robust Hierarchical Clustering with Noise Points Detection

mst: Minimum Spanning Tree of the Pairwise Distance Graph
In genieclust: Fast and Robust Hierarchical Clustering with Noise Points Detection