buildKmknn: Pre-cluster points with k-means

View source: R/buildKmknn.R

buildKmknnR Documentation

Pre-cluster points with k-means

Description

Perform k-means clustering in preparation for a KMKNN nearest-neighbors search.

Usage

buildKmknn(
  X,
  transposed = FALSE,
  distance = c("Euclidean", "Manhattan", "Cosine"),
  ...
)

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).

transposed

Logical scalar indicating whether X is transposed, i.e., rows are variables and columns are data points.

distance

String specifying the type of distance to use.

...

Further arguments to pass to kmeans.

Details

This function is automatically called by findKmknn and related functions. However, it can be called directly by the user to save time if multiple queries are to be performed to the same X.

Points in X are reordered to improve data locality during the nearest-neighbor search. Specifically, points in the same cluster are contiguous and ordered by increasing distance from the cluster center.

After k-means clustering, the function will store the coordinates of the cluster center in the output object. In addition, it records a list of extra information of length equal to the number of clusters. Each entry corresponds a cluster (let's say cluster j) and is a list of length 2. The first element is an integer scalar containing the zero-index of the first point in the reordered data matrix that is assigned to j. The second element is a numeric vector containing the distance of each point in the cluster from the cluster center.

Value

A KmknnIndex object containing indexing structures for the KMKNN search.

Author(s)

Aaron Lun

See Also

kmeans, for optional arguments.

KmknnIndex for details on the output class.

findKmknn, queryKmknn and findNeighbors, for dependent functions.

Examples

Y <- matrix(rnorm(100000), ncol=20)
out <- buildKmknn(Y)
out


LTLA/BiocNeighbors documentation built on Jan. 14, 2024, 9:46 p.m.