BiocNeighbors-raw-index: Reporting raw indices

Description What are raw indices? Motivation Author(s) See Also Examples

Description

An overview of what raw indices mean for neighbor-search implementations that contain a rearranged matrix in the BiocNeighborIndex object.

What are raw indices?

Consider the following call:

1
2
    index <- buildKmknn(vals)    
    out <- findKmknn(precomputed=index, k=k, raw.index=TRUE)

This yields the same output as:

1
2
    PRE <- bndata(index)
    out2 <- findKmknn(X=t(PRE), k=k)

When raw.index=TRUE in the first call, the indices in out$index matrix can be imagined to refer to columns of PRE in the second call. Moreover, all function arguments that previously referred to rows of X (e.g., subset) are now considered to refer to columns of PRE.

The same reasoning applies to all functions where precomputed can be specified in place of X. This includes query-based searches (e.g., queryKmknn) and range searches (rangeFindKmknn).

Motivation

Setting raw.index=TRUE is intended for scenarios where the reordered data in precomputed is used elsewhere. By returning indices to the reordered data, the user does not need to hold onto the original data and/or switch between the original ordering and that in precomputed. This simplifies downstream code and provides a slight speed boost by avoiding the need for re-indexing.

Neighbor search implementations can only return raw indices if their index construction involves transposing X and reordering its columns. This tends to be the case for most implementations as transposition allows efficient column-major distance calculations and reordering improves data locality. Both the KMKNN and VP tree implementations fulfill these requirements and thus have the raw.index option.

Note that setting raw.index=TRUE makes little sense when precomputed is not specified. When precomputed=NULL, a temporary index will be constructed that is not visible in the calling scope. As index construction may be stochastic, the raw indices will not refer to anything that is meaningful to the end-user.

Author(s)

Aaron Lun

See Also

findKmknn and findVptree for examples where raw indices are used.

Examples

1
2
3
4
5
6
vals <- matrix(rnorm(100000), ncol=20)
index <- buildKmknn(vals)    
out <- findKmknn(precomputed=index, raw.index=TRUE, k=5)
alt <- findKmknn(t(bndata(index)), k=5)    
head(out$index)
head(alt$index)

BiocNeighbors documentation built on Dec. 9, 2020, 2:01 a.m.