Description Usage Arguments Value Note References See Also Examples
Compute and return distances and indices of rows or columns within a specified distance threshold
with respect to a specified distance metric. The algorithm works best for Euclidean
distance (the default option).
Alternatively compute the t
closest rows when rank=TRUE
. Or use
columns=TRUE
to compute distances between columns instead, which is somewhat
cheaper for this algorithm than computing row distances.
Increase p to cut down the total number of candidate pairs evaluated,
at the expense of costlier truncated SVDs.
1 2 3 |
A |
an m by n real-valued dense or sparse matrix |
t |
a threshold distance value either in absolute distance (the default) or rank order (see |
p |
projected subspace dimension |
filter |
"local" filters candidate set sequentially, "distributed" computes thresholded correlations in a parallel code section which can be faster but requires that the data matrix is available (see notes). |
method |
the distance measure to be used, one of "euclidean", or "manhattan". Any unambiguous substring can be given. |
rank |
when |
dry_run |
set |
max_iter |
when |
columns |
set to |
restart |
either output from a previous run of |
... |
additional arguments passed to |
A list with elements:
indices
A three-column matrix. The first two columns contain
indices of rows meeting the distance threshold t
,
the third column contains the corresponding distance value (not returned
when dry_run=TRUE
).
restart
A truncated SVD returned by the IRLBA used to restart the
algorithm (only returned when dry_run=TRUE
).
tot
The total number of _possible_ vectors that meet
the correlation threshold identified by the algorithm.
longest_run
The largest number of successive entries in the
ordered first singular vector within a projected distance defined by the
correlation threshold; Equivalently, the number of n * p
matrix
vector products employed in the algorithm, not counting the truncated SVD step.
t
The threshold value.
svd_time
Time to compute truncated SVD.
total_time
Total run time.
When rank=TRUE
the method returns at least, and perhaps more than, the top t
closest
indices and their distances, unless they could not be found within the iteration
limit max_iter
.
http://arxiv.org/abs/1512.07246 (preprint)
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.