simil: Compute similarity/distance between rows or columns of large...

Description Usage Arguments See Also Examples

View source: R/proxy.R

Description

Fast similarity/distance computation function for large sparse matrices. You can floor small similarity value to to save computation time and storage space by an arbitrary threashold (min_simil) or rank (rank). Please increase the numbner of threads for better perfromance using setThreadOptions.

Usage

1
2
3
4
5
6
7
8
simil(x, y = NULL, margin = 1, method = c("cosine", "correlation",
  "jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching",
  "faith"), min_simil = NULL, rank = NULL, drop0 = FALSE,
  digits = 14)

dist(x, y = NULL, margin = 1, method = c("euclidean", "chisquared",
  "hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"),
  p = 2, drop0 = FALSE, digits = 14)

Arguments

x

Matrix object

y

if a matrix or Matrix object is provided, proximity between documents or features in x and y is computed.

margin

integer indicating margin of similarity/distance computation. 1 indicates rows or 2 indicates columns.

method

method to compute similarity or distance

min_simil

the minimum similarity value to be recoded.

rank

an integer value specifying top-n most similarity values to be recorded.

drop0

if TRUE, zero values are removed regardless of min_simil or rank.

digits

determines rounding of small values towards zero. Use primarily to correct rounding errors in C++. See zapsmall.

p

weight for minkowski distance

See Also

zapsmall

Examples

1
2
3
4
mt <- Matrix::rsparsematrix(100, 100, 0.01)
simil(mt, method = "cosine")[1:5, 1:5]
mt <- Matrix::rsparsematrix(100, 100, 0.01)
dist(mt, method = "euclidean")[1:5, 1:5]

Example output

Attaching package: 'proxyC'

The following object is masked from 'package:stats':

    dist

5 x 5 sparse Matrix of class "dsTMatrix"
              
[1,] . . . . .
[2,] . 1 0 . 0
[3,] . 0 1 . 0
[4,] . . . . .
[5,] . 0 0 . 1
5 x 5 sparse Matrix of class "dsTMatrix"
                                                     
[1,] 0.0000000 0.8220706 0.2832825 0.2800000 1.383040
[2,] 0.8220706 0.0000000 0.7741117 0.7729166 1.559423
[3,] 0.2832825 0.7741117 0.0000000 0.0430000 1.355083
[4,] 0.2800000 0.7729166 0.0430000 0.0000000 1.354400
[5,] 1.3830401 1.5594230 1.3550827 1.3544002 0.000000

proxyC documentation built on July 21, 2019, 9:04 a.m.