Description The inplace argument The t functions Implementation Details Author(s)

Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R's S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.

Covariance and correlation should largely need no introduction. Cosine similarity is commonly needed in, for example, natural language processing, where the cosine similarity coefficients of all columns of a term-document or document-term matrix is needed.

`inplace`

argumentWhen computing covariance and correlation with dense matrices,
we must operate on the centered and/or scaled input data. When
`inplace=FALSE`

, a copy of the matrix is made. This
allows for very wall-clock efficient processing at the cost of
m*n additional double precision numbers allocated. On the
other hand, if `inplace=TRUE`

, then the wall-clock
performance will drop considerably, but at the memory expense
of only m+n additional doubles. For perspective, given a
30,000x30,000 matrix, a copy of the data requires an
additional 6.7 GiB of data, while the inplace method requires
only 469 KiB, a 15,000-fold difference.

Note that cosine is always computed in place.

`t`

functionsThe package also includes "t" functions, like `tcosine()`

. These
behave analogously to `tcrossprod()`

as `crossprod()`

in base R.
So of `cosine()`

operates on the columns of the input matrix, then
`tcosine()`

operates on the rows. Another way to think of it is,
`tcosine(x) = cosine(t(x))`

.

Multiple storage schemes for the input data are accepted.
For dense matrices, an ordinary R matrix input is accepted.
For sparse matrices, a matrix in COO format, namely
`simple_triplet_matrix`

from the slam package, is accepted.

The implementation for dense matrix inputs is dominated
by a symmetric rank-k update via the BLAS subroutine `dsyrk`

;
see the package vignette for a discussion of the algorithm
implementation and complexity.

The implementation for two dense vector inputs is dominated by the
product `t(x) %*% y`

performed by the BLAS subroutine
`dgemm`

and the normalizing products `t(y) %*% y`

,
each computed via the BLAS function `dsyrk`

.

Drew Schmidt

coop documentation built on Nov. 17, 2017, 4:05 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.