Description Details Supported representations Non-conventional approach
The package contains efficient parallel functions for computation of similarity and distance metrics on various sparse and dense representations. Canonical applications of these functions are natural language processing and recommender systems.
Simdist
package uses a higher level abstraction for 2d sparse
representation than the standard sparse matrices software. For every
supported 2d representation primary
and secondary
dimension of
variation of the measurement are defined. Every function in this package
acts either on primary or secondary dimension. The primary reason for
primary/secondary division is computational - computing along primary
dimension is usually more efficient than along the secondary dimension. Even
for dense matrices the "mental model" used in the package is that of nested
lists - higher order grouping (i.e. document) is stored as entries along
primary dimension and inner elements (i.e. terms) are stored across
secondary dimension.
The supported 2d representations and primary-secondary dimensions are as follows:
primary - columns, secondary - rows
Matrix::dgCMatrix
primary - columns, secondary - rows
Matrix::dgRMatrix
primary - rows, secondary - columns
Matrix::dgTMatrix
primary - rows, secondary - columns
slam::simple_triplet_matrix
primary - rows, secondary - columns (not yet supported)
primary - first
id column, secondary - second id column. Id and value columns could be
explicitly marked with psv
function.
primary - first list level, secondary - inner vector level (not yet implemented)
To minimize the risk of logical errors due to mismatched dimensions only distances across same-type objects are currently implemented.
Primary/secondary dimension distinction which allows treating all representations as two-level nested lists.
For named matrices, secondary dimensions are matched by names, not positionaly. This means that even for matrices the size of the secondary dimension need not match. All rows in X not in Y will be considered missing (aka 0s) as if it were a sparse matrix.
No normalization by default. All sim and dist functions accept normalization or scaling functions (transformers) which allow arbitrary transformation of the input matrices.
Cosine similarity of a vector X with 0 vector is 0, in contrast to
proxy
package where it's 1. This preserves coordinate-wise
continuity in 0 and allows for a more efficient implementation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.