sparse_similarity: Compute similarity between pairs of rows of a matrix

sparse_similarityR Documentation

Compute similarity between pairs of rows of a matrix

Description

cosine_sparse computes cosine similarity between pairs of rows of a matrix. pearson_sparse computes pearson similarity between pairs of rows of a matrix.

Usage

cosine_sparse(X, id1, id2, ...)

pearson_sparse(X, id1, id2, ...)

Arguments

X

matrix

id1

vector of integers specifying the list of rows of X (first set)

id2

vector of integers specifying the list of rows of X, (second set), same length as id1.

...

arguments passed downstream for parallel processing.

Value

data.frame with the same number of rows as the length of id1 (and id2) containing the similarity between the pairs of rows of X. sim[i] == similarity(X[id1[i], ], X[id2[i], ]).

Examples


set.seed(42)
X <- matrix(rnorm(5 * 3), 5, 3)

id1 <- c(1, 3)
id2 <- c(5, 4)

s1 <- matric::cosine_sparse(X, id1, id2) %>% dplyr::arrange(id1, id2)

Xn <- X / sqrt(rowSums(X * X))

n_rows <- nrow(Xn)

s2 <-
  expand.grid(
    id1 = seq(n_rows),
    id2 = seq(n_rows),
    KEEP.OUT.ATTRS = FALSE
  ) %>%
  dplyr::mutate(sim = as.vector(tcrossprod(Xn))) %>%
  dplyr::inner_join(s1 %>% dplyr::select(id1, id2)) %>%
  dplyr::arrange(id1, id2)

s1

all.equal(s1, s2)

Xm <- X - rowMeans(X)
s3 <- matric::cosine_sparse(Xm, id1, id2) %>% dplyr::arrange(id1, id2)
s4 <- matric::pearson_sparse(X, id1, id2) %>% dplyr::arrange(id1, id2)

all.equal(s3, s4)

matric documentation built on April 1, 2023, 12:19 a.m.