fit_embeds_to_pairs: Fit embeds to pairs

View source: R/fit_embeds.R

fit_embeds_to_pairsR Documentation

Fit embeds to pairs

Description

Fit an embeddings matrix to a dataframe of known pairs of related concepts. Depending on matrix dimension, either compute all pair-wise similarities, or only those existing in the known pairs.

Usage

fit_embeds_to_pairs(
  m_embeds,
  df_pairs,
  df_pairs_cols = 1:2,
  similarity = c("inprod", "cosine", "cov_simi", "norm_inprod"),
  threshold_projs = 0.9,
  max_concepts = 1000
)

Arguments

m_embeds

Embedding matrix, rownames must be able to be matched to concepts in df_pairs

df_pairs

Known relationships data frame

df_pairs_cols

Columns of df_pairs for identifiers, that map to m_embeds rownames

similarity

Similarity measure to be computed. One of 'inprod' (inner product), 'cosine', 'cov_simi' (covariance similarity), 'norm_inprod' (normalized inner product).

threshold_projs

Specificity threshold to use for projections. (default 0.9 is equivalent to 10 percent false positives, and 0.95 to 5 percent false positives)

max_concepts

Maximum number of concepts to compute all pair-wise similarities

Value

List object with slots roc (pROC::roc return), sims and truth (to recompute partial AUCs using pROC), threshold_5fp (5 percent false positive threshold), n_concepts (length of concepts in embeddings), and df_projs (data frame listing pair-wise concepts similarities above threshold_projs).


kgraph documentation built on April 12, 2025, 1:42 a.m.