JACCARD_DICE: Jaccard or Dice similarity for text documents

View source: R/utils.R

JACCARD_DICER Documentation

Jaccard or Dice similarity for text documents

Description

Jaccard or Dice similarity for text documents

Usage

JACCARD_DICE(
  token_list1 = NULL,
  token_list2 = NULL,
  method = "jaccard",
  threads = 1
)

Arguments

token_list1

a list of tokenized text documents (it should have the same length as the token_list2)

token_list2

a list of tokenized text documents (it should have the same length as the token_list1)

method

a character string specifying the similarity metric. One of 'jaccard', 'dice'

threads

a numeric value specifying the number of cores to run in parallel

Details

The function calculates either the jaccard or the dice distance between pairs of tokenized text of two lists

Value

a numeric vector

Examples


library(textTinyR)

lst1 = list(c('use', 'this', 'function', 'to'), c('either', 'compute', 'the', 'jaccard'))

lst2 = list(c('or', 'the', 'dice', 'distance'), c('for', 'two', 'same', 'sized', 'lists'))

out = JACCARD_DICE(token_list1 = lst1, token_list2 = lst2, method = 'jaccard', threads = 1)

mlampros/textTinyR documentation built on Jan. 17, 2024, 1:18 a.m.