JACCARD_DICE: Jaccard or Dice similarity for text documents

Description Usage Arguments Details Value Examples

View source: R/utils.R

Description

Jaccard or Dice similarity for text documents

Usage

1
2
3
4
5
6
JACCARD_DICE(
  token_list1 = NULL,
  token_list2 = NULL,
  method = "jaccard",
  threads = 1
)

Arguments

token_list1

a list of tokenized text documents (it should have the same length as the token_list2)

token_list2

a list of tokenized text documents (it should have the same length as the token_list1)

method

a character string specifying the similarity metric. One of 'jaccard', 'dice'

threads

a numeric value specifying the number of cores to run in parallel

Details

The function calculates either the jaccard or the dice distance between pairs of tokenized text of two lists

Value

a numeric vector

Examples

1
2
3
4
5
6
7
library(textTinyR)

lst1 = list(c('use', 'this', 'function', 'to'), c('either', 'compute', 'the', 'jaccard'))

lst2 = list(c('or', 'the', 'dice', 'distance'), c('for', 'two', 'same', 'sized', 'lists'))

out = JACCARD_DICE(token_list1 = lst1, token_list2 = lst2, method = 'jaccard', threads = 1)

mlampros/textTinyR documentation built on Nov. 1, 2021, 8:44 a.m.