Jaccard or Dice similarity for text documents

JACCARD_DICE(
  token_list1 = NULL,
  token_list2 = NULL,
  method = "jaccard",
  threads = 1
)

Arguments

token_list1

a list of tokenized text documents (it should have the same length as the token_list2)

token_list2

a list of tokenized text documents (it should have the same length as the token_list1)

method

a character string specifying the similarity metric. One of 'jaccard', 'dice'

threads

a numeric value specifying the number of cores to run in parallel

Value

a numeric vector

Details

The function calculates either the jaccard or the dice distance between pairs of tokenized text of two lists

Examples


library(textTinyR)

lst1 = list(c('use', 'this', 'function', 'to'), c('either', 'compute', 'the', 'jaccard'))

lst2 = list(c('or', 'the', 'dice', 'distance'), c('for', 'two', 'same', 'sized', 'lists'))

out = JACCARD_DICE(token_list1 = lst1, token_list2 = lst2, method = 'jaccard', threads = 1)