Jaccard or Dice similarity for text documents
JACCARD_DICE( token_list1 = NULL, token_list2 = NULL, method = "jaccard", threads = 1 )
token_list1 | a list of tokenized text documents (it should have the same length as the token_list2) |
---|---|
token_list2 | a list of tokenized text documents (it should have the same length as the token_list1) |
method | a character string specifying the similarity metric. One of 'jaccard', 'dice' |
threads | a numeric value specifying the number of cores to run in parallel |
a numeric vector
The function calculates either the jaccard or the dice distance between pairs of tokenized text of two lists
library(textTinyR) lst1 = list(c('use', 'this', 'function', 'to'), c('either', 'compute', 'the', 'jaccard')) lst2 = list(c('or', 'the', 'dice', 'distance'), c('for', 'two', 'same', 'sized', 'lists')) out = JACCARD_DICE(token_list1 = lst1, token_list2 = lst2, method = 'jaccard', threads = 1)