intersection of words or letters in tokenized text
intersection of words or letters in tokenized text
# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)
a numeric vector
This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.
text_intersect$new(file_data = NULL)
--------------
count_intersect(distinct = FALSE, letters = FALSE)
--------------
ratio_intersect(distinct = FALSE, letters = FALSE)
https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi
new()
text_intersect$new(token_list1 = NULL, token_list2 = NULL)
token_list1
a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)
token_list2
a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)
count_intersect()
text_intersect$count_intersect(distinct = FALSE, letters = FALSE)
distinct
either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters
either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
ratio_intersect()
text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)
distinct
either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters
either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
clone()
The objects of this class are cloneable with this method.
text_intersect$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(textTinyR) tok1 = list(c('compare', 'this', 'text'), c('and', 'this', 'text')) tok2 = list(c('with', 'another', 'set'), c('of', 'text', 'documents')) init = text_intersect$new(tok1, tok2) init$count_intersect(distinct = TRUE, letters = FALSE) #> [1] 0 1 init$ratio_intersect(distinct = FALSE, letters = TRUE) #> [1] 0.0000000 0.1818182