intersection of words or letters in tokenized text
intersection of words or letters in tokenized text
# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)
a numeric vector
This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.
text_intersect$new(file_data = NULL)--------------count_intersect(distinct = FALSE, letters = FALSE)--------------ratio_intersect(distinct = FALSE, letters = FALSE)https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi
new()text_intersect$new(token_list1 = NULL, token_list2 = NULL)
token_list1a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)
token_list2a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)
count_intersect()text_intersect$count_intersect(distinct = FALSE, letters = FALSE)
distincteither TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letterseither TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
ratio_intersect()text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)
distincteither TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letterseither TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
clone()The objects of this class are cloneable with this method.
text_intersect$clone(deep = FALSE)
deepWhether to make a deep clone.
library(textTinyR) tok1 = list(c('compare', 'this', 'text'), c('and', 'this', 'text')) tok2 = list(c('with', 'another', 'set'), c('of', 'text', 'documents')) init = text_intersect$new(tok1, tok2) init$count_intersect(distinct = TRUE, letters = FALSE) #> [1] 0 1 init$ratio_intersect(distinct = FALSE, letters = TRUE) #> [1] 0.0000000 0.1818182