intersection of words or letters in tokenized text

# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Value

a numeric vector

Details

This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.

Methods

text_intersect$new(file_data = NULL)
--------------
count_intersect(distinct = FALSE, letters = FALSE)
--------------
ratio_intersect(distinct = FALSE, letters = FALSE)

References

https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi

Methods

Method `new()`

Usage

text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Arguments

token_list1: a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)
token_list2: a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)

Method `count_intersect()`

Usage

text_intersect$count_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct: either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters: either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed

Method `ratio_intersect()`

Usage

text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct: either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters: either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed

Method `clone()`

The objects of this class are cloneable with this method.

Usage

text_intersect$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


library(textTinyR)

tok1 = list(c('compare', 'this', 'text'),

            c('and', 'this', 'text'))

tok2 = list(c('with', 'another', 'set'),

            c('of', 'text', 'documents'))


init = text_intersect$new(tok1, tok2)


init$count_intersect(distinct = TRUE, letters = FALSE)
#> [1] 0 1


init$ratio_intersect(distinct = FALSE, letters = TRUE)
#> [1] 0.0000000 0.1818182

intersection of words or letters in tokenized text

Value

Details

Methods

References

Methods

Public methods

Method new()

Usage

Arguments

Method count_intersect()

Usage

Arguments

Method ratio_intersect()

Usage

Arguments

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `count_intersect()`

Method `ratio_intersect()`

Method `clone()`