Character string sequence matching

# init <- SequenceMatcher$new(string1 = NULL, string2 = NULL)

Details

the ratio method returns a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. This is expensive to compute if getMatchingBlocks() or getOpcodes() hasn’t already been called, in which case you may want to try quickRatio() or realQuickRatio() first to get an upper bound.

the quick_ratio method returns an upper bound on ratio() relatively quickly.

the real_quick_ratio method returns an upper bound on ratio() very quickly.

the get_matching_blocks method returns a list of triples describing matching subsequences. Each triple is of the form [i, j, n], and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j. The last triple is a dummy, and has the value [a.length, b.length, 0]. It is the only triple with n == 0. If [i, j, n] and [i', j', n'] are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

The get_opcodes method returns a list of 5-tuples describing how to turn a into b. Each tuple is of the form [tag, i1, i2, j1, j2]. The first tuple has i1 == j1 == 0, and remaining tuples have i1 equal to the i2 from the preceding tuple, and, likewise, j1 equal to the previous j2. The tag values are strings, with these meanings: 'replace' a[i1:i2] should be replaced by b[j1:j2]. 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

Methods

SequenceMatcher$new(string1 = NULL, string2 = NULL)
--------------
ratio()
--------------
quick_ratio()
--------------
real_quick_ratio()
--------------
get_matching_blocks()
--------------
get_opcodes()

References

https://www.npmjs.com/package/difflib, http://stackoverflow.com/questions/10383044/fuzzy-string-comparison

Methods

Method `new()`

Usage

SequenceMatcher$new(string1 = NULL, string2 = NULL)

Arguments

string1: a character string.
string2: a character string.

Method `ratio()`

Usage

SequenceMatcher$ratio()

Method `quick_ratio()`

Usage

SequenceMatcher$quick_ratio()

Method `real_quick_ratio()`

Usage

SequenceMatcher$real_quick_ratio()

Method `get_matching_blocks()`

Usage

SequenceMatcher$get_matching_blocks()

Method `get_opcodes()`

Usage

SequenceMatcher$get_opcodes()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

SequenceMatcher$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


try({
  if (reticulate::py_available(initialize = FALSE)) {

    if (check_availability()) {

      library(fuzzywuzzyR)

      s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair.'

      s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair.'

      init = SequenceMatcher$new(string1 = s1, string2 = s2)

      init$ratio()

      init$quick_ratio()

      init$real_quick_ratio()

      init$get_matching_blocks()

      init$get_opcodes()

    }
  }
}, silent=TRUE)

Character string sequence matching

Details

Methods

References

Methods

Public methods

Method new()

Usage

Arguments

Method ratio()

Usage

Method quick_ratio()

Usage

Method real_quick_ratio()

Usage

Method get_matching_blocks()

Usage

Method get_opcodes()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `ratio()`

Method `quick_ratio()`

Method `real_quick_ratio()`

Method `get_matching_blocks()`

Method `get_opcodes()`

Method `clone()`