Character string sequence matching

Character string sequence matching

# init <- SequenceMatcher$new(string1 = NULL, string2 = NULL)

Details

the ratio method returns a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. This is expensive to compute if getMatchingBlocks() or getOpcodes() hasn’t already been called, in which case you may want to try quickRatio() or realQuickRatio() first to get an upper bound.

the quick_ratio method returns an upper bound on ratio() relatively quickly.

the real_quick_ratio method returns an upper bound on ratio() very quickly.

the get_matching_blocks method returns a list of triples describing matching subsequences. Each triple is of the form [i, j, n], and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j. The last triple is a dummy, and has the value [a.length, b.length, 0]. It is the only triple with n == 0. If [i, j, n] and [i', j', n'] are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

The get_opcodes method returns a list of 5-tuples describing how to turn a into b. Each tuple is of the form [tag, i1, i2, j1, j2]. The first tuple has i1 == j1 == 0, and remaining tuples have i1 equal to the i2 from the preceding tuple, and, likewise, j1 equal to the previous j2. The tag values are strings, with these meanings: 'replace' a[i1:i2] should be replaced by b[j1:j2]. 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

Methods

SequenceMatcher$new(string1 = NULL, string2 = NULL)
--------------
ratio()
--------------
quick_ratio()
--------------
real_quick_ratio()
--------------
get_matching_blocks()
--------------
get_opcodes()

References

https://www.npmjs.com/package/difflib, http://stackoverflow.com/questions/10383044/fuzzy-string-comparison

Methods


Method new()

Usage

SequenceMatcher$new(string1 = NULL, string2 = NULL)

Arguments

string1

a character string.

string2

a character string.


Method ratio()

Usage

SequenceMatcher$ratio()


Method quick_ratio()

Usage

SequenceMatcher$quick_ratio()


Method real_quick_ratio()

Usage

SequenceMatcher$real_quick_ratio()


Method get_matching_blocks()

Usage

SequenceMatcher$get_matching_blocks()


Method get_opcodes()

Usage

SequenceMatcher$get_opcodes()


Method clone()

The objects of this class are cloneable with this method.

Usage

SequenceMatcher$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples


try({
  if (reticulate::py_available(initialize = FALSE)) {

    if (check_availability()) {

      library(fuzzywuzzyR)

      s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair.'

      s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair.'

      init = SequenceMatcher$new(string1 = s1, string2 = s2)

      init$ratio()

      init$quick_ratio()

      init$real_quick_ratio()

      init$get_matching_blocks()

      init$get_opcodes()

    }
  }
}, silent=TRUE)