In DIGI-VUB/text.alignment: Text Alignment with Smith-Waterman

knitr::opts_chunk$set(echo = TRUE, message = FALSE, comment = NA, eval = TRUE)

Smith Waterman

Smith-Waterman is an algorithm to identify similaries between sequences. The algorithm is explained in detail at https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm and finds a local optimal alignment between 2 sequences of letters.

This package implements the algorithm for sequences of letters as well as sequences of words and is usefull for text analytics researchers.

The package uses similar code as the textreuse::local_align function and also allows to align character sequences next to aligning word sequences

Example usage

The package was set up in order to easily

Find names in documents even if they are not correctly spelled
Match 2 texts
Find relevant sequences of texts in other texts

We show some examples of these use cases below.

library(text.alignment)

Example matching 2 names

a <- "Gaspard   Tournelly cardeur à laine"
b <- "Gaspard   Bourelly cordonnier"
smith_waterman(a, b)

a <- "Gaspard   T.  cardeur à laine"
b <- "Gaspard   Tournelly cardeur à laine"
smith_waterman(a, b, type = "characters")

Example matching 2 translations

a <- system.file(package = "text.alignment", "extdata", "example1.txt")
a <- readLines(a)
a <- paste(a, collapse = "\n")
b <- system.file(package = "text.alignment", "extdata", "example2.txt")
b <- readLines(b)
b <- paste(b, collapse = "\n")
cat(a, sep = "\n")
cat(b, sep = "\n")

smith_waterman(a, b, type = "words")

Find relevant sequences of texts in other texts

x <- smith_waterman("Lange rei", b)
x$b$tokens[x$b$alignment$from:x$b$alignment$to]
overview <- as.data.frame(x)
overview$b_from
overview$b_to
substr(overview$b, overview$b_from, overview$b_to)

Get alignment overview as a data.frame

x <- smith_waterman(a, b)
x <- as.data.frame(x, alignment_id = "matching-a-to-b")
str(x)

DIGI-VUB/text.alignment documentation built on Sept. 18, 2023, 7:26 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DIGI-VUB/text.alignment
Text Alignment with Smith-Waterman

In DIGI-VUB/text.alignment: Text Alignment with Smith-Waterman

Smith Waterman

Example usage

Example matching 2 names

Example matching 2 translations

Find relevant sequences of texts in other texts

Get alignment overview as a data.frame

R Package Documentation

Browse R Packages

We want your feedback!

DIGI-VUB/text.alignment Text Alignment with Smith-Waterman

In DIGI-VUB/text.alignment: Text Alignment with Smith-Waterman

Smith Waterman

Example usage

Example matching 2 names

Example matching 2 translations

Find relevant sequences of texts in other texts

Get alignment overview as a data.frame

R Package Documentation

Browse R Packages

We want your feedback!

DIGI-VUB/text.alignment
Text Alignment with Smith-Waterman