textreuse: Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Package details

AuthorLincoln Mullen [aut, cre] (<https://orcid.org/0000-0001-5103-6917>)
MaintainerLincoln Mullen <lincoln@lincolnmullen.com>
LicenseMIT + file LICENSE
URL https://docs.ropensci.org/textreuse https://github.com/ropensci/textreuse
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:

Try the textreuse package in your browser

Any scripts or data that you put into this service are public.

textreuse documentation built on July 8, 2020, 6:40 p.m.