textreuse: Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Package overview README.md Introduction to the textreuse package Minhash and locality-sensitive hashing Pairwise comparisons for document similarity Text Alignment

Vignettes Man pages API and functions Files

Package details
Author	Lincoln Mullen [aut, cre] (<https://orcid.org/0000-0001-5103-6917>)
Maintainer	Lincoln Mullen <lincoln@lincolnmullen.com>
License	MIT + file LICENSE
Version	0.1.5
URL	https://docs.ropensci.org/textreuse https://github.com/ropensci/textreuse
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("textreuse")`