ropensci/textreuse: Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Getting started

Package details

Maintainer
LicenseMIT + file LICENSE
Version0.1.4.9000
URL https://github.com/ropensci/textreuse
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("devtools")
library(devtools)
install_github("ropensci/textreuse")
ropensci/textreuse documentation built on July 20, 2018, 8:57 p.m.