
Get duplicate content score between 2 web pages. duplicateContentR takes 2 urls as input and computes a duplicate content score to detect plagiarism.



Getting started

Run the folowing lines to load needed packages

packages <- c("XML", "httr", "textrank", "duplicateContentR")
lapply(packages, library, character.only = TRUE)

Call the duplicate_content_score function by specified 3 arguments (url a, url b, your user agent*) and enjoy!

duplicate_content_score("", "","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0")

NB: You can get your user agent by asking Google "What is my user agent?"


Questions and feedbacks welcome!

You want to contribute ? Open a pull request ;-) If you encounter a bug or want to suggest an enhancement, please open an issue.

remibacha/duplicateContentR documentation built on May 16, 2019, 3:24 p.m.