compare: Compare a document to another document

Description Usage Arguments Value Examples

Description

Compares a document to another document to find similar sentences. The cosine similarity is used to compare both documents.

Usage

1
compare(x, source, cos_sim = 0.6)

Arguments

x

File name/path of the PDF.

source

File name/path of the source which should be compared to the document x (source has to be in PDF format).

cos_sim

Similarity parameter of the cosine distance. The output contains sentences which have cosine similarity greater or equal 'cos_sim'. The default is 0.6.

Value

A tibble data frame that contains the measured cosine similarity, the similar sentence of the document x and the location of the match, from both documents the page number and the sentence number.

Examples

1
2
3
4
5
6
7
8
9
# PDF from Book Reports,
# URL: https://www.bookreports.info/hansel-and-gretel-summary/ a bit modified.
file1 <- system.file('pdf', 'summary_hansel_and_gretel.pdf', package = 'antiplugr')

# PDF from Short Story America,
# URL: http://www.shortstoryamerica.com/pdf_classics/grimm_hanse_and_gretel.pdf
file2 <- system.file('pdf', 'grimm_hanse_and_gretel.pdf', package = 'antiplugr')

compare(file1, file2)

annamariakl/antiplugr documentation built on May 15, 2019, 11:49 a.m.