search_for: Search for similar sentences

Description Usage Arguments Value Examples

Description

search_for is used to search for similar or exact sentences in a PDF.

Usage

1
search_for(x, sen, exact = FALSE, cos_sim = 0.5)

Arguments

x

File name/path of the PDF.

sen

Sentence to be used to search in the text.

exact

If you search for the exact sentence, the default is FALSE and the cosine distance is used as similarity measurement.

cos_sim

Similarity parameter of the cosine distance. The output contains sentences which have cosine similarity greater or equal 'cos_sim'. The default is 0.5.

Value

A tibble data frame that contains the measured cosine similarity and the location of the match, the page number and the sentence number.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# PDF from Book Reports,
# URL: https://www.bookreports.info/hansel-and-gretel-summary/
file <- system.file('pdf', 'summary_hansel_and_gretel.pdf', package = 'antiplugr')

# a similar sentence from 'grimm_hanse_and_gretel.pdf' from Short Story America,
# URL: http://www.shortstoryamerica.com/pdf_classics/grimm_hanse_and_gretel.pdf
sen_1 <- "When four weeks had passed and Hansel was still thin, impatience overcame her, and she would wait no longer."

# an exact sentence
sen_2 <- "When four weeks had passed and Hansel was still thin, the witch got tired."

search_for(file, sen_1)
search_for(file, sen_2, exact = TRUE)

annamariakl/antiplugr documentation built on May 15, 2019, 11:49 a.m.