check_embed: Check if text embed is not from OCR

Description Usage Arguments Value See Also Examples

Description

Some PDFs have an embedded text layer that is derived from OCR by the scanner or other equipment that produced the PDF. Such documents are more likely to have fundamental errors, e.g., mis-OCR'd columnar text, that can be solved by using OCR rather than extracting the text layer.

Usage

1

Arguments

file

Path to a PDF to check for embedding source

Value

Logical: TRUE if good embed, FALSE if from OCR

See Also

pdftools::pdf_info

Examples

1
2
3
4
## Not run: 
# res <- summarize_gold("test.pdf", text)

## End(Not run)

jacob-ogre/pdftext documentation built on May 18, 2019, 8:01 a.m.