check_embed: Check if text embed is not from OCR
In jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs

Description Usage Arguments Value See Also Examples

Some PDFs have an embedded text layer that is derived from OCR by the scanner or other equipment that produced the PDF. Such documents are more likely to have fundamental errors, e.g., mis-OCR'd columnar text, that can be solved by using OCR rather than extracting the text layer.

1	check_embed(file)

file

Path to a PDF to check for embedding source

Logical: TRUE if good embed, FALSE if from OCR

pdftools::pdf_info

## Not run: 
# res <- summarize_gold("test.pdf", text)

## End(Not run)

jacob-ogre/pdftext documentation built on May 18, 2019, 8:01 a.m.

jacob-ogre/pdftext index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jacob-ogre/pdftext
Extract Text from Text- and Image-based PDFs

check_embed: Check if text embed is not from OCR
In jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs

Description

Usage

Arguments

Value

See Also

Examples

Related to check_embed in jacob-ogre/pdftext...

R Package Documentation

Browse R Packages

We want your feedback!

jacob-ogre/pdftext Extract Text from Text- and Image-based PDFs

check_embed: Check if text embed is not from OCR In jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs

Description

Usage

Arguments

Value

See Also

Examples

Related to check_embed in jacob-ogre/pdftext...

R Package Documentation

Browse R Packages

We want your feedback!

jacob-ogre/pdftext
Extract Text from Text- and Image-based PDFs

check_embed: Check if text embed is not from OCR
In jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs