ocr_pages: Perform optical character recognition on PNGs.

Description Usage Arguments Value See Also Examples

Description

Uses Tesseract (which must be installed and on the $PATH) to perform optical character recognition (OCR) on a pdf. Uses options()$pdftext.tess_conf to specify a custom config for tesseract, which can be set with set_tess_conf.

Usage

1
ocr_pages(pngs, fin_file, verbose = TRUE)

Arguments

pngs

A listing of the temp PNG directory for a PDF

fin_file

Path to the 'final' text file to be written

verbose

Whether to print processing messages [TRUE]

Value

The path to the OCR'd text file

See Also

pdf_to_txt

Examples

1
2
3
4
## Not run: 
res <- ocr_pages("test.pdf")

## End(Not run)

jacob-ogre/pdftext documentation built on May 18, 2019, 8:01 a.m.