ocr_pages: Wrap optical character recognition around a set of files

Description Usage Arguments Value See Also Examples

View source: R/ocr.R

Description

Uses Tesseract (which must be installed and on the $PATH) to perform optical character recognition (OCR) on a file.

Usage

1
ocr_pages(pdf, pages, ext = "png", errfile)

Arguments

pdf

Path to the PDF file to be OCR'd

pages

A vector of pages with embedded text in gold standard

ext

The file extension to be found, either png or tif [png]

errfile

The file to which Tesseract STDERR is written

Value

A data.frame with

See Also

extract_text

Examples

1
# res <- tess_ocr("test.pdf")

jacob-ogre/ocrerrors documentation built on May 18, 2019, 8:01 a.m.