ocr_pages: Wrap optical character recognition around a set of files
In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections

Uses Tesseract (which must be installed and on the $PATH) to perform optical character recognition (OCR) on a file.

1	ocr_pages(pdf, pages, ext = "png", errfile)

`pdf`	Path to the PDF file to be OCR'd
`pages`	A vector of pages with embedded text in gold standard
`ext`	The file extension to be found, either png or tif [png]
`errfile`	The file to which Tesseract STDERR is written

A data.frame with

extract_text

1	# res <- tess_ocr("test.pdf")

jacob-ogre/ocrerrors documentation built on May 18, 2019, 8:01 a.m.

jacob-ogre/ocrerrors index

rdrr.io home R language documentation Run R code online

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Description