duncantl/Rtesseract: Interface to the tesseract OCR system

This provides a flexible Optical Character Recognition (OCR) facility via the tesseract C++ library. This allows us to read text from images. It also allows us to analyze the results and possible errors in the recognition. We can do data analysis on the errors, if we know the truth, and explore how we may improve the recognition. It also provides some functionality from the leptonica library for performing image processing. This allows us, for example, to detect lines in an image, important for interpreting tables.

README.md

Vignettes Man pages API and functions Files

Package details
Author	Duncan Temple Lang, Matt Espe
Maintainer	Duncan Temple Lang <duncan@r-project.org>
License	Apache License
Version	0.6-0
Package repository	View on GitHub
Installation	Install the latest version of this package by entering the following in R: `install.packages("remotes") remotes::install_github("duncantl/Rtesseract")`