duncantl/Rtesseract: Interface to the tesseract OCR system

This provides a flexible Optical Character Recognition (OCR) facility via the tesseract C++ library. This allows us to read text from images. It also allows us to analyze the results and possible errors in the recognition. We can do data analysis on the errors, if we know the truth, and explore how we may improve the recognition. It also provides some functionality from the leptonica library for performing image processing. This allows us, for example, to detect lines in an image, important for interpreting tables.

Getting started

Package details

AuthorDuncan Temple Lang, Matt Espe
MaintainerDuncan Temple Lang <duncan@r-project.org>
LicenseApache License
Version0.5-0
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("duncantl/Rtesseract")
duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.