README.md

Access Abbyy Cloud OCR from R

Build Status Appveyor Build status CRAN_Status_Badge codecov Research software impact Github Stars

Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.

The package provides access to the Abbyy Cloud OCR SDK API. Details about results of calls to the API can be found here.

Installation

To get the latest version on CRAN:

install.packages("abbyyR")

To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)

Using abbyyR

To get acquainted with some of the important functions, read the vignettes:

# Overview of the package
vignette("introduction", package = "abbyyR")
# some functions are used along with output
vignette("example", package = "abbyyR")
# how to scrape text from a folder of images
vignette("wiscads", package = "abbyyR")

The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.

License

Scripts are released under the MIT License.

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.



soodoku/abbyyR documentation built on July 19, 2023, 8:36 a.m.