Home

/

CRAN

/

ocr: Tesseract OCR
In cpp11tesseract: Open Source OCR Engine

ocr	R Documentation

Tesseract OCR

Description

Extract text from an image. Requires that you have training data for the language you are reading. Works best for images with high contrast, little noise and horizontal text. See tesseract wiki and the package vignette for image preprocessing tips.

Usage

ocr(file, engine = tesseract("eng"), HOCR = FALSE, opw = "", upw = "")

ocr_data(file, engine = tesseract("eng"))

Arguments

`file`	file path or raw vector (png, tiff, jpeg, etc).
`engine`	a tesseract engine created with `tesseract()`. Alternatively a language string which will be passed to `tesseract()`.
`HOCR`	if `TRUE` return results as HOCR xml instead of plain text
`opw`	owner password to open pdf (please pass it as an environment variable to avoid leaking sensitive information)
`upw`	user password to open pdf (please pass it as an environment variable to avoid leaking sensitive information)

Details

The ocr() function returns plain text by default, or hOCR text if hOCR is set to TRUE. The ocr_data() function returns a data frame with a confidence rate and bounding box for each word in the text.

Value

character vector of text extracted from the file. If the file is has TIFF or PDF extension, it will be a vector of length equal to the number of pages.

References

Tesseract: Improving Quality

Examples

file <- system.file("examples", "test.png", package = "cpp11tesseract")
text <- ocr(file)
cat(text)

cpp11tesseract documentation built on April 4, 2025, 5:24 a.m.

cpp11tesseract index

Package overview Using the Tesseract OCR engine in R

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cpp11tesseract
Open Source OCR Engine

ocr: Tesseract OCR
In cpp11tesseract: Open Source OCR Engine

Tesseract OCR

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to ocr in cpp11tesseract...

R Package Documentation

Browse R Packages

We want your feedback!

cpp11tesseract Open Source OCR Engine

ocr: Tesseract OCR In cpp11tesseract: Open Source OCR Engine

Tesseract OCR

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to ocr in cpp11tesseract...

R Package Documentation

Browse R Packages

We want your feedback!

cpp11tesseract
Open Source OCR Engine

ocr: Tesseract OCR
In cpp11tesseract: Open Source OCR Engine