pixRead: Read an Image for use with Tesseract

View source: R/ext.R

pixReadR Documentation

Read an Image for use with Tesseract

Description

This function reads an image, determining what format it is and creates a C++ object that can be passed to Tesseract for optical character recognition.

Usage

pixRead(filename, addFinalizer = TRUE, multi = FALSE, ...)
pixCreate(x, ...)

Arguments

filename

the name of the file, fully or relatively qualified as necessary.

x

a vector of length three specifying the width, height and depth of the image.

addFinalizer

a logical value that controls whether the C code for pixRead adds a finalizer to garbage collect the resulting Pix object. When we pass the Pix to tesseract via SetImage, that tesseract object will attempt to free the image when/if it is garbage collected. So we have to avoid two attempts to free the same Pix.

multi

a logical value. If TRUE and the file is a TIFF file, we read it as a multipage TIFF file. If there are multiple images, we return these as a list. If not, i.e. there is only one image, we return that single image. If FALSE, we return the first image in the file.

...

additional arguments, currently ignored for pixRead and passed to the methods for pixCreate.

Value

An object of class Pix-class. It is essential that this is assigned to an R variable so that it is not garbage collected until the Recognize function has been called for the TesseractBaseAPI-class instance.

Author(s)

Duncan Temple Lang

References

Leptonica http://leptonica.com/ Tesseract https://code.google.com/p/tesseract-ocr/

See Also

SetImage

Examples

 f = system.file("images", "OCRSample2.png", package = "Rtesseract")
 pix = pixRead(f)
 pixGetInputFormat(pix)

 plot(pix)
 dim(pix)
 pix[,]
 
 pixGetInputFormat(pix)
 api = tesseract(pix)
 lapply(api, GetAlternatives, "symbol")

duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.