Old/tesseractFuns: Set and Query the Tesseract Object

tesseractFunsR Documentation

Set and Query the Tesseract Object

Description

These functions provide ways to both set and query the state of the tesseract object.

Usage

ReadConfigFile(api, files, debug = FALSE, ok = FALSE)
SetImage(api, pix)
SetInputName(api, name, check = TRUE, load = TRUE)
GetInputName(api)
GetImage(api, asArray = FALSE)

SetPageSegMode(api, mode)
GetPageSegMode(api)

SetRectangle(api, ..., dims = sapply(list(...), as.integer))

SetSourceResolution(api, ppi)
GetSourceYResolution(api)

SetOutputName(api, filename)

ProcessPages(filename, api = tesseract(), timeout = 0L, out = tempfile())

Arguments

api

the instance of the TesseractBaseAPI-class in which to perform the operations.

pix

an object of class Pix from pixRead.

asArray

a logical value. If FALSE, the image is returned as a reference to the internal C++ object. If TRUE, the contents of the C++ image are returned as a 3-dimensional array.

ppi

the per-pixel resolution as an integer.

dims

a vector of length 4 giving the location of the rectangle as x1, y1, width, height. This should NOT be the coordinates of the top-left and bottom-right of the rectangle, i.e. (x1, y1, x2, y2). The 3rd and 4th values are the width and height of the box.

...

the left, top, width and height

files

a character vector specifying the full or relative paths to the configuration files.

name

the name of the file being processed by the OCR system.

ok

in ReadConfigFile, tesseract can locate configuration files in its data directory (typically, /usr/local/share/tessdata or specified with the environment variable TESSDATA)

mode

the value for the page segmentation mode for the tesseract instance. This must correspond to one of the values in PageSegModeValues or the corresponding R variables. However, one can use symbolic names (lower or upper case) from this vector, e.g., "psm_auto".

check

check to see if the file actually exists

load

load the image in the file name and set it as the current image in the tesseract object.

filename,out

the name of the file. This is the name of the image file to process, or the output file to which the results of the OCR will be directed, if that occurs. The latter is rarely of interest as we can get this information directly from the TesseractBaseAPI object directly.

timeout

this is almost always 0 and so not specified. It controls how long any particular step in the processing should be allow to take before terminating the entire process.

Author(s)

Duncan Temple Lang

See Also

tesseract


duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.