Rtesseract: Interface to the tesseract OCR system

tesseractFuns

R Documentation

Set and Query the Tesseract Object

Description

These functions provide ways to both set and query the state of the tesseract object.

Usage

ReadConfigFile(api, files, debug = FALSE, ok = FALSE)
SetImage(api, pix)
SetInputName(api, name, check = TRUE, load = TRUE)
GetInputName(api)
GetImage(api, asArray = FALSE)

SetPageSegMode(api, mode)
GetPageSegMode(api)

SetRectangle(api, ..., dims = sapply(list(...), as.integer))

SetSourceResolution(api, ppi)
GetSourceYResolution(api)

SetOutputName(api, filename)

ProcessPages(filename, api = tesseract(), timeout = 0L, out = tempfile())

Arguments

`api`	the instance of the `TesseractBaseAPI-class` in which to perform the operations.
`pix`	an object of class `Pix` from `pixRead`.
`asArray`	a logical value. If `FALSE`, the image is returned as a reference to the internal C++ object. If `TRUE`, the contents of the C++ image are returned as a 3-dimensional array.
`ppi`	the per-pixel resolution as an integer.
`dims`	a vector of length 4 giving the location of the rectangle as x1, y1, width, height. This should NOT be the coordinates of the top-left and bottom-right of the rectangle, i.e. `(x1, y1, x2, y2)`. The 3rd and 4th values are the width and height of the box.
`...`	the left, top, width and height
`files`	a character vector specifying the full or relative paths to the configuration files.
`name`	the name of the file being processed by the OCR system.
`ok`	in `ReadConfigFile`, tesseract can locate configuration files in its data directory (typically, /usr/local/share/tessdata or specified with the environment variable `TESSDATA`)
`mode`	the value for the page segmentation mode for the tesseract instance. This must correspond to one of the values in `PageSegModeValues` or the corresponding R variables. However, one can use symbolic names (lower or upper case) from this vector, e.g., `"psm_auto"`.
`check`	check to see if the file actually exists
`load`	load the image in the file name and set it as the current image in the tesseract object.
`filename`, `out`	the name of the file. This is the name of the image file to process, or the output file to which the results of the OCR will be directed, if that occurs. The latter is rarely of interest as we can get this information directly from the `TesseractBaseAPI` object directly.
`timeout`	this is almost always 0 and so not specified. It controls how long any particular step in the processing should be allow to take before terminating the entire process.

Author(s)

Duncan Temple Lang

duncantl/Rtesseract
Interface to the tesseract OCR system

Old/tesseractFuns: Set and Query the Tesseract Object
In duncantl/Rtesseract: Interface to the tesseract OCR system

Set and Query the Tesseract Object

Description

Usage

Arguments

Author(s)

See Also

Related to Old/tesseractFuns in duncantl/Rtesseract...

R Package Documentation

Browse R Packages

We want your feedback!

duncantl/Rtesseract Interface to the tesseract OCR system

Old/tesseractFuns: Set and Query the Tesseract Object In duncantl/Rtesseract: Interface to the tesseract OCR system

Set and Query the Tesseract Object

Description

Usage

Arguments

Author(s)

See Also

Related to Old/tesseractFuns in duncantl/Rtesseract...

R Package Documentation

Browse R Packages

We want your feedback!

duncantl/Rtesseract
Interface to the tesseract OCR system

Old/tesseractFuns: Set and Query the Tesseract Object
In duncantl/Rtesseract: Interface to the tesseract OCR system