SetPageSegMode: Set and Query the Page Segmentation Mode for Tesseract...
In duncantl/Rtesseract: Interface to the tesseract OCR system

SetPageSegMode

R Documentation

Set and Query the Page Segmentation Mode for Tesseract Instance

Description

These functions allow us to set at what level the OCR is done - lines, words, characters, etc.

Usage

SetPageSegMode(api, mode)
GetPageSegMode(api)

Arguments

`api`	an instance of the `TesseractBaseAPI-class` obtained from a call to `tesseract`
`mode`	a value of class `PageSegMode` or that can be coerced to such a value. This can be an integer or a character string corresponding to the values or names of the vector `PageSegModeValues` or can be any of the corresponding variables, e.g. `PSM_OSD_ONLY`, etc.

Value

SetPageSegMode is used for its side-effect of setting the value in the tesseract instance.

GetPageSegMode returns an object of class PageSegMode.

Author(s)

Duncan Temple Lang

References

API http://zdenop.github.io/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html

Examples

f = system.file("images", "1990-10.png", package = "Rtesseract")
ts = tesseract(f)
GetPageSegMode(ts)
b = GetBoxes(ts)

# See if any of the matched elements are blank spaces corresponding to lines
i = grep("^[[:space:]]*$", rownames(b))

# Change to PSM_AUTO mode
SetPageSegMode(ts, 'PSM_AUTO')
GetPageSegMode(ts)
b1 = GetBoxes(ts)
# Find the potential lines
i = grep("^[[:space:]]*$", rownames(b1))
b1[i,]

# The last of these is the large black rectangle at the bottom of the
# page due to the page not covering the entire scanner bed.

duncantl/Rtesseract documentation built on Sept. 8, 2024, 8:38 a.m.