SetPageSegMode: Set and Query the Page Segmentation Mode for Tesseract...

View source: R/ext.R

SetPageSegModeR Documentation

Set and Query the Page Segmentation Mode for Tesseract Instance

Description

These functions allow us to set at what level the OCR is done - lines, words, characters, etc.

Usage

SetPageSegMode(api, mode)
GetPageSegMode(api)

Arguments

api

an instance of the TesseractBaseAPI-class obtained from a call to tesseract

mode

a value of class PageSegMode or that can be coerced to such a value. This can be an integer or a character string corresponding to the values or names of the vector PageSegModeValues or can be any of the corresponding variables, e.g. PSM_OSD_ONLY, etc.

Value

SetPageSegMode is used for its side-effect of setting the value in the tesseract instance.

GetPageSegMode returns an object of class PageSegMode.

Author(s)

Duncan Temple Lang

References

API http://zdenop.github.io/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html

See Also

tesseract, Recognize, GetText, etc.

Examples

f = system.file("images", "1990-10.png", package = "Rtesseract")
ts = tesseract(f)
GetPageSegMode(ts)
b = GetBoxes(ts)

# See if any of the matched elements are blank spaces corresponding to lines
i = grep("^[[:space:]]*$", rownames(b))

# Change to PSM_AUTO mode
SetPageSegMode(ts, 'PSM_AUTO')
GetPageSegMode(ts)
b1 = GetBoxes(ts)
# Find the potential lines
i = grep("^[[:space:]]*$", rownames(b1))
b1[i,]

# The last of these is the large black rectangle at the bottom of the
# page due to the page not covering the entire scanner bed.

duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.