SetPageSegMode | R Documentation |
These functions allow us to set at what level the OCR is done - lines, words, characters, etc.
SetPageSegMode(api, mode)
GetPageSegMode(api)
api |
an instance of the |
mode |
a value of class |
SetPageSegMode
is used for its side-effect of setting the
value in the tesseract instance.
GetPageSegMode
returns an object of class PageSegMode
.
Duncan Temple Lang
API http://zdenop.github.io/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html
tesseract
, Recognize
,
GetText
, etc.
f = system.file("images", "1990-10.png", package = "Rtesseract")
ts = tesseract(f)
GetPageSegMode(ts)
b = GetBoxes(ts)
# See if any of the matched elements are blank spaces corresponding to lines
i = grep("^[[:space:]]*$", rownames(b))
# Change to PSM_AUTO mode
SetPageSegMode(ts, 'PSM_AUTO')
GetPageSegMode(ts)
b1 = GetBoxes(ts)
# Find the potential lines
i = grep("^[[:space:]]*$", rownames(b1))
b1[i,]
# The last of these is the large black rectangle at the bottom of the
# page due to the page not covering the entire scanner bed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.