SetPageSegMode | R Documentation |
These functions allow us to set at what level the OCR is done - lines, words, characters, etc.
SetPageSegMode(api, mode) GetPageSegMode(api)
api |
an instance of the |
mode |
a value of class |
SetPageSegMode
is used for its side-effect of setting the
value in the tesseract instance.
GetPageSegMode
returns an object of class PageSegMode
.
Duncan Temple Lang
API http://zdenop.github.io/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html
tesseract
, Recognize
,
GetText
, etc.
f = system.file("images", "1990-10.png", package = "Rtesseract") ts = tesseract(f) GetPageSegMode(ts) b = GetBoxes(ts) # See if any of the matched elements are blank spaces corresponding to lines i = grep("^[[:space:]]*$", rownames(b)) # Change to PSM_AUTO mode SetPageSegMode(ts, 'PSM_AUTO') GetPageSegMode(ts) b1 = GetBoxes(ts) # Find the potential lines i = grep("^[[:space:]]*$", rownames(b1)) b1[i,] # The last of these is the large black rectangle at the bottom of the # page due to the page not covering the entire scanner bed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.