compareWord | R Documentation |
If we have the true text, we can use compareWords
and other functions to compare the OCR results to the truth
and determine which symbols were matched incorrectly.
The results can then be displayed on the image and the
incorrect symblols identified.
compareWords
process a collection of words;
compareWordInfo
processes a single word and the corresponding
true/actual value for the word.
compareWords(ocr, truth)
compareWordInfo(ocr, truth)
ocr |
the words from the OCR classification |
truth |
the true words |
compareWords
returns a data frame
with a row for each symbol/character that was different between
the OCR version and the truth.
The data frame contains
ocr |
the character recognized by the OCR system, incorrectly |
truth |
the true value of the character |
position |
the index in the word of the misclassified character/symbol |
wordIndex |
the index of the word in which the misclassification occured |
trueWord |
the value of the true word |
ocrWord |
the value of the word as recognized by the OCR system |
symbolIndex |
the index of the character/symbol in the entire set of symbols |
This function does not yet handle the case where the OCR and true words do not have the same length.
Duncan Temple Lang
Tesseract https://code.google.com/p/tesseract-ocr/
tesseract
, GetText
, GetConfidences
compareWords(c("Duncin", "Temple", "Lung"), c("Duncan", "Temple", "Lang"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.