compareWord: Compares OCR words to truth

View source: R/compareWords.R

compareWordR Documentation

Compares OCR words to truth

Description

If we have the true text, we can use compareWords and other functions to compare the OCR results to the truth and determine which symbols were matched incorrectly. The results can then be displayed on the image and the incorrect symblols identified.

compareWords process a collection of words; compareWordInfo processes a single word and the corresponding true/actual value for the word.

Usage

compareWords(ocr, truth)
compareWordInfo(ocr, truth)

Arguments

ocr

the words from the OCR classification

truth

the true words

Value

compareWords returns a data frame with a row for each symbol/character that was different between the OCR version and the truth. The data frame contains

ocr

the character recognized by the OCR system, incorrectly

truth

the true value of the character

position

the index in the word of the misclassified character/symbol

wordIndex

the index of the word in which the misclassification occured

trueWord

the value of the true word

ocrWord

the value of the word as recognized by the OCR system

symbolIndex

the index of the character/symbol in the entire set of symbols

Note

This function does not yet handle the case where the OCR and true words do not have the same length.

Author(s)

Duncan Temple Lang

References

Tesseract https://code.google.com/p/tesseract-ocr/

See Also

tesseract, GetText, GetConfidences

Examples

compareWords(c("Duncin", "Temple", "Lung"), c("Duncan", "Temple", "Lang"))

duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.