ocr_dictionary: Check the words in text against a dictionary
In lmullen/ocrquality: Measures of OCR Quality

Description Usage Arguments Value Examples

View source: R/ocr_dictionary.R

This function checks the quality of an OCR text against a dictionary. It will return a number between 0 and 1, which is the ratio of words found in the dictionary to the total number of words in the document. The higher the number, the better the quality of the OCR. These measures should not be taken in an absolute sense. That is, a score of 1 does not indicate perfect OCR. They should only be used to determine the relative quality of OCR within a corpus of texts. You can pass a character vector of any length. So, if you split a text into chunks, you can evaluate the OCR quality of each chunk.

1	ocr_dictionary(text, sample_size = -1L)

`text`	A character vector.
`sample_size`	If this value is positive, then this many words from the `text` will be selected for comparison. This is useful for large texts.

A vector of numeric values between 0 and 1.

paragraph <- "Fourr score and sleven years ago our fathers brought
  forth on this continent, a new nation, conceived in Liberty,
  and dedicated to tlhe proposition that all men are created equal."

ocr_dictionary(paragraph)