get_embed_pages: Return a vector of pages with embedded text

Description Usage Arguments Value Examples

View source: R/gold_std.R

Description

A PDF with a text layer may also have image-only pages. To find errors, we need to OCR only the (degraded) pages with embedded text because finding differences between the BAD and GOLD versions of the text depends on counts of each word...if OCR picks up words on pages that pdf_text cannot see because they are in am image, then GOLD != BAD because of different inputs.

Usage

1

Arguments

txt

An object (list of pages) from pdf_tools::pdf_text

Value

A vector of pages with an embedded text layer

Examples

1
# to be added

jacob-ogre/ocrerrors documentation built on May 18, 2019, 8:01 a.m.