get_embed_pages: Return a vector of pages with embedded text
In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections

Description Usage Arguments Value Examples

A PDF with a text layer may also have image-only pages. To find errors, we need to OCR only the (degraded) pages with embedded text because finding differences between the BAD and GOLD versions of the text depends on counts of each word...if OCR picks up words on pages that pdf_text cannot see because they are in am image, then GOLD != BAD because of different inputs.