get_gold: Extract text from a PDF with embedded text.
In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections

Description Usage Arguments Value Examples

Uses pdftools::pdf_text to get the text layer from PDF 'file', which is used as the 'gold standard' against which OCR'd versions are compared. Checks that the text layer is distilled from the original document rather than a text layer from OCR, e.g., a scanner that OCRs.

1	get_gold(file, write = FALSE, save = TRUE)

`file`	Path to the PDF to be processed
`write`	Whether to write the text to file [FALSE]
`save`	Whether to save the text as a .rda [TRUE]

List of pages with text layer if layer not from OCR; else NULL

1	# res <- get_gold("test.pdf", "GOLDs")

jacob-ogre/ocrerrors documentation built on May 18, 2019, 8:01 a.m.

jacob-ogre/ocrerrors index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jacob-ogre/ocrerrors
Find Optical Character Recognition Errors and Corrections

get_gold: Extract text from a PDF with embedded text.
In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections

Description

Usage

Arguments

Value

Examples

Related to get_gold in jacob-ogre/ocrerrors...

R Package Documentation

Browse R Packages

We want your feedback!

jacob-ogre/ocrerrors Find Optical Character Recognition Errors and Corrections

get_gold: Extract text from a PDF with embedded text. In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections

Description

Usage

Arguments

Value

Examples

Related to get_gold in jacob-ogre/ocrerrors...

R Package Documentation

Browse R Packages

We want your feedback!

jacob-ogre/ocrerrors
Find Optical Character Recognition Errors and Corrections

get_gold: Extract text from a PDF with embedded text.
In jacob-ogre/ocrerrors: Find Optical Character Recognition Errors and Corrections