OCR_document: Scan PDF with optical character recognition (OCR)
In arete: Automated REtrieval from TExt

OCR_document

R Documentation

Scan PDF with optical character recognition (OCR)

Description

Extract text contained under image form in a PDF through the use of optical character recognition software (OCR). Currently two options are available, method = "nougat" and method = "tesseract".

Usage

OCR_document(in_path, out_path, method = "nougat", verbose = TRUE)

Arguments

`in_path`	character. string of a file with species data in either pdf or txt format, e.g: ./folder/file.pdf
`out_path`	character. Binomial name of the species used with applicable `type`.
`method`	character. Method used for the OCR. Currently it defaults to the only available method, nougatOCR.
`verbose`	logical. Print output after finish.

Details

For now OCR processing of documents is only supported on linux systems.

Value

character. Containing the extracted information.

Examples

## Not run: 
OCR_document("path/to/file.pdf", "path/to/dir")

## End(Not run)

arete documentation built on Nov. 5, 2025, 6:31 p.m.

arete index

Package workflow

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

arete
Automated REtrieval from TExt

OCR_document: Scan PDF with optical character recognition (OCR)
In arete: Automated REtrieval from TExt

Scan PDF with optical character recognition (OCR)

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to OCR_document in arete...

R Package Documentation

Browse R Packages

We want your feedback!

arete Automated REtrieval from TExt

OCR_document: Scan PDF with optical character recognition (OCR) In arete: Automated REtrieval from TExt

Scan PDF with optical character recognition (OCR)

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to OCR_document in arete...

R Package Documentation

Browse R Packages

We want your feedback!

arete
Automated REtrieval from TExt

OCR_document: Scan PDF with optical character recognition (OCR)
In arete: Automated REtrieval from TExt