View source: R/03_OCR_document.R
| OCR_document | R Documentation |
Extract text contained under image form in a PDF through the use
of optical character recognition software (OCR). Currently two options are
available, method = "nougat" and method = "tesseract".
OCR_document(in_path, out_path, method = "nougat", verbose = TRUE)
in_path |
character. string of a file with species data in either pdf or txt format, e.g: ./folder/file.pdf |
out_path |
character. Binomial name of the species used with applicable |
method |
character. Method used for the OCR. Currently it defaults to the only available method, nougatOCR. |
verbose |
logical. Print output after finish. |
For now OCR processing of documents is only supported on linux systems.
character. Containing the extracted information.
arete_setup
## Not run:
OCR_document("path/to/file.pdf", "path/to/dir")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.