A large batch of PDFs may contain a mix of text-based and image- based PDFs, and one needs to extract the text from all of these files for analysis. This package offers a single primary function to perform text extraction from PDFs by trying the poppler library's wrapper in pdftools; if that fails, then Imagemagick, unpaper, and Tesseract are used to perform Optical Character Recognition.
|License||BSD_2_clause + file LICENSE|
|Package repository||View on GitHub|
Install the latest version of this package by entering the following in R:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.