A large batch of PDFs may contain a mix of text-based and image- based PDFs, and one needs to extract the text from all of these files for analysis. This package offers a single primary function to perform text extraction from PDFs by trying the poppler library's wrapper in pdftools; if that fails, then Imagemagick, unpaper, and Tesseract are used to perform Optical Character Recognition.
Package details |
|
---|---|
Maintainer | |
License | BSD_2_clause + file LICENSE |
Version | 0.2.1 |
Package repository | View on GitHub |
Installation |
Install the latest version of this package by entering the following in R:
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.