jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs

A large batch of PDFs may contain a mix of text-based and image- based PDFs, and one needs to extract the text from all of these files for analysis. This package offers a single primary function to perform text extraction from PDFs by trying the poppler library's wrapper in pdftools; if that fails, then Imagemagick, unpaper, and Tesseract are used to perform Optical Character Recognition.

README.md

Vignettes Man pages API and functions Files

Package details
Maintainer
License	BSD_2_clause + file LICENSE
Version	0.2.1
Package repository	View on GitHub
Installation	Install the latest version of this package by entering the following in R: `install.packages("remotes") remotes::install_github("jacob-ogre/pdftext")`