PolMine/trickypdf: Turn tricky pdf into txt/xml for corpus preparation

The class 'PDF' in the package includes methods to parse, prune and xmlify pdf documents. The speciality of the package is that it can handle pdf documents with two columns, and to correct lines that were scanned somewhat tiltedly.

Getting started

Package details

AuthorAndreas Blaette
MaintainerAndreas Blaette <[email protected]>
LicenseGPL-3
Version0.1.1
URL https://www.github.com/PolMine/pdf2xml
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("devtools")
library(devtools)
install_github("PolMine/trickypdf")
PolMine/trickypdf documentation built on April 7, 2018, 1:02 p.m.