Description Author(s) References
Turn pdf document into XML for further processing in a corpus preparation pipeline. The particular focus of the package is to cleanly extract text from layouted pdf documents (multi-column layout etc.).
Andreas Blaette (andreas.blaette@uni-due.de)
http://polmine.sowi.uni-due.de
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.