cermine: Converts a PDF to a JATS file.

Description Usage Arguments Value Author(s) Examples

Description

Reads a PDF and converts it's content to a Journal Article Tag Suite (JATS) xml file.

Usage

1
cermine(path, outputs, exts, override, timeout, configuration)

Arguments

path

path to a directory containing PDF files.

outputs

(optional) list of extraction output(s); possible values: "jats" (document metadata and content in NLM JATS format), "text" (raw document text), "zones" (text zones with their labels), "trueviz" (geometric structure in TrueViz format), "images" (images from the document); default: "jats,images".

exts

(optional) a comma-separated list of extensions of the resulting files; the list has to have the same length as output list; default: "cermxml,images".

override

(optional) Boolean whether to override previous created files or not. Default: FALSE

timeout

(optional) approximate maximum allowed processing time for a PDF file in seconds; by default, no timeout is used; the value is approximate because in some cases, the program might be allowed to slightly exceeded this time, say by a second or two.

configuration

(optional) path to configuration properties file see https://github.com/CeON/CERMINE for description of available configuration properties.

Value

A vector containing the file reference to the JATS xml file.

Author(s)

Jason Mumbulla, jasonmumbulla@gmail.com

Examples

1
2
3
4
5
6
7
8
cermine(c("~/pdfdir"))

# overwrite any existing converted JATS files.
cermine(c("~/pdfdir"),override=TRUE)

# convert pdfs in the directory ~/pdfdir, overwriting
# any existing files, outputs as text with the file extension txt.
cermine(c("~/pdfdir"),override=TRUE,outputs=c("text"),ext=c("txt"))

jmumbulla/cermineR documentation built on May 23, 2019, 8:03 p.m.