xml2eco: XML to EcoJSON

Description Usage Arguments Value Author(s) Examples

Description

Takes an XML file from a PDF created using pdftohtml, and runs the augmented keyword extractor on it. The intermediate text, broken into sections, and the results can be cached to disk.

Usage

1
2
xml2eco(XML, ecoextractLoc = getEcoExtractPyScript(),
        results_dir = character(), cache.dir = character())

Arguments

XML

the name of the XML document generated by converting the PDF document to XML via pdftohtml or indirectly via the convertPDF2XML function in ReadPDF.

ecoextractLoc

the file path to the ecoextract.py script. Defaults to the included script in python/ecoextract.py.

results_dir

optional directory to save the results into. Will be created if it does not exist.

cache.dir

optional directory to save the intermediate text, broken into sections.

Value

A list representation of the JSON object returned by ecoextract.py

location
date
resolved_keyword
txt

original text

Author(s)

Matt Espe

Examples

1
2
f = system.file("sampleDocs", "Rondini-2008-Development of multiplex PCR-liga.xml", package = "SpilloverDA")
tmp = xml2eco(f)

dsidavis/SpilloverDA documentation built on June 1, 2019, 2:55 p.m.