Description Usage Arguments Details Value Author(s) Examples
Run the term extractor on a document
1 2 3 4 5 | doc2keywords(doc.file, ecoextract = getEcoExtractPyScript(),
results.dir = character(), results.file = file.path(results.dir,
gsub("xml$", "rds", basename(doc.file))), cache.dir = character(),
cache.file = file.path(cache.dir, gsub("xml$", "rds", basename(doc.file))),
section.text = load_text(doc.file, cache.file, cache.dir))
|
doc.file |
a file to parse, either XML or PDF |
ecoextract |
file path to the ecoextract.py script |
results.dir |
optional, directory to store the results as a rds file. If not specified, no results will be saved. If the directory does not currently exist, it will be created. |
results.file |
optional, file name to use for the results, defaults to the |
cache.dir |
optional directory to cache the intermediate text results from |
cache.file |
optional, file name to use for the cached section text |
section.text |
a list, with one element per section to be processed |
This function will run the term extractor (based on EpiTator https://github.com/ecohealthalliance/EpiTator) on a document. The document can be either XML generated by pdftohtml or a PDF document which will be internally converted to a XML document. Additionally, the raw text can also be provided. Results and intermediate text split by sections can be optionally saved.
a list, with one element per section with all resolved keywords arranged in a nested list.
Matt Espe and Duncan Temple Lang
1 2 3 | txt = "This mentions China"
ans = doc2keywords(section.text = short_text)
getLocation(ans)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.