Description Usage Arguments Details Value Author(s) Examples
Run the term extractor on a document
| 1 2 3 4 5 | doc2keywords(doc.file, ecoextract = getEcoExtractPyScript(),
  results.dir = character(), results.file = file.path(results.dir,
  gsub("xml$", "rds", basename(doc.file))), cache.dir = character(),
  cache.file = file.path(cache.dir, gsub("xml$", "rds", basename(doc.file))),
  section.text = load_text(doc.file, cache.file, cache.dir))
 | 
| doc.file | a file to parse, either XML or PDF | 
| ecoextract | file path to the ecoextract.py script | 
| results.dir | optional, directory to store the results as a rds file. If not specified, no results will be saved. If the directory does not currently exist, it will be created. | 
| results.file | optional, file name to use for the results, defaults to the  | 
| cache.dir | optional directory to cache the intermediate text results from  | 
| cache.file | optional, file name to use for the cached section text | 
| section.text | a list, with one element per section to be processed | 
This function will run the term extractor (based on EpiTator https://github.com/ecohealthalliance/EpiTator) on a document. The document can be either XML generated by pdftohtml or a PDF document which will be internally converted to a XML document. Additionally, the raw text can also be provided. Results and intermediate text split by sections can be optionally saved.
a list, with one element per section with all resolved keywords arranged in a nested list.
Matt Espe and Duncan Temple Lang
| 1 2 3 | txt = "This mentions China"
ans = doc2keywords(section.text = short_text)
getLocation(ans)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.