usgs_gdd: Test dataset of public USGS datasets
In aazaff/geocarrot: R Interface for GeoDeepDive Library

An example StanfordCoreNLP 352 output for 5 USGS documents from GeoDeepDive. Note that the data is not normalized, most fields are comma-separated arrays. We recommend using cleanPunctuation()

usgs_gdd

A character matrix with 14,560 rows and 9 fields:

docid: Document id number, alphanumeric code
sentid: Sentence ide number, integer identifying the sentence in the document
wordidx: integer, index number of word in sentence
words: string, the actual words of the sentence
poses: grammatical parts of speech codes
ners: StanfordCoreNLP default named-entities
lemmas: Stems of words
dep_paths: Linkages between words
dep_parents: The index numbers for the dep_paths