usgs_gdd: Test dataset of public USGS datasets

Description Usage Format Source

Description

An example StanfordCoreNLP 352 output for 5 USGS documents from GeoDeepDive. Note that the data is not normalized, most fields are comma-separated arrays. We recommend using cleanPunctuation()

Usage

1

Format

A character matrix with 14,560 rows and 9 fields:

docid

Document id number, alphanumeric code

sentid

Sentence ide number, integer identifying the sentence in the document

wordidx

integer, index number of word in sentence

words

string, the actual words of the sentence

poses

grammatical parts of speech codes

ners

StanfordCoreNLP default named-entities

lemmas

Stems of words

dep_paths

Linkages between words

dep_parents

The index numbers for the dep_paths

Source

https://geodeepdive.org/


aazaff/geocarrot documentation built on May 5, 2019, 9:44 p.m.