corenlp_parse_conll: Parse the CoNLL output of CoreNLP.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description Usage Arguments Details Value

Read CoNLL output from a file and return a data.table with the annotation data.

1	corenlp_parse_conll(x, progress = TRUE)

`x`	A filename, or a `character` vector of filenames. If `x` is a `list` (of `character` vectors of filenames), it will be unlisted to yield a `character` vector.
`progress`	logical

corenlp_parse_conll uses data.table::fread() and supplies settings that prevent undesired behaviour. The resulting data.table will have the columns "idx", "word", "lemma", "pos", "ner", "headidx", "deprel", see the documentation of the CoNLLOutputter class.

A data.frame with 8 columns:

doc_id: Document id, an integer value.
idx: Token Counter, starting at 1 for each new sentence.
word: Word form or punctuation symbol.
lemma: Lemma of word form, or an underscore if not available.
pos: Fine-grained part-of-speech tag, or underscore if not available.
ner: Named Entity tag, or underscore if not available.
headidx: Head of the current token, which is either a value of ID or zero ('0'). This is underscore if not available.
deprel: Dependency relation to the HEAD, or underscore if not available.

Note that Column 1 is generated by bignlp, columns 2-8 map the CoNLL output of CoreNLP; the description of the columns is taken from the documentation of the CoNLLOutputter class

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.