NDJSON: Parse Stanford CoreNLP JSON output.

Description Arguments Fields Methods

Description

The JSON results of applying the Stanford CoreNLP annotators can be written to a streaming JSON file (ndjson format). The NDJSON class offers functionality to parse this kind of data, and to make it available in a tabular format.

Arguments

x

character vector, the JSON string(s) to be parsed

colsToKeep

columns to keep

destfile

a character string naming the file to write to

logfile

a character string naming the file to an error log to; if provided, json strings will be written to this file if parsing the json string string fails

Fields

destfile

a character string naming the file to write to

logfile

character string naming file for error log

colsToKeep

character vector with columns of the parsed Stanford CoreNLP result to keep

Methods

initialize(destfile = character(), logfile = character(), colsToKeep = c("sentence", "index", "word", "pos", "ner"))

Initialize a new instance of the CoreNLP/NDJSON parser.

jsonToDf(x)

Parse a json string to a data.frame. If a destfile has been defined during initialization, the output will be appended to the file provided. Without a destfile, a data.frame is returned. Strings that cannot be parsed are written to the logfile, if it is defined.

processFiles(filenames)

Process one or more files with the output of Stanford CoreNLP in a NDJSON format. If a destfile has been defined during initialization, results are written/appended to that file. Otherwise, a data.frame is returned.


PolMine/ctk documentation built on May 8, 2019, 3:20 a.m.