corenlp_annotate: Annotate a string.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description Usage Arguments Details Value Examples

Use CoreNLP to annotate strings.

corenlp_annotate(x, ...)

## S4 method for signature 'data.table'
corenlp_annotate(
  x,
  corenlp_dir = getOption("bignlp.corenlp_dir"),
  properties,
  purge = TRUE,
  threads = 1L,
  progress = TRUE,
  verbose = TRUE
)

## S4 method for signature 'character'
corenlp_annotate(
  x,
  corenlp_dir = getOption("bignlp.corenlp_dir"),
  properties,
  byline = NULL,
  output_format = "json",
  threads = 1L,
  progress = TRUE,
  preclean = TRUE,
  verbose = TRUE
)

`x`	Either a `data.table` (required to have the columns 'doc_id' and 'text'), or a character vector with input file(s), or a directory. If `input` is a directory, all files in the directory are processed. Files are assumed to be tsv files with two columns ('doc_id' and 'text').
`...`	Further arguments.
`corenlp_dir`	The directory where corenlp resides.
`properties`	A properties file to configure annotator.
`purge`	A `logical` value, whether to preprocess input.
`threads`	An integer value.
`progress`	Logical, whether to show progress bar.
`verbose`	Logical, whether to output messages.
`byline`	Logical, whether to process files in a line-by-line manner.
`output_format`	The output generated, either "json" (default), "txt", or "xml".
`preclean`	Logical, whether to preprocess string.

If argument threads is 1, the tagging result is returned, if output is NULL. If threads is higher than 1, output should be a directory where tagging results will be stored as NDJSON files.

The target files will be returned, so that they can serve as input to corenlp_parse_ndjson.

library(data.table)
reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
reuters_dt <- data.table(doc_id = 1L:length(reuters_txt), text = reuters_txt)

props <- corenlp_get_properties_file(lang = "en", fast = "TRUE")
y <- corenlp_annotate(
  x = reuters_dt,
  properties = props,
  corenlp_dir = corenlp_get_jar_dir(),
  progress = FALSE
)