corenlp_annotate: Annotate a string.

Description Usage Arguments Details Value Examples

Description

Use CoreNLP to annotate strings.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
corenlp_annotate(x, ...)

## S4 method for signature 'data.table'
corenlp_annotate(
  x,
  corenlp_dir = getOption("bignlp.corenlp_dir"),
  properties,
  purge = TRUE,
  threads = 1L,
  progress = TRUE,
  verbose = TRUE
)

## S4 method for signature 'character'
corenlp_annotate(
  x,
  corenlp_dir = getOption("bignlp.corenlp_dir"),
  properties,
  byline = NULL,
  output_format = "json",
  threads = 1L,
  progress = TRUE,
  preclean = TRUE,
  verbose = TRUE
)

Arguments

x

Either a data.table (required to have the columns 'doc_id' and 'text'), or a character vector with input file(s), or a directory. If input is a directory, all files in the directory are processed. Files are assumed to be tsv files with two columns ('doc_id' and 'text').

...

Further arguments.

corenlp_dir

The directory where corenlp resides.

properties

A properties file to configure annotator.

purge

A logical value, whether to preprocess input.

threads

An integer value.

progress

Logical, whether to show progress bar.

verbose

Logical, whether to output messages.

byline

Logical, whether to process files in a line-by-line manner.

output_format

The output generated, either "json" (default), "txt", or "xml".

preclean

Logical, whether to preprocess string.

Details

If argument threads is 1, the tagging result is returned, if output is NULL. If threads is higher than 1, output should be a directory where tagging results will be stored as NDJSON files.

Value

The target files will be returned, so that they can serve as input to corenlp_parse_ndjson.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(data.table)
reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
reuters_dt <- data.table(doc_id = 1L:length(reuters_txt), text = reuters_txt)

props <- corenlp_get_properties_file(lang = "en", fast = "TRUE")
y <- corenlp_annotate(
  x = reuters_dt,
  properties = props,
  corenlp_dir = corenlp_get_jar_dir(),
  progress = FALSE
)

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.