StanfordCoreNLP: StanfordCoreNLP Annotator Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description Details Super class Public fields Methods Examples

StanfordCoreNLP Annotator Class.

The StanfordCoreNLP class exposes the pipeline of StanfordCoreNLP for processing text. Its main functionality is exposed to R by way of an R6 class. The special focus of this implementation is to use the multithreading capacities of StanfordCoreNLP from R.

The StanfordCorenNLP pipeline uses multithreading (a) by processing files in parallel. This requires that chunks of text are present as files in one directory. The $processFiles() method exposes this functionality. The number of threads to be used is controlled by setting the property "threads" accordingly, see examples and vignette. This approach is fast and memory efficient, as it allows effectively a line-by-line approach.

The second approach to multithreading is (b) to process sentences in parallel, i.e. after tokenization and sentence segmentation further annotation tasks such as POS annotation and NER recognition are carried out in parallel. Whether this parallelization is used is controlled by setting the properties "pos.nthreads", "ner.nthreads" and alike. See examples.

bignlp::AnnotationPipeline -> StanfordCoreNLP

pipeline: Instance of the StanfordCoreNLP class.
outputter: An outputter (JSON, CoNLL, XML) to generate string output from annotations.
output_format: Which output format to use ("json", "xml", "conll").
properties: A Properties Java object to control multithreading.

Inherited methods

bignlp::AnnotationPipeline$annotate()

Method `new()`

Usage

StanfordCoreNLP$new(
  corenlp_dir = getOption("bignlp.corenlp_dir"),
  properties,
  output_format = c("xml", "json", "conll")
)

Arguments

corenlp_dir: Directory where StanfordCoreNLP resides.
properties: Either the filename of a properties file or a Java properties object.
output_format: Either "json", "xml", "conll".

Method `process()`

Process a string.

Usage

StanfordCoreNLP$process(txt, purge = TRUE)

Arguments

txt: A (length-one) character vector to process.
purge: A logical value, whether to preprocess input string txt.
doc_id: An ID to prepend.

Returns

If output_format is "json" or "xml", a string is returned, if output_format is "conll", a data.frame.

Method `process_files()`

Process all files in the stated directory (argument dir). Parallel processing is possible if a 'threads' key the properties object is defined and sets a number of cores to use.

Usage

StanfordCoreNLP$process_files(dir)

Arguments

dir: Directory with files to process (in parallel).

Returns

The method returns (invisibly) the files expected to result from the tagging exercise.

Method `verbose()`

Set whether calls of the class shall be verbose.

Usage

StanfordCoreNLP$verbose(x)

Arguments

x: A logical value. If TRUE, all status messages are shown, if FALSE, only error messages are displayed.

Returns

The class is returned invisibly

Method `clone()`

The objects of this class are cloneable with this method.

Usage

StanfordCoreNLP$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

if (getOption("bignlp.corenlp_dir") == "") corenlp_install(lang = "de")

txt <- "Das ist ein Satz. Und das ist ein zweiter Satz."

props_file <- corenlp_get_properties_file(lang = "de")
CNLP <- StanfordCoreNLP$new(output_format = "json", properties = props_file)
j <- CNLP$process(txt = txt)

CNLP <- StanfordCoreNLP$new(output_format = "xml", properties = props_file)
x <- CNLP$process(txt = txt)

CNLP <- StanfordCoreNLP$new(output_format = "conll", properties = props_file)
c <- CNLP$process(txt = txt)


# Java parallellization - processing sentences in parallel

library(data.table)
reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
dt <- data.table(doc_id = 1L:length(reuters_txt), text = reuters_txt)

options(java.parameters = "-Xmx4g")

n_cores <- as.character(parallel::detectCores() - 1L)
properties_file <- corenlp_get_properties_file(lang = "en", fast = TRUE)
props <- properties(properties_file)
props$put("pos.nthreads", as.character(parallel::detectCores() - 1L))
props$put("ner.nthreads", as.character(parallel::detectCores() - 1L))

CNLP <- StanfordCoreNLP$new(output_format = "conll", properties = props)

y <- CNLP$process(dt[1][["text"]])

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.

PolMine/bignlp index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PolMine/bignlp
Fast and Memory-Efficient Annotation of Big Corpora

StanfordCoreNLP: StanfordCoreNLP Annotator Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description

Details

Super class

Public fields

Methods

Public methods

Method `new()`

Usage

Arguments

Method `process()`

Usage

Arguments

Returns

Method `process_files()`

Usage

Arguments

Returns

Method `verbose()`

Usage

Arguments

Returns

Method `clone()`

Usage

Arguments

Examples

Related to StanfordCoreNLP in PolMine/bignlp...

R Package Documentation

Browse R Packages

We want your feedback!

PolMine/bignlp Fast and Memory-Efficient Annotation of Big Corpora

StanfordCoreNLP: StanfordCoreNLP Annotator Class. In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description

Details

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method process()

Usage

Arguments

Returns

Method process_files()

Usage

Arguments

Returns

Method verbose()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Related to StanfordCoreNLP in PolMine/bignlp...

R Package Documentation

Browse R Packages

We want your feedback!

PolMine/bignlp
Fast and Memory-Efficient Annotation of Big Corpora

StanfordCoreNLP: StanfordCoreNLP Annotator Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Method `new()`

Method `process()`

Method `process_files()`

Method `verbose()`

Method `clone()`