AnnotationPipeline: AnnotationPipeline Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description Public fields Methods Examples

Worker behind the higher-level StanfordCoreNLP class that allows fine-tuned configuration of an annotation pipeline, see the documentation of CoreNLP Pipelines. The $annotate() method supports processing annotations in parallel. Unlike the StanfordCoreNLP$process_files() method for processing the content of files in parallel, it is a very efficient in-memory operation and the fastest option for processsing medium-sized corpora. But as annotations consume a lot of memory, there are limitations to allocating sufficient heap space required for the parallel in-memory processing of larger corpora. If heap space is insufficient, the process may run endless without a telling warning message or an error. So use the $annotate() method with appropriate care.

pipeline: AnnotationPipeline

Method `new()`

Initialize AnnotationPipeline

Usage

AnnotationPipeline$new(corenlp_dir = getOption("bignlp.corenlp_dir"))

Arguments

corenlp_dir: Directory where StanfordCoreNLP resides.

Method `annotate()`

Annotate a list of strings.

Usage

AnnotationPipeline$annotate(x, threads = NULL, verbose = TRUE)

Arguments

x: A list of character vectors to annotate, an AnnotationList class object or an ArrayList with Annotation objects.
threads: If NULL, all available threads are used, otherwise an integer value with number of threads to use.
verbose: A logical value, whether to show progress messages.

Returns

A Java object .

Method `clone()`

The objects of this class are cloneable with this method.

Usage

AnnotationPipeline$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

A <- AnnotationPipeline$new()
a <- c("This is a sentences.", "Yet another sentence.")
s <- A$annotate(a)
result <- s$as.data.table()

reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
B <- AnnotationPipeline$new()
r <- B$annotate(reuters_txt)
result <- r$as.data.table()

## Not run: 
# this will NOT work with 512GB heap space - 4 GB required
library(polmineR)
gparl_by_date <- corpus("GERMAPARL") %>%
  subset(year %in% 1998) %>%
  split(s_attribute = "date") %>% 
  get_token_stream(p_attribute = "word", collapse = " ") %>%
  as.character()
C <- AnnotationPipeline$new()
anno <- C$annotate(gparl_by_date, 4L)
result <- anno$as.data.table(anno)

## End(Not run)

PolMine/bignlp documentation built on Jan. 29, 2021, 1:14 a.m.

PolMine/bignlp index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PolMine/bignlp
Fast and Memory-Efficient Annotation of Big Corpora

AnnotationPipeline: AnnotationPipeline Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description

Public fields

Methods

Public methods

Method `new()`

Usage

Arguments

Method `annotate()`

Usage

Arguments

Returns

Method `clone()`

Usage

Arguments

Examples

Related to AnnotationPipeline in PolMine/bignlp...

R Package Documentation

Browse R Packages

We want your feedback!

PolMine/bignlp Fast and Memory-Efficient Annotation of Big Corpora

AnnotationPipeline: AnnotationPipeline Class. In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method annotate()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Related to AnnotationPipeline in PolMine/bignlp...

R Package Documentation

Browse R Packages

We want your feedback!

PolMine/bignlp
Fast and Memory-Efficient Annotation of Big Corpora

AnnotationPipeline: AnnotationPipeline Class.
In PolMine/bignlp: Fast and Memory-Efficient Annotation of Big Corpora

Method `new()`

Method `annotate()`

Method `clone()`