Description Public fields Methods Examples
Worker behind the higher-level StanfordCoreNLP
class that allows fine-tuned
configuration of an annotation pipeline, see the documentation of CoreNLP Pipelines. The
$annotate()
method supports processing annotations in parallel. Unlike the
StanfordCoreNLP$process_files()
method for processing the content of files
in parallel, it is a very efficient in-memory operation and the fastest
option for processsing medium-sized corpora. But as annotations consume a lot
of memory, there are limitations to allocating sufficient heap space required
for the parallel in-memory processing of larger corpora. If heap space is
insufficient, the process may run endless without a telling warning message
or an error. So use the $annotate()
method with appropriate care.
pipeline
AnnotationPipeline
new()
Initialize AnnotationPipeline
AnnotationPipeline$new(corenlp_dir = getOption("bignlp.corenlp_dir"))
corenlp_dir
Directory where StanfordCoreNLP resides.
annotate()
Annotate a list of strings.
AnnotationPipeline$annotate(x, threads = NULL, verbose = TRUE)
x
A list of character
vectors to annotate, an AnnotationList
class object or an ArrayList with Annotation objects.
threads
If NULL
, all available threads are used, otherwise an
integer
value with number of threads to use.
verbose
A logical
value, whether to show progress messages.
A Java object .
clone()
The objects of this class are cloneable with this method.
AnnotationPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | A <- AnnotationPipeline$new()
a <- c("This is a sentences.", "Yet another sentence.")
s <- A$annotate(a)
result <- s$as.data.table()
reuters_txt <- readLines(system.file(package = "bignlp", "extdata", "txt", "reuters.txt"))
B <- AnnotationPipeline$new()
r <- B$annotate(reuters_txt)
result <- r$as.data.table()
## Not run:
# this will NOT work with 512GB heap space - 4 GB required
library(polmineR)
gparl_by_date <- corpus("GERMAPARL") %>%
subset(year %in% 1998) %>%
split(s_attribute = "date") %>%
get_token_stream(p_attribute = "word", collapse = " ") %>%
as.character()
C <- AnnotationPipeline$new()
anno <- C$annotate(gparl_by_date, 4L)
result <- anno$as.data.table(anno)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.