Home

/

CRAN

/

NLPclient

/

StanfordCoreNLP_Pipeline: Stanford 'CoreNLP' annotator pipeline

StanfordCoreNLP_Pipeline: Stanford 'CoreNLP' annotator pipeline
In NLPclient: Stanford 'CoreNLP' Annotation Client

Description Usage Arguments Value Note See Also Examples

View source: R/pipeline.R

Create a Stanford CoreNLP annotator pipeline.

1
2
3

StanfordCoreNLP_Pipeline(annotators = c("pos", "lemma"),
  language = "en", control = list(), port = 9000L,
  host = "localhost")

`annotators`	a character string specifying the annotators to be used in addition to ‘ssplit’ (sentence token annotation) and ‘tokenize’ (word token annotations), with elements `"pos"` (POS tagging), `"lemma"` (lemmatizing), `"ner"` (named entity recognition), `"regexner"` (rule-based named entity recognition over token sequences using Java regular expressions), `"parse"` (constituency parsing), `"depparse"` (dependency parsing), `"sentiment"` (sentiment analysis), `"coref"` (coference resolution), `"dcoref"` (deterministic coference resolution), `"cleanxml"` (clean XML tags), or `"relation"` (relation extraction), or unique abbreviations thereof. Ignored for languages other than English.
`language`	a character string giving the ISO-639 code of the language being processed by the annotator pipeline.
`control`	a named or empty (default) list vector with annotator control options, with the names giving the option names. See https://stanfordnlp.github.io/CoreNLP/annotators.html for available control options.
`port`	an integer giving the port (default is `9000L`).
`host`	a character string giving the hostname of the server.

An Annotator object providing the annotator pipeline.

See https://stanfordnlp.github.io/CoreNLP/#citing-stanford-corenlp-in-papers for information on citing Stanford CoreNLP.

Using the parse annotator requires considerable amounts of (Java) memory. The Stanford CoreNLP documentation suggests starting the JVM with at least 3GB of memory on 64-bit systems (and in fact, not using 32-bit systems), and hence have the JVM started with -Xmx3g unless option java.parameters is set to something non-empty (hence, this option should be set appropriately to accommodate different memory requirements or constraints).

Using the coreference annotators nowadays requires even more (Java) memory. The Stanford CoreNLP documentation suggests starting the JVM with at least 5GB of memory; we find 4GB sufficient. Hence, to use these annotators one needs to set option java.parameters as appropriate before starting the JVM.

https://stanfordnlp.github.io/CoreNLP/ for more information about the Stanford CoreNLP tools.

require("NLP")
s <- as.String(paste("Stanford University is located in California.",
                     "It is a great university."))
s

## Annotators: ssplit, tokenize:
if ( ping_nlp_client() == "pong" ) {
p <- StanfordCoreNLP_Pipeline(NULL)
a <- p(s)
a

## Annotators: ssplit, tokenize, pos, lemma (default):
p <- StanfordCoreNLP_Pipeline()
a <- p(s)
a

## Equivalently:
annotate(s, p)

## Annotators: ssplit, tokenize, parse:
p <- StanfordCoreNLP_Pipeline("parse")
a <- p(s)
a

## Respective formatted parse trees using Penn Treebank notation
## (see <https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html>):
ptexts <- sapply(subset(a, type == "sentence")$features, `[[`, "parse")
ptexts

## Read into NLP Tree objects.
ptrees <- lapply(ptexts, Tree_parse)
ptrees

## Basic dependencies:
depends <- lapply(subset(a, type == "sentence")$features, `[[`,
                  "basic-dependencies")
depends
## Note that the non-zero ids (gid for governor and did for dependent)
## refer to word token positions within the respective sentences, and
## not the ids of these token in the annotation: these can easily be
## matched using the sentence constituents features:
lapply(subset(a, type == "sentence")$features, `[[`, "constituents")

## (Similarly for sentence ids used in dcoref document features.)

## Note also that the dependencies are returned as a data frame 
## inheriting from class "Stanford_typed_dependencies" which has print
## and format methods for obtaining the usual formatting.
depends[[1L]]
## Use as.data.frame() to obtain strip this class:
as.data.frame(depends[[1L]])
}

Loading required package: NLP
Stanford University is located in California. It is a great university.

NLPclient documentation built on Dec. 16, 2019, 1:18 a.m.

NLPclient index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NLPclient
Stanford 'CoreNLP' Annotation Client

StanfordCoreNLP_Pipeline: Stanford 'CoreNLP' annotator pipeline
In NLPclient: Stanford 'CoreNLP' Annotation Client

Description

Usage

Arguments

Value

Note

See Also

Examples

Example output

Related to StanfordCoreNLP_Pipeline in NLPclient...

R Package Documentation

Browse R Packages

We want your feedback!

NLPclient Stanford 'CoreNLP' Annotation Client

StanfordCoreNLP_Pipeline: Stanford 'CoreNLP' annotator pipeline In NLPclient: Stanford 'CoreNLP' Annotation Client

Description

Usage

Arguments

Value

Note

See Also

Examples

Example output

Related to StanfordCoreNLP_Pipeline in NLPclient...

R Package Documentation

Browse R Packages

We want your feedback!

NLPclient
Stanford 'CoreNLP' Annotation Client

StanfordCoreNLP_Pipeline: Stanford 'CoreNLP' annotator pipeline
In NLPclient: Stanford 'CoreNLP' Annotation Client