Maxent_Sent_Token_Annotator: Apache OpenNLP based sentence token annotators

Description Usage Arguments Details Value See Also Examples

View source: R/sentdetect.R

Description

Generate an annotator which computes sentence annotations using the Apache OpenNLP Maxent sentence detector.

Usage

1
Maxent_Sent_Token_Annotator(language = "en", probs = FALSE, model = NULL)

Arguments

language

a character string giving the ISO-639 code of the language being processed by the annotator.

probs

a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature.

model

a character string giving the path to the Maxent model file to be used, or NULL indicating to use a default model file for the given language (if available, see Details).

Details

See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.

Value

An Annotator object giving the generated sentence token annotator.

See Also

https://opennlp.apache.org for more information about Apache OpenNLP.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
require("NLP")
## Some text.
s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ",
             "nonexecutive director Nov. 29.\n",
             "Mr. Vinken is chairman of Elsevier N.V., ",
             "the Dutch publishing group."),
           collapse = "")
s <- as.String(s)

sent_token_annotator <- Maxent_Sent_Token_Annotator()
sent_token_annotator
a1 <- annotate(s, sent_token_annotator)
a1
## Extract sentences.
s[a1]
## Variant with sentence probabilities as features.
annotate(s, Maxent_Sent_Token_Annotator(probs = TRUE))

Example output

OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
Loading required package: NLP
An annotator inheriting from classes
  Simple_Sent_Token_Annotator Annotator
with description
  Computes sentence annotations using the Apache OpenNLP Maxent
  sentence detector employing the default model for language 'en'.
 id type     start end features
  1 sentence     1  84 
  2 sentence    86 153 
[1] "Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29."
[2] "Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group."                
 id type     start end features
  1 sentence     1  84 prob=0.9998197
  2 sentence    86 153 prob=0.9968879
Warning message:
system call failed: Cannot allocate memory 

openNLP documentation built on Oct. 30, 2019, 11:37 a.m.