Description Usage Arguments Details Value See Also Examples
Generate an annotator which computes word token annotations using the Apache OpenNLP Maxent tokenizer.
1 | Maxent_Word_Token_Annotator(language = "en", probs = FALSE, model = NULL)
|
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.
An Annotator
object giving the generated word token
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | require("NLP")
## Some text.
s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ",
"nonexecutive director Nov. 29.\n",
"Mr. Vinken is chairman of Elsevier N.V., ",
"the Dutch publishing group."),
collapse = "")
s <- as.String(s)
## Need sentence token annotations.
sent_token_annotator <- Maxent_Sent_Token_Annotator()
a1 <- annotate(s, sent_token_annotator)
word_token_annotator <- Maxent_Word_Token_Annotator()
word_token_annotator
a2 <- annotate(s, word_token_annotator, a1)
a2
## Variant with word token probabilities as features.
head(annotate(s, Maxent_Word_Token_Annotator(probs = TRUE), a1))
## Can also perform sentence and word token annotations in a pipeline:
a <- annotate(s, list(sent_token_annotator, word_token_annotator))
head(a)
|
OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
Loading required package: NLP
An annotator inheriting from classes
Simple_Word_Token_Annotator Annotator
with description
Computes word token annotations using the Apache OpenNLP Maxent
tokenizer employing the default model for language 'en'.
id type start end features
1 sentence 1 84 constituents=<<integer,18>>
2 sentence 86 153 constituents=<<integer,13>>
3 word 1 6
4 word 8 13
5 word 14 14
6 word 16 17
7 word 19 23
8 word 25 27
9 word 28 28
10 word 30 33
11 word 35 38
12 word 40 42
13 word 44 48
14 word 50 51
15 word 53 53
16 word 55 66
17 word 68 75
18 word 77 80
19 word 82 83
20 word 84 84
21 word 86 88
22 word 90 95
23 word 97 98
24 word 100 107
25 word 109 110
26 word 112 119
27 word 121 124
28 word 125 125
29 word 127 129
30 word 131 135
31 word 137 146
32 word 148 152
33 word 153 153
id type start end features
1 sentence 1 84 constituents=<<integer,18>>
2 sentence 86 153 constituents=<<integer,13>>
3 word 1 6 prob=1
4 word 8 13 prob=0.9770575
5 word 14 14 prob=1
6 word 16 17 prob=1
id type start end features
1 sentence 1 84 constituents=<<integer,18>>
2 sentence 86 153 constituents=<<integer,13>>
3 word 1 6
4 word 8 13
5 word 14 14
6 word 16 17
Warning message:
system call failed: Cannot allocate memory
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.