Description Usage Arguments Details Value See Also Examples
Generate an annotator which computes POS tag annotations using the Apache OpenNLP Maxent Part of Speech tagger.
1 | Maxent_POS_Tag_Annotator(language = "en", probs = FALSE, model = NULL)
|
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘POS_prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.
An Annotator
object giving the generated POS tag
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | require("NLP")
## Some text.
s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ",
"nonexecutive director Nov. 29.\n",
"Mr. Vinken is chairman of Elsevier N.V., ",
"the Dutch publishing group."),
collapse = "")
s <- as.String(s)
## Need sentence and word token annotations.
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))
pos_tag_annotator <- Maxent_POS_Tag_Annotator()
pos_tag_annotator
a3 <- annotate(s, pos_tag_annotator, a2)
a3
## Variant with POS tag probabilities as (additional) features.
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2))
## Determine the distribution of POS tags for word tokens.
a3w <- subset(a3, type == "word")
tags <- sapply(a3w$features, `[[`, "POS")
tags
table(tags)
## Extract token/POS pairs (all of them): easy.
sprintf("%s/%s", s[a3w], tags)
## Extract pairs of word tokens and POS tags for second sentence:
a3ws2 <- annotations_in_spans(subset(a3, type == "word"),
subset(a3, type == "sentence")[2L])[[1L]]
sprintf("%s/%s", s[a3ws2], sapply(a3ws2$features, `[[`, "POS"))
|
OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
Loading required package: NLP
An annotator inheriting from classes
Simple_POS_Tag_Annotator Annotator
with description
Computes POS tag annotations using the Apache OpenNLP Maxent Part of
Speech tagger employing the default model for language 'en'
id type start end features
1 sentence 1 84 constituents=<<integer,18>>
2 sentence 86 153 constituents=<<integer,13>>
3 word 1 6 POS=NNP
4 word 8 13 POS=NNP
5 word 14 14 POS=,
6 word 16 17 POS=CD
7 word 19 23 POS=NNS
8 word 25 27 POS=JJ
9 word 28 28 POS=,
10 word 30 33 POS=MD
11 word 35 38 POS=VB
12 word 40 42 POS=DT
13 word 44 48 POS=NN
14 word 50 51 POS=IN
15 word 53 53 POS=DT
16 word 55 66 POS=JJ
17 word 68 75 POS=NN
18 word 77 80 POS=NNP
19 word 82 83 POS=CD
20 word 84 84 POS=.
21 word 86 88 POS=NNP
22 word 90 95 POS=NNP
23 word 97 98 POS=VBZ
24 word 100 107 POS=NN
25 word 109 110 POS=IN
26 word 112 119 POS=NNP
27 word 121 124 POS=NNP
28 word 125 125 POS=,
29 word 127 129 POS=DT
30 word 131 135 POS=JJ
31 word 137 146 POS=NN
32 word 148 152 POS=NN
33 word 153 153 POS=.
id type start end features
1 sentence 1 84 constituents=<<integer,18>>
2 sentence 86 153 constituents=<<integer,13>>
3 word 1 6 POS=NNP, POS_prob=0.9476405
4 word 8 13 POS=NNP, POS_prob=0.9692841
5 word 14 14 POS=,, POS_prob=0.9884445
6 word 16 17 POS=CD, POS_prob=0.9926943
[1] "NNP" "NNP" "," "CD" "NNS" "JJ" "," "MD" "VB" "DT" "NN" "IN"
[13] "DT" "JJ" "NN" "NNP" "CD" "." "NNP" "NNP" "VBZ" "NN" "IN" "NNP"
[25] "NNP" "," "DT" "JJ" "NN" "NN" "."
tags
, . CD DT IN JJ MD NN NNP NNS VB VBZ
3 2 2 3 2 3 1 5 7 1 1 1
[1] "Pierre/NNP" "Vinken/NNP" ",/," "61/CD"
[5] "years/NNS" "old/JJ" ",/," "will/MD"
[9] "join/VB" "the/DT" "board/NN" "as/IN"
[13] "a/DT" "nonexecutive/JJ" "director/NN" "Nov./NNP"
[17] "29/CD" "./." "Mr./NNP" "Vinken/NNP"
[21] "is/VBZ" "chairman/NN" "of/IN" "Elsevier/NNP"
[25] "N.V./NNP" ",/," "the/DT" "Dutch/JJ"
[29] "publishing/NN" "group/NN" "./."
[1] "Mr./NNP" "Vinken/NNP" "is/VBZ" "chairman/NN"
[5] "of/IN" "Elsevier/NNP" "N.V./NNP" ",/,"
[9] "the/DT" "Dutch/JJ" "publishing/NN" "group/NN"
[13] "./."
Warning message:
system call failed: Cannot allocate memory
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.