Annotator | R Documentation |
Create annotator (pipeline) objects.
Annotator(f, meta = list(), classes = NULL)
Annotator_Pipeline(..., meta = list())
as.Annotator_Pipeline(x)
f |
an annotator function, which must have formals |
meta |
an empty or named list of annotator (pipeline) metadata tag-value pairs. |
classes |
a character vector or |
... |
annotator objects. |
x |
an R object. |
Annotator()
checks that the given annotator function has the
appropriate formals, and returns an annotator object which inherits
from the given classes and "Annotator"
. There are
print()
and format()
methods for such objects, which use
the description
element of the metadata if available.
Annotator_Pipeline()
creates an annotator pipeline object from
the given annotator objects. Such pipeline objects can be used by
annotate()
for successively computing and merging
annotations, and can also be obtained by coercion with
as.Annotator_Pipeline()
, which currently handles annotator
objects and lists of such (and of course, annotator pipeline
objects).
For Annotator()
, an annotator object inheriting from the given
classes and class "Annotator"
.
For Annotator_Pipeline()
and as.Annotator_Pipeline()
, an
annotator pipeline object inheriting from class
"Annotator_Pipeline"
.
Simple annotator generators for creating “simple” annotator objects based on function performing simple basic NLP tasks.
Package StanfordCoreNLP available from the repository at https://datacube.wu.ac.at which provides generators for annotator pipelines based on the Stanford CoreNLP tools.
## Use blankline_tokenizer() for a simple paragraph token annotator:
para_token_annotator <-
Annotator(function(s, a = Annotation()) {
spans <- blankline_tokenizer(s)
n <- length(spans)
## Need n consecutive ids, starting with the next "free"
## one:
from <- next_id(a$id)
Annotation(seq(from = from, length.out = n),
rep.int("paragraph", n),
spans$start,
spans$end)
},
list(description =
"A paragraph token annotator based on blankline_tokenizer()."))
para_token_annotator
## Alternatively, use Simple_Para_Token_Annotator().
## A simple text with two paragraphs:
s <- String(paste(" First sentence. Second sentence. ",
" Second paragraph. ",
sep = "\n\n"))
a <- annotate(s, para_token_annotator)
## Annotations for paragraph tokens.
a
## Extract paragraph tokens.
s[a]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.