Annotation | R Documentation |
Creation and manipulation of annotation objects.
Annotation(id = NULL, type = NULL, start, end, features = NULL,
meta = list())
as.Annotation(x, ...)
## S3 method for class 'Span'
as.Annotation(x, id = NULL, type = NULL, ...)
is.Annotation(x)
id |
an integer vector giving the annotation ids, or |
type |
a character vector giving the annotation types, or
|
start , end |
integer vectors giving the start and end positions of the character spans the annotations refer to. |
features |
a list of (named or empty) feature lists, or
|
meta |
a named or empty list of annotation metadata tag-value pairs. |
x |
an R object (an object of class |
... |
further arguments passed to or from other methods. |
A single annotation (of natural language text) is a quintuple with “slots” ‘id’, ‘type’, ‘start’, ‘end’, and ‘features’. These give, respectively, id and type, the character span the annotation refers to, and a collection of annotation features (tag/value pairs).
Annotation objects provide sequences (allowing positional access) of
single annotations, together with metadata about these. They have
class "Annotation"
and, as they contain character spans, also
inherit from class "Span"
. Span objects can be coerced
to annotation objects via as.Annotation()
which allows to
specify ids and types (using the default values sets these to
missing), and annotation objects can be coerced to span objects using
as.Span()
.
The features of a single annotation are represented as named or empty lists.
Subscripting annotation objects via [
extracts subsets of
annotations; subscripting via $
extracts the sequence of values
of the named slot, i.e., an integer vector for ‘id’,
‘start’, and ‘end’, a character vector for
‘type’, and a list of named or empty lists for
‘features’.
There are several additional methods for class "Annotation"
:
print()
and format()
(which both have a values
argument which if FALSE
suppresses indicating the feature map
values);
c()
combines annotations (or objects coercible to these using
as.Annotation()
);
merge()
merges annotations by combining the feature lists of
annotations with otherwise identical slots;
subset()
allows subsetting by expressions involving the slot
names; and
as.list()
and as.data.frame()
coerce, respectively, to
lists (of single annotation objects) and data frames (with annotations
and slots corresponding to rows and columns).
Annotation()
creates annotation objects from the given sequences
of slot values: those not NULL
must all have the same length
(the number of annotations in the object).
as.Annotation()
coerces to annotation objects, with a method
for span objects.
is.Annotation()
tests whether an object inherits from class
"Annotation"
.
For Annotation()
and as.Annotation()
, an annotation
object (of class "Annotation"
also inheriting from class
"Span"
).
For is.Annotation()
, a logical.
## A simple text.
s <- String(" First sentence. Second sentence. ")
## ****5****0****5****0****5****0****5**
## Basic sentence and word token annotations for the text.
a1s <- Annotation(1 : 2,
rep.int("sentence", 2L),
c( 3L, 20L),
c(17L, 35L))
a1w <- Annotation(3 : 6,
rep.int("word", 4L),
c( 3L, 9L, 20L, 27L),
c( 7L, 16L, 25L, 34L))
## Use c() to combine these annotations:
a1 <- c(a1s, a1w)
a1
## Subscripting via '[':
a1[3 : 4]
## Subscripting via '$':
a1$type
## Subsetting according to slot values, directly:
a1[a1$type == "word"]
## or using subset():
subset(a1, type == "word")
## We can subscript string objects by annotation objects to extract the
## annotated substrings:
s[subset(a1, type == "word")]
## We can also subscript by lists of annotation objects:
s[annotations_in_spans(subset(a1, type == "word"),
subset(a1, type == "sentence"))]
## Suppose we want to add the sentence constituents (the ids of the
## words in the respective sentences) to the features of the sentence
## annotations. The basic computation is
lapply(annotations_in_spans(a1[a1$type == "word"],
a1[a1$type == "sentence"]),
function(a) a$id)
## For annotations, we need lists of feature lists:
features <-
lapply(annotations_in_spans(a1[a1$type == "word"],
a1[a1$type == "sentence"]),
function(e) list(constituents = e$id))
## Could add these directly:
a2 <- a1
a2$features[a2$type == "sentence"] <- features
a2
## Note how the print() method summarizes the features.
## We could also write a sentence constituent annotator
## (note that annotators should always have formals 's' and 'a', even
## though for computing the sentence constituents s is not needed):
sent_constituent_annotator <-
Annotator(function(s, a) {
i <- which(a$type == "sentence")
features <-
lapply(annotations_in_spans(a[a$type == "word"],
a[i]),
function(e) list(constituents = e$id))
Annotation(a$id[i], a$type[i], a$start[i], a$end[i],
features)
})
sent_constituent_annotator(s, a1)
## Can use merge() to merge the annotations:
a2 <- merge(a1, sent_constituent_annotator(s, a1))
a2
## Equivalently, could have used
a2 <- annotate(s, sent_constituent_annotator, a1)
a2
## which merges automatically.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.