annotate: Annotate text strings

Description Usage Arguments Value Examples

View source: R/annotate.R

Description

Compute annotations by iteratively calling the given annotators with the given text and current annotations, and merging the newly computed annotations with the current ones.

Usage

1
annotate(s, f, a = Annotation())

Arguments

s

a String object, or something coercible to this using as.String (e.g., a character string with appropriate encoding information).

f

an Annotator or Annotator_Pipeline object, or something coercible to the latter via as.Annotator_Pipeline() (such as a list of annotator objects).

a

an Annotation object giving the annotations to start with.

Value

An Annotation object containing the iteratively computed and merged annotations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## A simple text.
s <- String("  First sentence.  Second sentence.  ")
##           ****5****0****5****0****5****0****5**

## A very trivial sentence tokenizer.
sent_tokenizer <-
function(s) {
    s <- as.String(s)
    m <- gregexpr("[^[:space:]][^.]*\\.", s)[[1L]]
    Span(m, m + attr(m, "match.length") - 1L)
}
## (Could also use Regexp_Tokenizer() with the above regexp pattern.)
## A simple sentence token annotator based on the sentence tokenizer.
sent_token_annotator <- Simple_Sent_Token_Annotator(sent_tokenizer)

## Annotate sentence tokens.
a1 <- annotate(s, sent_token_annotator)
a1

## A very trivial word tokenizer.
word_tokenizer <-
function(s) {
    s <- as.String(s)
    ## Remove the last character (should be a period when using
    ## sentences determined with the trivial sentence tokenizer).
    s <- substring(s, 1L, nchar(s) - 1L)
    ## Split on whitespace separators.
    m <- gregexpr("[^[:space:]]+", s)[[1L]]
    Span(m, m + attr(m, "match.length") - 1L)
}
## A simple word token annotator based on the word tokenizer.
word_token_annotator <- Simple_Word_Token_Annotator(word_tokenizer)

## Annotate word tokens using the already available sentence token
## annotations.
a2 <- annotate(s, word_token_annotator, a1)
a2

## Can also perform sentence and word token annotations in a pipeline:
p <- Annotator_Pipeline(sent_token_annotator, word_token_annotator)
annotate(s, p)

Example output

 id type     start end features
  1 sentence     3  17 
  2 sentence    20  35 
 id type     start end features
  1 sentence     3  17 constituents=<<integer,2>>
  2 sentence    20  35 constituents=<<integer,2>>
  3 word         3   7 
  4 word         9  16 
  5 word        20  25 
  6 word        27  34 
 id type     start end features
  1 sentence     3  17 constituents=<<integer,2>>
  2 sentence    20  35 constituents=<<integer,2>>
  3 word         3   7 
  4 word         9  16 
  5 word        20  25 
  6 word        27  34 

NLP documentation built on Oct. 23, 2020, 6:18 p.m.

Related to annotate in NLP...