Creation and manipulation of string objects.
1 2 3
a character vector with the appropriate encoding information
String objects provide character strings encoded in UTF-8 with class
"String", which currently has a useful
method: with indices
j of length one, this gives a
string object with the substring starting at the position given by
i and ending at the position given by
with a single index which is an object inheriting from class
"Span" or a list of such objects returns a character
vector of substrings with the respective spans, or a list thereof.
Additional methods may be added in the future.
String() creates a string object from a given character vector,
taking the first element of the vector and converting it to UTF-8
as.String() is a generic function to coerce to a string object.
The default method calls
String() on the result of converting
to character and concatenating into a single string with the elements
separated by newlines.
is.String() tests whether an object inherits from class
as.String(), a string object (of class
is.String(), a logical.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## A simple text. s <- String(" First sentence. Second sentence. ") ## ****5****0****5****0****5****0****5** ## Basic sentence and word token annotation for the text. a <- c(Annotation(1 : 2, rep.int("sentence", 2L), c( 3L, 20L), c(17L, 35L)), Annotation(3 : 6, rep.int("word", 4L), c( 3L, 9L, 20L, 27L), c( 7L, 16L, 25L, 34L))) ## All word tokens (by subscripting with an annotation object): s[a[a$type == "word"]] ## Word tokens according to sentence (by subscripting with a list of ## annotation objects): s[annotations_in_spans(a[a$type == "word"], a[a$type == "sentence"])]