AnnotatedPlainTextDocument: Annotated Plain Text Documents
In NLP: Natural Language Processing Infrastructure

AnnotatedPlainTextDocument

R Documentation

Annotated Plain Text Documents

Description

Create annotated plain text documents from plain text and collections of annotations for this text.

Usage

AnnotatedPlainTextDocument(s, a, meta = list())
annotation(x)

Arguments

`s`	a `String` object, or something coercible to this using `as.String()` (e.g., a character string with appropriate encoding information).
`a`	an `Annotation` object with annotations for `s`.
`meta`	a named or empty list of document metadata tag-value pairs.
`x`	an object inheriting from class `"AnnotatedPlainTextDocument"`.

Details

Annotated plain text documents combine plain text with annotations for the text.

A typical workflow is to use annotate() with suitable annotator pipelines to obtain the annotations, and then use AnnotatedPlainTextDocument() to combine these with the text being annotated. This yields an object inheriting from "AnnotatedPlainTextDocument" and "TextDocument", from which the text and annotations can be obtained using, respectively, as.character() and annotation().

There are methods for class "AnnotatedPlainTextDocument" and generics words(), sents(), paras(), tagged_words(), tagged_sents(), tagged_paras(), chunked_sents(), parsed_sents() and parsed_paras() providing structured views of the text in such documents. These all require the necessary annotations to be available in the annotation object used.

The methods for generics tagged_words(), tagged_sents() and tagged_paras() provide a mechanism for mapping POS tags via the map argument, see section Details in the help page for tagged_words() for more information. The POS tagset used will be inferred from the POS_tagset metadata element of the annotation object used.

Value

For AnnotatedPlainTextDocument(), an annotated plain text document object inheriting from "AnnotatedPlainTextTextDocument" and "TextDocument".

For annotation(), an Annotation object.

Examples

## Use a pre-built annotated plain text document obtained by employing an
## annotator pipeline from package 'StanfordCoreNLP', available from the
## repository at <https://datacube.wu.ac.at>, using the following code:
##   require("StanfordCoreNLP")
##   s <- paste("Stanford University is located in California.",
##              "It is a great university.")
##   p <- StanfordCoreNLP_Pipeline(c("pos", "lemma", "parse"))
##   d <- AnnotatedPlainTextDocument(s, p(s))

d <- readRDS(system.file("texts", "stanford.rds", package = "NLP"))

d

## Extract available annotation:
a <- annotation(d)
a

## Structured views:
sents(d)
tagged_sents(d)
tagged_sents(d, map = Universal_POS_tags_map)
parsed_sents(d)

## Add (trivial) paragraph annotation:
s <- as.character(d)
a <- annotate(s, Simple_Para_Token_Annotator(blankline_tokenizer), a)
d <- AnnotatedPlainTextDocument(s, a)
## Structured view:
paras(d)

NLP documentation built on April 12, 2025, 1:36 a.m.