AnnotatedPlainTextDocument | R Documentation |
Create annotated plain text documents from plain text and collections of annotations for this text.
AnnotatedPlainTextDocument(s, a, meta = list())
annotation(x)
s |
a |
a |
an |
meta |
a named or empty list of document metadata tag-value pairs. |
x |
an object inheriting from class
|
Annotated plain text documents combine plain text with annotations for the text.
A typical workflow is to use annotate()
with suitable
annotator pipelines to obtain the annotations, and then use
AnnotatedPlainTextDocument()
to combine these with the text
being annotated. This yields an object inheriting from
"AnnotatedPlainTextDocument"
and "TextDocument"
,
from which the text and annotations can be obtained using,
respectively, as.character()
and annotation()
.
There are methods for class "AnnotatedPlainTextDocument"
and
generics
words()
,
sents()
,
paras()
,
tagged_words()
,
tagged_sents()
,
tagged_paras()
,
chunked_sents()
,
parsed_sents()
and
parsed_paras()
providing structured views of the text in such documents. These all
require the necessary annotations to be available in the annotation
object used.
The methods for generics
tagged_words()
,
tagged_sents()
and
tagged_paras()
provide a mechanism for mapping POS tags via the map
argument,
see section Details in the help page for
tagged_words()
for more information.
The POS tagset used will be inferred from the POS_tagset
metadata element of the annotation object used.
For AnnotatedPlainTextDocument()
, an annotated plain text
document object inheriting from
"AnnotatedPlainTextTextDocument"
and
"TextDocument"
.
For annotation()
, an Annotation
object.
TextDocument
for basic information on the text document
infrastructure employed by package NLP.
## Use a pre-built annotated plain text document obtained by employing an
## annotator pipeline from package 'StanfordCoreNLP', available from the
## repository at <https://datacube.wu.ac.at>, using the following code:
## require("StanfordCoreNLP")
## s <- paste("Stanford University is located in California.",
## "It is a great university.")
## p <- StanfordCoreNLP_Pipeline(c("pos", "lemma", "parse"))
## d <- AnnotatedPlainTextDocument(s, p(s))
d <- readRDS(system.file("texts", "stanford.rds", package = "NLP"))
d
## Extract available annotation:
a <- annotation(d)
a
## Structured views:
sents(d)
tagged_sents(d)
tagged_sents(d, map = Universal_POS_tags_map)
parsed_sents(d)
## Add (trivial) paragraph annotation:
s <- as.character(d)
a <- annotate(s, Simple_Para_Token_Annotator(blankline_tokenizer), a)
d <- AnnotatedPlainTextDocument(s, a)
## Structured view:
paras(d)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.