Description Usage Arguments Details Value See Also
Create text documents from CoNNL-U format files.
CoNLLUTextDocument(con, meta = list())
a connection object or a character string.
a named or empty list of document metadata tag-value pairs.
The CoNLL-U format (see
is a CoNLL-style format for annotated texts popularized and employed
by the Universal Dependencies project
For each “word” in the text, this provides exactly the 10
FORM (word form or punctuation symbol),
LEMMA (lemma or stem of word form),
UPOSTAG (universal part-of-speech tag, see
XPOSTAG (language-specific part-of-speech tag, may be
FEATS (list of morphological features),
The lines with these fields and optional comments are read from the
given connection and split into fields using
This is combined with consecutive sentence ids into a data frame used
for representing the annotation information, and together with the
given metadata returned as a CoNLL-U text document inheriting from
The complete annotation information data frame can be extracted via
content(). CoNLL-U v2 requires providing the complete texts of
each sentence (or a reconstruction thereof) in # text = comment
lines. Where consistently provided, these are made available in the
text attribute of the content data frame.
In addition, there are methods for generics
which should be used to access the text in such text document
The CoNLL-U format allows to represent both words and (multiword)
tokens (see section ‘Words, Tokens and Empty Nodes’ in the
format documentation), as distinguished by ids being integers or
integer ranges, with the words being annotated further. One can
as.character() to extract the tokens; all other
viewers listed above use the words. Finally, the viewers
incorporating POS tags take a
which argument to specify using
the universal or language-specific tags, by giving a substring of
"UPOSTAG" (default) or
An object inheriting from
TextDocument for basic information on the text document
infrastructure employed by package NLP.
https://universaldependencies.org/ for access to the Universal Dependencies treebanks, which provide annotated texts in many different languages using CoNLL-U format.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.