Create text documents from CoNLL-style files.
a connection object or a character string.
encoding to be assumed for input strings.
a named or empty list of document metadata tag-value pairs.
CoNLL-style files use an extended tabular format where empty lines
separate sentences, and non-empty lines consist of whitespace
separated columns giving the word tokens and annotations for these.
In principle, these annotations can vary from corpus to corpus: the
current version of
CoNLLTextDocument() assumes a fixed set of 3
columns giving, respectively, the word token and its POS and chunk
The lines are read from the given connection and split into fields
scan(). From this, a suitable representation of
the provided information is obtained, and returned as a CoNLL text
document object inheriting from classes
There are methods for generics
(as well as
which should be used to access the text in such text document
The methods for generics
provide a mechanism for mapping POS tags via the
see section Details in the help page for
tagged_words() for more information.
The POS tagset used will be inferred from the
metadata element of the CoNLL-style text document.
An object inheriting from
TextDocument for basic information on the text document
infrastructure employed by package NLP.
http://ifarm.nl/signll/conll/ for general information about CoNLL (Conference on Natural Language Learning), the yearly meeting of the Special Interest Group on Natural Language Learning of the Association for Computational Linguistics.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.