CoNLLTextDocument: CoNLL-Style Text Documents

Description Usage Arguments Details Value See Also


Create text documents from CoNLL-style files.


CoNLLTextDocument(con, encoding = "unknown", meta = list())



a connection object or a character string. See scan() for details.


encoding to be assumed for input strings. See scan() for details.


a named or empty list of document metadata tag-value pairs.


CoNLL-style files use an extended tabular format where empty lines separate sentences, and non-empty lines consist of whitespace separated columns giving the word tokens and annotations for these. In principle, these annotations can vary from corpus to corpus: the current version of CoNLLTextDocument() assumes a fixed set of 3 columns giving, respectively, the word token and its POS and chunk tags.

The lines are read from the given connection and split into fields using scan(). From this, a suitable representation of the provided information is obtained, and returned as a CoNLL text document object inheriting from classes "CoNLLTextDocument" and "TextDocument".

There are methods for generics words(), sents(), tagged_words(), tagged_sents(), and chunked_sents() (as well as as.character()) and class "CoNLLTextDocument", which should be used to access the text in such text document objects.

The methods for generics tagged_words() and tagged_sents() provide a mechanism for mapping POS tags via the map argument, see section Details in the help page for tagged_words() for more information. The POS tagset used will be inferred from the POS_tagset metadata element of the CoNLL-style text document.


An object inheriting from "CoNLLTextDocument" and "TextDocument".

See Also

TextDocument for basic information on the text document infrastructure employed by package NLP. for general information about CoNLL (Conference on Natural Language Learning), the yearly meeting of the Special Interest Group on Natural Language Learning of the Association for Computational Linguistics. for the CoNLL 2000 chunking task, and training and test data sets which can be read in using CoNLLTextDocument().

Search within the NLP package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.