as_conllu | R Documentation |
If you have a data.frame with annotations containing 1 row per token, you can convert it to CONLL-U format with this function. The data frame is required to have the following columns: doc_id, sentence_id, sentence, token_id, token and optionally has the following columns: lemma, upos, xpos, feats, head_token_id, dep_rel, deps, misc. Where these fields have the following meaning
doc_id: the identifier of the document
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.
The tokens in the data.frame should be ordered as they appear in the sentence.
as_conllu(x)
x |
a data.frame with columns doc_id, sentence_id, sentence, token_id, token, lemma, upos, xpos, feats, head_token_id, deprel, dep_rel, misc |
a character string of length 1 containing the data.frame in CONLL-U format. See the example. You can easily save this to disk for processing in other applications.
https://universaldependencies.org/format.html
file_conllu <- system.file(package = "udpipe", "dummydata", "traindata.conllu") x <- udpipe_read_conllu(file_conllu) str(x) conllu <- as_conllu(x) cat(conllu) ## Not run: ## Write it to file, making sure it is in UTF-8 cat(as_conllu(x), file = file("annotations.conllu", encoding = "UTF-8")) ## End(Not run) ## Some fields are not mandatory, they will assummed to be NA conllu <- as_conllu(x[, c('doc_id', 'sentence_id', 'sentence', 'token_id', 'token', 'upos')]) cat(conllu)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.