Description Usage Arguments Value See Also Examples
Perform Parts of Speech tagging and Lemmatisation on 19th century Southern Dutch texts
1 | udpipe_vosters(x, tokenizer = c("generic", "basic"), trace = FALSE, ...)
|
x |
a data.frame with columns doc_id and text |
tokenizer |
either 'generic' to use a generic tokenizer provided by R package udpipe or 'basic' to split based on spaces |
trace |
argument passed on to |
... |
passed on to |
a data.frame with tokenised and parts of speech tags and lemma's with columns doc_id, sentence_id, token, lemma, upos, xpos, token_id, term_id, start, end. Note that columns start and end will be all NA values if the 'basic' tokenizer is used
tokenize_simple
, udpipe_annotate
1 2 3 4 5 6 7 8 9 10 11 12 | x <- data.frame(
doc_id = c("a", "b"),
text = c("beschuldigd van zich pligtig of ten minsten
door medewerking af verheeling medepligtig gemaakt te hebben
aan eenen diefstal van Kleedings",
"eenen langen en mageren persoon eenen kantoenen mantel
te beleenen had gebragt"),
stringsAsFactors = FALSE)
anno <- udpipe_vosters(x, tokenizer = "generic")
anno
anno <- udpipe_vosters(x, tokenizer = "basic")
anno
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.