udpipe_vosters: Perform Parts of Speech tagging and Lemmatisation on 19th...
In DIGI-VUB/udpipe.vosters: UDPipe Models Built on Corpus Vosters

Description Usage Arguments Value See Also Examples

View source: R/pkg.R

Perform Parts of Speech tagging and Lemmatisation on 19th century Southern Dutch texts

1	udpipe_vosters(x, tokenizer = c("generic", "basic"), trace = FALSE, ...)

`x`	a data.frame with columns doc_id and text
`tokenizer`	either 'generic' to use a generic tokenizer provided by R package udpipe or 'basic' to split based on spaces
`trace`	argument passed on to `udpipe_annotate`
`...`	passed on to `tokenize_simple`

a data.frame with tokenised and parts of speech tags and lemma's with columns doc_id, sentence_id, token, lemma, upos, xpos, token_id, term_id, start, end. Note that columns start and end will be all NA values if the 'basic' tokenizer is used

tokenize_simple, udpipe_annotate

x <- data.frame(
  doc_id = c("a", "b"), 
  text = c("beschuldigd van zich pligtig of ten minsten 
            door medewerking af verheeling medepligtig gemaakt te hebben 
            aan eenen diefstal van Kleedings",
           "eenen langen en mageren persoon eenen kantoenen mantel 
            te beleenen had gebragt"), 
  stringsAsFactors = FALSE)
anno <- udpipe_vosters(x, tokenizer = "generic")
anno
anno <- udpipe_vosters(x, tokenizer = "basic")
anno