udpipe_vosters: Perform Parts of Speech tagging and Lemmatisation on 19th...

Description Usage Arguments Value See Also Examples

View source: R/pkg.R

Description

Perform Parts of Speech tagging and Lemmatisation on 19th century Southern Dutch texts

Usage

1
udpipe_vosters(x, tokenizer = c("generic", "basic"), trace = FALSE, ...)

Arguments

x

a data.frame with columns doc_id and text

tokenizer

either 'generic' to use a generic tokenizer provided by R package udpipe or 'basic' to split based on spaces

trace

argument passed on to udpipe_annotate

...

passed on to tokenize_simple

Value

a data.frame with tokenised and parts of speech tags and lemma's with columns doc_id, sentence_id, token, lemma, upos, xpos, token_id, term_id, start, end. Note that columns start and end will be all NA values if the 'basic' tokenizer is used

See Also

tokenize_simple, udpipe_annotate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
x <- data.frame(
  doc_id = c("a", "b"), 
  text = c("beschuldigd van zich pligtig of ten minsten 
            door medewerking af verheeling medepligtig gemaakt te hebben 
            aan eenen diefstal van Kleedings",
           "eenen langen en mageren persoon eenen kantoenen mantel 
            te beleenen had gebragt"), 
  stringsAsFactors = FALSE)
anno <- udpipe_vosters(x, tokenizer = "generic")
anno
anno <- udpipe_vosters(x, tokenizer = "basic")
anno

DIGI-VUB/udpipe.vosters documentation built on Sept. 9, 2020, 12:36 a.m.