options(width = 1000) knitr::opts_chunk$set(echo = TRUE, message = FALSE, comment = NA, eval = TRUE)
Install the R package.
install.packages("udpipe")
Get your language model and start annotating.
library(udpipe) udmodel <- udpipe_download_model(language = "dutch")
knitr::opts_chunk$set(echo = TRUE, message = FALSE, comment = NA, eval = !udmodel$download_failed)
udmodel <- udpipe_load_model(file = udmodel$file_model) x <- udpipe_annotate(udmodel, x = "Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.") x <- as.data.frame(x, detailed = TRUE)
Or just do as follows.
library(udpipe) x <- udpipe(x = "Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.", object = "dutch")
The annotation returns paragraphs, sentences, tokens, the location of the token in the original text, morphology elements like the lemma, the universal part of speech tag and the treebank-specific parts of speech tag, morphosyntactic features and returns as well the dependency relationship. More information at https://universaldependencies.org/guidelines.html
str(x)
Mark that it is important that the x
argument to udpipe_annotate
is in UTF-8 encoding.
You can check the encoding of your text with Encoding('your text')
. You can convert your text to UTF-8, using standard R utilities: as in iconv('your text', from = 'latin1', to = 'UTF-8')
where you replace the from
part with whichever encoding you have your text in, possible your computers default as defined in localeToCharset()
. So annotation would look something like this if your text is not already in UTF-8 encoding:
udpipe_annotate(udmodel, x = iconv('your text', to = 'UTF-8'))
if your text is in the encoding of the current locale of your computer.udpipe_annotate(udmodel, x = iconv('your text', from = 'latin1', to = 'UTF-8'))
if your text is in latin1 encoding. udpipe_annotate(udmodel, x = iconv('your text', from = 'CP949', to = 'UTF-8'))
if your text is in CP949 encoding. invisible(if(file.exists(udmodel$file)) file.remove(udmodel$file))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.