knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of rmorphodita is to enable morphological analysis, tagging and generation using MorphoDiTa's Python bindings (contained in
the ufal.morphodita
Python package).
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("skvrnami/rmorphodita")
First you need to install morphodita by running install_morphodita()
.
library(rmorphodita)
install_morphodita()
Then you need to download a language model to use for tagging etc.
There are three languages available: Czech (CZ
), Slovak (SK
), and English (EN
).
The download_models
function downloads a .zip file with models from LINDAT/CLARIAH-CZ repository to a specified directory, unzips them and returns
list of files with morphological taggers and dictionaries.
cz_models <- download_models(lang = "CZ", dest_folder = "tmp") cz_models
Then it is necessary to load tagger:
cz_tagger <- load_tagger(cz_models[8])
tagged_text <- morpho_tag(cz_tagger, "Já bych všechny ty počítače zakázala.", NULL) tagged_text
Function morpho_analyze
returns all possible forms of a word.
morpho_analyze(cz_tagger, "kout")
And function morpho_generate
returns all possible forms of a given lemma that
complies with the specified wildcard. In the case below, it returns all nouns in second case.
morpho_generate(cz_tagger, "kout", tag_wildcard = "N???2?")
As the tags are quite unintelligible, it is possible to extract and recode them like this.
The extract_hm_tags
function splits the tag into columns indicating particular grammatical categories such as part of speech (pos
), gender, number, case etc.
The recode_tags
function then recode the tag marks into factor with a full description of the tag meaning (using the TAGS
list which stores the meaning of the tag values).
tagged_text %>% extract_hm_tags() %>% recode_tags(., tags_df = TAGS)
unlink("tmp", recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.