knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-", warning = FALSE, message = FALSE )
Notes: this package is still under development.
This package is a wrapper around Lexique 3.81, a database for natural language processing in French.
More info on: http://www.lexique.org
Lexique gives access to ~ 150 000 french words with several annotations: lemme, phoneme, genre, frequency, number of letters, word neighbours...
The full corpus is a data object contained inside the package, which you can call with :
library(lexiquer) data("lexique")
You can then left join it with a one-word-per-row data.frame:
library(tidytext) library(proustr) library(tidyverse) sw <- proust_stopwords() ds <- ducotedechezswann tm <- unnest_tokens(ds, word, text) %>% slice(1:10) %>% select(word) tm %>% left_join(lexique, by = c("word" = "ortho")) %>% select(lemme, cgramortho) %>% na.omit() %>% count(lemme, cgramortho) %>% top_n(10, n) %>% arrange(desc(n))
bind_*
wrappers{lexiquer}
provides a series of wrapper to bind specific part of the corpus to your text. See the bind_*
functions for more details.
For example, you can binf the grammatical category of the word:
bind_gram_cat(tm, word)
Or the lemme
bind_lemme(tm, word)
is_lemme
Test if a word is a lemme :
is_lemme(tm, word)
count_*
wrappersSeveral counting functions are available:
count_syll(tm, word)
devtools::install_github("ColinFay/lexiquer")
Questions and feedbacks welcome!
You want to contribute ? Open a PR :) If you encounter a bug or want to suggest an enhancement, please open an issue.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.