knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-", 
  warning = FALSE, 
  message = FALSE
)

Notes: this package is still under development.

lexiquer

This package is a wrapper around Lexique 3.81, a database for natural language processing in French.

More info on: http://www.lexique.org

What's in this db?

Lexique gives access to ~ 150 000 french words with several annotations: lemme, phoneme, genre, frequency, number of letters, word neighbours...

Getting started

The full corpus is a data object contained inside the package, which you can call with :

library(lexiquer)
data("lexique")

You can then left join it with a one-word-per-row data.frame:

library(tidytext)
library(proustr)
library(tidyverse)

sw <- proust_stopwords()
ds <- ducotedechezswann

tm <- unnest_tokens(ds, word, text) %>%
  slice(1:10) %>%
  select(word)
tm %>%
  left_join(lexique, by = c("word" = "ortho")) %>%
  select(lemme, cgramortho) %>%
  na.omit() %>%
  count(lemme, cgramortho) %>%
  top_n(10, n) %>%
  arrange(desc(n))

bind_* wrappers

{lexiquer} provides a series of wrapper to bind specific part of the corpus to your text. See the bind_* functions for more details.

For example, you can binf the grammatical category of the word:

bind_gram_cat(tm, word)

Or the lemme

bind_lemme(tm, word)

is_lemme

Test if a word is a lemme :

is_lemme(tm, word)

count_* wrappers

Several counting functions are available:

count_syll(tm, word)

Install

devtools::install_github("ColinFay/lexiquer")

Feedbacks

Questions and feedbacks welcome!

You want to contribute ? Open a PR :) If you encounter a bug or want to suggest an enhancement, please open an issue.



ColinFay/lexiquer documentation built on May 6, 2019, 2:19 p.m.