knitr::opts_chunk$set(
  comment = "#>",
  tidy = FALSE,
  error = FALSE)

franc

Detect the Language of Text

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Linux Build Status Windows Build
status CRAN RStudio mirror downloads

Franc has no external dependencies and supports 310 languages; all languages spoken by more than one million speakers. Franc is a port of the JavaScript project of the same name, see https://github.com/wooorm/franc.

Installation

devtools::install_github("mangothecat/franc")

Usage

library(franc)

Simply supply the text, and franc detects its language:

franc("Alle menslike wesens word vry")
franc("এটি একটি ভাষা একক IBM স্ক্রিপ্ট")
franc("Alle mennesker er født frie og")
head(franc_all("O Brasil caiu 26 posições"))

und is the undefined language, this is returned if the input is too short (shorter than 10 characters by default).

franc("the")
franc("the", min_length = 3)

You can provide a whitelist or a blacklist:

franc_all("O Brasil caiu 26 posições",
    whitelist = c("por", "src", "glg", "spa"))
head(franc_all("O Brasil caiu 26 posições",
    blacklist = c("src", "glg", "lav")))

Supported languages

The R version of franc supports 310 languages. By default only the languages with more than 1 million speakers are used, this is 175 languages. The min_speakers argument can relax this, and allows using more languages:

head(franc_all("O Brasil caiu 26 posições"))
head(franc_all("O Brasil caiu 26 posições", min_speakers = 0))

License

MIT © Mango Solutions, Titus Wormer, Maciej Ceglowski, Jacob R. Rideout and Kent S. Johnson.



MangoTheCat/franc documentation built on May 7, 2019, 2:10 p.m.