identify_language: Detect Language

Description Usage Arguments Value Examples

View source: R/identify_language.R

Description

This function performs language detection by using Compact Language Detector 2 from CRAN library cld2. It is vectorised and guesses the language of each string. Note that it is not designed to do well on very short text, lists of proper names, part numbers, etc. CLD2 has the highest F1 score and is an order of magnitude faster than CLD3.

Usage

1

Arguments

text

A string with text to classify or a connection to read from.

  • cld2: Probabilistically (Naïve Bayesian classifier) detects over 80 languages in plain text.

Value

A character vector with ISO-639-1 two-letter language codes.

Examples

1
2
txt <- c("English is a West Germanic language ", "In espaniol, le lingua castilian")
identify_language(txt)

labourR documentation built on July 18, 2020, 5:06 p.m.