ISO_639: ISO 639 Language Codes
In ISOcodes: Selected ISO Codes

ISO_639

R Documentation

ISO 639 Language Codes

Description

International Organization for Standardization (ISO) codes for the representation of languages. Consists of four parts, with more parts work in progress. ISO 639-1 consists of 185 two-letter (alpha-2) codes used to identify the world's major languages. ISO 639-2 has three-letter (alpha-3) codes for 485 languages. ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. ISO 639-5 defines alpha-3 codes for language families.

Usage

ISO_639_2
ISO_639_3
ISO_639_3_Retirements
ISO_639_5

Format

ISO_639_2 is a character data frame with variables Alpha_3_B and Alpha_3_T (the ISO 639-2 bibliographic and terminological codes), Alpha_2 (the corresponding ISO 639-1 alpha-2 code if available), and Name (the English name of the language).

ISO_639_3 is a data frame with the following variables:

Id:: a character vector with the ISO 639-3 3-letter (alpha-3) identifiers.
Part2B:: a character vector with the equivalent ISO 639-2 B-code identifiers of the bibliographic applications code set (if existent).
Part2T:: a character vector with the equivalent ISO 639-2 T-code identifiers of the terminology applications code set (if existent).
Part1:: a character vector with the equivalent ISO 639-1 2-letter (alpha-2) identifiers (if existent).
Scope:: a factor with levels "I" (Individual), "M" (Macrolanguage) and "S" (Special).
Type:: a factor with levels "L" (Living languages), "E" (Extinct languages), "A" (Ancient languages), "H" (Historic languages), "C" (Constructed languages), and "S" (Special).
Name:: a character vector with the reference language names.
Comment:: a character vector with a comment relating to one or more of the other variables.
Family:: a character vector with the generic English names of the languages' family or macrolanguage.
eng:: a character vector with the language names in English.
fra:: a character vector with the language names in French (if available).
spa:: a character vector with the language names in Spanish (if available).
zho:: a character vector with the language names in Chinese (if available).
rus:: a character vector with the language names in Russian (if available).
deu:: a character vector with the language names in German (if available).

Variables Family and eng to deu are extracted from the Wikipedia ISO 639-3 language codes pages.

ISO_639_3_Retirements is a data frame giving the languages retired from ISO 639-3, with variables:

Id:: a character vector with the retired codes
Ret_Reason:: a factor with levels "C" (change), "D" (duplicate), "N" (non-existent), "S" (split), and "M" (merge).
Change_To:: a character vector which in the cases of C, D, and M gives the identifier to which all instances of the Id should be changed.
Ret_Remedy:: a character vector with instructions for updating an instance of the retired (split) identifier.
Effective:: a Date object giving the date the retirement became effective.

ISO_639_5 is a data frame with the following variables:

Id: a character vector with the 3-letter (alpha-3) ISO 639-5 identifiers.
English_Name: the family names in English.
French_Name: the family names in French.
Part2: a factor indicating how the family relates to 639-2, with levels "g" (group: consists of several related languages), "r" (rest group: a group of several related languages, from which some specific languages have been excluded), or "" (no 639-2 code).
Hierarchy: an indication of which other language families or groups the current language family or group is a member of (given as 639-5 ids separated by ‘⁠ : ⁠’).

Details

While most languages are given one code by the ISO 639-2 standard, twenty-two of the languages described have two three-letter codes, a “bibliographic” code (ISO 639-2/B, B-code), which is derived from the English name for the language and was a necessary legacy feature, and a “terminological” code (ISO 639-2/T, T-code), which is derived from the native name for the language. The range ‘⁠qaa⁠’ to ‘⁠qtz⁠’ is reserved for local use.

ISO 639-3 is a superset of ISO 639-1 and of the individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections, whereas Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.

ISO 639-2 contains codes for some individual and group languages and so any code in it is either in 639-3 or 639-5; 639-5 families may be missing from 639-2.