ECIMCI_profiles | R Documentation |
N-gram profile db for 26 languages based on the European Corpus Initiative Multilingual Corpus I.
ECIMCI_profiles
This profile db was built by Johannes Rauch, using the ECI/MCI corpus (http://www.elsnet.org/eci.html) and the default options employed by package textcat, with all text documents encoded in UTF-8.
The category ids used for the db are the respective IETF language tags
(see language in package NLP), using the ISO 639-2
Part B language subtags and, for Serbian, the script employed (i.e.,
"scc-Cyrl"
and "scc-Latn"
for Serbian written in
Cyrillic and Latin script, respectively; all other languages in the
profile db are written in Latin script.)
S. Armstrong-Warwick, H. S. Thompson, D. McKelvie and D. Petitpierre (1994), Data in Your Language: The ECI Multilingual Corpus 1. In “Proceedings of the International Workshop on Sharable Natural Language Resources” (Nara, Japan), 97–106. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.950
## Languages in the the ECI/MCI profile db: names(ECIMCI_profiles) ## Key options used for the profile: attr(ECIMCI_profiles, "options")[c("n", "size", "reduce", "useBytes")]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.