Description Usage Arguments Details Value Note References Examples
This function tries to guess the language a text is written in.
1 2 3 4 5 6 7 8 9 | guess.lang(
txt.file,
udhr.path,
comp.length = 300,
keep.udhr = FALSE,
quiet = TRUE,
in.mem = TRUE,
format = "file"
)
|
txt.file |
A character vector pointing to the file with the text to be analyzed. |
udhr.path |
A character string, either pointing to the directory where you unzipped the translations of the Universal Declaration of Human Rights, or to the ZIP file containing them. |
comp.length |
Numeric value,
giving the number of characters to be used of |
keep.udhr |
Logical, whether all the UDHR translations should be kept in the resulting object. |
quiet |
Logical. If |
in.mem |
Logical. If |
format |
Either "file" or "obj". If the latter,
|
To accomplish the task, the method described by Benedetto, Caglioti & Loreto (2002) is used, utilizing both gzip compression and tranlations of the Universal Declaration of Human Rights[1]. The latter holds the world record for being translated into the most different languages, and is publicly available.
An object of class kRp.lang.
For this implementation the documents provided by the "UDHR in Unicode" project[2] have been used.
Their translations are not part of this package and must be downloaded seperately to use guess.lang!
You need the ZIP archive containing all the plain text files from https://unicode.org/udhr/downloads.html.
Benedetto, D., Caglioti, E. & Loreto, V. (2002). Language trees and zipping. Physical Review Letters, 88(4), 048702.
[1] https://www.ohchr.org/EN/UDHR/Pages/UDHRIndex.aspx
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
# using the still zipped bulk file
guess.lang(
file.path("~","data","some.txt"),
udhr.path=file.path("~","data","udhr_txt.zip")
)
# using the unzipped UDHR archive
guess.lang(
file.path("~","data","some.txt"),
udhr.path=file.path("~","data","udhr_txt")
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.