Description Usage Arguments Details Value Warning Note Author(s) References Examples
The kakasi
is an interface to the external program kakasi,
KAnji KAna Simple Inverter. It is useful especially when Japanese Kanji
characters are subject to convert to Romaji (ASCII) characters.
1 2 3 | kakasi(x, kakasi.option="-Ha -Ka -Ja -Ea -ka",
ITAIJIDICTPATH = Sys.getenv("ITAIJIDICTPATH", unset = NA),
KANWADICTPATH = Sys.getenv("KANWADICTPATH", unset = NA))
|
x |
A character vector |
kakasi.option |
A chracter string specifying the options passed to kakasi library/program |
ITAIJIDICTPATH |
A character string specifying the path to itaijidict. Environmental variable of itaijidict passed to kakasi library. |
KANWADICTPATH |
A character string specifying the path to kanwadict. Environmental variable of kanwadict passed to kakasi library. |
Japanese strings are often made up a mixture of Chinese characters
(Kanji), Kana (Hiragana and Katakana) and Romaji (Latin phonetical
pronunciation). The external program kakasi converts between these four
different ways of writing Japanese. kakasi
and Sys.kakasi
are useful especially for sanitizing a character vector by converting
Japanese (non-ASCII) to ASCII characters.
kakasi
uses two basic dictionaries: itaijidict and
kanwadict. These dictionaries are included in doc/share of Package
directory after installation of Nippon package. Since the kakasi library
looks up the environmental variables to find dictionary, ITAIJIDICTPATH
and KANWADICTPATH are internally set using Sys.setenv
at the time
when kakasi
is called first time. After the first call,
kakasi
continues to use the environmental variables. Until R
session closes, these environmental variables never unset. To use
alternative dictionary instead of the bundled, a user can set the
environmental variables using Sys.setenv
or as arguments of
kakasi
. For permanent setting of environmental variables, see
help of Renviron.
A character vector
Note that non-Japanese and non-ASCII characters are not filtered in
kakasi
.kakasi
warns unless LC_CTYPE is "ja_JP.UTF-8"
(Linux or MacOSX) or "Japanese_Japan.932" (Windows). It is not sure
whether the function is workable in other locale.
Sys.kakasi
was removed in Nippon ver.0.6.
kakasi
warns unless LC_CTYPE is "ja_JP.UTF-8" (Linux or MacOSX)
or "Japanese_Japan.932" (Windows).
The accuracy of Kanji-Kana conversion with kakasi is a bit lower than with MeCab program (http://mecab.sourceforge.net/). Although MeCab does not have a function of Kana-Romaji conversion, MeCab could be an option if you wish more accurate results. RMeCab is available from http://rmecab.jp/wiki/.
For Windows users, please be known that R on Windows can use strings
encoded by both "ja_JP.UTF-8" and "Japanese_Japan.932"; however,
kakasi
works only with "Japanese_Japan.932". If you have data
encoded with UTF-8 on Windows, you should convert it to
"Japanese_Japan.932 (CP932)" as shown in example.
Susumu Tanimura aruminat@gmail.com
KAKASI - Kanji Kana Simple Inverter http://kakasi.namazu.org/
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
library(Nippon)
data(prefectures)
regions <- unique(prefectures$region)
regions
# Unix-like operating systems
kakasi(regions)
# Windows
regions.cp932 <- iconv(regions, from = "UTF-8", to = "CP932")
kakasi(regions.cp932)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.