zen2han: Convert Japanese characters from fullwidth (zenkaku) to...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function is to convert Japanese characters between fullwidth (zenkaku) and halfwidth (hankaku) forms for avoiding trouble in Japanese string operation or for taking advantage of fullwidth (zenkaku) forms.

Usage

1
2

Arguments

s

A character vector. UTF-8 encoding is preferable.

Details

Japanese graphic characters are traditionally classed into fullwidth (zenkaku) and halfwidth (hankaku) form. Alphabets, numbers, and symbols can take either from, while Hiragana, Katakana, and Kanji are only available as fullwidth characters. It causes troubles in string manipulation such as matching or searching where the two forms of alphabets, numbers, and symbols are mixed in. Thus, the character data should be sanitized with this function.

The targeted zenkaku characters are numbers, alphabets, punctuation marks, and other special symbols. Katakana is not the target of zen2han because the halfwidth Katakana is rather a troublemaker.

han2zen functions reversely. This is useful for Japanese users to escape prohibitive characters in strings (e.g., '$' in a character vector).

Value

zen2han returns a character vector. All alphabets, numbers, and symbols have their halfwidth from.

han2zen returns a character vector. All alphabets, numbers, and symbols have their fullwidth from.

Author(s)

Susumu Tanimura aruminat@gmail.com

References

Halfwidth and Fullwidth Forms http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html

See Also

han2zen, showNonASCII

Examples

1
2

Example output

Loading required package: maptools
Loading required package: sp
Checking rgeos availability: TRUE
$number
[1] "<U+FF10><U+FF11><U+FF12><U+FF13><U+FF14><U+FF15><U+FF16><U+FF17><U+FF18><U+FF19>"

$lower
[1] "<U+FF41><U+FF42><U+FF43><U+FF44><U+FF45><U+FF46><U+FF47><U+FF48><U+FF49><U+FF4A><U+FF4B><U+FF4C><U+FF4D><U+FF4E><U+FF4F><U+FF50><U+FF51><U+FF52><U+FF53><U+FF54><U+FF55><U+FF56><U+FF57><U+FF58><U+FF59><U+FF5A>"

$upper
[1] "<U+FF21><U+FF22><U+FF23><U+FF24><U+FF25><U+FF26><U+FF27><U+FF28><U+FF29><U+FF2A><U+FF2B><U+FF2C><U+FF2D><U+FF2E><U+FF2F><U+FF30><U+FF31><U+FF32><U+FF33><U+FF34><U+FF35><U+FF36><U+FF37><U+FF38><U+FF39><U+FF3A>"

[1] "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Warning message:
In if (Encoding(s) != "UTF-8") s <- iconv(s, from = "", to = "UTF-8") :
  the condition has length > 1 and only the first element will be used

Nippon documentation built on May 2, 2019, 1:03 p.m.