Man pages for OOmegaPPanDDa/tmcn
A Text mining toolkit for international characters especially for Chinese.

catUTF8Print the UTF-8 codes of a string.
createHashmapEnvCreate an environment for hash mapping.
GBKGBK character set
getCharsetGet the current encoding of the locale.
getWordFreqGet the word frequency data.frame.
isBIG5Indicate whether the encoding of input string is BIG5.
isGB18030Indicate whether the encoding of input string is GB18030.
isGB2312Indicate whether the encoding of input string is GB2312.
isGBKIndicate whether the encoding of input string is GBK.
isUTF8Indicate whether the encoding of input string is UTF-8.
NTUSDNational Taiwan University Semantic Dictionary
revUTF8Revert UTF-8 string to Chinese character.
setchsSet locale to Simplified Chinese.
setchtSet locale to Simplified Chinese.
SIMTRADictionary of simplified and traditional Chinese
stopwordsCNReturn Chinese stop words.
strcapMixed case capitalizing.
strextractExtract matched substrings by regular expression.
strpadPad a string to a specified length with a padding character.
strstripTrim space of a string.
tmcnTestRun unit tests.
toPinyinConvert a chinese text to pinyin format.
toTradConvert a Chinese text from simplified to traditional...
toUTF8Convert encoding of Chinese string to UTF-8.
OOmegaPPanDDa/tmcn documentation built on May 7, 2019, 8:55 p.m.