tmcn: A Text mining toolkit for international characters especially for Chinese.

Share:

A Text mining toolkit for international characters especially for Chinese.

Author
Jian Li <rweibo@sina.com>
Date of publication
2015-03-02 14:40:04
Maintainer
Jian Li <rweibo@sina.com>
License
LGPL
Version
0.1-4

View on R-Forge

Man pages

catUTF8
Print the UTF-8 codes of a string.
createHashmapEnv
Create an environment for hash mapping.
GBK
GBK character set
getCharset
Get the current encoding of the locale.
getWordFreq
Get the word frequency data.frame.
isBIG5
Indicate whether the encoding of input string is BIG5.
isGB18030
Indicate whether the encoding of input string is GB18030.
isGB2312
Indicate whether the encoding of input string is GB2312.
isGBK
Indicate whether the encoding of input string is GBK.
isUTF8
Indicate whether the encoding of input string is UTF-8.
NTUSD
National Taiwan University Semantic Dictionary
revUTF8
Revert UTF-8 string to Chinese character.
setchs
Set locale to Simplified Chinese.
setcht
Set locale to Simplified Chinese.
SIMTRA
Dictionary of simplified and traditional Chinese
stopwordsCN
Return Chinese stop words.
strcap
Mixed case capitalizing.
strextract
Extract matched substrings by regular expression.
strpad
Pad a string to a specified length with a padding character.
strstrip
Trim space of a string.
tmcnTest
Run unit tests.
toPinyin
Convert a chinese text to pinyin format.
toTrad
Convert a Chinese text from simplified to traditional...
toUTF8
Convert encoding of Chinese string to UTF-8.

Files in this package

tmcn/DESCRIPTION
tmcn/NAMESPACE
tmcn/R
tmcn/R/catUTF8.R
tmcn/R/createHashmapEnv.R
tmcn/R/deprecated.R
tmcn/R/getCharset.R
tmcn/R/getWordFreq.R
tmcn/R/isBIG5.R
tmcn/R/isGB18030.R
tmcn/R/isGB2312.R
tmcn/R/isGBK.R
tmcn/R/isUTF8.R
tmcn/R/plotWordcloud.R
tmcn/R/revUTF8.R
tmcn/R/setchs.R
tmcn/R/setcht.R
tmcn/R/stopwordsCN.R
tmcn/R/strcap.R
tmcn/R/strextract.R
tmcn/R/strpad.R
tmcn/R/strstrip.R
tmcn/R/tmcnTest.R
tmcn/R/toPinyin.R
tmcn/R/toTrad.R
tmcn/R/toUTF8.R
tmcn/R/utils.R
tmcn/R/zzz.R
tmcn/data
tmcn/data/GBK.rda
tmcn/data/NTUSD.rda
tmcn/data/SIMTRA.rda
tmcn/demo
tmcn/demo/00Index
tmcn/demo/demo.R
tmcn/inst
tmcn/inst/dic
tmcn/inst/dic/stopwords.txt
tmcn/inst/unittests
tmcn/inst/unittests/runit.strextract.R
tmcn/inst/unittests/runit.strpad.R
tmcn/inst/unittests/runit.strstrip.R
tmcn/man
tmcn/man/GBK.Rd
tmcn/man/NTUSD.Rd
tmcn/man/SIMTRA.Rd
tmcn/man/catUTF8.Rd
tmcn/man/createHashmapEnv.Rd
tmcn/man/getCharset.Rd
tmcn/man/getWordFreq.Rd
tmcn/man/isBIG5.Rd
tmcn/man/isGB18030.Rd
tmcn/man/isGB2312.Rd
tmcn/man/isGBK.Rd
tmcn/man/isUTF8.Rd
tmcn/man/revUTF8.Rd
tmcn/man/setchs.Rd
tmcn/man/setcht.Rd
tmcn/man/stopwordsCN.Rd
tmcn/man/strcap.Rd
tmcn/man/strextract.Rd
tmcn/man/strpad.Rd
tmcn/man/strstrip.Rd
tmcn/man/tmcnTest.Rd
tmcn/man/toPinyin.Rd
tmcn/man/toTrad.Rd
tmcn/man/toUTF8.Rd
tmcn/src
tmcn/src/tmcn_encoding_isbig5.cpp
tmcn/src/tmcn_encoding_isgb18030.cpp
tmcn/src/tmcn_encoding_isgb2312.cpp
tmcn/src/tmcn_encoding_isgbk.cpp
tmcn/src/tmcn_encoding_isutf8.cpp