Description Details Author(s) References See Also Examples
This is a package for Chinese text segmentation, keyword extraction and speech tagging with Rcpp and cppjieba.
You can use custom dictionary. JiebaR can also identify new words, but adding new words will ensure higher accuracy.
Qin Wenfeng <http://qinwenfeng.com>
CppJieba https://github.com/aszxqw/cppjieba;
JiebaR https://github.com/qinwf/jiebaR;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ### Note: Can not display Chinese characters here.
## Not run:
words = "hello world"
engine1 = worker()
segment(words, engine1)
# "./temp.txt" is a file path
segment("./temp.txt", engine1)
engine2 = worker("hmm")
segment("./temp.txt", engine2)
engine2$write = T
segment("./temp.txt", engine2)
engine3 = worker(type = "mix", dict = "dict_path",symbol = T)
segment("./temp.txt", engine3)
## End(Not run)
## Not run:
### Keyword Extraction
engine = worker("keywords", topn = 1)
keywords(words, engine)
### Speech Tagging
tagger = worker("tag")
tagging(words, tagger)
### Simhash
simhasher = worker("simhash", topn = 1)
simhash(words, simhasher)
distance("hello world" , "hello world!" , simhasher)
show_dictpath()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.