jiebaR: A package for Chinese text segmentation
In jiebaR: Chinese Text Segmentation

jiebaR

R Documentation

A package for Chinese text segmentation

Description

This is a package for Chinese text segmentation, keyword extraction and speech tagging with Rcpp and cppjieba.

Details

You can use custom dictionary. JiebaR can also identify new words, but adding new words will ensure higher accuracy.

Author(s)

Qin Wenfeng <http://qinwenfeng.com>

References

CppJieba https://github.com/aszxqw/cppjieba;

Examples

### Note: Can not display Chinese characters here.
## Not run: 
words = "hello world"
engine1 = worker()
segment(words, engine1)

# "./temp.txt" is a file path

segment("./temp.txt", engine1)

engine2 = worker("hmm")
segment("./temp.txt", engine2)

engine2$write = T
segment("./temp.txt", engine2)

engine3 = worker(type = "mix", dict = "dict_path",symbol = T)
segment("./temp.txt", engine3)
 
## End(Not run)
 
## Not run: 
### Keyword Extraction
engine = worker("keywords", topn = 1)
keywords(words, engine)

### Speech Tagging 
tagger = worker("tag")
tagging(words, tagger)

### Simhash
simhasher = worker("simhash", topn = 1)
simhash(words, simhasher)
distance("hello world" , "hello world!" , simhasher)

show_dictpath()

## End(Not run)

jiebaR documentation built on April 4, 2025, 2:41 a.m.