jiebaR: A package for Chinese text segmentation

Description Details Author(s) References See Also Examples

Description

This is a package for Chinese text segmentation, keyword extraction and speech tagging with Rcpp and cppjieba.

Details

You can use custom dictionary. JiebaR can also identify new words, but adding new words will ensure higher accuracy.

Author(s)

Qin Wenfeng <http://qinwenfeng.com>

References

CppJieba https://github.com/aszxqw/cppjieba;

See Also

JiebaR https://github.com/qinwf/jiebaR;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
### Note: Can not display Chinese characters here.
## Not run: 
words = "hello world"
engine1 = worker()
segment(words, engine1)

# "./temp.txt" is a file path

segment("./temp.txt", engine1)

engine2 = worker("hmm")
segment("./temp.txt", engine2)

engine2$write = T
segment("./temp.txt", engine2)

engine3 = worker(type = "mix", dict = "dict_path",symbol = T)
segment("./temp.txt", engine3)
 
## End(Not run)
 
## Not run: 
### Keyword Extraction
engine = worker("keywords", topn = 1)
keywords(words, engine)

### Speech Tagging 
tagger = worker("tag")
tagging(words, tagger)

### Simhash
simhasher = worker("simhash", topn = 1)
simhash(words, simhasher)
distance("hello world" , "hello world!" , simhasher)

show_dictpath()

## End(Not run)

jiebaR documentation built on Dec. 16, 2019, 1:19 a.m.