segmentCN: Sengment a sentence.

Description Usage Arguments Value Author(s) Examples

Description

A function segment Chinese sentence into words.

Usage

1
2
3
4
5
  segmentCN(strwords,
    analyzer = get("Analyzer", envir = .RwordsegEnv),
    nature = FALSE, nosymbol = TRUE,
    returnType = c("vector", "tm"), isfast = FALSE,
    outfile = "", blocklines = 1000)

Arguments

strwords

A Chinese sentence in UTF-8 or the path of a text file.

analyzer

A JAVA object of analyzer.

nature

Whether to recognise the nature of the words.

nosymbol

Whether to keep symbols in the sentence.

returnType

Default is a string vector but we also can choose 'tm' to output a single string separated by space so that it can be used by Corpus directly.

isfast

Whether to run the fast analyzer.

outfile

The path of output if strwords is a file.

blocklines

The (maximal) number of lines to read at one time when strwords is a file.

Value

a vector of words (list if input is vecter) which have been segmented or the path of output file.

Author(s)

Jian Li <rweibo@sina.com>

Examples

1
2
3
4
## Not run: 
segmentCN("hello world!")

## End(Not run)

OOmegaPPanDDa/Rwordseg documentation built on May 7, 2019, 8:55 p.m.