segmentCN: Sengment a sentence.

Description Usage Arguments Details Value Author(s)

View source: R/segmentCN.R

Description

A function segment Chinese sentence into words.

Usage

1
2
3
segmentCN(strwords, package = c("jiebaR", "Rwordseg"), nature = FALSE, 
  nosymbol = TRUE, useStopDic = FALSE, returnType = c("vector", "tm"))
insertWords(inswords, package = c("jiebaR", "Rwordseg"))

Arguments

strwords

A string vector of Chinese sentences in UTF-8.

package

Use which package, "jiebaR" or "Rwordseg"?

nature

Whether to recognise the nature of the words.

nosymbol

Whether to keep symbols in the sentence.

useStopDic

Whether to use the default stop words.

returnType

Default is a string vector but we also can choose 'tm' to output a single string separated by space so that it can be used by Corpus directly.

inswords

A string vector of words will be added into dictionary.

Details

The function segmentCN is originated from the 'Rwordseg' package. If 'Rwordseg' was installed successfully (JRE and 'rJava' package are required), using 'Rwordseg::segmentCN' directly may be the easiest choice. More detailed can be found in http://jianl.org/cn/R/Rwordseg.html.

In this package the function segmentCN is a wrapper of 'jiebaR', which can be easily installed from CRAN. This function segmentCN only provide some basic functionalities of 'jiebaR'. More detailed can be found in http://qinwenfeng.com/jiebaR.

The function insertWords is used to add new words into dictionary temporarily. If you want to manage your own dictionary, please select either 'Rwordseg' or 'jiebaR' package for segmentation.

Value

a vector of words (list if input is vecter) which have been segmented or the path of output file.

Author(s)

Jian Li <[email protected]>


tmcn documentation built on March 18, 2018, 1:44 p.m.