make_stoplist: Input a Filename and Return a Vector of Stop Words

Description Usage Arguments Details Value

View source: R/make_stoplist.R

Description

When a filename is provided, the function will return a vector of terms. If nothing is provided, it will return the stop words used in package jiebaR. See Details.

Usage

1
make_stoplist(x = "jiebar", print = TRUE)

Arguments

x

a length 1 character specifying a valid stop word file. If it is not provided, or is "jiebar" (default), "jiebaR" or "auto", it will return part of the stop words used by package jiebaR. See Details.

print

TRUE or FALSE, whether to print the first 5 words

Details

In a valid text file that saves stop words, each word should occupy a single line. However, if any line that contains more than one word and these words are separated by blanks, punctuations, numbers, it is also accepted, for the function will try to split them. Duplicated words will also be automatically removed. The encoding of a stop words file is auto-detected by the function.

For stop word list from jiebaR, see jiebaR::STOPPATH. It contains many words that are often removed in analyzing Chinese text. However, the result returned by make_stoplist is slightly different.

Value

a character vector of words. If no word is obtained, it will return NULL.


chinese.misc documentation built on Sept. 13, 2020, 5:13 p.m.