filter_segment: Filter segmentation result

Description Usage Arguments Examples

View source: R/filter.R

Description

This function helps remove some words in the segmentation result.

Usage

1
filter_segment(input, filter_words, unit = 50)

Arguments

input

a string vector

filter_words

a string vector of words to be removed.

unit

the length of word unit to use in regular expression, and the default is 50. Long list of a words forms a big regular expressions, it may or may not be accepted: the POSIX standard only requires up to 256 bytes. So we use unit to split the words in units.

Examples

1
filter_segment(c("abc","def"," ","."), c("abc"))

jiebaR documentation built on Dec. 16, 2019, 1:19 a.m.