preprocess: Text preprocessing

Description Usage Arguments Details Value Author(s) Examples

View source: R/RcppExports.R

Description

A minimal text preprocessing utility.

Usage

1
preprocess(input, erase = "[^.?!:;'[:alnum:][:space:]]", lower_case = TRUE)

Arguments

input

a character vector.

erase

a length one character vector. Regular expression matching parts of text to be erased from input. The default removes anything not alphanumeric ([A-z0-9]), space (white space, tab, vertical tab, newline, form feed, carriage return), apostrophes or punctuation characters ("[.?!:;]").

lower_case

a length one logical vector. If TRUE, puts everything to lower case.

Details

The expressions preprocess(x, erase = pattern, lower_case = TRUE) and preprocess(x, erase = pattern, lower_case = FALSE) are roughly equivalent to tolower(gsub(pattern, "", x)) and gsub(pattern, "", x), respectively, provided that the regular expression 'pattern' is correctly recognized by R.

Internally, preprocess() converts the string 'pattern' is converted into a C++ std::regex class by the default constructor std::regex::regex(std::string).

Value

a character vector containing the processed output.

Author(s)

Valerio Gherardi

Examples

1
preprocess("#This Is An Example@-@!#")

kgrams documentation built on Nov. 16, 2021, 9:22 a.m.