preprocess: Basic Text Preprocessor

View source: R/preprocess.r

preprocessR Documentation

Basic Text Preprocessor

Description

A simple text preprocessor for use with the ngram() function.

Usage

preprocess(
  x,
  case = "lower",
  remove.punct = FALSE,
  remove.numbers = FALSE,
  fix.spacing = TRUE
)

Arguments

x

Input text.

case

Option to change the case of the text. Value should be "upper", "lower", or NULL (no change).

remove.punct

Logical; should punctuation be removed?

remove.numbers

Logical; should numbers be removed?

fix.spacing

Logical; should multi/trailing spaces be collapsed/removed.

Details

The input text x must already be in the correct form for ngram(), i.e., a single string (character vector of length 1).

The case argument can take 3 possible values: NULL, in which case nothing is done, or lower or upper, wherein the case of the input text will be made lower/upper case, repesctively.

Value

concat() returns

Examples

library(ngram)

x = "Watch  out    for snakes!  111"
preprocess(x)
preprocess(x, remove.punct=TRUE, remove.numbers=TRUE)


ngram documentation built on Nov. 1, 2022, 1:06 a.m.