splitter: Character Splitter

View source: R/splitter.r

splitterR Documentation

Character Splitter

Description

A utility function for use with n-gram modeling. This function splits a string based on various options.

Usage

splitter(
  string,
  split.char = FALSE,
  split.space = TRUE,
  spacesep = "_",
  split.punct = FALSE
)

Arguments

string

An input string.

split.char

Logical; should a split occur after every character?

split.space

Logical; determines if spaces should be preserved as characters in the n-gram tokenization. The character(s) used for spaces are determined by the spacesep argument. characters.

spacesep

The character(s) to represent a space in the case that split.space=TRUE. Should not just be a space(s).

split.punct

Logical; determines if splits should occur at punctuation.

Details

Note that choosing split.char=TRUE necessarily implies split.punct=TRUE as well — but not necessarily that split.space=TRUE.

Value

A string.

Examples

x = "watch out! a snake!"

splitter(x, split.char=TRUE)
splitter(x, split.space=TRUE, spacesep="_")
splitter(x, split.punct=TRUE)



ngram documentation built on Nov. 1, 2022, 1:06 a.m.