tokenize_spaces_punct: Tokenise text into a sequence of words

View source: R/smith_waterman.R

tokenize_spaces_punctR Documentation

Tokenise text into a sequence of words

Description

Tokenise text into a sequence of words. The function uses strsplit to split text into words by using the [:space:] and [:punct:] character classes.

Usage

tokenize_spaces_punct(x)

Arguments

x

a character string of length 1

Value

a character vector with the sequence of words in x

See Also

strsplit

Examples

tokenize_spaces_punct("This just splits. Text.alongside\nspaces right?")
tokenize_spaces_punct("Also .. multiple punctuations or ??marks")
tokenize_spaces_punct("Joske  Vermeulen")

text.alignment documentation built on Sept. 14, 2023, 5:08 p.m.