tokenize_simple: Tokenise text into a sequence of words

Description Usage Arguments Value See Also Examples

View source: R/pkg.R

Description

Tokenise text into a sequence of words. The function uses strsplit to split text into words by using the [:space:] character classes.

Usage

1
tokenize_simple(x, split = "[[:space:]]+")

Arguments

x

a character string of length 1

split

passed on to strsplit

Value

a character vector with the sequence of words in x

See Also

strsplit

Examples

1
2
3
tokenize_simple("This just splits. Text.alongside\nspaces right?")
tokenize_simple("Also .. multiple punctuations or ??marks")
tokenize_simple("Joske  Vermeulen")

DIGI-VUB/udpipe.vosters documentation built on Sept. 9, 2020, 12:36 a.m.