tokenize_space: Break Text at Spaces

View source: R/tokenize.R

tokenize_spaceR Documentation

Break Text at Spaces

Description

This is an extremely simple tokenizer, breaking only and exactly on the space character. This tokenizer is intended to work in tandem with prepare_text, so that spaces are cleaned up and inserted as necessary before the tokenizer runs. This function and prepare_text are combined together in prepare_and_tokenize.

Usage

tokenize_space(text)

Arguments

text

A character vector to clean.

Value

The text as a list of character vectors (one vector per element of text). Each element of each vector is roughly equivalent to a word.

Examples

tokenize_space("This is some text.")

piecemaker documentation built on June 7, 2023, 5:55 p.m.