tokenize_space: Break Text at Spaces
In piecemaker: Tools for Preparing Text for Tokenizers

View source: R/tokenize.R

tokenize_space

R Documentation

Break Text at Spaces

Description

This is an extremely simple tokenizer, breaking only and exactly on the space character. This tokenizer is intended to work in tandem with prepare_text, so that spaces are cleaned up and inserted as necessary before the tokenizer runs. This function and prepare_text are combined together in prepare_and_tokenize.

Usage

tokenize_space(text)

Arguments

text

A character vector to clean.

Value

The text as a list of character vectors (one vector per element of text). Each element of each vector is roughly equivalent to a word.

Examples

tokenize_space("This is some text.")

piecemaker documentation built on June 7, 2023, 5:55 p.m.

piecemaker index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com