split_on_punc: Split text on punctuation.

View source: R/tokenization.R

split_on_puncR Documentation

Split text on punctuation.

Description

(R implementation of BasicTokenizer._run_split_on_punc from BERT: tokenization.py.)

Usage

split_on_punc(text)

Arguments

text

A character scalar, encoded as utf-8.

Value

The input text as a character vector, split on punctuation characters.


jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.