tokenize_chinese_chars: Add whitespace around any CJK character.
In jonathanbratt/RBERT: R Implementation of BERT

tokenize_chinese_chars

R Documentation

Add whitespace around any CJK character.

Description

(R implementation of BasicTokenizer._tokenize_chinese_chars from BERT: tokenization.py.) This may result in doubled-up spaces, but that's the behavior of the python code...

Usage

tokenize_chinese_chars(text)

Arguments

text

A character scalar.

Value

Text with spaces around CJK characters.

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.

jonathanbratt/RBERT index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jonathanbratt/RBERT
R Implementation of BERT

tokenize_chinese_chars: Add whitespace around any CJK character.
In jonathanbratt/RBERT: R Implementation of BERT

Add whitespace around any CJK character.

Description

Usage

Arguments

Value

Related to tokenize_chinese_chars in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT R Implementation of BERT

tokenize_chinese_chars: Add whitespace around any CJK character. In jonathanbratt/RBERT: R Implementation of BERT

Add whitespace around any CJK character.

Description

Usage

Arguments

Value

Related to tokenize_chinese_chars in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT
R Implementation of BERT

tokenize_chinese_chars: Add whitespace around any CJK character.
In jonathanbratt/RBERT: R Implementation of BERT