dot-tokenize_chinese_chars: Add whitespace around any CJK character.

Description Usage Arguments Value

Description

R implementation of BasicTokenizer._tokenize_chinese_chars from BERT: tokenization.py. This may result in doubled-up spaces, but that's the behavior of the Python code.

Usage

1

Arguments

text

A character scalar.

Value

Text with spaces around CJK characters.


wordpiece documentation built on Feb. 11, 2021, 5:06 p.m.