naiveTokenizer: Naive Tokenizer

Description Usage Arguments

View source: R/NaiveTokenizer.R

Description

Simple Tokenizer to split words among punctuation and whitespaces. If possible, prefer a DL Tokenizer. WARNING: This tokenizer is build for the english language and can be applied to other latin-based or cyrillic-based languages. This tokenizer does not work on other alphabets like chinese, devanagari, thai, japanese, hebrew or arabic.

Usage

1

Arguments

string

character string to be tokenized


LazerLambda/RolliNLP documentation built on Oct. 17, 2020, 8:54 p.m.