pre_tokenizer_whitespace: This pre-tokenizer simply splits using the following regex:...
In tok: Fast Text Tokenization

pre_tokenizer_whitespace

R Documentation

This pre-tokenizer simply splits using the following regex: `⁠\w+|[^\w\s]+⁠`

Description

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

Super class

tok::tok_pre_tokenizer -> tok_pre_tokenizer_whitespace

Methods

Method `new()`

Initializes the whistespace tokenizer

Usage

pre_tokenizer_whitespace$new()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

pre_tokenizer_whitespace$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

tok
Fast Text Tokenization

pre_tokenizer_whitespace: This pre-tokenizer simply splits using the following regex:...
In tok: Fast Text Tokenization

This pre-tokenizer simply splits using the following regex: `⁠\w+|[^\w\s]+⁠`

Description

Super class

Methods

Public methods

Method `new()`

Usage

Method `clone()`

Usage

Arguments

See Also

Related to pre_tokenizer_whitespace in tok...

R Package Documentation

Browse R Packages

We want your feedback!

tok Fast Text Tokenization

pre_tokenizer_whitespace: This pre-tokenizer simply splits using the following regex:... In tok: Fast Text Tokenization

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

Description

Super class

Methods

Public methods

Method new()

Usage

Method clone()

Usage

Arguments

See Also

Related to pre_tokenizer_whitespace in tok...

R Package Documentation

Browse R Packages

We want your feedback!

tok
Fast Text Tokenization

pre_tokenizer_whitespace: This pre-tokenizer simply splits using the following regex:...
In tok: Fast Text Tokenization

This pre-tokenizer simply splits using the following regex: `⁠\w+|[^\w\s]+⁠`

Method `new()`

Method `clone()`