tokenize_text_by_language: Language-aware tokenizer used across embedders and keyword...
In VectrixDB: Lightweight Vector Database with Embedded Machine Learning Models

tokenize_text_by_language

R Documentation

Language-aware tokenizer used across embedders and keyword search

Language-aware tokenizer used across embedders and keyword search

tokenize_text_by_language(text, language = "en", remove_stopwords = FALSE)

`text`	Input text
`language`	"en" or "ml"
`remove_stopwords`	Remove English stopwords when language is "en"

Character vector of tokens

VectrixDB documentation built on Feb. 20, 2026, 5:09 p.m.

VectrixDB index

Package overview README.md

Note that we can't provide technical support on individual packages. You should contact the package authors for that.