tokenize_text_by_language: Language-aware tokenizer used across embedders and keyword...

View source: R/utils.R

tokenize_text_by_languageR Documentation

Language-aware tokenizer used across embedders and keyword search

Description

Language-aware tokenizer used across embedders and keyword search

Usage

tokenize_text_by_language(text, language = "en", remove_stopwords = FALSE)

Arguments

text

Input text

language

"en" or "ml"

remove_stopwords

Remove English stopwords when language is "en"

Value

Character vector of tokens


VectrixDB documentation built on Feb. 20, 2026, 5:09 p.m.