process_multiwords_fast: Optimized multiword processing workflow

View source: R/txt_recode_fast.R

process_multiwords_fastR Documentation

Optimized multiword processing workflow

Description

Complete optimized workflow for multiword detection and processing. Uses C++ functions and data.table for maximum performance.

Usage

process_multiwords_fast(x2, stats, term = c("lemma", "token"))

Arguments

x2

Data frame with token information

stats

Data frame with multiword statistics (keyword, ngram columns)

term

Type of term to process: "lemma" or "token"

Details

This function replaces the original switch block with an optimized version that uses:

  • C++ functions for text recoding

  • Vectorized operations instead of multiple mutate calls

  • Pre-computed lookups to avoid repeated joins

Value

Data frame with columns: doc_id, term_id, multiword, upos_multiword, ngram

Examples

## Not run: 
result <- process_multiwords_fast(dfTag, multiword_stats, term = "lemma")

## End(Not run)


tall documentation built on Dec. 12, 2025, 5:07 p.m.